1
|
Han J, Zhang H, Ning K. Techniques for learning and transferring knowledge for microbiome-based classification and prediction: review and assessment. Brief Bioinform 2024; 26:bbaf015. [PMID: 39820436 PMCID: PMC11737891 DOI: 10.1093/bib/bbaf015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2024] [Revised: 12/10/2024] [Accepted: 01/06/2025] [Indexed: 01/19/2025] Open
Abstract
The volume of microbiome data is growing at an exponential rate, and the current methodologies for big data mining are encountering substantial obstacles. Effectively managing and extracting valuable insights from these vast microbiome datasets has emerged as a significant challenge in the field of contemporary microbiome research. This comprehensive review delves into the utilization of foundation models and transfer learning techniques within the context of microbiome-based classification and prediction tasks, advocating for a transition away from traditional task-specific or scenario-specific models towards more adaptable, continuous learning models. The article underscores the practicality and benefits of initially constructing a robust foundation model, which can then be fine-tuned using transfer learning to tackle specific context tasks. In real-world scenarios, the application of transfer learning empowers models to leverage disease-related data from one geographical area and enhance diagnostic precision in different regions. This transition from relying on "good models" to embracing "adaptive models" resonates with the philosophy of "teaching a man to fish" thereby paving the way for advancements in personalized medicine and accurate diagnosis. Empirical research suggests that the integration of foundation models with transfer learning methodologies substantially boosts the performance of models when dealing with large-scale and diverse microbiome datasets, effectively mitigating the challenges posed by data heterogeneity.
Collapse
Affiliation(s)
- Jin Han
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center of AI Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Luoyu Road 1037, Wuhan 430074, Hubei, China
| | - Haohong Zhang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center of AI Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Luoyu Road 1037, Wuhan 430074, Hubei, China
| | - Kang Ning
- Key Laboratory of Molecular Biophysics of the Ministry of Education, Hubei Key Laboratory of Bioinformatics and Molecular-imaging, Center of AI Biology, Department of Bioinformatics and Systems Biology, College of Life Science and Technology, Huazhong University of Science and Technology, Luoyu Road 1037, Wuhan 430074, Hubei, China
| |
Collapse
|
2
|
Matchado MS, Rühlemann M, Reitmeier S, Kacprowski T, Frost F, Haller D, Baumbach J, List M. On the limits of 16S rRNA gene-based metagenome prediction and functional profiling. Microb Genom 2024; 10:001203. [PMID: 38421266 PMCID: PMC10926695 DOI: 10.1099/mgen.0.001203] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Accepted: 02/05/2024] [Indexed: 03/02/2024] Open
Abstract
Molecular profiling techniques such as metagenomics, metatranscriptomics or metabolomics offer important insights into the functional diversity of the microbiome. In contrast, 16S rRNA gene sequencing, a widespread and cost-effective technique to measure microbial diversity, only allows for indirect estimation of microbial function. To mitigate this, tools such as PICRUSt2, Tax4Fun2, PanFP and MetGEM infer functional profiles from 16S rRNA gene sequencing data using different algorithms. Prior studies have cast doubts on the quality of these predictions, motivating us to systematically evaluate these tools using matched 16S rRNA gene sequencing, metagenomic datasets, and simulated data. Our contribution is threefold: (i) using simulated data, we investigate if technical biases could explain the discordance between inferred and expected results; (ii) considering human cohorts for type two diabetes, colorectal cancer and obesity, we test if health-related differential abundance measures of functional categories are concordant between 16S rRNA gene-inferred and metagenome-derived profiles and; (iii) since 16S rRNA gene copy number is an important confounder in functional profiles inference, we investigate if a customised copy number normalisation with the rrnDB database could improve the results. Our results show that 16S rRNA gene-based functional inference tools generally do not have the necessary sensitivity to delineate health-related functional changes in the microbiome and should thus be used with care. Furthermore, we outline important differences in the individual tools tested and offer recommendations for tool selection.
Collapse
Affiliation(s)
- Monica Steffi Matchado
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
| | - Malte Rühlemann
- Institute of Clinical Molecular Biology, Kiel University, Kiel, Germany
| | - Sandra Reitmeier
- ZIEL - Institute for Food & Health, Core Facility Microbiome, Technical University of Munich, Freising, Germany
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics of Technische Universität Braunschweig and Hannover Medical School, Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), TU Braunschweig, Braunschweig, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
| | - Fabian Frost
- Department of Medicine A, University Medicine Greifswald, Greifswald, Germany
| | - Dirk Haller
- ZIEL - Institute for Food & Health, Core Facility Microbiome, Technical University of Munich, Freising, Germany
- Chair of Nutrition and Immunology, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| | - Jan Baumbach
- Institute for Computational Systems Biology, University of Hamburg, Hamburg, Germany
- Institute of Mathematics and Computer Science, University of Southern Denmark, Odense, Denmark
| | - Markus List
- Data Science in Systems Biology, TUM School of Life Sciences, Technical University of Munich, Freising, Germany
| |
Collapse
|
3
|
Meng D, Ai S, Spanos M, Shi X, Li G, Cretoiu D, Zhou Q, Xiao J. Exercise and microbiome: From big data to therapy. Comput Struct Biotechnol J 2023; 21:5434-5445. [PMID: 38022690 PMCID: PMC10665598 DOI: 10.1016/j.csbj.2023.10.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2023] [Revised: 10/16/2023] [Accepted: 10/17/2023] [Indexed: 12/01/2023] Open
Abstract
Exercise is a vital component in maintaining optimal health and serves as a prospective therapeutic intervention for various diseases. The human microbiome, comprised of trillions of microorganisms, plays a crucial role in overall health. Given the advancements in microbiome research, substantial databases have been created to decipher the functionality and mechanisms of the microbiome in health and disease contexts. This review presents an initial overview of microbiomics development and related databases, followed by an in-depth description of the multi-omics technologies for microbiome. It subsequently synthesizes the research pertaining to exercise-induced modifications of the microbiome and diseases that impact the microbiome. Finally, it highlights the potential therapeutic implications of an exercise-modulated microbiome in intestinal disease, obesity and diabetes, cardiovascular disease, and immune/inflammation-related diseases.
Collapse
Affiliation(s)
- Danni Meng
- Institute of Geriatrics (Shanghai University), Affiliated Nantong Hospital of Shanghai University (The Sixth People’s Hospital of Nantong), School of Medicine, Shanghai University, Nantong 226011, China
- Cardiac Regeneration and Ageing Lab, Institute of Cardiovascular Sciences, Shanghai Engineering Research Center of Organ Repair, School of Life Science, Shanghai University, Shanghai 200444, China
| | - Songwei Ai
- Institute of Geriatrics (Shanghai University), Affiliated Nantong Hospital of Shanghai University (The Sixth People’s Hospital of Nantong), School of Medicine, Shanghai University, Nantong 226011, China
- Cardiac Regeneration and Ageing Lab, Institute of Cardiovascular Sciences, Shanghai Engineering Research Center of Organ Repair, School of Life Science, Shanghai University, Shanghai 200444, China
| | - Michail Spanos
- Cardiovascular Division of the Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Xiaohui Shi
- Institute of Geriatrics (Shanghai University), Affiliated Nantong Hospital of Shanghai University (The Sixth People’s Hospital of Nantong), School of Medicine, Shanghai University, Nantong 226011, China
- Cardiac Regeneration and Ageing Lab, Institute of Cardiovascular Sciences, Shanghai Engineering Research Center of Organ Repair, School of Life Science, Shanghai University, Shanghai 200444, China
| | - Guoping Li
- Cardiovascular Division of the Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Dragos Cretoiu
- Department of Medical Genetics, Carol Davila University of Medicine and Pharmacy, Bucharest 020031, Romania
- Materno-Fetal Assistance Excellence Unit, Alessandrescu-Rusescu National Institute for Mother and Child Health, Bucharest 011062, Romania
| | - Qiulian Zhou
- Institute of Geriatrics (Shanghai University), Affiliated Nantong Hospital of Shanghai University (The Sixth People’s Hospital of Nantong), School of Medicine, Shanghai University, Nantong 226011, China
- Cardiac Regeneration and Ageing Lab, Institute of Cardiovascular Sciences, Shanghai Engineering Research Center of Organ Repair, School of Life Science, Shanghai University, Shanghai 200444, China
| | - Junjie Xiao
- Institute of Geriatrics (Shanghai University), Affiliated Nantong Hospital of Shanghai University (The Sixth People’s Hospital of Nantong), School of Medicine, Shanghai University, Nantong 226011, China
- Cardiac Regeneration and Ageing Lab, Institute of Cardiovascular Sciences, Shanghai Engineering Research Center of Organ Repair, School of Life Science, Shanghai University, Shanghai 200444, China
| |
Collapse
|
4
|
Dundore-Arias JP, Michalska-Smith M, Millican M, Kinkel LL. More Than the Sum of Its Parts: Unlocking the Power of Network Structure for Understanding Organization and Function in Microbiomes. ANNUAL REVIEW OF PHYTOPATHOLOGY 2023; 61:403-423. [PMID: 37217203 DOI: 10.1146/annurev-phyto-021021-041457] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/24/2023]
Abstract
Plant and soil microbiomes are integral to the health and productivity of plants and ecosystems, yet researchers struggle to identify microbiome characteristics important for providing beneficial outcomes. Network analysis offers a shift in analytical framework beyond "who is present" to the organization or patterns of coexistence between microbes within the microbiome. Because microbial phenotypes are often significantly impacted by coexisting populations, patterns of coexistence within microbiomes are likely to be especially important in predicting functional outcomes. Here, we provide an overview of the how and why of network analysis in microbiome research, highlighting the ways in which network analyses have provided novel insights into microbiome organization and functional capacities, the diverse network roles of different microbial populations, and the eco-evolutionary dynamics of plant and soil microbiomes.
Collapse
Affiliation(s)
- J P Dundore-Arias
- Department of Biology and Chemistry, California State University, Monterey Bay, Seaside, California, USA
| | - M Michalska-Smith
- Department of Plant Pathology, University of Minnesota, St. Paul, Minnesota, USA;
- Department of Ecology, Evolution, and Behavior, University of Minnesota, St. Paul, Minnesota, USA
| | | | - L L Kinkel
- Department of Plant Pathology, University of Minnesota, St. Paul, Minnesota, USA;
| |
Collapse
|
5
|
Yu Y, Zhang Y, Liu Y, Lv M, Wang Z, Wen LL, Li A. In situ reductive dehalogenation of groundwater driven by innovative organic carbon source materials: Insights into the organohalide-respiratory electron transport chain. JOURNAL OF HAZARDOUS MATERIALS 2023; 452:131243. [PMID: 36989787 DOI: 10.1016/j.jhazmat.2023.131243] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/02/2022] [Revised: 02/24/2023] [Accepted: 03/17/2023] [Indexed: 05/03/2023]
Abstract
In situ bioremediation using organohalide-respiring bacteria (OHRB) is a prospective method for the removal of persistent halogenated organic pollutants from groundwater, as OHRB can utilize H2 or organic compounds produced by carbon source materials as electron donors for cell growth through organohalide respiration. However, few previous studies have determined the suitability of different carbon source materials to the metabolic mechanism of reductive dehalogenation from the perspective of electron transfer. The focus of this critical review was to reveal the interactions and relationships between carbon source materials and functional microbes, in terms of the electron transfer mechanism. Furthermore, this review illustrates some innovative strategies that have used the physiological characteristics of OHRB to guide the optimization of carbon source materials, improving the abundance of indigenous dehalogenated bacteria and enhancing electron transfer efficiency. Finally, it is proposed that future research should combine multi-omics analysis with machine learning (ML) to guide the design of effective carbon source materials and optimize current dehalogenation bioremediation strategies to reduce the cost and footprint of practical groundwater bioremediation applications.
Collapse
Affiliation(s)
- Yang Yu
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Yueyan Zhang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Yuqing Liu
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Mengran Lv
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Zeyi Wang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Li-Lian Wen
- College of Resource and Environmental Science, Hubei University, Wuhan 430062, China.
| | - Ang Li
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China.
| |
Collapse
|
6
|
Zhang M, Zhang W, Chen Y, Zhao J, Wu S, Su X. Flex Meta-Storms elucidates the microbiome local beta-diversity under specific phenotypes. Bioinformatics 2023; 39:btad148. [PMID: 36946295 PMCID: PMC10082668 DOI: 10.1093/bioinformatics/btad148] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Revised: 03/13/2023] [Accepted: 03/19/2023] [Indexed: 03/23/2023] Open
Abstract
MOTIVATION Beta-diversity quantitatively measures the difference among microbial communities thus enlightening the association between microbiome composition and environment properties or host phenotypes. The beta-diversity analysis mainly relies on distances among microbiomes that are calculated by all microbial features. However, in some cases, only a small fraction of members in a community plays crucial roles. Such a tiny proportion is insufficient to alter the overall distance, which is always missed by end-to-end comparison. On the other hand, beta-diversity pattern can also be interfered due to the data sparsity when only focusing on nonabundant microbes. RESULTS Here, we develop Flex Meta-Storms (FMS) distance algorithm that implements the "local alignment" of microbiomes for the first time. Using a flexible extraction that considers the weighted phylogenetic and functional relations of microbes, FMS produces a normalized phylogenetic distance among members of interest for microbiome pairs. We demonstrated the advantage of FMS in detecting the subtle variations of microbiomes among different states using artificial and real datasets, which were neglected by regular distance metrics. Therefore, FMS effectively discriminates microbiomes with higher sensitivity and flexibility, thus contributing to in-depth comprehension of microbe-host interactions, as well as promoting the utilization of microbiome data such as disease screening and prediction. AVAILABILITY AND IMPLEMENTATION FMS is implemented in C++, and the source code is released at https://github.com/qdu-bioinfo/flex-meta-storms.
Collapse
Affiliation(s)
- Mingqian Zhang
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Wenke Zhang
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Yuzhu Chen
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Jin Zhao
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Shunyao Wu
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, China
| |
Collapse
|
7
|
Noman SM, Zeeshan M, Arshad J, Deressa Amentie M, Shafiq M, Yuan Y, Zeng M, Li X, Xie Q, Jiao X. Machine Learning Techniques for Antimicrobial Resistance Prediction of Pseudomonas Aeruginosa from Whole Genome Sequence Data. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2023; 2023:5236168. [PMID: 36909968 PMCID: PMC9995192 DOI: 10.1155/2023/5236168] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Revised: 10/21/2022] [Accepted: 02/02/2023] [Indexed: 03/05/2023]
Abstract
AIM Due to the growing availability of genomic datasets, machine learning models have shown impressive diagnostic potential in identifying emerging and reemerging pathogens. This study aims to use machine learning techniques to develop and compare a model for predicting bacterial resistance to a panel of 12 classes of antibiotics using whole genome sequence (WGS) data of Pseudomonas aeruginosa. METHOD A machine learning technique called Random Forest (RF) and BioWeka was used for classification accuracy assessment and logistic regression (LR) for statistical analysis. RESULTS Our results show 44.66% of isolates were resistant to twelve antimicrobial agents and 55.33% were sensitive. The mean classification accuracy was obtained ≥98% for BioWeka and ≥96 for RF on these families of antimicrobials. Where ampicillin was 99.31% and 94.00%, amoxicillin was 99.02% and 95.21%, meropenem was 98.27% and 96.63%, cefepime was 99.73% and 98.34%, fosfomycin was 96.44% and 99.23%, ceftazidime was 98.63% and 94.31%, chloramphenicol was 98.71% and 96.00%, erythromycin was 95.76% and 97.63%, tetracycline was 99.27% and 98.25%, gentamycin was 98.00% and 97.30%, butirosin was 99.57% and 98.03%, and ciprofloxacin was 96.17% and 98.97% with 10-fold-cross validation. In addition, out of twelve, eight drugs have found no false-positive and false-negative bacterial strains. CONCLUSION The ability to accurately detect antibiotic resistance could help clinicians make educated decisions about empiric therapy based on the local antibiotic resistance pattern. Moreover, infection prevention may have major consequences if such prescribing practices become widespread for human health.
Collapse
Affiliation(s)
- Sohail M. Noman
- Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, Guangdong 515041, China
| | - Muhammad Zeeshan
- Department of Medicine and Surgery, Al-Nafees Medical College and Hospital, Isra University, Islamabad 44000, Pakistan
| | - Jehangir Arshad
- Department of Electrical and Computer Engineering, Comsats University Islamabad, Lahore Campus 44000, Lahore, Pakistan
| | | | - Muhammad Shafiq
- Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, Guangdong 515041, China
| | - Yumeng Yuan
- Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, Guangdong 515041, China
| | - Mi Zeng
- Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, Guangdong 515041, China
| | - Xin Li
- Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, Guangdong 515041, China
| | - Qingdong Xie
- Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, Guangdong 515041, China
| | - Xiaoyang Jiao
- Department of Cell Biology and Genetics, Shantou University Medical College, Shantou, Guangdong 515041, China
| |
Collapse
|
8
|
Loganathan T, Priya Doss C G. The influence of machine learning technologies in gut microbiome research and cancer studies - A review. Life Sci 2022; 311:121118. [DOI: 10.1016/j.lfs.2022.121118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 10/19/2022] [Accepted: 10/19/2022] [Indexed: 11/18/2022]
|
9
|
Karasikov M, Mustafa H, Rätsch G, Kahles A. Lossless indexing with counting de Bruijn graphs. Genome Res 2022; 32:1754-1764. [PMID: 35609994 PMCID: PMC9528980 DOI: 10.1101/gr.276607.122] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 05/05/2022] [Indexed: 11/25/2022]
Abstract
Sequencing data are rapidly accumulating in public repositories. Making this resource accessible for interactive analysis at scale requires efficient approaches for its storage and indexing. There have recently been remarkable advances in building compressed representations of annotated (or colored) de Bruijn graphs for efficiently indexing k-mer sets. However, approaches for representing quantitative attributes such as gene expression or genome positions in a general manner have remained underexplored. In this work, we propose counting de Bruijn graphs, a notion generalizing annotated de Bruijn graphs by supplementing each node-label relation with one or many attributes (e.g., a k-mer count or its positions). Counting de Bruijn graphs index k-mer abundances from 2652 human RNA-seq samples in over eightfold smaller representations compared with state-of-the-art bioinformatics tools and is faster to construct and query. Furthermore, counting de Bruijn graphs with positional annotations losslessly represent entire reads in indexes on average 27% smaller than the input compressed with gzip for human Illumina RNA-seq and 57% smaller for Pacific Biosciences (PacBio) HiFi sequencing of viral samples. A complete searchable index of all viral PacBio SMRT reads from NCBI's Sequence Read Archive (SRA) (152,884 samples, 875 Gbp) comprises only 178 GB. Finally, on the full RefSeq collection, we generate a lossless and fully queryable index that is 4.6-fold smaller than the MegaBLAST index. The techniques proposed in this work naturally complement existing methods and tools using de Bruijn graphs, and significantly broaden their applicability: from indexing k-mer counts and genome positions to implementing novel sequence alignment algorithms on top of highly compressed graph-based sequence indexes.
Collapse
Affiliation(s)
- Mikhail Karasikov
- Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
- Biomedical Informatics Research, University Hospital Zurich, 8091 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Harun Mustafa
- Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
- Biomedical Informatics Research, University Hospital Zurich, 8091 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Gunnar Rätsch
- Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
- Biomedical Informatics Research, University Hospital Zurich, 8091 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
- Department of Biology at ETH Zurich, 8093 Zurich, Switzerland
- ETH AI Center, ETH Zurich, 8092 Zurich, Switzerland
| | - André Kahles
- Department of Computer Science, ETH Zurich, 8092 Zurich, Switzerland
- Biomedical Informatics Research, University Hospital Zurich, 8091 Zurich, Switzerland
- Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
10
|
Explainable Machine Learning for Longitudinal Multi-Omic Microbiome. MATHEMATICS 2022. [DOI: 10.3390/math10121994] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Over the years, research studies have shown there is a key connection between the microbial community in the gut, genes, and immune system. Understanding this association may help discover the cause of complex chronic idiopathic disorders such as inflammatory bowel disease. Even though important efforts have been put into the field, the functions, dynamics, and causation of dysbiosis state performed by the microbial community remains unclear. Machine learning models can help elucidate important connections and relationships between microbes in the human host. Our study aims to extend the current knowledge of associations between the human microbiome and health and disease through the application of dynamic Bayesian networks to describe the temporal variation of the gut microbiota and dynamic relationships between taxonomic entities and clinical variables. We develop a set of preprocessing steps to clean, filter, select, integrate, and model informative metagenomics, metatranscriptomics, and metabolomics longitudinal data from the Human Microbiome Project. This study accomplishes novel network models with satisfactory predictive performance (accuracy = 0.648) for each inflammatory bowel disease state, validating Bayesian networks as a framework for developing interpretable models to help understand the basic ways the different biological entities (taxa, genes, metabolites) interact with each other in a given environment (human gut) over time. These findings can serve as a starting point to advance the discovery of novel therapeutic approaches and new biomarkers for precision medicine.
Collapse
|
11
|
Agostinetto G, Bozzi D, Porro D, Casiraghi M, Labra M, Bruno A. SKIOME Project: a curated collection of skin microbiome datasets enriched with study-related metadata. Database (Oxford) 2022; 2022:6586378. [PMID: 35576001 PMCID: PMC9216470 DOI: 10.1093/database/baac033] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 02/25/2022] [Accepted: 05/09/2022] [Indexed: 04/07/2023]
Abstract
Large amounts of data from microbiome-related studies have been (and are currently being) deposited on international public databases. These datasets represent a valuable resource for the microbiome research community and could serve future researchers interested in integrating multiple datasets into powerful meta-analyses. However, this huge amount of data lacks harmonization and it is far from being completely exploited in its full potential to build a foundation that places microbiome research at the nexus of many subdisciplines within and beyond biology. Thus, it urges the need for data accessibility and reusability, according to findable, accessible, interoperable and reusable (FAIR) principles, as supported by National Microbiome Data Collaborative and FAIR Microbiome. To tackle the challenge of accelerating discovery and advances in skin microbiome research, we collected, integrated and organized existing microbiome data resources from human skin 16S rRNA amplicon-sequencing experiments. We generated a comprehensive collection of datasets, enriched in metadata, and organized this information into data frames ready to be integrated into microbiome research projects and advanced post-processing analyses, such as data science applications (e.g. machine learning). Furthermore, we have created a data retrieval and curation framework built on three different stages to maximize the retrieval of datasets and metadata associated with them. Lastly, we highlighted some caveats regarding metadata retrieval and suggested ways to improve future metadata submissions. Overall, our work resulted in a curated skin microbiome datasets collection accompanied by a state-of-the-art analysis of the last 10 years of the skin microbiome field. Database URL: https://github.com/giuliaago/SKIOMEMetadataRetrieval.
Collapse
Affiliation(s)
- Giulia Agostinetto
- *Corresponding author: Giulia Agostinetto. E-mail: and Antonia Bruno. Tel: +0039 0264483413; E-mail:
| | | | - Danilo Porro
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza, 2, Milan 20126, Italy
- Institute of Molecular Bioimaging and Physiology (IBFM), National Research Council (CNR), via Fratelli Cervi, 93, Segrate (MI) 20054, Italy
| | - Maurizio Casiraghi
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza, 2, Milan 20126, Italy
| | - Massimo Labra
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Piazza della Scienza, 2, Milan 20126, Italy
| | - Antonia Bruno
- *Corresponding author: Giulia Agostinetto. E-mail: and Antonia Bruno. Tel: +0039 0264483413; E-mail:
| |
Collapse
|
12
|
Giulia A, Anna S, Antonia B, Dario P, Maurizio C. Extending Association Rule Mining to Microbiome Pattern Analysis: Tools and Guidelines to Support Real Applications. FRONTIERS IN BIOINFORMATICS 2022; 1:794547. [PMID: 36303759 PMCID: PMC9580939 DOI: 10.3389/fbinf.2021.794547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 12/07/2021] [Indexed: 11/24/2022] Open
Abstract
Boosted by the exponential growth of microbiome-based studies, analyzing microbiome patterns is now a hot-topic, finding different fields of application. In particular, the use of machine learning techniques is increasing in microbiome studies, providing deep insights into microbial community composition. In this context, in order to investigate microbial patterns from 16S rRNA metabarcoding data, we explored the effectiveness of Association Rule Mining (ARM) technique, a supervised-machine learning procedure, to extract patterns (in this work, intended as groups of species or taxa) from microbiome data. ARM can generate huge amounts of data, making spurious information removal and visualizing results challenging. Our work sheds light on the strengths and weaknesses of pattern mining strategy into the study of microbial patterns, in particular from 16S rRNA microbiome datasets, applying ARM on real case studies and providing guidelines for future usage. Our results highlighted issues related to the type of input and the use of metadata in microbial pattern extraction, identifying the key steps that must be considered to apply ARM consciously on 16S rRNA microbiome data. To promote the use of ARM and the visualization of microbiome patterns, specifically, we developed microFIM (microbial Frequent Itemset Mining), a versatile Python tool that facilitates the use of ARM integrating common microbiome outputs, such as taxa tables. microFIM implements interest measures to remove spurious information and merges the results of ARM analysis with the common microbiome outputs, providing similar microbiome strategies that help scientists to integrate ARM in microbiome applications. With this work, we aimed at creating a bridge between microbial ecology researchers and ARM technique, making researchers aware about the strength and weaknesses of association rule mining approach.
Collapse
Affiliation(s)
- Agostinetto Giulia
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy
- *Correspondence: Agostinetto Giulia,
| | | | - Bruno Antonia
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy
| | - Pescini Dario
- Department of Statistics and Quantitative Methods, University of Milano-Bicocca, Milan, Italy
| | - Casiraghi Maurizio
- Department of Biotechnology and Biosciences, University of Milano-Bicocca, Milan, Italy
| |
Collapse
|
13
|
Abstract
Quantitative comparison among microbiomes can link microbial beta-diversity to environmental features, thus enabling prediction of ecosystem properties or dissection of host-microbiome interaction. However, to compute beta-diversity, current methods mainly employ the entire community profiles of taxa or functions, which can miss the subtle differences caused by low-abundance community members that may play crucial roles in the properties of interest. In this work, I review the distance metrics and search engines that we developed to match microbiomes at a large scale based on whole-community-level similarities, as well as their limitations in tackling the microbiome changes caused by less abundant community features. Then I propose the concept of microbiome "local alignment," including an algorithm to measure microbiome similarity on specific fractions of biodiversity and an indexing strategy for rapidly fetching microbiome local-alignment matches from the data repository.
Collapse
Affiliation(s)
- Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, China
- Single-Cell Center, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, China
| |
Collapse
|
14
|
Abstract
Quantitative comparison among microbiomes can link microbial beta-diversity to environmental features, thus enabling prediction of ecosystem properties or dissection of host-microbiome interaction. However, to compute beta-diversity, current methods mainly employ the entire community profiles of taxa or functions, which can miss the subtle differences caused by low-abundance community members that may play crucial roles in the properties of interest. In this work, I review the distance metrics and search engines that we developed to match microbiomes at a large scale based on whole-community-level similarities, as well as their limitations in tackling the microbiome changes caused by less abundant community features. Then I propose the concept of microbiome “local alignment,” including an algorithm to measure microbiome similarity on specific fractions of biodiversity and an indexing strategy for rapidly fetching microbiome local-alignment matches from the data repository.
Collapse
|
15
|
Cai Z, Lin S, Hu S, Zhao L. Structure and Function of Oral Microbial Community in Periodontitis Based on Integrated Data. Front Cell Infect Microbiol 2021; 11:663756. [PMID: 34222038 PMCID: PMC8248787 DOI: 10.3389/fcimb.2021.663756] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Accepted: 05/31/2021] [Indexed: 02/05/2023] Open
Abstract
Objective Microorganisms play a key role in the initiation and progression of periodontal disease. Research studies have focused on seeking specific microorganisms for diagnosing and monitoring the outcome of periodontitis treatment. Large samples may help to discover novel potential biomarkers and capture the common characteristics among different periodontitis patients. This study examines how to screen and merge high-quality periodontitis-related sequence datasets from several similar projects to analyze and mine the potential information comprehensively. Methods In all, 943 subgingival samples from nine publications were included based on predetermined screening criteria. A uniform pipeline (QIIME2) was applied to clean the raw sequence datasets and merge them together. Microbial structure, biomarkers, and correlation network were explored between periodontitis and healthy individuals. The microbiota patterns at different periodontal pocket depths were described. Additionally, potential microbial functions and metabolic pathways were predicted using PICRUSt to assess the differences between health and periodontitis. Results The subgingival microbial communities and functions in subjects with periodontitis were significantly different from those in healthy subjects. Treponema, TG5, Desulfobulbus, Catonella, Bacteroides, Aggregatibacter, Peptostreptococcus, and Eikenella were periodontitis biomarkers, while Veillonella, Corynebacterium, Neisseria, Rothia, Paludibacter, Capnocytophaga, and Kingella were signature of healthy periodontium. With the variation of pocket depth from shallow to deep pocket, the proportion of Spirochaetes, Bacteroidetes, TM7, and Fusobacteria increased, whereas that of Proteobacteria and Actinobacteria decreased. Synergistic relationships were observed among different pathobionts and negative relationships were noted between periodontal pathobionts and healthy microbiota. Conclusion This study shows significant differences in the oral microbial community and potential metabolic pathways between the periodontitis and healthy groups. Our integrated analysis provides potential biomarkers and directions for in-depth research. Moreover, a new method for integrating similar sequence data is shown here that can be applied to other microbial-related areas.
Collapse
Affiliation(s)
- Zhengwen Cai
- State Key Laboratory of Oral Diseases, West China College of Stomatology, Sichuan University, Chengdu, China.,National Clinical Research Center for Oral Diseases, West China College of Stomatology, Sichuan University, Chengdu, China
| | - Shulan Lin
- State Key Laboratory of Oral Diseases, West China College of Stomatology, Sichuan University, Chengdu, China.,National Clinical Research Center for Oral Diseases, West China College of Stomatology, Sichuan University, Chengdu, China.,Department of Periodontics, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| | - Shoushan Hu
- State Key Laboratory of Oral Diseases, West China College of Stomatology, Sichuan University, Chengdu, China.,National Clinical Research Center for Oral Diseases, West China College of Stomatology, Sichuan University, Chengdu, China
| | - Lei Zhao
- State Key Laboratory of Oral Diseases, West China College of Stomatology, Sichuan University, Chengdu, China.,National Clinical Research Center for Oral Diseases, West China College of Stomatology, Sichuan University, Chengdu, China.,Department of Periodontics, West China Hospital of Stomatology, Sichuan University, Chengdu, China
| |
Collapse
|
16
|
Zhang Y, Jing G, Chen Y, Li J, Su X. Hierarchical Meta-Storms enables comprehensive and rapid comparison of microbiome functional profiles on a large scale using hierarchical dissimilarity metrics and parallel computing. BIOINFORMATICS ADVANCES 2021; 1:vbab003. [PMID: 36700101 PMCID: PMC9710644 DOI: 10.1093/bioadv/vbab003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 04/20/2021] [Accepted: 05/06/2021] [Indexed: 01/28/2023]
Abstract
Functional beta-diversity analysis on numerous microbiomes interprets the linkages between metabolic functions and their meta-data. To evaluate the microbiome beta-diversity, widely used distance metrices only count overlapped gene families but omit their inherent relationships, resulting in erroneous distances due to the sparsity of high-dimensional function profiles. Here we propose Hierarchical Meta-Storms (HMS) to tackle such problem. HMS contains two core components: (i) a dissimilarity algorithm that comprehensively measures functional distances among microbiomes using multi-level metabolic hierarchy and (ii) a fast Principal Co-ordinates Analysis (PCoA) implementation that deduces the beta-diversity pattern optimized by parallel computing. Results showed HMS can detect the variations of microbial functions in upper-level metabolic pathways, however, always missed by other methods. In addition, HMS accomplished the pairwise distance matrix and PCoA for 20 000 microbiomes in 3.9 h on a single computing node, which was 23 times faster and 80% less RAM consumption compared to existing methods, enabling the in-depth data mining among microbiomes on a high resolution. HMS takes microbiome functional profiles as input, produces their pairwise distance matrix and PCoA coordinates. Availability and implementation It is coded in C/C++ with parallel computing and released in two alternative forms: a standalone software (https://github.com/qdu-bioinfo/hierarchical-meta-storms) and an equivalent R package (https://github.com/qdu-bioinfo/hrms). Supplementary information Supplementary data are available at Bioinformatics Advances online.
Collapse
Affiliation(s)
- Yufeng Zhang
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Gongchao Jing
- Single-Cell Center, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong 266101, China
| | - Yuzhu Chen
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Jinhua Li
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China,To whom correspondence should be addressed. or Jinhua Li
| | - Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China,Single-Cell Center, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, Shandong 266101, China,To whom correspondence should be addressed. or Jinhua Li
| |
Collapse
|
17
|
Wu S, Chen Y, Li Z, Li J, Zhao F, Su X. Towards multi-label classification: Next step of machine learning for microbiome research. Comput Struct Biotechnol J 2021; 19:2742-2749. [PMID: 34093989 PMCID: PMC8131981 DOI: 10.1016/j.csbj.2021.04.054] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2021] [Revised: 04/21/2021] [Accepted: 04/22/2021] [Indexed: 11/22/2022] Open
Abstract
Machine learning (ML) has been widely used in microbiome research for biomarker selection and disease prediction. By training microbial profiles of samples from patients and healthy controls, ML classifiers constructs data models by community features that highly correlated with the target diseases, so as to determine the status of new samples. To clearly understand the host-microbe interaction of specific diseases, previous studies always focused on well-designed cohorts, in which each sample was exactly labeled by a single status type. However, in fact an individual may be associated with multiple diseases simultaneously, which introduce additional variations on microbial patterns that interferes the status detection. More importantly, comorbidities or complications can be missed by regular ML models, limiting the practical application of microbiome techniques. In this review, we summarize the typical ML approaches of single-label classification for microbiome research, and demonstrate their limitations in multi-label disease detection using a real dataset. Then we prospect a further step of ML towards multi-label classification that potentially solves the aforementioned problem, including a series of promising strategies and key technical issues for applying multi-label classification in microbiome-based studies.
Collapse
Affiliation(s)
- Shunyao Wu
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Yuzhu Chen
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Zhiruo Li
- School of Mathematics and Statistics, Qingdao University, Qingdao, Shandong 266071, China
| | - Jian Li
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Fengyang Zhao
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| | - Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, Shandong 266071, China
| |
Collapse
|
18
|
Xu L, Pierroz G, Wipf HML, Gao C, Taylor JW, Lemaux PG, Coleman-Derr D. Holo-omics for deciphering plant-microbiome interactions. MICROBIOME 2021; 9:69. [PMID: 33762001 PMCID: PMC7988928 DOI: 10.1186/s40168-021-01014-z] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 02/02/2021] [Indexed: 05/02/2023]
Abstract
Host-microbiome interactions are recognized for their importance to host health. An improved understanding of the molecular underpinnings of host-microbiome relationships will advance our capacity to accurately predict host fitness and manipulate interaction outcomes. Within the plant microbiome research field, unlocking the functional relationships between plants and their microbial partners is the next step to effectively using the microbiome to improve plant fitness. We propose that strategies that pair host and microbial datasets-referred to here as holo-omics-provide a powerful approach for hypothesis development and advancement in this area. We discuss several experimental design considerations and present a case study to highlight the potential for holo-omics to generate a more holistic perspective of molecular networks within the plant microbiome system. In addition, we discuss the biggest challenges for conducting holo-omics studies; specifically, the lack of vetted analytical frameworks, publicly available tools, and required technical expertise to process and integrate heterogeneous data. Finally, we conclude with a perspective on appropriate use-cases for holo-omics studies, the need for downstream validation, and new experimental techniques that hold promise for the plant microbiome research field. We argue that utilizing a holo-omics approach to characterize host-microbiome interactions can provide important opportunities for broadening system-level understandings and significantly inform microbial approaches to improving host health and fitness. Video abstract.
Collapse
Affiliation(s)
- Ling Xu
- Department of Plant and Microbial Biology, University of California, Berkeley, CA USA
| | - Grady Pierroz
- Department of Plant and Microbial Biology, University of California, Berkeley, CA USA
| | - Heidi M.-L. Wipf
- Department of Plant and Microbial Biology, University of California, Berkeley, CA USA
| | - Cheng Gao
- Department of Plant and Microbial Biology, University of California, Berkeley, CA USA
| | - John W. Taylor
- Department of Plant and Microbial Biology, University of California, Berkeley, CA USA
| | - Peggy G. Lemaux
- Department of Plant and Microbial Biology, University of California, Berkeley, CA USA
| | - Devin Coleman-Derr
- Department of Plant and Microbial Biology, University of California, Berkeley, CA USA
- Plant Gene Expression Center, USDA-ARS, Albany, CA USA
| |
Collapse
|
19
|
Microbiome Search Engine 2: a Platform for Taxonomic and Functional Search of Global Microbiomes on the Whole-Microbiome Level. mSystems 2021; 6:6/1/e00943-20. [PMID: 33468706 PMCID: PMC7820668 DOI: 10.1128/msystems.00943-20] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
A search-based strategy is useful for large-scale mining of microbiome data sets, such as a bird’s-eye view of the microbiome data space and disease diagnosis via microbiome big data. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes against the existing microbiome data sets on the basis of their similarity in taxonomic structure or functional profile. Metagenomic data sets from diverse environments have been growing rapidly. To ensure accessibility and reusability, tools that quickly and informatively correlate new microbiomes with existing ones are in demand. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes in the global metagenome data space based on the taxonomic or functional similarity of a whole microbiome to those in the database. MSE 2 consists of (i) a well-organized and regularly updated microbiome database that currently contains over 250,000 metagenomic shotgun and 16S rRNA gene amplicon samples associated with unified metadata collected from 798 studies, (ii) an enhanced search engine that enables real-time and fast (<0.5 s per query) searches against the entire database for best-matched microbiomes using overall taxonomic or functional profiles, and (iii) a Web-based graphical user interface for user-friendly searching, data browsing, and tutoring. MSE 2 is freely accessible via http://mse.ac.cn. For standalone searches of customized microbiome databases, the kernel of the MSE 2 search engine is provided at GitHub (https://github.com/qibebt-bioinfo/meta-storms). IMPORTANCE A search-based strategy is useful for large-scale mining of microbiome data sets, such as a bird’s-eye view of the microbiome data space and disease diagnosis via microbiome big data. Here, we introduce Microbiome Search Engine 2 (MSE 2), a microbiome database platform for searching query microbiomes against the existing microbiome data sets on the basis of their similarity in taxonomic structure or functional profile. Key improvements include database extension, data compatibility, a search engine kernel, and a user interface. The new ability to search the microbiome space via functional similarity greatly expands the scope of search-based mining of the microbiome big data.
Collapse
|
20
|
Mukherjee S, Stamatis D, Bertsch J, Ovchinnikova G, Sundaramurthi J, Lee J, Kandimalla M, Chen IMA, Kyrpides NC, Reddy TBK. Genomes OnLine Database (GOLD) v.8: overview and updates. Nucleic Acids Res 2021; 49:D723-D733. [PMID: 33152092 PMCID: PMC7778979 DOI: 10.1093/nar/gkaa983] [Citation(s) in RCA: 129] [Impact Index Per Article: 32.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Revised: 10/08/2020] [Accepted: 10/19/2020] [Indexed: 12/28/2022] Open
Abstract
The Genomes OnLine Database (GOLD) (https://gold.jgi.doe.gov/) is a manually curated, daily updated collection of genome projects and their metadata accumulated from around the world. The current version of the database includes over 1.17 million entries organized broadly into Studies (45 770), Organisms (387 382) or Biosamples (101 207), Sequencing Projects (355 364) and Analysis Projects (283 481). These four levels contain over 600 metadata fields, which includes 76 controlled vocabulary (CV) tables containing 3873 terms. GOLD provides an interactive web user interface for browsing and searching by a wide range of project and metadata fields. Users can enter details about their own projects in GOLD, which acts as a gatekeeper to ensure that metadata is accurately documented before submitting sequence information to the Integrated Microbial Genomes (IMG) system for analysis. In order to maintain a reference dataset for use by members of the scientific community, GOLD also imports projects from public repositories such as GenBank and SRA. The current status of the database, along with recent updates and improvements are described in this manuscript.
Collapse
Affiliation(s)
- Supratim Mukherjee
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Dimitri Stamatis
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Jon Bertsch
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Galina Ovchinnikova
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | - Janey Lee
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Mahathi Kandimalla
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - I-Min A Chen
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - Nikos C Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | - T B K Reddy
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| |
Collapse
|
21
|
Jing G, Zhang Y, Cui W, Liu L, Xu J, Su X. Meta-Apo improves accuracy of 16S-amplicon-based prediction of microbiome function. BMC Genomics 2021; 22:9. [PMID: 33407112 PMCID: PMC7788972 DOI: 10.1186/s12864-020-07307-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2020] [Accepted: 12/07/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Due to their much lower costs in experiment and computation than metagenomic whole-genome sequencing (WGS), 16S rRNA gene amplicons have been widely used for predicting the functional profiles of microbiome, via software tools such as PICRUSt 2. However, due to the potential PCR bias and gene profile variation among phylogenetically related genomes, functional profiles predicted from 16S amplicons may deviate from WGS-derived ones, resulting in misleading results. RESULTS Here we present Meta-Apo, which greatly reduces or even eliminates such deviation, thus deduces much more consistent diversity patterns between the two approaches. Tests of Meta-Apo on > 5000 16S-rRNA amplicon human microbiome samples from 4 body sites showed the deviation between the two strategies is significantly reduced by using only 15 WGS-amplicon training sample pairs. Moreover, Meta-Apo enables cross-platform functional comparison between WGS and amplicon samples, thus greatly improve 16S-based microbiome diagnosis, e.g. accuracy of gingivitis diagnosis via 16S-derived functional profiles was elevated from 65 to 95% by WGS-based classification. Therefore, with the low cost of 16S-amplicon sequencing, Meta-Apo can produce a reliable, high-resolution view of microbiome function equivalent to that offered by shotgun WGS. CONCLUSIONS This suggests that large-scale, function-oriented microbiome sequencing projects can probably benefit from the lower cost of 16S-amplicon strategy, without sacrificing the precision in functional reconstruction that otherwise requires WGS. An optimized C++ implementation of Meta-Apo is available on GitHub ( https://github.com/qibebt-bioinfo/meta-apo ) under a GNU GPL license. It takes the functional profiles of a few paired WGS:16S-amplicon samples as training, and outputs the calibrated functional profiles for the much larger number of 16S-amplicon samples.
Collapse
Affiliation(s)
- Gongchao Jing
- Single-Cell Center, CAS Key Lab of Biofuels, Shandong Key Lab of Energy Genetics and Shandong Institute of Energy Research, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, China
| | - Yufeng Zhang
- Single-Cell Center, CAS Key Lab of Biofuels, Shandong Key Lab of Energy Genetics and Shandong Institute of Energy Research, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, China.,College of Computer Science and Technology, Qingdao University, Qingdao, China
| | - Wenzhi Cui
- College of Control Science and Engineering, China University of Petroleum, Qingdao, China
| | - Lu Liu
- Single-Cell Center, CAS Key Lab of Biofuels, Shandong Key Lab of Energy Genetics and Shandong Institute of Energy Research, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, China
| | - Jian Xu
- Single-Cell Center, CAS Key Lab of Biofuels, Shandong Key Lab of Energy Genetics and Shandong Institute of Energy Research, Qingdao Institute of BioEnergy and Bioprocess Technology, Chinese Academy of Sciences, Qingdao, China
| | - Xiaoquan Su
- College of Computer Science and Technology, Qingdao University, Qingdao, China.
| |
Collapse
|