1
|
Sadurski J, Polak-Berecka M, Staniszewski A, Waśko A. Step-by-Step Metagenomics for Food Microbiome Analysis: A Detailed Review. Foods 2024; 13:2216. [PMID: 39063300 PMCID: PMC11276190 DOI: 10.3390/foods13142216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2024] [Revised: 07/11/2024] [Accepted: 07/12/2024] [Indexed: 07/28/2024] Open
Abstract
This review article offers a comprehensive overview of the current understanding of using metagenomic tools in food microbiome research. It covers the scientific foundation and practical application of genetic analysis techniques for microbial material from food, including bioinformatic analysis and data interpretation. The method discussed in the article for analyzing microorganisms in food without traditional culture methods is known as food metagenomics. This approach, along with other omics technologies such as nutrigenomics, proteomics, metabolomics, and transcriptomics, collectively forms the field of foodomics. Food metagenomics allows swift and thorough examination of bacteria and potential metabolic pathways by utilizing foodomic databases. Despite its established scientific basis and available bioinformatics resources, the research approach of food metagenomics outlined in the article is not yet widely implemented in industry. The authors believe that the integration of next-generation sequencing (NGS) with rapidly advancing digital technologies such as artificial intelligence (AI), the Internet of Things (IoT), and big data will facilitate the widespread adoption of this research strategy in microbial analysis for the food industry. This adoption is expected to enhance food safety and product quality in the near future.
Collapse
Affiliation(s)
- Jan Sadurski
- Department of Biotechnology, Microbiology and Human Nutrition, Faculty of Food Science and Biotechnology, University of Life Sciences in Lublin, 20-704 Lublin, Poland; (M.P.-B.); (A.S.); (A.W.)
| | | | | | | |
Collapse
|
2
|
Music of metagenomics-a review of its applications, analysis pipeline, and associated tools. Funct Integr Genomics 2021; 22:3-26. [PMID: 34657989 DOI: 10.1007/s10142-021-00810-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 09/25/2021] [Accepted: 10/03/2021] [Indexed: 10/20/2022]
Abstract
This humble effort highlights the intricate details of metagenomics in a simple, poetic, and rhythmic way. The paper enforces the significance of the research area, provides details about major analytical methods, examines the taxonomy and assembly of genomes, emphasizes some tools, and concludes by celebrating the richness of the ecosystem populated by the "metagenome."
Collapse
|
3
|
Kayani MUR, Huang W, Feng R, Chen L. Genome-resolved metagenomics using environmental and clinical samples. Brief Bioinform 2021; 22:bbab030. [PMID: 33758906 PMCID: PMC8425419 DOI: 10.1093/bib/bbab030] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Revised: 11/29/2020] [Accepted: 01/20/2021] [Indexed: 12/25/2022] Open
Abstract
Recent advances in high-throughput sequencing technologies and computational methods have added a new dimension to metagenomic data analysis i.e. genome-resolved metagenomics. In general terms, it refers to the recovery of draft or high-quality microbial genomes and their taxonomic classification and functional annotation. In recent years, several studies have utilized the genome-resolved metagenome analysis approach and identified previously unknown microbial species from human and environmental metagenomes. In this review, we describe genome-resolved metagenome analysis as a series of four necessary steps: (i) preprocessing of the sequencing reads, (ii) de novo metagenome assembly, (iii) genome binning and (iv) taxonomic and functional analysis of the recovered genomes. For each of these four steps, we discuss the most commonly used tools and the currently available pipelines to guide the scientific community in the recovery and subsequent analyses of genomes from any metagenome sample. Furthermore, we also discuss the tools required for validation of assembly quality as well as for improving quality of the recovered genomes. We also highlight the currently available pipelines that can be used to automate the whole analysis without having advanced bioinformatics knowledge. Finally, we will highlight the most widely adapted and actively maintained tools and pipelines that can be helpful to the scientific community in decision making before they commence the analysis.
Collapse
Affiliation(s)
- Masood ur Rehman Kayani
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| | - Wanqiu Huang
- Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 200,000, China
| | - Ru Feng
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| | - Lei Chen
- Center for Microbiota and Immunological Diseases, Shanghai General Hospital, Shanghai Institute of Immunology, Shanghai Jiao Tong University, School of Medicine, Shanghai 2,000,025, China
| |
Collapse
|
4
|
Ayling M, Clark MD, Leggett RM. New approaches for metagenome assembly with short reads. Brief Bioinform 2021; 21:584-594. [PMID: 30815668 PMCID: PMC7299287 DOI: 10.1093/bib/bbz020] [Citation(s) in RCA: 115] [Impact Index Per Article: 28.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 01/31/2019] [Accepted: 02/01/2019] [Indexed: 02/07/2023] Open
Abstract
In recent years, the use of longer range read data combined with advances in assembly algorithms has stimulated big improvements in the contiguity and quality of genome assemblies. However, these advances have not directly transferred to metagenomic data sets, as assumptions made by the single genome assembly algorithms do not apply when assembling multiple genomes at varying levels of abundance. The development of dedicated assemblers for metagenomic data was a relatively late innovation and for many years, researchers had to make do using tools designed for single genomes. This has changed in the last few years and we have seen the emergence of a new type of tool built using different principles. In this review, we describe the challenges inherent in metagenomic assemblies and compare the different approaches taken by these novel assembly tools.
Collapse
Affiliation(s)
- Martin Ayling
- Earlham Institute, Norwich Research Park, Norwich, UK
| | | | | |
Collapse
|
5
|
Deng Z, Delwart E. ContigExtender: a new approach to improving de novo sequence assembly for viral metagenomics data. BMC Bioinformatics 2021; 22:119. [PMID: 33706720 PMCID: PMC7953547 DOI: 10.1186/s12859-021-04038-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 02/21/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Metagenomics is the study of microbial genomes for pathogen detection and discovery in human clinical, animal, and environmental samples via Next-Generation Sequencing (NGS). Metagenome de novo sequence assembly is a crucial analytical step in which longer contigs, ideally whole chromosomes/genomes, are formed from shorter NGS reads. However, the contigs generated from the de novo assembly are often very fragmented and rarely longer than a few kilo base pairs (kb). Therefore, a time-consuming extension process is routinely performed on the de novo assembled contigs. RESULTS To facilitate this process, we propose a new tool for metagenome contig extension after de novo assembly. ContigExtender employs a novel recursive extending strategy that explores multiple extending paths to achieve highly accurate longer contigs. We demonstrate that ContigExtender outperforms existing tools in synthetic, animal, and human metagenomics datasets. CONCLUSIONS A novel software tool ContigExtender has been developed to assist and enhance the performance of metagenome de novo assembly. ContigExtender effectively extends contigs from a variety of sources and can be incorporated in most viral metagenomics analysis pipelines for a wide variety of applications, including pathogen detection and viral discovery.
Collapse
Affiliation(s)
- Zachary Deng
- Vitalant Research Institute, San Francisco, CA, 94118, USA.
- Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA, 94107, USA.
| | - Eric Delwart
- Vitalant Research Institute, San Francisco, CA, 94118, USA.
- Department of Laboratory Medicine, University of California at San Francisco, San Francisco, CA, 94107, USA.
| |
Collapse
|
6
|
Shah RM, McKenzie EJ, Rosin MT, Jadhav SR, Gondalia SV, Rosendale D, Beale DJ. An Integrated Multi-Disciplinary Perspectivefor Addressing Challenges of the Human Gut Microbiome. Metabolites 2020; 10:E94. [PMID: 32155792 PMCID: PMC7143645 DOI: 10.3390/metabo10030094] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2020] [Revised: 02/18/2020] [Accepted: 02/27/2020] [Indexed: 02/06/2023] Open
Abstract
Our understanding of the human gut microbiome has grown exponentially. Advances in genome sequencing technologies and metagenomics analysis have enabled researchers to study microbial communities and their potential function within the context of a range of human gut related diseases and disorders. However, up until recently, much of this research has focused on characterizing the gut microbiological community structure and understanding its potential through system wide (meta) genomic and transcriptomic-based studies. Thus far, the functional output of these microbiomes, in terms of protein and metabolite expression, and within the broader context of host-gut microbiome interactions, has been limited. Furthermore, these studies highlight our need to address the issues of individual variation, and of samples as proxies. Here we provide a perspective review of the recent literature that focuses on the challenges of exploring the human gut microbiome, with a strong focus on an integrated perspective applied to these themes. In doing so, we contextualize the experimental and technical challenges of undertaking such studies and provide a framework for capitalizing on the breadth of insight such approaches afford. An integrated perspective of the human gut microbiome and the linkages to human health will pave the way forward for delivering against the objectives of precision medicine, which is targeted to specific individuals and addresses the issues and mechanisms in situ.
Collapse
Affiliation(s)
- Rohan M. Shah
- Department of Chemistry and Biotechnology, Faculty of Science, Engineering and Technology, Swinburne University of Technology, Hawthorn, VIC 3122, Australia;
- Land and Water, Commonwealth Scientific and Industrial Research Organization (CSIRO), Dutton Park, QLD 4102, Australia
| | - Elizabeth J. McKenzie
- Liggins Institute, The University of Auckland, Grafton, Auckland 1142, New Zealand; (E.J.M.); (M.T.R.)
| | - Magda T. Rosin
- Liggins Institute, The University of Auckland, Grafton, Auckland 1142, New Zealand; (E.J.M.); (M.T.R.)
| | - Snehal R. Jadhav
- Centre for Advanced Sensory Science, School of Exercise and Nutrition Sciences, Deakin University, Burwood, VIC 3125, Australia;
| | - Shakuntla V. Gondalia
- Centre for Human Psychopharmacology, Swinburne University of Technology, Hawthorn, VIC 3122, Australia;
| | | | - David J. Beale
- Land and Water, Commonwealth Scientific and Industrial Research Organization (CSIRO), Dutton Park, QLD 4102, Australia
| |
Collapse
|
7
|
Cuadrat RRC, Cury JC, Dávila AMR. Metagenomic Analysis of Upwelling-Affected Brazilian Coastal Seawater Reveals Sequence Domains of Type I PKS and Modular NRPS. Int J Mol Sci 2015; 16:28285-95. [PMID: 26633360 PMCID: PMC4691048 DOI: 10.3390/ijms161226101] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Revised: 11/16/2015] [Accepted: 11/18/2015] [Indexed: 11/23/2022] Open
Abstract
Marine environments harbor a wide range of microorganisms from the three domains of life. These microorganisms have great potential to enable discovery of new enzymes and bioactive compounds for industrial use. However, only ~1% of microorganisms from the environment can currently be identified through cultured isolates, limiting the discovery of new compounds. To overcome this limitation, a metagenomics approach has been widely adopted for biodiversity studies on samples from marine environments. In this study, we screened metagenomes in order to estimate the potential for new natural compound synthesis mediated by diversity in the Polyketide Synthase (PKS) and Nonribosomal Peptide Synthetase (NRPS) genes. The samples were collected from the Praia dos Anjos (Angel’s Beach) surface water—Arraial do Cabo (Rio de Janeiro state, Brazil), an environment affected by upwelling. In order to evaluate the potential for screening natural products in Arraial do Cabo samples, we used KS (keto-synthase) and C (condensation) domains (from PKS and NRPS, respectively) to build Hidden Markov Models (HMM) models. From both samples, a total of 84 KS and 46 C novel domain sequences were obtained, showing the potential of this environment for the discovery of new genes of biotechnological interest. These domains were classified by phylogenetic analysis and this was the first study conducted to screen PKS and NRPS genes in an upwelling affected sample
Collapse
Affiliation(s)
- Rafael R C Cuadrat
- Computational and Systems Biology Laboratory, Oswaldo Cruz Institute, Fiocruz, Avenida Brasil 4365, Rio de Janeiro CEP 21040-360, Brazil.
- Leibniz-Institute of Freshwater Ecology and Inland Fisheries, Alte Fischerhuette 2, OT Neuglobsow, Stechlin 16775, Germany.
- Berlin Center for Genomics in Biodiversity Research, Königin-Luise-Straße 6-8, Berlin 14195, Germany.
| | - Juliano C Cury
- Molecular Microbiology Laboratory, Federal University of São João del-Rei, Sete Lagoas Campus Rua Sétimo Moreira Martins 188, Itapoã II, Sete Lagoas CEP 35702-031, Brazil.
| | - Alberto M R Dávila
- Computational and Systems Biology Laboratory, Oswaldo Cruz Institute, Fiocruz, Avenida Brasil 4365, Rio de Janeiro CEP 21040-360, Brazil.
| |
Collapse
|
8
|
Cunha MLR, Meijers JCM, Middeldorp S. Introduction to the analysis of next generation sequencing data and its application to venous thromboembolism. Thromb Haemost 2015; 114:920-32. [PMID: 26446408 DOI: 10.1160/th15-05-0411] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2015] [Accepted: 08/26/2015] [Indexed: 12/13/2022]
Abstract
Despite knowledge of various inherited risk factors associated with venous thromboembolism (VTE), no definite cause can be found in about 50% of patients. The application of data-driven searches such as GWAS has not been able to identify genetic variants with implications for clinical care, and unexplained heritability remains. In the past years, the development of several so-called next generation sequencing (NGS) platforms is offering the possibility of generating fast, inexpensive and accurate genomic information. However, so far their application to VTE has been very limited. Here we review basic concepts of NGS data analysis and explore the application of NGS technology to VTE. We provide both computational and biological viewpoints to discuss potentials and challenges of NGS-based studies.
Collapse
Affiliation(s)
- Marisa L R Cunha
- Marisa L. R. Cunha, Department of Experimental Vascular Medicine, Academic Medical Center, Meibergdreef 9, 1105 AZ Amsterdam, The Netherlands, Tel.: +31 20 5662824, Fax: +31 20 6968833, E-mail:
| | | | | |
Collapse
|
9
|
Lai B, Wang F, Wang X, Duan L, Zhu H. InteMAP: Integrated metagenomic assembly pipeline for NGS short reads. BMC Bioinformatics 2015; 16:244. [PMID: 26250558 PMCID: PMC4545859 DOI: 10.1186/s12859-015-0686-x] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2014] [Accepted: 07/24/2015] [Indexed: 12/03/2022] Open
Abstract
Background Next-generation sequencing (NGS) has greatly facilitated metagenomic analysis but also raised new challenges for metagenomic DNA sequence assembly, owing to its high-throughput nature and extremely short reads generated by sequencers such as Illumina. To date, how to generate a high-quality draft assembly for metagenomic sequencing projects has not been fully addressed. Results We conducted a comprehensive assessment on state-of-the-art de novo assemblers and revealed that the performance of each assembler depends critically on the sequencing depth. To address this problem, we developed a pipeline named InteMAP to integrate three assemblers, ABySS, IDBA-UD and CABOG, which were found to complement each other in assembling metagenomic sequences. Making a decision of which assembling approaches to use according to the sequencing coverage estimation algorithm for each short read, the pipeline presents an automatic platform suitable to assemble real metagenomic NGS data with uneven coverage distribution of sequencing depth. By comparing the performance of InteMAP with current assemblers on both synthetic and real NGS metagenomic data, we demonstrated that InteMAP achieves better performance with a longer total contig length and higher contiguity, and contains more genes than others. Conclusions We developed a de novo pipeline, named InteMAP, that integrates existing tools for metagenomics assembly. The pipeline outperforms previous assembly methods on metagenomic assembly by providing a longer total contig length, a higher contiguity and covering more genes. InteMAP, therefore, could potentially be a useful tool for the research community of metagenomics. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0686-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Binbin Lai
- State Key Lab for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China. .,Center for Quantitative Biology, Peking University, Beijing, 100871, China.
| | - Fumeng Wang
- State Key Lab for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China.
| | - Xiaoqi Wang
- State Key Lab for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China.
| | - Liping Duan
- Department of Gastroenterology, Peking University Third Hospital, Beijing, 100191, China.
| | - Huaiqiu Zhu
- State Key Lab for Turbulence and Complex Systems and Department of Biomedical Engineering, College of Engineering, Peking University, Beijing, 100871, China. .,Center for Quantitative Biology, Peking University, Beijing, 100871, China.
| |
Collapse
|
10
|
Afiahayati, Sato K, Sakakibara Y. MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning. DNA Res 2014; 22:69-77. [PMID: 25431440 PMCID: PMC4379979 DOI: 10.1093/dnares/dsu041] [Citation(s) in RCA: 75] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The assembly of multiple genomes from mixed sequence reads is a bottleneck in metagenomic analysis. A single-genome assembly program (assembler) is not capable of resolving metagenome sequences, so assemblers designed specifically for metagenomics have been developed. MetaVelvet is an extension of the single-genome assembler Velvet. It has been proved to generate assemblies with higher N50 scores and higher quality than single-genome assemblers such as Velvet and SOAPdenovo when applied to metagenomic sequence reads and is frequently used in this research community. One important open problem for MetaVelvet is its low accuracy and sensitivity in detecting chimeric nodes in the assembly (de Bruijn) graph, which prevents the generation of longer contigs and scaffolds. We have tackled this problem of classifying chimeric nodes using supervised machine learning to significantly improve the performance of MetaVelvet and developed a new tool, called MetaVelvet-SL. A Support Vector Machine is used for learning the classification model based on 94 features extracted from candidate nodes. In extensive experiments, MetaVelvet-SL outperformed the original MetaVelvet and other state-of-the-art metagenomic assemblers, IDBA-UD, Ray Meta and Omega, to reconstruct accurate longer assemblies with higher N50 scores for both simulated data sets and real data sets of human gut microbial sequences.
Collapse
Affiliation(s)
- Afiahayati
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan
| | - Kengo Sato
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan
| | - Yasubumi Sakakibara
- Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama 223-8522, Japan
| |
Collapse
|
11
|
Koren S, Treangen TJ, Hill CM, Pop M, Phillippy AM. Automated ensemble assembly and validation of microbial genomes. BMC Bioinformatics 2014; 15:126. [PMID: 24884846 PMCID: PMC4030574 DOI: 10.1186/1471-2105-15-126] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2014] [Accepted: 04/24/2014] [Indexed: 11/12/2022] Open
Abstract
Background The continued democratization of DNA sequencing has sparked a new wave of development of genome assembly and assembly validation methods. As individual research labs, rather than centralized centers, begin to sequence the majority of new genomes, it is important to establish best practices for genome assembly. However, recent evaluations such as GAGE and the Assemblathon have concluded that there is no single best approach to genome assembly. Instead, it is preferable to generate multiple assemblies and validate them to determine which is most useful for the desired analysis; this is a labor-intensive process that is often impossible or unfeasible. Results To encourage best practices supported by the community, we present iMetAMOS, an automated ensemble assembly pipeline; iMetAMOS encapsulates the process of running, validating, and selecting a single assembly from multiple assemblies. iMetAMOS packages several leading open-source tools into a single binary that automates parameter selection and execution of multiple assemblers, scores the resulting assemblies based on multiple validation metrics, and annotates the assemblies for genes and contaminants. We demonstrate the utility of the ensemble process on 225 previously unassembled Mycobacterium tuberculosis genomes as well as a Rhodobacter sphaeroides benchmark dataset. On these real data, iMetAMOS reliably produces validated assemblies and identifies potential contamination without user intervention. In addition, intelligent parameter selection produces assemblies of R. sphaeroides comparable to or exceeding the quality of those from the GAGE-B evaluation, affecting the relative ranking of some assemblers. Conclusions Ensemble assembly with iMetAMOS provides users with multiple, validated assemblies for each genome. Although computationally limited to small or mid-sized genomes, this approach is the most effective and reproducible means for generating high-quality assemblies and enables users to select an assembly best tailored to their specific needs.
Collapse
Affiliation(s)
- Sergey Koren
- National Biodefense Analysis and Countermeasures Center, 110 Thomas Johnson Drive, Frederick, MD 21702, USA.
| | | | | | | | | |
Collapse
|