1
|
Meyer F, Robertson G, Deng ZL, Koslicki D, Gurevich A, McHardy AC. CAMI Benchmarking Portal: online evaluation and ranking of metagenomic software. Nucleic Acids Res 2025:gkaf369. [PMID: 40331433 DOI: 10.1093/nar/gkaf369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2025] [Revised: 04/17/2025] [Accepted: 04/23/2025] [Indexed: 05/08/2025] Open
Abstract
Finding appropriate software and parameter settings to process shotgun metagenome data is essential for meaningful metagenomic analyses. To enable objective and comprehensive benchmarking of metagenomic software, the community-led initiative for the Critical Assessment of Metagenome Interpretation (CAMI) promotes standards and best practices. Since 2015, CAMI has provided comprehensive datasets, benchmarking guidelines, and challenges. However, benchmarking had to be conducted offline, requiring substantial time and technical expertise and leading to gaps in results between challenges. We introduce the CAMI Benchmarking Portal-a central repository of CAMI resources and web server for the evaluation and ranking of metagenome assembly, binning, and taxonomic profiling software. The portal simplifies evaluation, enabling users to easily compare their results with previous and other users' submissions through a variety of metrics and visualizations. As a demonstration, we benchmark software performance on the marine dataset of the CAMI II challenge. The portal currently hosts 28 675 results and is freely available at https://cami-challenge.org/.
Collapse
Affiliation(s)
- Fernando Meyer
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), 38124 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, 38106 Braunschweig, Germany
- Initiative for the Critical Assessment of Metagenome Interpretation (CAMI )
| | - Gary Robertson
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), 38124 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, 38106 Braunschweig, Germany
- Initiative for the Critical Assessment of Metagenome Interpretation (CAMI )
| | - Zhi-Luo Deng
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), 38124 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, 38106 Braunschweig, Germany
- Initiative for the Critical Assessment of Metagenome Interpretation (CAMI )
| | - David Koslicki
- Initiative for the Critical Assessment of Metagenome Interpretation (CAMI )
- Computer Science and Engineering, Penn State University, University Park, PA 16802, United States
- Biology, Penn State University , University Park, PA 16802, United States
| | - Alexey Gurevich
- Initiative for the Critical Assessment of Metagenome Interpretation (CAMI )
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), 66123 Saarbrücken, Germany
- Center for Bioinformatics Saar and Saarland University, Saarland Informatics Campus, 66123 Saarbrücken, Germany
| | - Alice C McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), 38124 Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, 38106 Braunschweig, Germany
- Initiative for the Critical Assessment of Metagenome Interpretation (CAMI )
- German Center for Infection Research (DZIF), partner site Hannover Braunschweig, 38124 Braunschweig, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, 30625 Hannover, Germany
| |
Collapse
|
2
|
Fulke AB, Eranezhath S, Raut S, Jadhav HS. Recent toolset of metagenomics for taxonomical and functional annotation of marine associated viruses: A review. REGIONAL STUDIES IN MARINE SCIENCE 2024; 77:103728. [DOI: 10.1016/j.rsma.2024.103728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2025]
|
3
|
Ulrich JU, Renard BY. Fast and space-efficient taxonomic classification of long reads with hierarchical interleaved XOR filters. Genome Res 2024; 34:914-924. [PMID: 38886068 PMCID: PMC11293544 DOI: 10.1101/gr.278623.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 05/23/2024] [Indexed: 06/20/2024]
Abstract
Metagenomic long-read sequencing is gaining popularity for various applications, including pathogen detection and microbiome studies. To analyze the large data created in those studies, software tools need to taxonomically classify the sequenced molecules and estimate the relative abundances of organisms in the sequenced sample. Because of the exponential growth of reference genome databases, the current taxonomic classification methods have large computational requirements. This issue motivated us to develop a new data structure for fast and memory-efficient querying of long reads. Here, we present Taxor as a new tool for long-read metagenomic classification using a hierarchical interleaved XOR filter data structure for indexing and querying large reference genome sets. Taxor implements several k-mer-based approaches, such as syncmers, for pseudoalignment to classify reads and an expectation-maximization algorithm for metagenomic profiling. Our results show that Taxor outperforms state-of-the-art tools regarding precision while having a similar recall for long-read taxonomic classification. Most notably, Taxor reduces the memory requirements and index size by >50% and is among the fastest tools regarding query times. This enables real-time metagenomics analysis with large reference databases on a small laptop in the field.
Collapse
Affiliation(s)
- Jens-Uwe Ulrich
- Data Analytics and Computational Statistics, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany;
- Phylogenomics Unit, Center for Artificial Intelligence in Public Health Research, Robert Koch Institute, 15745 Wildau, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Bernhard Y Renard
- Data Analytics and Computational Statistics, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany;
| |
Collapse
|
4
|
Zachariasen T, Russel J, Petersen C, Vestergaard GA, Shah S, Atienza Lopez P, Passali M, Turvey SE, Sørensen SJ, Lund O, Stokholm J, Brejnrod A, Thorsen J. MAGinator enables accurate profiling of de novo MAGs with strain-level phylogenies. Nat Commun 2024; 15:5734. [PMID: 38977664 PMCID: PMC11231285 DOI: 10.1038/s41467-024-49958-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Accepted: 06/21/2024] [Indexed: 07/10/2024] Open
Abstract
Metagenomic sequencing has provided great advantages in the characterisation of microbiomes, but currently available analysis tools lack the ability to combine subspecies-level taxonomic resolution and accurate abundance estimation with functional profiling of assembled genomes. To define the microbiome and its associations with human health, improved tools are needed to enable comprehensive understanding of the microbial composition and elucidation of the phylogenetic and functional relationships between the microbes. Here, we present MAGinator, a freely available tool, tailored for profiling of shotgun metagenomics datasets. MAGinator provides de novo identification of subspecies-level microbes and accurate abundance estimates of metagenome-assembled genomes (MAGs). MAGinator utilises the information from both gene- and contig-based methods yielding insight into both taxonomic profiles and the origin of genes and genetic content, used for inference of functional content of each sample by host organism. Additionally, MAGinator facilitates the reconstruction of phylogenetic relationships between the MAGs, providing a framework to identify clade-level differences.
Collapse
Affiliation(s)
- Trine Zachariasen
- Department of Health and Technology, Section of Bioinformatics, Technical University of Denmark, Lyngby, Denmark.
| | - Jakob Russel
- Department of Biology, Section of Microbiology, University of Copenhagen, Copenhagen, Denmark
| | - Charisse Petersen
- Department of Pediatrics, BC Children's Hospital, University of British Columbia, 950 West 28th Avenue, Vancouver, BC, Canada
| | - Gisle A Vestergaard
- Department of Health and Technology, Section of Bioinformatics, Technical University of Denmark, Lyngby, Denmark
| | - Shiraz Shah
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Pablo Atienza Lopez
- Danish Multiple Sclerosis Center, Department of Neurology, Copenhagen University Hospital, Rigshospitalet-Glostrup, Glostrup, Denmark
- Department of Food Science, University of Copenhagen, Copenhagen, Denmark
| | - Moschoula Passali
- Danish Multiple Sclerosis Center, Department of Neurology, Copenhagen University Hospital, Rigshospitalet-Glostrup, Glostrup, Denmark
| | - Stuart E Turvey
- Department of Pediatrics, BC Children's Hospital, University of British Columbia, 950 West 28th Avenue, Vancouver, BC, Canada
| | - Søren J Sørensen
- Department of Biology, Section of Microbiology, University of Copenhagen, Copenhagen, Denmark
| | - Ole Lund
- Department of Health and Technology, Section of Bioinformatics, Technical University of Denmark, Lyngby, Denmark
| | - Jakob Stokholm
- Department of Biology, Section of Microbiology, University of Copenhagen, Copenhagen, Denmark
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
| | - Asker Brejnrod
- Department of Health and Technology, Section of Bioinformatics, Technical University of Denmark, Lyngby, Denmark
| | - Jonathan Thorsen
- COPSAC, Copenhagen Prospective Studies on Asthma in Childhood, Herlev and Gentofte Hospital, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
5
|
Pham DT, Phan V. MetaBIDx: a new computational approach to bacteria identification in microbiomes. MICROBIOME RESEARCH REPORTS 2024; 3:25. [PMID: 38841411 PMCID: PMC11149084 DOI: 10.20517/mrr.2024.01] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 03/04/2024] [Accepted: 03/25/2024] [Indexed: 06/07/2024]
Abstract
Objectives: This study introduces MetaBIDx, a computational method designed to enhance species prediction in metagenomic environments. The method addresses the challenge of accurate species identification in complex microbiomes, which is due to the large number of generated reads and the ever-expanding number of bacterial genomes. Bacterial identification is essential for disease diagnosis and tracing outbreaks associated with microbial infections. Methods: MetaBIDx utilizes a modified Bloom filter for efficient indexing of reference genomes and incorporates a novel strategy for reducing false positives by clustering species based on their genomic coverages by identified reads. The approach was evaluated and compared with several well-established tools across various datasets. Precision, recall, and F1-score were used to quantify the accuracy of species prediction. Results: MetaBIDx demonstrated superior performance compared to other tools, especially in terms of precision and F1-score. The application of clustering based on approximate coverages significantly improved precision in species identification, effectively minimizing false positives. We further demonstrated that other methods can also benefit from our approach to removing false positives by clustering species based on approximate coverages. Conclusion: With a novel approach to reducing false positives and the effective use of a modified Bloom filter to index species, MetaBIDx represents an advancement in metagenomic analysis. The findings suggest that the proposed approach could also benefit other metagenomic tools, indicating its potential for broader application in the field. The study lays the groundwork for future improvements in computational efficiency and the expansion of microbial databases.
Collapse
Affiliation(s)
| | - Vinhthuy Phan
- Department of Computer Science, University of Memphis, Memphis, TN 38152, USA
| |
Collapse
|
6
|
Koslicki D, White S, Ma C, Novikov A. YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample. Bioinformatics 2024; 40:btae047. [PMID: 38268451 PMCID: PMC10868342 DOI: 10.1093/bioinformatics/btae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Revised: 01/05/2024] [Accepted: 01/22/2024] [Indexed: 01/26/2024] Open
Abstract
MOTIVATION In metagenomics, the study of environmentally associated microbial communities from their sampled DNA, one of the most fundamental computational tasks is that of determining which genomes from a reference database are present or absent in a given sample metagenome. Existing tools generally return point estimates, with no associated confidence or uncertainty associated with it. This has led to practitioners experiencing difficulty when interpreting the results from these tools, particularly for low-abundance organisms as these often reside in the "noisy tail" of incorrect predictions. Furthermore, few tools account for the fact that reference databases are often incomplete and rarely, if ever, contain exact replicas of genomes present in an environmentally derived metagenome. RESULTS We present solutions for these issues by introducing the algorithm YACHT: Yes/No Answers to Community membership via Hypothesis Testing. This approach introduces a statistical framework that accounts for sequence divergence between the reference and sample genomes, in terms of ANI, as well as incomplete sequencing depth, thus providing a hypothesis test for determining the presence or absence of a reference genome in a sample. After introducing our approach, we quantify its statistical power and how this changes with varying parameters. Subsequently, we perform extensive experiments using both simulated and real data to confirm the accuracy and scalability of this approach. AVAILABILITY AND IMPLEMENTATION The source code implementing this approach is available via Conda and at https://github.com/KoslickiLab/YACHT. We also provide the code for reproducing experiments at https://github.com/KoslickiLab/YACHT-reproducibles.
Collapse
Affiliation(s)
- David Koslicki
- Department of Computer Science and Engineering, Pennsylvania State University, State College, PA 16802, United States
- Department of Biology, Pennsylvania State University, State College, PA 16802, United States
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, PA 16802, USA
- One Health Microbiome Center, Pennsylvania State University, State College, PA 16802, United States
| | - Stephen White
- Department of Mathematics, Pennsylvania State University, State College, PA 16802, United States
| | - Chunyu Ma
- Huck Institutes of the Life Sciences, Pennsylvania State University, State College, PA 16802, USA
| | - Alexei Novikov
- Department of Mathematics, Pennsylvania State University, State College, PA 16802, United States
| |
Collapse
|
7
|
Ibañez-Lligoña M, Colomer-Castell S, González-Sánchez A, Gregori J, Campos C, Garcia-Cehic D, Andrés C, Piñana M, Pumarola T, Rodríguez-Frias F, Antón A, Quer J. Bioinformatic Tools for NGS-Based Metagenomics to Improve the Clinical Diagnosis of Emerging, Re-Emerging and New Viruses. Viruses 2023; 15:587. [PMID: 36851800 PMCID: PMC9965957 DOI: 10.3390/v15020587] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 02/16/2023] [Accepted: 02/17/2023] [Indexed: 02/24/2023] Open
Abstract
Epidemics and pandemics have occurred since the beginning of time, resulting in millions of deaths. Many such disease outbreaks are caused by viruses. Some viruses, particularly RNA viruses, are characterized by their high genetic variability, and this can affect certain phenotypic features: tropism, antigenicity, and susceptibility to antiviral drugs, vaccines, and the host immune response. The best strategy to face the emergence of new infectious genomes is prompt identification. However, currently available diagnostic tests are often limited for detecting new agents. High-throughput next-generation sequencing technologies based on metagenomics may be the solution to detect new infectious genomes and properly diagnose certain diseases. Metagenomic techniques enable the identification and characterization of disease-causing agents, but they require a large amount of genetic material and involve complex bioinformatic analyses. A wide variety of analytical tools can be used in the quality control and pre-processing of metagenomic data, filtering of untargeted sequences, assembly and quality control of reads, and taxonomic profiling of sequences to identify new viruses and ones that have been sequenced and uploaded to dedicated databases. Although there have been huge advances in the field of metagenomics, there is still a lack of consensus about which of the various approaches should be used for specific data analysis tasks. In this review, we provide some background on the study of viral infections, describe the contribution of metagenomics to this field, and place special emphasis on the bioinformatic tools (with their capabilities and limitations) available for use in metagenomic analyses of viral pathogens.
Collapse
Affiliation(s)
- Marta Ibañez-Lligoña
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Sergi Colomer-Castell
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Alejandra González-Sánchez
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Josep Gregori
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Carolina Campos
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Damir Garcia-Cehic
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
| | - Cristina Andrés
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Maria Piñana
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
| | - Tomàs Pumarola
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Microbiology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Francisco Rodríguez-Frias
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Department of Basic Sciences, Universitat Internacional de Catalunya, Sant Cugat del Vallès, 08195 Barcelona, Spain
| | - Andrés Antón
- Microbiology Department, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Microbiology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| | - Josep Quer
- Liver Diseases-Viral Hepatitis, Liver Unit, Vall d’Hebron Institut de Recerca (VHIR), Vall d’Hebron Hospital Universitari, Vall d’Hebron Barcelona Hospital Campus, Passeig Vall d’Hebron 119-129, 08035 Barcelona, Spain
- Centro de Investigación Biomédica en Red de Enfermedades Hepáticas y Digestivas (CIBERehd), Instituto de Salud Carlos III, Av. Monforte de Lemos, 3-5, 28029 Madrid, Spain
- Biochemistry and Molecular Biology Department, Universitat Autònoma de Barcelona (UAB), Campus de la UAB, Plaça Cívica, 08193 Bellaterra, Spain
| |
Collapse
|
8
|
Shen W, Xiang H, Huang T, Tang H, Peng M, Cai D, Hu P, Ren H. KMCP: accurate metagenomic profiling of both prokaryotic and viral populations by pseudo-mapping. Bioinformatics 2023; 39:btac845. [PMID: 36579886 PMCID: PMC9828150 DOI: 10.1093/bioinformatics/btac845] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Revised: 12/17/2022] [Accepted: 12/28/2022] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION The growing number of microbial reference genomes enables the improvement of metagenomic profiling accuracy but also imposes greater requirements on the indexing efficiency, database size and runtime of taxonomic profilers. Additionally, most profilers focus mainly on bacterial, archaeal and fungal populations, while less attention is paid to viral communities. RESULTS We present KMCP (K-mer-based Metagenomic Classification and Profiling), a novel k-mer-based metagenomic profiling tool that utilizes genome coverage information by splitting the reference genomes into chunks and stores k-mers in a modified and optimized Compact Bit-Sliced Signature Index for fast alignment-free sequence searching. KMCP combines k-mer similarity and genome coverage information to reduce the false positive rate of k-mer-based taxonomic classification and profiling methods. Benchmarking results based on simulated and real data demonstrate that KMCP, despite a longer running time than all other methods, not only allows the accurate taxonomic profiling of prokaryotic and viral populations but also provides more confident pathogen detection in clinical samples of low depth. AVAILABILITY AND IMPLEMENTATION The software is open-source under the MIT license and available at https://github.com/shenwei356/kmcp. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wei Shen
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Department of Infectious Diseases, Institute for Viral Hepatitis, The Second Affiliated Hospital, Chongqing Medical University, Chongqing 400010, China
| | - Hongyan Xiang
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Department of Infectious Diseases, Institute for Viral Hepatitis, The Second Affiliated Hospital, Chongqing Medical University, Chongqing 400010, China
| | - Tianquan Huang
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Department of Infectious Diseases, Institute for Viral Hepatitis, The Second Affiliated Hospital, Chongqing Medical University, Chongqing 400010, China
| | - Hui Tang
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Department of Infectious Diseases, Institute for Viral Hepatitis, The Second Affiliated Hospital, Chongqing Medical University, Chongqing 400010, China
| | - Mingli Peng
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Department of Infectious Diseases, Institute for Viral Hepatitis, The Second Affiliated Hospital, Chongqing Medical University, Chongqing 400010, China
| | - Dachuan Cai
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Department of Infectious Diseases, Institute for Viral Hepatitis, The Second Affiliated Hospital, Chongqing Medical University, Chongqing 400010, China
| | - Peng Hu
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Department of Infectious Diseases, Institute for Viral Hepatitis, The Second Affiliated Hospital, Chongqing Medical University, Chongqing 400010, China
| | - Hong Ren
- Key Laboratory of Molecular Biology for Infectious Diseases (Ministry of Education), Department of Infectious Diseases, Institute for Viral Hepatitis, The Second Affiliated Hospital, Chongqing Medical University, Chongqing 400010, China
| |
Collapse
|
9
|
PathoLive—Real-Time Pathogen Identification from Metagenomic Illumina Datasets. Life (Basel) 2022; 12:life12091345. [PMID: 36143382 PMCID: PMC9505849 DOI: 10.3390/life12091345] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Revised: 08/24/2022] [Accepted: 08/24/2022] [Indexed: 11/18/2022] Open
Abstract
Over the past years, NGS has become a crucial workhorse for open-view pathogen diagnostics. Yet, long turnaround times result from using massively parallel high-throughput technologies as the analysis can only be performed after sequencing has finished. The interpretation of results can further be challenged by contaminations, clinically irrelevant sequences, and the sheer amount and complexity of the data. We implemented PathoLive, a real-time diagnostics pipeline for the detection of pathogens from clinical samples hours before sequencing has finished. Based on real-time alignment with HiLive2, mappings are scored with respect to common contaminations, low-entropy areas, and sequences of widespread, non-pathogenic organisms. The results are visualized using an interactive taxonomic tree that provides an easily interpretable overview of the relevance of hits. For a human plasma sample that was spiked in vitro with six pathogenic viruses, all agents were clearly detected after only 40 of 200 sequencing cycles. For a real-world sample from Sudan, the results correctly indicated the presence of Crimean-Congo hemorrhagic fever virus. In a second real-world dataset from the 2019 SARS-CoV-2 outbreak in Wuhan, we found the presence of a SARS coronavirus as the most relevant hit without the novel virus reference genome being included in the database. For all samples, clinically irrelevant hits were correctly de-emphasized. Our approach is valuable to obtain fast and accurate NGS-based pathogen identifications and correctly prioritize and visualize them based on their clinical significance: PathoLive is open source and available on GitLab and BioConda.
Collapse
|
10
|
Meyer F, Fritz A, Deng ZL, Koslicki D, Lesker TR, Gurevich A, Robertson G, Alser M, Antipov D, Beghini F, Bertrand D, Brito JJ, Brown CT, Buchmann J, Buluç A, Chen B, Chikhi R, Clausen PTLC, Cristian A, Dabrowski PW, Darling AE, Egan R, Eskin E, Georganas E, Goltsman E, Gray MA, Hansen LH, Hofmeyr S, Huang P, Irber L, Jia H, Jørgensen TS, Kieser SD, Klemetsen T, Kola A, Kolmogorov M, Korobeynikov A, Kwan J, LaPierre N, Lemaitre C, Li C, Limasset A, Malcher-Miranda F, Mangul S, Marcelino VR, Marchet C, Marijon P, Meleshko D, Mende DR, Milanese A, Nagarajan N, Nissen J, Nurk S, Oliker L, Paoli L, Peterlongo P, Piro VC, Porter JS, Rasmussen S, Rees ER, Reinert K, Renard B, Robertsen EM, Rosen GL, Ruscheweyh HJ, Sarwal V, Segata N, Seiler E, Shi L, Sun F, Sunagawa S, Sørensen SJ, Thomas A, Tong C, Trajkovski M, Tremblay J, Uritskiy G, Vicedomini R, Wang Z, Wang Z, Wang Z, Warren A, Willassen NP, Yelick K, You R, Zeller G, Zhao Z, Zhu S, Zhu J, Garrido-Oter R, Gastmeier P, Hacquard S, Häußler S, Khaledi A, Maechler F, Mesny F, Radutoiu S, Schulze-Lefert P, Smit N, Strowig T, et alMeyer F, Fritz A, Deng ZL, Koslicki D, Lesker TR, Gurevich A, Robertson G, Alser M, Antipov D, Beghini F, Bertrand D, Brito JJ, Brown CT, Buchmann J, Buluç A, Chen B, Chikhi R, Clausen PTLC, Cristian A, Dabrowski PW, Darling AE, Egan R, Eskin E, Georganas E, Goltsman E, Gray MA, Hansen LH, Hofmeyr S, Huang P, Irber L, Jia H, Jørgensen TS, Kieser SD, Klemetsen T, Kola A, Kolmogorov M, Korobeynikov A, Kwan J, LaPierre N, Lemaitre C, Li C, Limasset A, Malcher-Miranda F, Mangul S, Marcelino VR, Marchet C, Marijon P, Meleshko D, Mende DR, Milanese A, Nagarajan N, Nissen J, Nurk S, Oliker L, Paoli L, Peterlongo P, Piro VC, Porter JS, Rasmussen S, Rees ER, Reinert K, Renard B, Robertsen EM, Rosen GL, Ruscheweyh HJ, Sarwal V, Segata N, Seiler E, Shi L, Sun F, Sunagawa S, Sørensen SJ, Thomas A, Tong C, Trajkovski M, Tremblay J, Uritskiy G, Vicedomini R, Wang Z, Wang Z, Wang Z, Warren A, Willassen NP, Yelick K, You R, Zeller G, Zhao Z, Zhu S, Zhu J, Garrido-Oter R, Gastmeier P, Hacquard S, Häußler S, Khaledi A, Maechler F, Mesny F, Radutoiu S, Schulze-Lefert P, Smit N, Strowig T, Bremges A, Sczyrba A, McHardy AC. Critical Assessment of Metagenome Interpretation: the second round of challenges. Nat Methods 2022; 19:429-440. [PMID: 35396482 PMCID: PMC9007738 DOI: 10.1038/s41592-022-01431-4] [Show More Authors] [Citation(s) in RCA: 153] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2021] [Accepted: 02/14/2022] [Indexed: 12/20/2022]
Abstract
Evaluating metagenomic software is key for optimizing metagenome interpretation and focus of the Initiative for the Critical Assessment of Metagenome Interpretation (CAMI). The CAMI II challenge engaged the community to assess methods on realistic and complex datasets with long- and short-read sequences, created computationally from around 1,700 new and known genomes, as well as 600 new plasmids and viruses. Here we analyze 5,002 results by 76 program versions. Substantial improvements were seen in assembly, some due to long-read data. Related strains still were challenging for assembly and genome recovery through binning, as was assembly quality for the latter. Profilers markedly matured, with taxon profilers and binners excelling at higher bacterial ranks, but underperforming for viruses and Archaea. Clinical pathogen detection results revealed a need to improve reproducibility. Runtime and memory usage analyses identified efficient programs, including top performers with other metrics. The results identify challenges and guide researchers in selecting methods for analyses. This study presents the results of the second round of the Critical Assessment of Metagenome Interpretation challenges (CAMI II), which is a community-driven effort for comprehensively benchmarking tools for metagenomics data analysis.
Collapse
Affiliation(s)
- Fernando Meyer
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Adrian Fritz
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany
| | - Zhi-Luo Deng
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany.,Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany
| | | | - Till Robin Lesker
- German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany.,Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Gary Robertson
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany
| | - Mohammed Alser
- Department of Information Technology and Electrical Engineering, ETH Zürich, Zurich, Switzerland
| | - Dmitry Antipov
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | | | | | | | | | - Jan Buchmann
- Institute for Biological Data Science, Heinrich-Heine-University, Düsseldorf, Germany
| | - Aydin Buluç
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Bo Chen
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | | | - Philip T L C Clausen
- National Food Institute, Division of Global Surveillance, Technical University of Denmark, Lyngby, Denmark
| | - Alexandru Cristian
- Drexel University, Philadelphia, PA, USA.,Google Inc., Philadelphia, PA, USA
| | - Piotr Wojciech Dabrowski
- Robert Koch-Institut, Berlin, Germany.,Hochschule für Technik und Wirtschaft Berlin, Berlin, Germany
| | | | - Rob Egan
- DOE Joint Genome Institute, Berkeley, CA, USA.,Lawrence Berkeley National Laboratories, Berkeley, CA, USA
| | - Eleazar Eskin
- University of California, Los Angeles, Los Angeles, CA, USA
| | | | - Eugene Goltsman
- DOE Joint Genome Institute, Berkeley, CA, USA.,Lawrence Berkeley National Laboratories, Berkeley, CA, USA
| | - Melissa A Gray
- Drexel University, Philadelphia, PA, USA.,Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Philadelphia, PA, USA
| | - Lars Hestbjerg Hansen
- University of Copenhagen, Department of Plant and Environmental Science, Frederiksberg, Denmark
| | - Steven Hofmeyr
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Pingqin Huang
- School of Computer Science, Fudan University, Shanghai, China
| | - Luiz Irber
- University of California, Davis, Davis, CA, USA
| | - Huijue Jia
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Human Commensal Microorganisms and Health Research, BGI-Shenzhen, Shenzhen, China
| | - Tue Sparholt Jørgensen
- Technical University of Denmark, Novo Nordisk Foundation Center for Biosustainability, Lyngby, Denmark.,Aarhus University, Department of Environmental Science, Roskilde, Denmark
| | - Silas D Kieser
- Department of Cell Physiology and Metabolism, Faculty of Medicine, University of Geneva, Geneva, Switzerland.,Swiss Institute of Bioinformatics, Geneva, Switzerland
| | | | - Axel Kola
- Charité-Universitätsmedizin Berlin, Berlin, Germany
| | - Mikhail Kolmogorov
- Department of Computer Science and Engineering, University of California San Diego, San Diego, CA, USA
| | - Anton Korobeynikov
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia.,Department of Statistical Modelling, Saint Petersburg State University, Saint Petersburg, Russia
| | - Jason Kwan
- University of Wisconsin-Madison, Madison, WI, USA
| | | | | | - Chenhao Li
- Genome Institute of Singapore, Singapore, Singapore
| | | | - Fabio Malcher-Miranda
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | | | - Vanessa R Marcelino
- Sydney Medical School, The University of Sydney, Sydney, Australia.,Centre for Innate Immunity and Infectious Diseases, Hudson Institute of Medical Research, Clayton, Australia
| | | | - Pierre Marijon
- Department of Computer Science, Inria, University of Lille, CNRS, Lille, France
| | - Dmitry Meleshko
- Center for Algorithmic Biotechnology, Saint Petersburg State University, Saint Petersburg, Russia
| | - Daniel R Mende
- Amsterdam University Medical Center, Amsterdam, the Netherlands
| | - Alessio Milanese
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland.,Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | - Niranjan Nagarajan
- Genome Institute of Singapore, A*STAR, Singapore, Singapore.,National University of Singapore, Singapore, Singapore
| | | | - Sergey Nurk
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
| | - Leonid Oliker
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Lucas Paoli
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | | | - Vitor C Piro
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany
| | | | - Simon Rasmussen
- Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Evan R Rees
- University of Wisconsin-Madison, Madison, WI, USA
| | - Knut Reinert
- Institute for Bioinformatics, FU Berlin, Berlin, Germany
| | - Bernhard Renard
- Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, Potsdam, Germany.,Bioinformatics Unit (MF1), Robert Koch Institute, Berlin, Germany
| | | | - Gail L Rosen
- Drexel University, Philadelphia, PA, USA.,Ecological and Evolutionary Signal-Processing and Informatics Laboratory, Philadelphia, PA, USA.,Center for Biological Discovery from Big Data, Philadelphia, PA, USA
| | - Hans-Joachim Ruscheweyh
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | - Varuni Sarwal
- University of California, Los Angeles, Los Angeles, CA, USA
| | - Nicola Segata
- Department CIBIO, University of Trento, Trento, Italy
| | - Enrico Seiler
- Institute for Bioinformatics, FU Berlin, Berlin, Germany
| | - Lizhen Shi
- Florida Polytechnic University, Lakeland, FL, USA
| | - Fengzhu Sun
- Quantitative and Computational Biology Department, University of Southern California, Los Angeles, CA, USA
| | - Shinichi Sunagawa
- Department of Biology, Institute of Microbiology and Swiss Institute of Bioinformatics, ETH Zürich, Zürich, Switzerland
| | | | - Ashleigh Thomas
- DOE Joint Genome Institute, Berkeley, CA, USA.,University of British Columbia, Vancouver, British Columbia, Canada
| | | | - Mirko Trajkovski
- Department of Cell Physiology and Metabolism, Faculty of Medicine, University of Geneva, Geneva, Switzerland.,Diabetes Center, Faculty of Medicine, University of Geneva, Geneva, Switzerland
| | - Julien Tremblay
- Energy, Mining and Environment, National Research Council Canada, Montreal, Quebec, Canada
| | | | | | - Zhengyang Wang
- School of Computer Science, Fudan University, Shanghai, China
| | - Ziye Wang
- School of Mathematical Sciences, Fudan University, Shanghai, China
| | - Zhong Wang
- Department of Energy Joint Genome Institute, Berkeley, CA, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,School of Natural Sciences, University of California at Merced, Merced, CA, USA
| | | | | | - Katherine Yelick
- Lawrence Berkeley National Laboratory, Berkeley, CA, USA.,University of California, Berkeley, Berkeley, CA, USA
| | - Ronghui You
- School of Computer Science, Fudan University, Shanghai, China
| | - Georg Zeller
- Structural and Computational Biology Unit, EMBL, Heidelberg, Germany
| | | | - Shanfeng Zhu
- Institute of Science and Technology for Brain-Inspired Intelligence, Fudan University, Shanghai, China.,Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence (Fudan University), Ministry of Education, Shanghai, China
| | - Jie Zhu
- BGI-Shenzhen, Shenzhen, China.,Shenzhen Key Laboratory of Human Commensal Microorganisms and Health Research, BGI-Shenzhen, Shenzhen, China
| | | | | | | | - Susanne Häußler
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Ariane Khaledi
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | | | - Fantin Mesny
- Max Planck Institute for Plant Breeding Research, Köln, Germany
| | | | | | - Nathiana Smit
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Till Strowig
- Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Andreas Bremges
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany.,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany
| | - Alexander Sczyrba
- Center for Biotechnology (CeBiTec), Bielefeld University, Bielefeld, Germany
| | - Alice Carolyn McHardy
- Computational Biology of Infection Research, Helmholtz Centre for Infection Research, Braunschweig, Germany. .,Braunschweig Integrated Centre of Systems Biology (BRICS), Technische Universität Braunschweig, Braunschweig, Germany. .,German Center for Infection Research (DZIF), Hannover-Braunschweig Site, Braunschweig, Germany. .,Cluster of Excellence RESIST (EXC 2155), Hannover Medical School, Hannover, Germany.
| |
Collapse
|
11
|
Music of metagenomics-a review of its applications, analysis pipeline, and associated tools. Funct Integr Genomics 2021; 22:3-26. [PMID: 34657989 DOI: 10.1007/s10142-021-00810-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2021] [Revised: 09/25/2021] [Accepted: 10/03/2021] [Indexed: 10/20/2022]
Abstract
This humble effort highlights the intricate details of metagenomics in a simple, poetic, and rhythmic way. The paper enforces the significance of the research area, provides details about major analytical methods, examines the taxonomy and assembly of genomes, emphasizes some tools, and concludes by celebrating the richness of the ecosystem populated by the "metagenome."
Collapse
|
12
|
Mäklin T, Kallonen T, David S, Boinett CJ, Pascoe B, Méric G, Aanensen DM, Feil EJ, Baker S, Parkhill J, Sheppard SK, Corander J, Honkela A. High-resolution sweep metagenomics using fast probabilistic inference. Wellcome Open Res 2021; 5:14. [PMID: 34746439 PMCID: PMC8543175 DOI: 10.12688/wellcomeopenres.15639.2] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/30/2021] [Indexed: 01/13/2023] Open
Abstract
Determining the composition of bacterial communities beyond the level of a genus or species is challenging because of the considerable overlap between genomes representing close relatives. Here, we present the mSWEEP pipeline for identifying and estimating the relative sequence abundances of bacterial lineages from plate sweeps of enrichment cultures. mSWEEP leverages biologically grouped sequence assembly databases, applying probabilistic modelling, and provides controls for false positive results. Using sequencing data from major pathogens, we demonstrate significant improvements in lineage quantification and detection accuracy. Our pipeline facilitates investigating cultures comprising mixtures of bacteria, and opens up a new field of plate sweep metagenomics.
Collapse
Affiliation(s)
- Tommi Mäklin
- Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
| | - Teemu Kallonen
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Sophia David
- Centre for Genomic Pathogen Surveillance, Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Christine J. Boinett
- Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
- Centre for Tropical Medicine and Global Health, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK
| | - Ben Pascoe
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, UK
| | - Guillaume Méric
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, UK
| | - David M. Aanensen
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK
- Department of Infectious Disease Epidemiology, Imperial College London, London, UK
- Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Edward J. Feil
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, UK
| | - Stephen Baker
- Hospital for Tropical Diseases, Wellcome Trust Major Overseas Programme, Oxford University Clinical Research Unit, Ho Chi Minh City, Vietnam
- Centre for Tropical Medicine and Global Health, Nuffield Department of Clinical Medicine, University of Oxford, Oxford, UK
| | - Julian Parkhill
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Samuel K. Sheppard
- The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath, Bath, UK
| | - Jukka Corander
- Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
- Department of Biostatistics, University of Oslo, Oslo, Norway
- Wellcome Trust Sanger Institute, Hinxton, Cambridgeshire, UK
| | - Antti Honkela
- Helsinki Institute for Information Technology HIIT, Department of Mathematics and Statistics, University of Helsinki, Helsinki, Finland
- Department of Public Health, University of Helsinki, Helsinki, Finland
- Helsinki Institute for Information Technology HIIT, Department of Computer Science, University of Helsinki, Helsinki, Finland
| |
Collapse
|
13
|
Accessing Dietary Effects on the Rumen Microbiome: Different Sequencing Methods Tell Different Stories. Vet Sci 2021; 8:vetsci8070138. [PMID: 34357930 PMCID: PMC8310016 DOI: 10.3390/vetsci8070138] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2021] [Revised: 07/02/2021] [Accepted: 07/14/2021] [Indexed: 12/29/2022] Open
Abstract
The current study employed both amplicon and shotgun sequencing to examine and compare the rumen microbiome in Angus bulls fed with either a backgrounding diet (BCK) or finishing diet (HG), to assess if both methods produce comparable results. Rumen digesta samples from 16 bulls were subjected for microbial profiling. Distinctive microbial profiles were revealed by the two methods, indicating that choice of sequencing approach may be a critical facet in studies of the rumen microbiome. Shotgun-sequencing identified the presence of 303 bacterial genera and 171 archaeal species, several of which exhibited differential abundance. Amplicon-sequencing identified 48 bacterial genera, 4 archaeal species, and 9 protozoal species. Among them, 20 bacterial genera and 5 protozoal species were differentially abundant between the two diets. Overall, amplicon-sequencing showed a more drastic diet-derived effect on the ruminal microbial profile compared to shotgun-sequencing. While both methods detected dietary differences at various taxonomic levels, few consistent patterns were evident. Opposite results were seen for the phyla Firmicutes and Bacteroidetes, and the genus Selenomonas. This study showcases the importance of sequencing platform choice and suggests a need for integrative methods that allow robust comparisons of microbial data drawn from various omic approaches, allowing for comprehensive comparisons across studies.
Collapse
|
14
|
Zeng T, Yu X, Chen Z. Applying artificial intelligence in the microbiome for gastrointestinal diseases: A review. J Gastroenterol Hepatol 2021; 36:832-840. [PMID: 33880762 DOI: 10.1111/jgh.15503] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/09/2021] [Revised: 03/18/2021] [Accepted: 03/18/2021] [Indexed: 12/20/2022]
Abstract
For a long time, gut bacteria have been recognized for their important roles in the occurrence and progression of gastrointestinal diseases like colorectal cancer, and the ever-increasing amounts of microbiome data combined with other high-quality clinical and imaging datasets are leading the study of gastrointestinal diseases into an era of biomedical big data. The "omics" technologies used for microbiome analysis continuously evolve, and the machine learning or artificial intelligence technologies are key to extract the relevant information from microbiome data. This review intends to provide a focused summary of recent research and applications of microbiome big data and to discuss the use of artificial intelligence to combat gastrointestinal diseases.
Collapse
Affiliation(s)
- Tao Zeng
- CAS Key Laboratory of Computational Biology, Bio-Med Big Data Center, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, Shanghai, China
- Key Laboratory of Systems Biology, Shanghai Institute of Biochemistry and Cell Biology, Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, Shanghai, China
| | - Xiangtian Yu
- Clinical Reasearch Center, Shanghai Jiao Tong University Affiliated Sixth People's Hospital, Shanghai, China
| | - Zhangran Chen
- Institute for Microbial Ecology, School of Medicine, Xiamen University, Xiamen, China
| |
Collapse
|
15
|
Bharti R, Grimm DG. Current challenges and best-practice protocols for microbiome analysis. Brief Bioinform 2021; 22:178-193. [PMID: 31848574 PMCID: PMC7820839 DOI: 10.1093/bib/bbz155] [Citation(s) in RCA: 284] [Impact Index Per Article: 71.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2019] [Revised: 10/23/2019] [Accepted: 11/06/2019] [Indexed: 12/15/2022] Open
Abstract
Analyzing the microbiome of diverse species and environments using next-generation sequencing techniques has significantly enhanced our understanding on metabolic, physiological and ecological roles of environmental microorganisms. However, the analysis of the microbiome is affected by experimental conditions (e.g. sequencing errors and genomic repeats) and computationally intensive and cumbersome downstream analysis (e.g. quality control, assembly, binning and statistical analyses). Moreover, the introduction of new sequencing technologies and protocols led to a flood of new methodologies, which also have an immediate effect on the results of the analyses. The aim of this work is to review the most important workflows for 16S rRNA sequencing and shotgun and long-read metagenomics, as well as to provide best-practice protocols on experimental design, sample processing, sequencing, assembly, binning, annotation and visualization. To simplify and standardize the computational analysis, we provide a set of best-practice workflows for 16S rRNA and metagenomic sequencing data (available at https://github.com/grimmlab/MicrobiomeBestPracticeReview).
Collapse
Affiliation(s)
- Richa Bharti
- Weihenstephan-Triesdorf University of Applied Sciences and Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Straubing, Germany
| | - Dominik G Grimm
- Weihenstephan-Triesdorf University of Applied Sciences and Technical University of Munich, TUM Campus Straubing for Biotechnology and Sustainability, Straubing, Germany
| |
Collapse
|
16
|
Tran Q, Phan V. Assembling Reads Improves Taxonomic Classification of Species. Genes (Basel) 2020; 11:E946. [PMID: 32824429 PMCID: PMC7465921 DOI: 10.3390/genes11080946] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 08/11/2020] [Accepted: 08/13/2020] [Indexed: 11/22/2022] Open
Abstract
Most current approach to metagenomic classification employ short next generation sequencing (NGS) reads that are present in metagenomic samples to identify unique genomic regions. NGS reads, however, might not be long enough to differentiate similar genomes. This suggests a potential for using longer reads to improve classification performance. Presently, longer reads tend to have a higher rate of sequencing errors. Thus, given the pros and cons, it remains unclear which types of reads is better for metagenomic classification. We compared two taxonomic classification protocols: a traditional assembly-free protocol and a novel assembly-based protocol. The novel assembly-based protocol consists of assembling short-reads into longer reads, which will be subsequently classified by a traditional taxonomic classifier. We discovered that most classifiers made fewer predictions with longer reads and that they achieved higher classification performance on synthetic metagenomic data. Generally, we observed a significant increase in precision, while having similar recall rates. On real data, we observed similar characteristics that suggest that the classifiers might have similar performance of higher precision with similar recall with longer reads. We have shown a noticeable difference in performance between assembly-based and assembly-free taxonomic classification. This finding strongly suggests that classifying species in metagenomic environments can be achieved with higher overall performance simply by assembling short reads. Further, it also suggests that long-read technologies might be better for species classification.
Collapse
Affiliation(s)
- Quang Tran
- Department of Computer Science, University of Memphis, Memphis, TN 38152, USA;
| | | |
Collapse
|
17
|
Sim M, Lee J, Lee D, Kwon D, Kim J. TAMA: improved metagenomic sequence classification through meta-analysis. BMC Bioinformatics 2020; 21:185. [PMID: 32397982 PMCID: PMC7218625 DOI: 10.1186/s12859-020-3533-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Accepted: 05/05/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Microorganisms are important occupants of many different environments. Identifying the composition of microbes and estimating their abundance promote understanding of interactions of microbes in environmental samples. To understand their environments more deeply, the composition of microorganisms in environmental samples has been studied using metagenomes, which are the collections of genomes of the microorganisms. Although many tools have been developed for taxonomy analysis based on different algorithms, variability of analysis outputs of existing tools from the same input metagenome datasets is the main obstacle for many researchers in this field. RESULTS Here, we present a novel meta-analysis tool for metagenome taxonomy analysis, called TAMA, by intelligently integrating outputs from three different taxonomy analysis tools. Using an integrated reference database, TAMA performs taxonomy assignment for input metagenome reads based on a meta-score by integrating scores of taxonomy assignment from different taxonomy classification tools. TAMA outperformed existing tools when evaluated using various benchmark datasets. It was also successfully applied to obtain relative species abundance profiles and difference in composition of microorganisms in two types of cheese metagenome and human gut metagenome. CONCLUSION TAMA can be easily installed and used for metagenome read classification and the prediction of relative species abundance from multiple numbers and types of metagenome read samples. TAMA can be used to more accurately uncover the composition of microorganisms in metagenome samples collected from various environments, especially when the use of a single taxonomy analysis tool is unreliable. TAMA is an open source tool, and can be downloaded at https://github.com/jkimlab/TAMA.
Collapse
Affiliation(s)
- Mikang Sim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Jongin Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Daehwan Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Daehong Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea.
| |
Collapse
|
18
|
Shi W, Qi H, Sun Q, Fan G, Liu S, Wang J, Zhu B, Liu H, Zhao F, Wang X, Hu X, Li W, Liu J, Tian Y, Wu L, Ma J. gcMeta: a Global Catalogue of Metagenomics platform to support the archiving, standardization and analysis of microbiome data. Nucleic Acids Res 2020; 47:D637-D648. [PMID: 30365027 PMCID: PMC6324004 DOI: 10.1093/nar/gky1008] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 10/13/2018] [Indexed: 11/26/2022] Open
Abstract
Meta-omics approaches have been increasingly used to study the structure and function of the microbial communities. A variety of large-scale collaborative projects are being conducted to encompass samples from diverse environments and habitats. This change has resulted in enormous demands for long-term data maintenance and capacity for data analysis. The Global Catalogue of Metagenomics (gcMeta) is a part of the ‘Chinese Academy of Sciences Initiative of Microbiome (CAS-CMI)’, which focuses on studying the human and environmental microbiome, establishing depositories of samples, strains and data, as well as promoting international collaboration. To accommodate and rationally organize massive datasets derived from several thousands of human and environmental microbiome samples, gcMeta features a database management system for archiving and publishing data in a standardized way. Another main feature is the integration of more than ninety web-based data analysis tools and workflows through a Docker platform which enables data analysis by using various operating systems. This platform has been rapidly expanding, and now hosts data from the CAS-CMI and a number of other ongoing research projects. In conclusion, this platform presents a powerful and user-friendly service to support worldwide collaborative efforts in the field of meta-omics research. This platform is freely accessible at https://gcmeta.wdcm.org/.
Collapse
Affiliation(s)
- Wenyu Shi
- Microbial Resource and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Heyuan Qi
- Microbial Resource and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Qinglan Sun
- Microbial Resource and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Guomei Fan
- Microbial Resource and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Shuangjiang Liu
- Microbial Resource and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China.,State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Jun Wang
- CAS Key Laboratory of Pathogenic Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Science, Beijing 100101, China
| | - Baoli Zhu
- CAS Key Laboratory of Pathogenic Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Science, Beijing 100101, China.,University of Chinese Academy of Sciences, Beijing 100049, China.,Collaborative Innovation Centre for Diagnosis and Treatment of Infectious Diseases First Attainted Hospital, College of Medicine, Zhejiang University, Hangzhou 310058, China.,Beijing Key Laboratory of Antimicrobial Resistance and Pathogen Genomics, Beijing 100101, China
| | - Hongwei Liu
- State Key Laboratory of Mycology, Institute of Microbiology, Chinese Academy of Science, Beijing 100101, China
| | - Fangqing Zhao
- Computational Genomics Lab, Beijing Institutes of Life Science, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiaochen Wang
- Microbial Resource and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Xiaoxuan Hu
- Microbial Resource and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Wei Li
- Microbial Resource and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Jia Liu
- Internet of Things Information Technology and Application Laboratory, Computer Network Information Center, Chinese Academy of Sciences. Beijing 100101, China
| | - Ye Tian
- Internet of Things Information Technology and Application Laboratory, Computer Network Information Center, Chinese Academy of Sciences. Beijing 100101, China
| | - Linhuan Wu
- Microbial Resource and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China.,State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| | - Juncai Ma
- Microbial Resource and Big Data Center, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China.,State Key Laboratory of Microbial Resources, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China
| |
Collapse
|
19
|
Chan AWY, Naphtali J, Schellhorn HE. High-throughput DNA sequencing technologies for water and wastewater analysis. Sci Prog 2019; 102:351-376. [PMID: 31818206 PMCID: PMC10424514 DOI: 10.1177/0036850419881855] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Conventional microbiological water monitoring uses culture-dependent techniques to screen indicator microbial species such as Escherichia coli and fecal coliforms. With high-throughput, second-generation sequencing technologies becoming less expensive, water quality monitoring programs can now leverage the massively parallel nature of second-generation sequencing technologies for batch sample processing to simultaneously obtain compositional and functional information of culturable and as yet uncultured microbial organisms. This review provides an introduction to the technical capabilities and considerations necessary for the use of second-generation sequencing technologies, specifically 16S rDNA amplicon and whole-metagenome sequencing, to investigate the composition and functional potential of microbiomes found in water and wastewater systems.
Collapse
Affiliation(s)
| | - James Naphtali
- Department of Biology, McMaster University, Hamilton, ON, Canada
| | | |
Collapse
|
20
|
Dadi TH, Siragusa E, Piro VC, Andrusch A, Seiler E, Renard BY, Reinert K. DREAM-Yara: an exact read mapper for very large databases with short update time. Bioinformatics 2019; 34:i766-i772. [PMID: 30423080 DOI: 10.1093/bioinformatics/bty567] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
Motivation Mapping-based approaches have become limited in their application to very large sets of references since computing an FM-index for very large databases (e.g. >10 GB) has become a bottleneck. This affects many analyses that need such index as an essential step for approximate matching of the NGS reads to reference databases. For instance, in typical metagenomics analysis, the size of the reference sequences has become prohibitive to compute a single full-text index on standard machines. Even on large memory machines, computing such index takes about 1 day of computing time. As a result, updates of indices are rarely performed. Hence, it is desirable to create an alternative way of indexing while preserving fast search times. Results To solve the index construction and update problem we propose the DREAM (Dynamic seaRchablE pArallel coMpressed index) framework and provide an implementation. The main contributions are the introduction of an approximate search distributor via a novel use of Bloom filters. We combine several Bloom filters to form an interleaved Bloom filter and use this new data structure to quickly exclude reads for parts of the databases where they cannot match. This allows us to keep the databases in several indices which can be easily rebuilt if parts are updated while maintaining a fast search time. The second main contribution is an implementation of DREAM-Yara a distributed version of a fully sensitive read mapper under the DREAM framework. Availability and implementation https://gitlab.com/pirovc/dream_yara/.
Collapse
Affiliation(s)
| | - Enrico Siragusa
- Computational Genomics, IBM Thomas J Watson Research Center, Yorktown Heights, NY, USA
| | - Vitor C Piro
- Bioinformatics Unit (MF1), Robert Koch Institute, Berlin, Germany.,CAPES Foundation, Ministry of Education of Brazil, Brasília DF, Brazil
| | - Andreas Andrusch
- Centre for Biological Threats and Special Pathogens (ZBS1), Robert Koch Institute, Berlin, Germany
| | - Enrico Seiler
- Algorithmic Bioinformatics, Institute for Bioinformatics, FU Berlin, Berlin, Germany
| | | | - Knut Reinert
- Algorithmic Bioinformatics, Institute for Bioinformatics, FU Berlin, Berlin, Germany
| |
Collapse
|
21
|
Breitwieser FP, Lu J, Salzberg SL. A review of methods and databases for metagenomic classification and assembly. Brief Bioinform 2019; 20:1125-1136. [PMID: 29028872 PMCID: PMC6781581 DOI: 10.1093/bib/bbx120] [Citation(s) in RCA: 294] [Impact Index Per Article: 49.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2017] [Revised: 08/22/2017] [Indexed: 12/13/2022] Open
Abstract
Microbiome research has grown rapidly over the past decade, with a proliferation of new methods that seek to make sense of large, complex data sets. Here, we survey two of the primary types of methods for analyzing microbiome data: read classification and metagenomic assembly, and we review some of the challenges facing these methods. All of the methods rely on public genome databases, and we also discuss the content of these databases and how their quality has a direct impact on our ability to interpret a microbiome sample.
Collapse
Affiliation(s)
| | | | - Steven L Salzberg
- Corresponding author: Steven L. Salzberg, Center for Computational Biology, Johns Hopkins University, 1900 E. Monument St., Baltimore, MD, 21205, USA. E-mail:
| |
Collapse
|
22
|
Seiler E, Trappe K, Renard BY. Where did you come from, where did you go: Refining metagenomic analysis tools for horizontal gene transfer characterisation. PLoS Comput Biol 2019; 15:e1007208. [PMID: 31335917 PMCID: PMC6677323 DOI: 10.1371/journal.pcbi.1007208] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 08/02/2019] [Accepted: 06/24/2019] [Indexed: 12/22/2022] Open
Abstract
Horizontal gene transfer (HGT) has changed the way we regard evolution. Instead of waiting for the next generation to establish new traits, especially bacteria are able to take a shortcut via HGT that enables them to pass on genes from one individual to another, even across species boundaries. The tool Daisy offers the first HGT detection approach based on read mapping that provides complementary evidence compared to existing methods. However, Daisy relies on the acceptor and donor organism involved in the HGT being known. We introduce DaisyGPS, a mapping-based pipeline that is able to identify acceptor and donor reference candidates of an HGT event based on sequencing reads. Acceptor and donor identification is akin to species identification in metagenomic samples based on sequencing reads, a problem addressed by metagenomic profiling tools. However, acceptor and donor references have certain properties such that these methods cannot be directly applied. DaisyGPS uses MicrobeGPS, a metagenomic profiling tool tailored towards estimating the genomic distance between organisms in the sample and the reference database. We enhance the underlying scoring system of MicrobeGPS to account for the sequence patterns in terms of mapping coverage of an acceptor and donor involved in an HGT event, and report a ranked list of reference candidates. These candidates can then be further evaluated by tools like Daisy to establish HGT regions. We successfully validated our approach on both simulated and real data, and show its benefits in an investigation of an outbreak involving Methicillin-resistant Staphylococcus aureus data.
Collapse
Affiliation(s)
- Enrico Seiler
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
- Efficient Algorithms for Omics Data, Max Planck Institute for Molecular Genetics, and Algorithmic Bioinformatics, Institute for Bioinformatics, Freie Universität Berlin, Berlin, Germany
| | - Kathrin Trappe
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| | - Bernhard Y. Renard
- Bioinformatics Unit (MF1), Department for Methods Development and Research Infrastructure, Robert Koch Institute, Berlin, Germany
| |
Collapse
|
23
|
Meyer F, Bremges A, Belmann P, Janssen S, McHardy AC, Koslicki D. Assessing taxonomic metagenome profilers with OPAL. Genome Biol 2019; 20:51. [PMID: 30832730 PMCID: PMC6398228 DOI: 10.1186/s13059-019-1646-y] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2018] [Accepted: 01/31/2019] [Indexed: 12/25/2022] Open
Abstract
The explosive growth in taxonomic metagenome profiling methods over the past years has created a need for systematic comparisons using relevant performance criteria. The Open-community Profiling Assessment tooL (OPAL) implements commonly used performance metrics, including those of the first challenge of the initiative for the Critical Assessment of Metagenome Interpretation (CAMI), together with convenient visualizations. In addition, we perform in-depth performance comparisons with seven profilers on datasets of CAMI and the Human Microbiome Project. OPAL is freely available at https://github.com/CAMI-challenge/OPAL .
Collapse
Affiliation(s)
- Fernando Meyer
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Andreas Bremges
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
- German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Braunschweig, Germany
| | - Peter Belmann
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
- Faculty of Technology and Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Stefan Janssen
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Braunschweig, Germany
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Department of Pediatric Oncology, Hematology and Clinical Immunology, Heinrich-Heine University Dusseldorf, Dusseldorf, Germany
| | - Alice C McHardy
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany.
| | - David Koslicki
- Mathematics Department, Oregon State University, Corvallis, OR, USA.
| |
Collapse
|
24
|
Nasko DJ, Koren S, Phillippy AM, Treangen TJ. RefSeq database growth influences the accuracy of k-mer-based lowest common ancestor species identification. Genome Biol 2018; 19:165. [PMID: 30373669 PMCID: PMC6206640 DOI: 10.1186/s13059-018-1554-6] [Citation(s) in RCA: 91] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2018] [Accepted: 10/01/2018] [Indexed: 12/05/2022] Open
Abstract
In order to determine the role of the database in taxonomic sequence classification, we examine the influence of the database over time on k-mer-based lowest common ancestor taxonomic classification. We present three major findings: the number of new species added to the NCBI RefSeq database greatly outpaces the number of new genera; as a result, more reads are classified with newer database versions, but fewer are classified at the species level; and Bayesian-based re-estimation mitigates this effect but struggles with novel genomes. These results suggest a need for new classification approaches specially adapted for large databases.
Collapse
Affiliation(s)
- Daniel J Nasko
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
| | - Sergey Koren
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Adam M Phillippy
- Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, Bethesda, MD, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
25
|
Nooij S, Schmitz D, Vennema H, Kroneman A, Koopmans MPG. Overview of Virus Metagenomic Classification Methods and Their Biological Applications. Front Microbiol 2018; 9:749. [PMID: 29740407 PMCID: PMC5924777 DOI: 10.3389/fmicb.2018.00749] [Citation(s) in RCA: 83] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 04/03/2018] [Indexed: 12/20/2022] Open
Abstract
Metagenomics poses opportunities for clinical and public health virology applications by offering a way to assess complete taxonomic composition of a clinical sample in an unbiased way. However, the techniques required are complicated and analysis standards have yet to develop. This, together with the wealth of different tools and workflows that have been proposed, poses a barrier for new users. We evaluated 49 published computational classification workflows for virus metagenomics in a literature review. To this end, we described the methods of existing workflows by breaking them up into five general steps and assessed their ease-of-use and validation experiments. Performance scores of previous benchmarks were summarized and correlations between methods and performance were investigated. We indicate the potential suitability of the different workflows for (1) time-constrained diagnostics, (2) surveillance and outbreak source tracing, (3) detection of remote homologies (discovery), and (4) biodiversity studies. We provide two decision trees for virologists to help select a workflow for medical or biodiversity studies, as well as directions for future developments in clinical viral metagenomics.
Collapse
Affiliation(s)
- Sam Nooij
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| | - Dennis Schmitz
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| | - Harry Vennema
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Annelies Kroneman
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands
| | - Marion P G Koopmans
- Emerging and Endemic Viruses, Centre for Infectious Disease Control, National Institute for Public Health and the Environment (RIVM), Bilthoven, Netherlands.,Viroscience Laboratory, Erasmus University Medical Centre, Rotterdam, Netherlands
| |
Collapse
|
26
|
Neves ALA, Li F, Ghoshal B, McAllister T, Guan LL. Enhancing the Resolution of Rumen Microbial Classification from Metatranscriptomic Data Using Kraken and Mothur. Front Microbiol 2017; 8:2445. [PMID: 29270165 PMCID: PMC5725470 DOI: 10.3389/fmicb.2017.02445] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2017] [Accepted: 11/24/2017] [Indexed: 12/23/2022] Open
Abstract
The advent of next generation sequencing and bioinformatics tools have greatly advanced our knowledge about the phylogenetic diversity and ecological role of microbes inhabiting the mammalian gut. However, there is a lack of information on the evaluation of these computational tools in the context of the rumen microbiome as these programs have mostly been benchmarked on real or simulated datasets generated from human studies. In this study, we compared the outcomes of two methods, Kraken (mRNA based) and a pipeline developed in-house based on Mothur (16S rRNA based), to assess the taxonomic profiles (bacteria and archaea) of rumen microbial communities using total RNA sequencing of rumen fluid collected from 12 cattle with differing feed conversion ratios (FCR). Both approaches revealed a similar phyla distribution of the most abundant taxa, with Bacteroidetes, Firmicutes, and Proteobacteria accounting for approximately 80% of total bacterial abundance. For bacterial taxa, although 69 genera were commonly detected by both methods, an additional 159 genera were exclusively identified by Kraken. Kraken detected 423 species, while Mothur was not able to assign bacterial sequences to the species level. For archaea, both methods generated similar results only for the abundance of Methanomassiliicoccaceae (previously referred as RCC), which comprised more than 65% of the total archaeal families. Taxon R4-41B was exclusively identified by Mothur in the rumen of feed efficient bulls, whereas Kraken uniquely identified Methanococcaceae in inefficient bulls. Although Kraken enhanced the microbial classification at the species level, identification of bacteria or archaea in the rumen is limited due to a lack of reference genomes for the rumen microbiome. The findings from this study suggest that the development of the combined pipelines using Mothur and Kraken is needed for a more inclusive and representative classification of microbiomes.
Collapse
Affiliation(s)
- Andre L A Neves
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
| | - Fuyong Li
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
| | - Bibaswan Ghoshal
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
| | - Tim McAllister
- Lethbridge Research Centre, Agriculture and Agri-Food Canada, Lethbridge, AB, Canada
| | - Le L Guan
- Department of Agricultural, Food and Nutritional Science, University of Alberta, Edmonton, AB, Canada
| |
Collapse
|
27
|
Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Dröge J, Gregor I, Majda S, Fiedler J, Dahms E, Bremges A, Fritz A, Garrido-Oter R, Jørgensen TS, Shapiro N, Blood PD, Gurevich A, Bai Y, Turaev D, DeMaere MZ, Chikhi R, Nagarajan N, Quince C, Meyer F, Balvočiūtė M, Hansen LH, Sørensen SJ, Chia BKH, Denis B, Froula JL, Wang Z, Egan R, Don Kang D, Cook JJ, Deltel C, Beckstette M, Lemaitre C, Peterlongo P, Rizk G, Lavenier D, Wu YW, Singer SW, Jain C, Strous M, Klingenberg H, Meinicke P, Barton MD, Lingner T, Lin HH, Liao YC, Silva GGZ, Cuevas DA, Edwards RA, Saha S, Piro VC, Renard BY, Pop M, Klenk HP, Göker M, Kyrpides NC, Woyke T, Vorholt JA, Schulze-Lefert P, Rubin EM, Darling AE, Rattei T, McHardy AC. Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software. Nat Methods 2017; 14:1063-1071. [PMID: 28967888 DOI: 10.1101/099127] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Accepted: 08/25/2017] [Indexed: 05/25/2023]
Abstract
Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
Collapse
Affiliation(s)
- Alexander Sczyrba
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
- Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Peter Hofmann
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Peter Belmann
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
- Center for Biotechnology, Bielefeld University, Bielefeld, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - David Koslicki
- Mathematics Department, Oregon State University, Corvallis, Oregon, USA
| | - Stefan Janssen
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Department of Pediatrics, University of California, San Diego, California, USA
- Department of Computer Science and Engineering, University of California, San Diego, California, USA
| | - Johannes Dröge
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Ivan Gregor
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Stephan Majda
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
| | - Jessika Fiedler
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
| | - Eik Dahms
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Andreas Bremges
- Faculty of Technology, Bielefeld University, Bielefeld, Germany
- Center for Biotechnology, Bielefeld University, Bielefeld, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
- German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Braunschweig, Germany
| | - Adrian Fritz
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Ruben Garrido-Oter
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
- Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS)
| | - Tue Sparholt Jørgensen
- Department of Environmental Science, Section of Environmental microbiology and Biotechnology, Aarhus University, Roskilde, Denmark
- Department of Microbiology, University of Copenhagen, Copenhagen, Denmark
- Department of Science and Environment, Roskilde University, Roskilde, Denmark
| | - Nicole Shapiro
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Philip D Blood
- Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Alexey Gurevich
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Yang Bai
- Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Dmitrij Turaev
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - Matthew Z DeMaere
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Rayan Chikhi
- Department of Computer Science, Research Center in Computer Science (CRIStAL), Signal and Automatic Control of Lille, Lille, France
- National Centre of the Scientific Research (CNRS), Rennes, France
| | - Niranjan Nagarajan
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Christopher Quince
- Department of Microbiology and Infection, Warwick Medical School, University of Warwick, Coventry, UK
| | - Fernando Meyer
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Monika Balvočiūtė
- Department of Computer Science, University of Tuebingen, Tuebingen, Germany
| | - Lars Hestbjerg Hansen
- Department of Environmental Science, Section of Environmental microbiology and Biotechnology, Aarhus University, Roskilde, Denmark
| | - Søren J Sørensen
- Department of Microbiology, University of Copenhagen, Copenhagen, Denmark
| | - Burton K H Chia
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Bertrand Denis
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Jeff L Froula
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Zhong Wang
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Robert Egan
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Dongwan Don Kang
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Charles Deltel
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Michael Beckstette
- Department of Molecular Infection Biology, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Claire Lemaitre
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Pierre Peterlongo
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Guillaume Rizk
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
- Algorizk-IT consulting and software systems, Paris, France
| | - Dominique Lavenier
- National Centre of the Scientific Research (CNRS), Rennes, France
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Yu-Wei Wu
- Joint BioEnergy Institute, Emeryville, California, USA
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Steven W Singer
- Joint BioEnergy Institute, Emeryville, California, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Chirag Jain
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Marc Strous
- Energy Engineering and Geomicrobiology, University of Calgary, Calgary, Alberta, Canada
| | - Heiner Klingenberg
- Department of Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
| | - Peter Meinicke
- Department of Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
| | - Michael D Barton
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Hsin-Hung Lin
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan Town, Taiwan
| | - Yu-Chieh Liao
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan Town, Taiwan
| | | | - Daniel A Cuevas
- Computational Science Research Center, San Diego State University, San Diego, California, USA
| | - Robert A Edwards
- Computational Science Research Center, San Diego State University, San Diego, California, USA
| | - Surya Saha
- Boyce Thompson Institute for Plant Research, New York, New York, USA
| | - Vitor C Piro
- Research Group Bioinformatics (NG4), Robert Koch Institute, Berlin, Germany
- Coordination for the Improvement of Higher Education Personnel (CAPES) Foundation, Ministry of Education of Brazil, Brasília, Brazil
| | - Bernhard Y Renard
- Research Group Bioinformatics (NG4), Robert Koch Institute, Berlin, Germany
| | - Mihai Pop
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA
- Department of Computer Science, University of Maryland, College Park, Maryland, USA
| | - Hans-Peter Klenk
- School of Biology, Newcastle University, Newcastle upon Tyne, UK
| | - Markus Göker
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Nikos C Kyrpides
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Tanja Woyke
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Paul Schulze-Lefert
- Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS)
| | - Edward M Rubin
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Thomas Rattei
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - Alice C McHardy
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
- Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
- Cluster of Excellence on Plant Sciences (CEPLAS)
| |
Collapse
|
28
|
Beisser D, Graupner N, Grossmann L, Timm H, Boenigk J, Rahmann S. TaxMapper: an analysis tool, reference database and workflow for metatranscriptome analysis of eukaryotic microorganisms. BMC Genomics 2017; 18:787. [PMID: 29037173 PMCID: PMC5644092 DOI: 10.1186/s12864-017-4168-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 10/05/2017] [Indexed: 12/17/2022] Open
Abstract
Background High-throughput sequencing (HTS) technologies are increasingly applied to analyse complex microbial ecosystems by mRNA sequencing of whole communities, also known as metatranscriptome sequencing. This approach is at the moment largely limited to prokaryotic communities and communities of few eukaryotic species with sequenced genomes. For eukaryotes the analysis is hindered mainly by a low and fragmented coverage of the reference databases to infer the community composition, but also by lack of automated workflows for the task. Results From the databases of the National Center for Biotechnology Information and Marine Microbial Eukaryote Transcriptome Sequencing Project, 142 references were selected in such a way that the taxa represent the main lineages within each of the seven supergroups of eukaryotes and possess predominantly complete transcriptomes or genomes. From these references, we created an annotated microeukaryotic reference database. We developed a tool called TaxMapper for a reliably mapping of sequencing reads against this database and filtering of unreliable assignments. For filtering, a classifier was trained and tested on each of the following: sequences of taxa in the database, sequences of taxa related to those in the database, and random sequences. Additionally, TaxMapper is part of a metatranscriptomic Snakemake workflow developed to perform quality assessment, functional and taxonomic annotation and (multivariate) statistical analysis including environmental data. The workflow is provided and described in detail to empower researchers to apply it for metatranscriptome analysis of any environmental sample. Conclusions TaxMapper shows superior performance compared to standard approaches, resulting in a higher number of true positive taxonomic assignments. Both the TaxMapper tool and the workflow are available as open-source code at Bitbucket under the MIT license: https://bitbucket.org/dbeisser/taxmapperand as a Bioconda package: https://bioconda.github.io/recipes/taxmapper/README.html. Electronic supplementary material The online version of this article (doi:10.1186/s12864-017-4168-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Daniela Beisser
- Biodiversity, University of Duisburg-Essen, Universitätsstr. 5, Essen, 45141, Germany.
| | - Nadine Graupner
- Biodiversity, University of Duisburg-Essen, Universitätsstr. 5, Essen, 45141, Germany
| | - Lars Grossmann
- Biodiversity, University of Duisburg-Essen, Universitätsstr. 5, Essen, 45141, Germany
| | - Henning Timm
- Genome Informatics, University of Duisburg-Essen, University Hospital Essen, Hufelandstr. 55, Essen, 45147, Germany
| | - Jens Boenigk
- Biodiversity, University of Duisburg-Essen, Universitätsstr. 5, Essen, 45141, Germany
| | - Sven Rahmann
- Genome Informatics, University of Duisburg-Essen, University Hospital Essen, Hufelandstr. 55, Essen, 45147, Germany
| |
Collapse
|
29
|
Sczyrba A, Hofmann P, Belmann P, Koslicki D, Janssen S, Dröge J, Gregor I, Majda S, Fiedler J, Dahms E, Bremges A, Fritz A, Garrido-Oter R, Jørgensen TS, Shapiro N, Blood PD, Gurevich A, Bai Y, Turaev D, DeMaere MZ, Chikhi R, Nagarajan N, Quince C, Meyer F, Balvočiūtė M, Hansen LH, Sørensen SJ, Chia BKH, Denis B, Froula JL, Wang Z, Egan R, Don Kang D, Cook JJ, Deltel C, Beckstette M, Lemaitre C, Peterlongo P, Rizk G, Lavenier D, Wu YW, Singer SW, Jain C, Strous M, Klingenberg H, Meinicke P, Barton MD, Lingner T, Lin HH, Liao YC, Silva GGZ, Cuevas DA, Edwards RA, Saha S, Piro VC, Renard BY, Pop M, Klenk HP, Göker M, Kyrpides NC, Woyke T, Vorholt JA, Schulze-Lefert P, Rubin EM, Darling AE, Rattei T, McHardy AC. Critical Assessment of Metagenome Interpretation-a benchmark of metagenomics software. Nat Methods 2017; 14:1063-1071. [PMID: 28967888 DOI: 10.1038/nmeth.4458] [Citation(s) in RCA: 487] [Impact Index Per Article: 60.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2016] [Accepted: 08/25/2017] [Indexed: 12/12/2022]
Abstract
Methods for assembly, taxonomic profiling and binning are key to interpreting metagenome data, but a lack of consensus about benchmarking complicates performance assessment. The Critical Assessment of Metagenome Interpretation (CAMI) challenge has engaged the global developer community to benchmark their programs on highly complex and realistic data sets, generated from ∼700 newly sequenced microorganisms and ∼600 novel viruses and plasmids and representing common experimental setups. Assembly and genome binning programs performed well for species represented by individual genomes but were substantially affected by the presence of related strains. Taxonomic profiling and binning programs were proficient at high taxonomic ranks, with a notable performance decrease below family level. Parameter settings markedly affected performance, underscoring their importance for program reproducibility. The CAMI results highlight current challenges but also provide a roadmap for software selection to answer specific research questions.
Collapse
Affiliation(s)
- Alexander Sczyrba
- Faculty of Technology, Bielefeld University, Bielefeld, Germany.,Center for Biotechnology, Bielefeld University, Bielefeld, Germany
| | - Peter Hofmann
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Peter Belmann
- Faculty of Technology, Bielefeld University, Bielefeld, Germany.,Center for Biotechnology, Bielefeld University, Bielefeld, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - David Koslicki
- Mathematics Department, Oregon State University, Corvallis, Oregon, USA
| | - Stefan Janssen
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Department of Pediatrics, University of California, San Diego, California, USA.,Department of Computer Science and Engineering, University of California, San Diego, California, USA
| | - Johannes Dröge
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Ivan Gregor
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Stephan Majda
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany
| | - Jessika Fiedler
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany
| | - Eik Dahms
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Andreas Bremges
- Faculty of Technology, Bielefeld University, Bielefeld, Germany.,Center for Biotechnology, Bielefeld University, Bielefeld, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany.,German Center for Infection Research (DZIF), partner site Hannover-Braunschweig, Braunschweig, Germany
| | - Adrian Fritz
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Ruben Garrido-Oter
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany.,Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany.,Cluster of Excellence on Plant Sciences (CEPLAS)
| | - Tue Sparholt Jørgensen
- Department of Environmental Science, Section of Environmental microbiology and Biotechnology, Aarhus University, Roskilde, Denmark.,Department of Microbiology, University of Copenhagen, Copenhagen, Denmark.,Department of Science and Environment, Roskilde University, Roskilde, Denmark
| | - Nicole Shapiro
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Philip D Blood
- Pittsburgh Supercomputing Center, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| | - Alexey Gurevich
- Center for Algorithmic Biotechnology, Institute of Translational Biomedicine, Saint Petersburg State University, Saint Petersburg, Russia
| | - Yang Bai
- Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany
| | - Dmitrij Turaev
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - Matthew Z DeMaere
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Rayan Chikhi
- Department of Computer Science, Research Center in Computer Science (CRIStAL), Signal and Automatic Control of Lille, Lille, France.,National Centre of the Scientific Research (CNRS), Rennes, France
| | - Niranjan Nagarajan
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Christopher Quince
- Department of Microbiology and Infection, Warwick Medical School, University of Warwick, Coventry, UK
| | - Fernando Meyer
- Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany
| | - Monika Balvočiūtė
- Department of Computer Science, University of Tuebingen, Tuebingen, Germany
| | - Lars Hestbjerg Hansen
- Department of Environmental Science, Section of Environmental microbiology and Biotechnology, Aarhus University, Roskilde, Denmark
| | - Søren J Sørensen
- Department of Microbiology, University of Copenhagen, Copenhagen, Denmark
| | - Burton K H Chia
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Bertrand Denis
- Department of Computational and Systems Biology, Genome Institute of Singapore, Singapore
| | - Jeff L Froula
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Zhong Wang
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Robert Egan
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Dongwan Don Kang
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Charles Deltel
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France.,Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Michael Beckstette
- Department of Molecular Infection Biology, Helmholtz Centre for Infection Research, Braunschweig, Germany
| | - Claire Lemaitre
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France.,Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Pierre Peterlongo
- GenScale-Bioinformatics Research Team, Inria Rennes-Bretagne Atlantique Research Centre, Rennes, France.,Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Guillaume Rizk
- Institute of Research in Informatics and Random Systems (IRISA), Rennes, France.,Algorizk-IT consulting and software systems, Paris, France
| | - Dominique Lavenier
- National Centre of the Scientific Research (CNRS), Rennes, France.,Institute of Research in Informatics and Random Systems (IRISA), Rennes, France
| | - Yu-Wei Wu
- Joint BioEnergy Institute, Emeryville, California, USA.,Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei, Taiwan
| | - Steven W Singer
- Joint BioEnergy Institute, Emeryville, California, USA.,Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, California, USA
| | - Chirag Jain
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Marc Strous
- Energy Engineering and Geomicrobiology, University of Calgary, Calgary, Alberta, Canada
| | - Heiner Klingenberg
- Department of Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
| | - Peter Meinicke
- Department of Bioinformatics, Institute for Microbiology and Genetics, University of Goettingen, Goettingen, Germany
| | - Michael D Barton
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Hsin-Hung Lin
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan Town, Taiwan
| | - Yu-Chieh Liao
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan Town, Taiwan
| | | | - Daniel A Cuevas
- Computational Science Research Center, San Diego State University, San Diego, California, USA
| | - Robert A Edwards
- Computational Science Research Center, San Diego State University, San Diego, California, USA
| | - Surya Saha
- Boyce Thompson Institute for Plant Research, New York, New York, USA
| | - Vitor C Piro
- Research Group Bioinformatics (NG4), Robert Koch Institute, Berlin, Germany.,Coordination for the Improvement of Higher Education Personnel (CAPES) Foundation, Ministry of Education of Brazil, Brasília, Brazil
| | - Bernhard Y Renard
- Research Group Bioinformatics (NG4), Robert Koch Institute, Berlin, Germany
| | - Mihai Pop
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, USA.,Department of Computer Science, University of Maryland, College Park, Maryland, USA
| | - Hans-Peter Klenk
- School of Biology, Newcastle University, Newcastle upon Tyne, UK
| | - Markus Göker
- Leibniz Institute DSMZ-German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany
| | - Nikos C Kyrpides
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Tanja Woyke
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | | | - Paul Schulze-Lefert
- Department of Plant Microbe Interactions, Max Planck Institute for Plant Breeding Research, Cologne, Germany.,Cluster of Excellence on Plant Sciences (CEPLAS)
| | - Edward M Rubin
- Department of Energy, Joint Genome Institute, Walnut Creek, California, USA
| | - Aaron E Darling
- The ithree institute, University of Technology Sydney, Sydney, New South Wales, Australia
| | - Thomas Rattei
- Department of Microbiology and Ecosystem Science, University of Vienna, Vienna, Austria
| | - Alice C McHardy
- Formerly Department of Algorithmic Bioinformatics, Heinrich Heine University (HHU), Duesseldorf, Germany.,Department of Computational Biology of Infection Research, Helmholtz Centre for Infection Research (HZI), Braunschweig, Germany.,Braunschweig Integrated Centre of Systems Biology (BRICS), Braunschweig, Germany.,Cluster of Excellence on Plant Sciences (CEPLAS)
| |
Collapse
|
30
|
Abstract
Microbiome analysis involves determining the composition and function of a community of microorganisms in a particular location. For the gastroenterologist, this technology opens up a rapidly evolving set of challenges and opportunities for generating novel insights into the health of patients on the basis of microbiota characterizations from intestinal, hepatic or extraintestinal samples. Alterations in gut microbiota composition correlate with intestinal and extraintestinal disease and, although only a few mechanisms are known, the microbiota are still an attractive target for developing biomarkers for disease detection and management as well as potential therapeutic applications. In this Review, we summarize the major decision points confronting new entrants to the field or for those designing new projects in microbiome research. We provide recommendations based on current technology options and our experience of sequencing platform choices. We also offer perspectives on future applications of microbiome research, which we hope convey the promise of this technology for clinical applications.
Collapse
|
31
|
Olson ND, Zook JM, Morrow JB, Lin NJ. Challenging a bioinformatic tool's ability to detect microbial contaminants using in silico whole genome sequencing data. PeerJ 2017; 5:e3729. [PMID: 28924496 PMCID: PMC5600177 DOI: 10.7717/peerj.3729] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Accepted: 08/02/2017] [Indexed: 12/20/2022] Open
Abstract
High sensitivity methods such as next generation sequencing and polymerase chain reaction (PCR) are adversely impacted by organismal and DNA contaminants. Current methods for detecting contaminants in microbial materials (genomic DNA and cultures) are not sensitive enough and require either a known or culturable contaminant. Whole genome sequencing (WGS) is a promising approach for detecting contaminants due to its sensitivity and lack of need for a priori assumptions about the contaminant. Prior to applying WGS, we must first understand its limitations for detecting contaminants and potential for false positives. Herein we demonstrate and characterize a WGS-based approach to detect organismal contaminants using an existing metagenomic taxonomic classification algorithm. Simulated WGS datasets from ten genera as individuals and binary mixtures of eight organisms at varying ratios were analyzed to evaluate the role of contaminant concentration and taxonomy on detection. For the individual genomes the false positive contaminants reported depended on the genus, with Staphylococcus, Escherichia, and Shigella having the highest proportion of false positives. For nearly all binary mixtures the contaminant was detected in the in-silico datasets at the equivalent of 1 in 1,000 cells, though F. tularensis was not detected in any of the simulated contaminant mixtures and Y. pestis was only detected at the equivalent of one in 10 cells. Once a WGS method for detecting contaminants is characterized, it can be applied to evaluate microbial material purity, in efforts to ensure that contaminants are characterized in microbial materials used to validate pathogen detection assays, generate genome assemblies for database submission, and benchmark sequencing methods.
Collapse
Affiliation(s)
- Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Jayne B Morrow
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| | - Nancy J Lin
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, United States of America
| |
Collapse
|
32
|
Piro VC, Matschkowski M, Renard BY. MetaMeta: integrating metagenome analysis tools to improve taxonomic profiling. MICROBIOME 2017; 5:101. [PMID: 28807044 PMCID: PMC5557516 DOI: 10.1186/s40168-017-0318-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/22/2016] [Accepted: 07/25/2017] [Indexed: 05/11/2023]
Abstract
BACKGROUND Many metagenome analysis tools are presently available to classify sequences and profile environmental samples. In particular, taxonomic profiling and binning methods are commonly used for such tasks. Tools available among these two categories make use of several techniques, e.g., read mapping, k-mer alignment, and composition analysis. Variations on the construction of the corresponding reference sequence databases are also common. In addition, different tools provide good results in different datasets and configurations. All this variation creates a complicated scenario to researchers to decide which methods to use. Installation, configuration and execution can also be difficult especially when dealing with multiple datasets and tools. RESULTS We propose MetaMeta: a pipeline to execute and integrate results from metagenome analysis tools. MetaMeta provides an easy workflow to run multiple tools with multiple samples, producing a single enhanced output profile for each sample. MetaMeta includes a database generation, pre-processing, execution, and integration steps, allowing easy execution and parallelization. The integration relies on the co-occurrence of organisms from different methods as the main feature to improve community profiling while accounting for differences in their databases. CONCLUSIONS In a controlled case with simulated and real data, we show that the integrated profiles of MetaMeta overcome the best single profile. Using the same input data, it provides more sensitive and reliable results with the presence of each organism being supported by several methods. MetaMeta uses Snakemake and has six pre-configured tools, all available at BioConda channel for easy installation (conda install -c bioconda metameta). The MetaMeta pipeline is open-source and can be downloaded at: https://gitlab.com/rki_bioinformatics .
Collapse
Affiliation(s)
- Vitor C. Piro
- Research Group Bioinformatics (NG4), Robert Koch Institute, Nordufer 20, Berlin, 13353 Germany
- CAPES Foundation, Ministry of Education of Brazil, Brasília, 70040-020 DF Brazil
| | - Marcel Matschkowski
- Research Group Bioinformatics (NG4), Robert Koch Institute, Nordufer 20, Berlin, 13353 Germany
| | - Bernhard Y. Renard
- Research Group Bioinformatics (NG4), Robert Koch Institute, Nordufer 20, Berlin, 13353 Germany
| |
Collapse
|
33
|
Dadi TH, Renard BY, Wieler LH, Semmler T, Reinert K. SLIMM: species level identification of microorganisms from metagenomes. PeerJ 2017; 5:e3138. [PMID: 28367376 PMCID: PMC5372838 DOI: 10.7717/peerj.3138] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2016] [Accepted: 03/02/2017] [Indexed: 12/21/2022] Open
Abstract
Identification and quantification of microorganisms is a significant step in studying the alpha and beta diversities within and between microbial communities respectively. Both identification and quantification of a given microbial community can be carried out using whole genome shotgun sequences with less bias than when using 16S-rDNA sequences. However, shared regions of DNA among reference genomes and taxonomic units pose a significant challenge in assigning reads correctly to their true origins. The existing microbial community profiling tools commonly deal with this problem by either preparing signature-based unique references or assigning an ambiguous read to its least common ancestor in a taxonomic tree. The former method is limited to making use of the reads which can be mapped to the curated regions, while the latter suffer from the lack of uniquely mapped reads at lower (more specific) taxonomic ranks. Moreover, even if the tools exhibited good performance in calling the organisms present in a sample, there is still room for improvement in determining the correct relative abundance of the organisms. We present a new method Species Level Identification of Microorganisms from Metagenomes (SLIMM) which addresses the above issues by using coverage information of reference genomes to remove unlikely genomes from the analysis and subsequently gain more uniquely mapped reads to assign at lower ranks of a taxonomic tree. SLIMM is based on a few, seemingly easy steps which when combined create a tool that outperforms state-of-the-art tools in run-time and memory usage while being on par or better in computing quantitative and qualitative information at species-level.
Collapse
Affiliation(s)
- Temesgen Hailemariam Dadi
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany; International Max Planck Research School for Computational Biology and Scientific Computing (IMPRS-CBSC), Berlin, Germany; Department of Veterinary Medicine, Freie Universität Berlin, Berlin, Germany
| | | | | | - Torsten Semmler
- Department of Veterinary Medicine, Freie Universität Berlin, Berlin, Germany; Robert Koch Institute, Berlin, Germany
| | - Knut Reinert
- Department of Mathematics and Computer Science, Freie Universität Berlin, Berlin, Germany; Max Planck Institute for Molecular Genetics, Berlin, Germany
| |
Collapse
|
34
|
DeCastro ME, Rodríguez-Belmonte E, González-Siso MI. Metagenomics of Thermophiles with a Focus on Discovery of Novel Thermozymes. Front Microbiol 2016; 7:1521. [PMID: 27729905 PMCID: PMC5037290 DOI: 10.3389/fmicb.2016.01521] [Citation(s) in RCA: 72] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2016] [Accepted: 09/12/2016] [Indexed: 11/24/2022] Open
Abstract
Microbial populations living in environments with temperatures above 50°C (thermophiles) have been widely studied, increasing our knowledge in the composition and function of these ecological communities. Since these populations express a broad number of heat-resistant enzymes (thermozymes), they also represent an important source for novel biocatalysts that can be potentially used in industrial processes. The integrated study of the whole-community DNA from an environment, known as metagenomics, coupled with the development of next generation sequencing (NGS) technologies, has allowed the generation of large amounts of data from thermophiles. In this review, we summarize the main approaches commonly utilized for assessing the taxonomic and functional diversity of thermophiles through metagenomics, including several bioinformatics tools and some metagenome-derived methods to isolate their thermozymes.
Collapse
Affiliation(s)
- María-Eugenia DeCastro
- Grupo EXPRELA, Centro de Investigacións Científicas Avanzadas (CICA), Departamento de Bioloxía Celular e Molecular, Facultade de Ciencias, Universidade da Coruña A Coruña, Spain
| | - Esther Rodríguez-Belmonte
- Grupo EXPRELA, Centro de Investigacións Científicas Avanzadas (CICA), Departamento de Bioloxía Celular e Molecular, Facultade de Ciencias, Universidade da Coruña A Coruña, Spain
| | - María-Isabel González-Siso
- Grupo EXPRELA, Centro de Investigacións Científicas Avanzadas (CICA), Departamento de Bioloxía Celular e Molecular, Facultade de Ciencias, Universidade da Coruña A Coruña, Spain
| |
Collapse
|