1
|
Bayer PE, Bennett A, Nester G, Corrigan S, Raes EJ, Cooper M, Ayad ME, McVey P, Kardailsky A, Pearce J, Fraser MW, Goncalves P, Burnell S, Rauschert S. A Comprehensive Evaluation of Taxonomic Classifiers in Marine Vertebrate eDNA Studies. Mol Ecol Resour 2025:e14107. [PMID: 40243260 DOI: 10.1111/1755-0998.14107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 12/05/2024] [Accepted: 03/11/2025] [Indexed: 04/18/2025]
Abstract
Environmental DNA (eDNA) metabarcoding is a widely used tool for surveying marine vertebrate biodiversity. To this end, many computational tools have been released and a plethora of bioinformatic approaches are used for eDNA-based community composition analysis. Simulation studies and careful evaluation of taxonomic classifiers are essential to establish reliable benchmarks to improve the accuracy and reproducibility of eDNA-based findings. Here we present a comprehensive evaluation of nine taxonomic classifiers exploring three widely used mitochondrial markers (12S rDNA, 16S rDNA and COI) in Australian marine vertebrates. Curated reference databases and exclusion database tests were used to simulate diverse species compositions, including three positive control and two negative control datasets. Using these simulated datasets ranging from 36 to 302 marker genes, we were able to identify between 19% and 89% of marine vertebrate species using mitochondrial markers. We show that MMSeqs2 and Metabuli generally outperform BLAST with 10% and 11% higher F1 scores for 12S and 16S rDNA markers, respectively, and that Naive Bayes Classifiers such as Mothur outperform sequence-based classifiers except MMSeqs2 for COI markers by 11%. Database exclusion tests reveal that MMSeqs2 and BLAST are less susceptible to false positives compared to Kraken2 with default parameters. Based on these findings, we recommend that MMSeqs2 is used for taxonomic classification of marine vertebrates given its ability to improve species-level assignments while reducing the number of false positives. Our work contributes to the establishment of best practices in eDNA-based biodiversity analysis to ultimately increase the reliability of this monitoring tool in the context of marine vertebrate conservation.
Collapse
Affiliation(s)
- Philipp E Bayer
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Adam Bennett
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Georgia Nester
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
- Minderoo-UWA Deep-Sea Research Centre, School of Biological Sciences and Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Shannon Corrigan
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Eric J Raes
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Madalyn Cooper
- Minderoo Foundation, Perth, Western Australia, Australia
| | - Marcelle E Ayad
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Philip McVey
- Minderoo Foundation, Perth, Western Australia, Australia
| | - Anya Kardailsky
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Jessica Pearce
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Matthew W Fraser
- Minderoo Foundation, Perth, Western Australia, Australia
- School of Biological Sciences, The University of Western Australia, Crawley, Western Australia, Australia
| | - Priscila Goncalves
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Stephen Burnell
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Sebastian Rauschert
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| |
Collapse
|
2
|
Li Z, Zhao W, Jiang Y, Wen Y, Li M, Liu L, Zou K. New insights into biologic interpretation of bioinformatic pipelines for fish eDNA metabarcoding: A case study in Pearl River estuary. JOURNAL OF ENVIRONMENTAL MANAGEMENT 2024; 368:122136. [PMID: 39128344 DOI: 10.1016/j.jenvman.2024.122136] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Revised: 05/31/2024] [Accepted: 08/06/2024] [Indexed: 08/13/2024]
Abstract
Environmental DNA (eDNA) metabarcoding is an emerging tool for monitoring biological communities in aquatic ecosystems. The selection of bioinformatic pipelines significantly impacts the results of biodiversity assessments. However, there is currently no consensus on the appropriate bioinformatic pipelines for fish community analysis in eDNA metabarcoding. In this study, we compared three bioinformatic pipelines (Uparse, DADA2, and UNOISE3) using real and mock (constructed with 15/30 known fish) communities to investigate the differences in biological interpretation during the data analysis process in eDNA metabarcoding. Performance evaluation and diversity analyses revealed that the choice of bioinformatic pipeline could impact the biological results of metabarcoding experiments. Among the three pipelines, the operational taxonomic units (OTU)-based pipeline (Uparse) showed the best performance (sensitivity: 0.6250 ± 0.0166; compositional similarity: 0.4000 ± 0.0571), the highest richness (25-102) and minimal inter-group differences in alpha diversity. It suggested the OTU-based pipeline possessed superior capability in fish diversity monitoring compared to ASV/ZOTU-based pipeline. Additionally, the Bray-Curtis distance matrix achieved the highest discriminative effect in the PCoA (43.3%-53.89%) and inter-group analysis (P < 0.01), indicating it was better at distinguishing compositional differences or specific genera of fish community at different sampling sites than other distance matrices. These findings provide new insights into fish community monitoring through eDNA metabarcoding in estuarine environments.
Collapse
Affiliation(s)
- Zhuoying Li
- Joint Laboratory of Guangdong Province and Hong Kong Region on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, 510642, Guangzhou, China
| | - Wencheng Zhao
- Joint Laboratory of Guangdong Province and Hong Kong Region on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, 510642, Guangzhou, China
| | - Yun Jiang
- Joint Laboratory of Guangdong Province and Hong Kong Region on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, 510642, Guangzhou, China
| | - Yongjing Wen
- Joint Laboratory of Guangdong Province and Hong Kong Region on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, 510642, Guangzhou, China
| | - Min Li
- Key Laboratory for Sustainable Utilization of Open-sea Fishery, Ministry of Agriculture and Rural Affairs, South China Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Guangzhou, 510300, China.
| | - Li Liu
- Joint Laboratory of Guangdong Province and Hong Kong Region on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, 510642, Guangzhou, China
| | - Keshu Zou
- Joint Laboratory of Guangdong Province and Hong Kong Region on Marine Bioresource Conservation and Exploitation, College of Marine Sciences, South China Agricultural University, 510642, Guangzhou, China.
| |
Collapse
|
3
|
Pipes L, Nielsen R. A rapid phylogeny-based method for accurate community profiling of large-scale metabarcoding datasets. eLife 2024; 13:e85794. [PMID: 39145536 PMCID: PMC11377034 DOI: 10.7554/elife.85794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Accepted: 08/14/2024] [Indexed: 08/16/2024] Open
Abstract
Environmental DNA (eDNA) is becoming an increasingly important tool in diverse scientific fields from ecological biomonitoring to wastewater surveillance of viruses. The fundamental challenge in eDNA analyses has been the bioinformatical assignment of reads to taxonomic groups. It has long been known that full probabilistic methods for phylogenetic assignment are preferable, but unfortunately, such methods are computationally intensive and are typically inapplicable to modern next-generation sequencing data. We present a fast approximate likelihood method for phylogenetic assignment of DNA sequences. Applying the new method to several mock communities and simulated datasets, we show that it identifies more reads at both high and low taxonomic levels more accurately than other leading methods. The advantage of the method is particularly apparent in the presence of polymorphisms and/or sequencing errors and when the true species is not represented in the reference database.
Collapse
Affiliation(s)
- Lenore Pipes
- Department of Integrative Biology, University of California, Berkeley, Berkeley, United States
| | - Rasmus Nielsen
- Department of Integrative Biology, University of California, Berkeley, Berkeley, United States
- GLOBE Institute, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
4
|
Rumore J, Walker M, Pagotto F, Forbes JD, Peterson CL, Tyler AD, Graham M, Van Domselaar G, Nadon C, Reimer A, Knox N. Use of a taxon-specific reference database for accurate metagenomics-based pathogen detection of Listeria monocytogenes in turkey deli meat and spinach. BMC Genomics 2023; 24:361. [PMID: 37370007 PMCID: PMC10303765 DOI: 10.1186/s12864-023-09338-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2022] [Accepted: 04/26/2023] [Indexed: 06/29/2023] Open
Abstract
BACKGROUND The reliability of culture-independent pathogen detection in foods using metagenomics is contingent on the quality and composition of the reference database. The inclusion of microbial sequences from a diverse representation of taxonomies in universal reference databases is recommended to maximize classification precision for pathogen detection. However, these sizable databases have high memory requirements that may be out of reach for some users. In this study, we aimed to assess the performance of a foodborne pathogen (FBP)-specific reference database (taxon-specific) relative to a universal reference database (taxon-agnostic). We tested our FBP-specific reference database's performance for detecting Listeria monocytogenes in two complex food matrices-ready-to-eat (RTE) turkey deli meat and prepackaged spinach-using three popular read-based DNA-to-DNA metagenomic classifiers: Centrifuge, Kraken 2 and KrakenUniq. RESULTS In silico host sequence removal led to substantially fewer false positive (FP) classifications and higher classification precision in RTE turkey deli meat datasets using the FBP-specific reference database. No considerable improvement in classification precision was observed following host filtering for prepackaged spinach datasets and was likely a consequence of a higher microbe-to-host sequence ratio. All datasets classified with Centrifuge using the FBP-specific reference database had the lowest classification precision compared to Kraken 2 or KrakenUniq. When a confidence-scoring threshold was applied, a nearly equivalent precision to the universal reference database was achieved for Kraken 2 and KrakenUniq. Recall was high for both reference databases across all datasets and classifiers. Substantially fewer computational resources were required for metagenomics-based detection of L. monocytogenes using the FBP-specific reference database, especially when combined with Kraken 2. CONCLUSIONS A universal (taxon-agnostic) reference database is not essential for accurate and reliable metagenomics-based pathogen detection of L. monocytogenes in complex food matrices. Equivalent classification performance can be achieved using a taxon-specific reference database when the appropriate quality control measures, classification software, and analysis parameters are applied. This approach is less computationally demanding and more attainable for the broader scientific and food safety communities.
Collapse
Affiliation(s)
- Jillian Rumore
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB, Canada.
- Public Health Agency of Canada, National Microbiology Laboratory, MB, Winnipeg, Canada.
| | - Matthew Walker
- Public Health Agency of Canada, National Microbiology Laboratory, MB, Winnipeg, Canada
| | - Franco Pagotto
- Food Directorate, Health Canada, Bureau of Microbial Hazards, Ottawa, ON, Canada
| | - Jessica D Forbes
- Eastern Ontario Regional Laboratory Association, Ottawa, ON, Canada
| | - Christy-Lynn Peterson
- Public Health Agency of Canada, National Microbiology Laboratory, MB, Winnipeg, Canada
| | - Andrea D Tyler
- Public Health Agency of Canada, National Microbiology Laboratory, MB, Winnipeg, Canada
| | - Morag Graham
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB, Canada
- Public Health Agency of Canada, National Microbiology Laboratory, MB, Winnipeg, Canada
| | - Gary Van Domselaar
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB, Canada
- Public Health Agency of Canada, National Microbiology Laboratory, MB, Winnipeg, Canada
| | - Celine Nadon
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB, Canada
- Public Health Agency of Canada, National Microbiology Laboratory, MB, Winnipeg, Canada
| | - Aleisha Reimer
- Public Health Agency of Canada, National Microbiology Laboratory, MB, Winnipeg, Canada
| | - Natalie Knox
- Department of Medical Microbiology and Infectious Diseases, University of Manitoba, Winnipeg, MB, Canada
- Public Health Agency of Canada, National Microbiology Laboratory, MB, Winnipeg, Canada
| |
Collapse
|
5
|
Garrido-Sanz L, Àngel Senar M, Piñol J. Drastic reduction of false positive species in samples of insects by intersecting the default output of two popular metagenomic classifiers. PLoS One 2022; 17:e0275790. [PMID: 36282811 PMCID: PMC9595558 DOI: 10.1371/journal.pone.0275790] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Accepted: 09/15/2022] [Indexed: 11/19/2022] Open
Abstract
The use of high-throughput sequencing to recover short DNA reads of many species has been widely applied on biodiversity studies, either as amplicon metabarcoding or shotgun metagenomics. These reads are assigned to taxa using classifiers. However, for different reasons, the results often contain many false positives. Here we focus on the reduction of false positive species attributable to the classifiers. We benchmarked two popular classifiers, BLASTn followed by MEGAN6 (BM) and Kraken2 (K2), to analyse shotgun sequenced artificial single-species samples of insects. To reduce the number of misclassified reads, we combined the output of the two classifiers in two different ways: (1) by keeping only the reads that were attributed to the same species by both classifiers (intersection approach); and (2) by keeping the reads assigned to some species by any classifier (union approach). In addition, we applied an analytical detection limit to further reduce the number of false positives species. As expected, both metagenomic classifiers used with default parameters generated an unacceptably high number of misidentified species (tens with BM, hundreds with K2). The false positive species were not necessarily phylogenetically close, as some of them belonged to different orders of insects. The union approach failed to reduce the number of false positives, but the intersection approach got rid of most of them. The addition of an analytic detection limit of 0.001 further reduced the number to ca. 0.5 false positive species per sample. The misidentification of species by most classifiers hampers the confidence of the DNA-based methods for assessing the biodiversity of biological samples. Our approach to alleviate the problem is straightforward and significantly reduced the number of reported false positive species.
Collapse
Affiliation(s)
- Lidia Garrido-Sanz
- Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
- * E-mail:
| | | | - Josep Piñol
- Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
- CREAF, Cerdanyola del Vallès, Spain
| |
Collapse
|
6
|
Crowdsourced benchmarking of taxonomic metagenome profilers: lessons learned from the sbv IMPROVER Microbiomics challenge. BMC Genomics 2022; 23:624. [PMID: 36042406 PMCID: PMC9429340 DOI: 10.1186/s12864-022-08803-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2022] [Accepted: 07/25/2022] [Indexed: 11/10/2022] Open
Abstract
Background Selection of optimal computational strategies for analyzing metagenomics data is a decisive step in determining the microbial composition of a sample, and this procedure is complex because of the numerous tools currently available. The aim of this research was to summarize the results of crowdsourced sbv IMPROVER Microbiomics Challenge designed to evaluate the performance of off-the-shelf metagenomics software as well as to investigate the robustness of these results by the extended post-challenge analysis. In total 21 off-the-shelf taxonomic metagenome profiling pipelines were benchmarked for their capacity to identify the microbiome composition at various taxon levels across 104 shotgun metagenomics datasets of bacterial genomes (representative of various microbiome samples) from public databases. Performance was determined by comparing predicted taxonomy profiles with the gold standard. Results Most taxonomic profilers performed homogeneously well at the phylum level but generated intermediate and heterogeneous scores at the genus and species levels, respectively. kmer-based pipelines using Kraken with and without Bracken or using CLARK-S performed best overall, but they exhibited lower precision than the two marker-gene-based methods MetaPhlAn and mOTU. Filtering out the 1% least abundance species—which were not reliably predicted—helped increase the performance of most profilers by increasing precision but at the cost of recall. However, the use of adaptive filtering thresholds determined from the sample’s Shannon index increased the performance of most kmer-based profilers while mitigating the tradeoff between precision and recall. Conclusions kmer-based metagenomic pipelines using Kraken/Bracken or CLARK-S performed most robustly across a large variety of microbiome datasets. Removing non-reliably predicted low-abundance species by using diversity-dependent adaptive filtering thresholds further enhanced the performance of these tools. This work demonstrates the applicability of computational pipelines for accurately determining taxonomic profiles in clinical and environmental contexts and exemplifies the power of crowdsourcing for unbiased evaluation. Supplementary Information The online version contains supplementary material available at 10.1186/s12864-022-08803-2.
Collapse
|
7
|
Chiarello M, McCauley M, Villéger S, Jackson CR. Ranking the biases: The choice of OTUs vs. ASVs in 16S rRNA amplicon data analysis has stronger effects on diversity measures than rarefaction and OTU identity threshold. PLoS One 2022; 17:e0264443. [PMID: 35202411 PMCID: PMC8870492 DOI: 10.1371/journal.pone.0264443] [Citation(s) in RCA: 75] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2021] [Accepted: 02/10/2022] [Indexed: 02/07/2023] Open
Abstract
Advances in the analysis of amplicon sequence datasets have introduced a methodological shift in how research teams investigate microbial biodiversity, away from sequence identity-based clustering (producing Operational Taxonomic Units, OTUs) to denoising methods (producing amplicon sequence variants, ASVs). While denoising methods have several inherent properties that make them desirable compared to clustering-based methods, questions remain as to the influence that these pipelines have on the ecological patterns being assessed, especially when compared to other methodological choices made when processing data (e.g. rarefaction) and computing diversity indices. We compared the respective influences of two widely used methods, namely DADA2 (a denoising method) vs. Mothur (a clustering method) on 16S rRNA gene amplicon datasets (hypervariable region v4), and compared such effects to the rarefaction of the community table and OTU identity threshold (97% vs. 99%) on the ecological signals detected. We used a dataset comprising freshwater invertebrate (three Unionidae species) gut and environmental (sediment, seston) communities sampled in six rivers in the southeastern USA. We ranked the respective effects of each methodological choice on alpha and beta diversity, and taxonomic composition. The choice of the pipeline significantly influenced alpha and beta diversities and changed the ecological signal detected, especially on presence/absence indices such as the richness index and unweighted Unifrac. Interestingly, the discrepancy between OTU and ASV-based diversity metrics could be attenuated by the use of rarefaction. The identification of major classes and genera also revealed significant discrepancies across pipelines. Compared to the pipeline's effect, OTU threshold and rarefaction had a minimal impact on all measurements.
Collapse
Affiliation(s)
- Marlène Chiarello
- Department of Biology, University of Mississippi, University, MS, United States of America
| | - Mark McCauley
- Department of Biology, University of Mississippi, University, MS, United States of America
| | - Sébastien Villéger
- MARBEC, University of Montpellier, CNRS, Ifremer, IRD, Montpellier, France
| | - Colin R. Jackson
- Department of Biology, University of Mississippi, University, MS, United States of America
| |
Collapse
|
8
|
Ziemski M, Wisanwanichthan T, Bokulich NA, Kaehler BD. Beating Naive Bayes at Taxonomic Classification of 16S rRNA Gene Sequences. Front Microbiol 2021; 12:644487. [PMID: 34220738 PMCID: PMC8249850 DOI: 10.3389/fmicb.2021.644487] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 05/31/2021] [Indexed: 12/28/2022] Open
Abstract
Naive Bayes classifiers (NBC) have dominated the field of taxonomic classification of amplicon sequences for over a decade. Apart from having runtime requirements that allow them to be trained and used on modest laptops, they have persistently provided class-topping classification accuracy. In this work we compare NBC with random forest classifiers, neural network classifiers, and a perfect classifier that can only fail when different species have identical sequences, and find that in some practical scenarios there is little scope for improving on NBC for taxonomic classification of 16S rRNA gene sequences. Further improvements in taxonomy classification are unlikely to come from novel algorithms alone, and will need to leverage other technological innovations, such as ecological frequency information.
Collapse
Affiliation(s)
- Michal Ziemski
- Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition, and Health, ETH Zürich, Zurich, Switzerland
| | | | - Nicholas A. Bokulich
- Laboratory of Food Systems Biotechnology, Institute of Food, Nutrition, and Health, ETH Zürich, Zurich, Switzerland
| | | |
Collapse
|
9
|
Mathon L, Valentini A, Guérin PE, Normandeau E, Noel C, Lionnet C, Boulanger E, Thuiller W, Bernatchez L, Mouillot D, Dejean T, Manel S. Benchmarking bioinformatic tools for fast and accurate eDNA metabarcoding species identification. Mol Ecol Resour 2021; 21:2565-2579. [PMID: 34002951 DOI: 10.1111/1755-0998.13430] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2021] [Revised: 05/10/2021] [Accepted: 05/11/2021] [Indexed: 12/01/2022]
Abstract
Bioinformatic analysis of eDNA metabarcoding data is a crucial step toward rigorously assessing biodiversity. Many programs are now available for each step of the required analyses, but their relative abilities at providing fast and accurate species lists have seldom been evaluated. We used simulated mock communities and real fish eDNA metabarcoding data to evaluate the performance of 13 bioinformatic programs and pipelines to retrieve fish occurrence and read abundance using the 12S mt rRNA gene marker. We used four indices to compare the outputs of each program with the simulated samples: sensitivity, F-measure, root-mean-square error (RMSE) on read relative abundances, and execution time. We found marked differences among programs only for the taxonomic assignment step, both in terms of sensitivity, F-measure and RMSE. Running time was highly different between programs for each step. The fastest programs with best indices for each step were assembled into a pipeline. We compared this pipeline to pipelines constructed from existing toolboxes (OBITools, Barque, and QIIME 2). Our pipeline and Barque obtained the best performance for all indices and appear to be better alternatives to highly used pipelines for analysing fish eDNA metabarcoding data when a complete reference database is available. Analysis on real eDNA metabarcoding data also indicated differences for taxonomic assignment and execution time only. This study reveals major differences between programs during the taxonomic assignment step. The choice of algorithm for the taxonomic assignment can have a significant impact on diversity estimates and should be made according to the objectives of the study.
Collapse
Affiliation(s)
- Laetitia Mathon
- CEFE, Univ. Montpellier, CNRS, EPHE-PSL University, IRD, Montpellier, France.,SPYGEN, Savoie Technolac, Le Bourget du Lac, France
| | | | | | - Eric Normandeau
- Université Laval, IBIS (Institut de Biologie Intégrative et des Systèmes), Québec, QC, Canada
| | - Cyril Noel
- IFREMER - IRSI - Service de Bioinformatique (SeBiMER), Plouzané, France
| | - Clément Lionnet
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, Laboratoire d'Ecologie Alpine, Grenoble, France
| | - Emilie Boulanger
- CEFE, Univ. Montpellier, CNRS, EPHE-PSL University, IRD, Montpellier, France.,MARBEC, Univ. Montpellier, CNRS, IRD, Ifremer, Montpellier, France
| | - Wilfried Thuiller
- Univ. Grenoble Alpes, Univ. Savoie Mont Blanc, CNRS, LECA, Laboratoire d'Ecologie Alpine, Grenoble, France
| | - Louis Bernatchez
- Université Laval, IBIS (Institut de Biologie Intégrative et des Systèmes), Québec, QC, Canada
| | - David Mouillot
- MARBEC, Univ. Montpellier, CNRS, IRD, Ifremer, Montpellier, France.,Institut Universitaire de France, IUF, Paris, France
| | - Tony Dejean
- SPYGEN, Savoie Technolac, Le Bourget du Lac, France
| | - Stéphanie Manel
- CEFE, Univ. Montpellier, CNRS, EPHE-PSL University, IRD, Montpellier, France
| |
Collapse
|
10
|
Buchka S, Hapfelmeier A, Gardner PP, Wilson R, Boulesteix AL. On the optimistic performance evaluation of newly introduced bioinformatic methods. Genome Biol 2021; 22:152. [PMID: 33975646 PMCID: PMC8111726 DOI: 10.1186/s13059-021-02365-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2020] [Accepted: 04/23/2021] [Indexed: 12/03/2022] Open
Abstract
Most research articles presenting new data analysis methods claim that "the new method performs better than existing methods," but the veracity of such statements is questionable. Our manuscript discusses and illustrates consequences of the optimistic bias occurring during the evaluation of novel data analysis methods, that is, all biases resulting from, for example, selection of datasets or competing methods, better ability to fix bugs in a preferred method, and selective reporting of method variants. We quantitatively investigate this bias using an example from epigenetic analysis: normalization methods for data generated by the Illumina HumanMethylation450K BeadChip microarray.
Collapse
Affiliation(s)
- Stefan Buchka
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU, Munich, Germany
| | - Alexander Hapfelmeier
- Institute of Medical Informatics, Statistics and Epidemiology, School of Medicine, TUM, Munich, Germany
- Institute of General Practice and Health Services Research, School of Medicine, TUM, Munich, Germany
| | - Paul P. Gardner
- Department of Biochemistry, University of Otago, Otago, New Zealand
| | - Rory Wilson
- Research Unit Molecular Epidemiology, Institute of Epidemiology, Helmholtz Zentrum München, German Research Center for Environmental Health, Neuherberg, Germany
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, LMU, Munich, Germany
| |
Collapse
|
11
|
Mlaga KD, Mathieu A, Beauparlant CJ, Ott A, Khodr A, Perin O, Droit A. HCK and ABAA: A Newly Designed Pipeline to Improve Fungi Metabarcoding Analysis. Front Microbiol 2021; 12:640693. [PMID: 34025601 PMCID: PMC8134036 DOI: 10.3389/fmicb.2021.640693] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Accepted: 04/08/2021] [Indexed: 11/13/2022] Open
Abstract
Introduction The fungi ITS sequence length dissimilarity, non-specific amplicons, including chimaera formed during Polymerase Chain Reaction (PCR), added to sequencing errors, create bias during similarity clustering and abundance estimation in the downstream analysis. To overcome these challenges, we present a novel approach, Hierarchical Clustering with Kraken (HCK), to classify ITS1 amplicons and Abundance-Base Alternative Approach (ABAA) pipeline to detect and filter non-specific amplicons in fungi metabarcoding sequencing datasets. Materials and Methods We compared the performances of both pipelines against QIIME, KRAKEN, and DADA2 using publicly available fungi ITS mock community datasets and using BLASTn as a reference. We calculated the Precision, Recall, F-score using the True-Positive, False-positive, and False-negative estimation. Alpha diversity (Chao1 and Shannon metrics) was also used to evaluate the diversity estimation of our method. Results The analysis shows that ABAA reduced the number of false-positive with all metabarcoding methods tested, and HCK increases precision and recall. HCK, coupled with ABAA, improves the F-score and bring alpha diversity metric value close to that of the BLASTn alpha diversity values when compared to QIIME, KRAKEN, and DADA2. Conclusion The developed HCK-ABAA approach allows better identification of the fungi community structures while avoiding use of a reference database for non-specific amplicons filtration. It results in a more robust and stable methodology over time. The software can be downloaded on the following link: https://bitbucket.org/GottySG36/hck/src/master/.
Collapse
Affiliation(s)
- Kodjovi D Mlaga
- Department of Molecular Medicine, Laval University, Quebec, QC, Canada
| | - Alban Mathieu
- Department of Molecular Medicine, Laval University, Quebec, QC, Canada.,Centre de Recherche du CHU de Québec, Quebec, QC, Canada
| | - Charles Joly Beauparlant
- Department of Molecular Medicine, Laval University, Quebec, QC, Canada.,Centre de Recherche du CHU de Québec, Quebec, QC, Canada
| | - Alban Ott
- Research and Innovation, L'Oreal, Paris, France
| | - Ahmad Khodr
- Research and Innovation, L'Oreal, Paris, France
| | | | - Arnaud Droit
- Department of Molecular Medicine, Laval University, Quebec, QC, Canada.,Centre de Recherche du CHU de Québec, Quebec, QC, Canada
| |
Collapse
|
12
|
Hleap JS, Littlefair JE, Steinke D, Hebert PDN, Cristescu ME. Assessment of current taxonomic assignment strategies for metabarcoding eukaryotes. Mol Ecol Resour 2021; 21:2190-2203. [PMID: 33905615 DOI: 10.1111/1755-0998.13407] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2020] [Revised: 03/08/2021] [Accepted: 04/19/2021] [Indexed: 01/04/2023]
Abstract
The effective use of metabarcoding in biodiversity science has brought important analytical challenges due to the need to generate accurate taxonomic assignments. The assignment of sequences to genus or species level is critical for biodiversity surveys and biomonitoring, but it is particularly challenging as researchers must select the approach that best recovers information on species composition. This study evaluates the performance and accuracy of seven methods in recovering the species composition of mock communities by using COI barcode fragments. The mock communities varied in species number and specimen abundance, while upstream molecular and bioinformatic variables were held constant, and using a set of COI fragments. We evaluated the impact of parameter optimization on the quality of the predictions. Our results indicate that BLAST top hit competes well with more complex approaches if optimized for the mock community under study. For example, the two machine learning methods that were benchmarked proved more sensitive to reference database heterogeneity and completeness than methods based on sequence similarity. The accuracy of assignments was impacted by both species and specimen counts (query compositional heterogeneity) which ultimately influence the selection of appropriate software. We urge researchers to: (i) use realistic mock communities to allow optimization of parameters, regardless of the taxonomic assignment method employed; (ii) carefully choose and curate the reference databases including completeness; and (iii) use QIIME, BLAST or LCA methods, in conjunction with parameter tuning to better assign taxonomy to diverse communities, especially when information on species diversity is lacking for the area under study.
Collapse
Affiliation(s)
- Jose S Hleap
- Department of Biology, McGill University, Montreal, QC, Canada.,SHARCNET, University of Guelph, Guelph, ON, Canada.,Fundacion SQUALUS, Cali, Colombia
| | - Joanne E Littlefair
- Department of Biology, McGill University, Montreal, QC, Canada.,Queen Mary University of London, London, UK
| | - Dirk Steinke
- Centre for Biodiversity Genomics & Department of Integrative Biology, University of Guelph, Guelph, ON, Canada
| | - Paul D N Hebert
- Centre for Biodiversity Genomics & Department of Integrative Biology, University of Guelph, Guelph, ON, Canada
| | | |
Collapse
|
13
|
Straub D, Blackwell N, Langarica-Fuentes A, Peltzer A, Nahnsen S, Kleindienst S. Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline. Front Microbiol 2020; 11:550420. [PMID: 33193131 PMCID: PMC7645116 DOI: 10.3389/fmicb.2020.550420] [Citation(s) in RCA: 125] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Accepted: 10/02/2020] [Indexed: 12/11/2022] Open
Abstract
One of the major methods to identify microbial community composition, to unravel microbial population dynamics, and to explore microbial diversity in environmental samples is high-throughput DNA- or RNA-based 16S rRNA (gene) amplicon sequencing in combination with bioinformatics analyses. However, focusing on environmental samples from contrasting habitats, it was not systematically evaluated (i) which analysis methods provide results that reflect reality most accurately, (ii) how the interpretations of microbial community studies are biased by different analysis methods and (iii) if the most optimal analysis workflow can be implemented in an easy-to-use pipeline. Here, we compared the performance of 16S rRNA (gene) amplicon sequencing analysis tools (i.e., Mothur, QIIME1, QIIME2, and MEGAN) using three mock datasets with known microbial community composition that differed in sequencing quality, species number and abundance distribution (i.e., even or uneven), and phylogenetic diversity (i.e., closely related or well-separated amplicon sequences). Our results showed that QIIME2 outcompeted all other investigated tools in sequence recovery (>10 times fewer false positives), taxonomic assignments (>22% better F-score) and diversity estimates (>5% better assessment), suggesting that this approach is able to reflect the in situ microbial community most accurately. Further analysis of 24 environmental datasets obtained from four contrasting terrestrial and freshwater sites revealed dramatic differences in the resulting microbial community composition for all pipelines at genus level. For instance, at the investigated river water sites Sphaerotilus was only reported when using QIIME1 (8% abundance) and Agitococcus with QIIME1 or QIIME2 (2 or 3% abundance, respectively), but both genera remained undetected when analyzed with Mothur or MEGAN. Since these abundant taxa probably have implications for important biogeochemical cycles (e.g., nitrate and sulfate reduction) at these sites, their detection and semi-quantitative enumeration is crucial for valid interpretations. A high-performance computing conformant workflow was constructed to allow FAIR (Findable, Accessible, Interoperable, and Re-usable) 16S rRNA (gene) amplicon sequence analysis starting from raw sequence files, using the most optimal methods identified in our study. Our presented workflow should be considered for future studies, thereby facilitating the analysis of high-throughput 16S rRNA (gene) sequencing data substantially, while maximizing reliability and confidence in microbial community data analysis.
Collapse
Affiliation(s)
- Daniel Straub
- Microbial Ecology, Center for Applied Geoscience, Department of Geosciences, University of Tübingen, Tübingen, Germany
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
| | - Nia Blackwell
- Microbial Ecology, Center for Applied Geoscience, Department of Geosciences, University of Tübingen, Tübingen, Germany
| | - Adrian Langarica-Fuentes
- Microbial Ecology, Center for Applied Geoscience, Department of Geosciences, University of Tübingen, Tübingen, Germany
| | - Alexander Peltzer
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
| | - Sven Nahnsen
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen, Germany
| | - Sara Kleindienst
- Microbial Ecology, Center for Applied Geoscience, Department of Geosciences, University of Tübingen, Tübingen, Germany
| |
Collapse
|
14
|
O'Rourke DR, Bokulich NA, Jusino MA, MacManes MD, Foster JT. A total crapshoot? Evaluating bioinformatic decisions in animal diet metabarcoding analyses. Ecol Evol 2020; 10:9721-9739. [PMID: 33005342 PMCID: PMC7520210 DOI: 10.1002/ece3.6594] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2020] [Revised: 06/29/2020] [Accepted: 07/01/2020] [Indexed: 01/01/2023] Open
Abstract
Metabarcoding studies provide a powerful approach to estimate the diversity and abundance of organisms in mixed communities in nature. While strategies exist for optimizing sample and sequence library preparation, best practices for bioinformatic processing of amplicon sequence data are lacking in animal diet studies. Here we evaluate how decisions made in core bioinformatic processes, including sequence filtering, database design, and classification, can influence animal metabarcoding results. We show that denoising methods have lower error rates compared to traditional clustering methods, although these differences are largely mitigated by removing low-abundance sequence variants. We also found that available reference datasets from GenBank and BOLD for the animal marker gene cytochrome oxidase I (COI) can be complementary, and we discuss methods to improve existing databases to include versioned releases. Taxonomic classification methods can dramatically affect results. For example, the commonly used Barcode of Life Database (BOLD) Classification API assigned fewer names to samples from order through species levels using both a mock community and bat guano samples compared to all other classifiers (vsearch-SINTAX and q2-feature-classifier's BLAST + LCA, VSEARCH + LCA, and Naive Bayes classifiers). The lack of consensus on bioinformatics best practices limits comparisons among studies and may introduce biases. Our work suggests that biological mock communities offer a useful standard to evaluate the myriad computational decisions impacting animal metabarcoding accuracy. Further, these comparisons highlight the need for continual evaluations as new tools are adopted to ensure that the inferences drawn reflect meaningful biology instead of digital artifacts.
Collapse
Affiliation(s)
- Devon R. O'Rourke
- Department of Molecular, Cellular, and Biomedical SciencesUniversity of New HampshireDurhamNHUSA
- Pathogen and Microbiome InstituteNorthern Arizona UniversityFlagstaffAZUSA
| | - Nicholas A. Bokulich
- Laboratory of Food Systems BiotechnologyInstitute of Food, Nutrition, and HealthETH ZurichZurichSwitzerland
| | - Michelle A. Jusino
- Biology DepartmentWilliam & MaryWilliamsburgVAUSA
- Center for Forest Mycology ResearchUSDA Forest ServiceNorthern Research StationMadisonUSA
| | - Matthew D. MacManes
- Department of Molecular, Cellular, and Biomedical SciencesUniversity of New HampshireDurhamNHUSA
| | - Jeffrey T. Foster
- Department of Molecular, Cellular, and Biomedical SciencesUniversity of New HampshireDurhamNHUSA
- Pathogen and Microbiome InstituteNorthern Arizona UniversityFlagstaffAZUSA
- Department of Biological SciencesNorthern Arizona UniversityFlagstaffAZUSA
| |
Collapse
|
15
|
Abstract
Many biological contaminants are disseminated through water, and their occurrence has potential detrimental impacts on public and environmental health. Conventional monitoring tools rely on cultivation and are not robust in addressing modern water quality concerns. This review proposes metagenomics as a means to provide a rapid, nontargeted assessment of biological contaminants in water. When further coupled with appropriate methods (e.g., quantitative PCR and flow cytometry) and bioinformatic tools, metagenomics can provide information concerning both the abundance and diversity of biological contaminants in reclaimed waters. Further correlation between the metagenomic-derived data of selected contaminants and the measurable parameters of water quality can also aid in devising strategies to alleviate undesirable water quality. Here, we review metagenomic approaches (i.e., both sequencing platforms and bioinformatic tools) and studies that demonstrated their use for reclaimed-water quality monitoring. We also provide recommendations on areas of improvement that will allow metagenomics to significantly impact how the water industry performs reclaimed-water quality monitoring in the future.
Collapse
Affiliation(s)
- Pei-Ying Hong
- Water Desalination and Reuse Center, Division of Biological and Environmental Science and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - David Mantilla-Calderon
- Water Desalination and Reuse Center, Division of Biological and Environmental Science and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Changzhi Wang
- Water Desalination and Reuse Center, Division of Biological and Environmental Science and Engineering, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
16
|
Seppey M, Manni M, Zdobnov EM. LEMMI: a continuous benchmarking platform for metagenomics classifiers. Genome Res 2020; 30:1208-1216. [PMID: 32616517 PMCID: PMC7462069 DOI: 10.1101/gr.260398.119] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2020] [Accepted: 06/25/2020] [Indexed: 11/24/2022]
Abstract
Studies of microbiomes are booming, along with the diversity of computational approaches to make sense out of the sequencing data and the volumes of accumulated microbial genotypes. A swift evaluation of newly published methods and their improvements against established tools is necessary to reduce the time between the methods' release and their adoption in microbiome analyses. The LEMMI platform offers a novel approach for benchmarking software dedicated to metagenome composition assessments based on read classification. It enables the integration of newly published methods in an independent and centralized benchmark designed to be continuously open to new submissions. This allows developers to be proactive regarding comparative evaluations and guarantees that any promising methods can be assessed side by side with established tools quickly after their release. Moreover, LEMMI enforces an effective distribution through software containers to ensure long-term availability of all methods. Here, we detail the LEMMI workflow and discuss the performances of some previously unevaluated tools. We see this platform eventually as a community-driven effort in which method developers can showcase novel approaches and get unbiased benchmarks for publications, and users can make informed choices and obtain standardized and easy-to-use tools.
Collapse
Affiliation(s)
- Mathieu Seppey
- Department of Genetic Medicine and Development, University of Geneva Medical School and Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland
| | - Mosè Manni
- Department of Genetic Medicine and Development, University of Geneva Medical School and Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland
| | - Evgeny M Zdobnov
- Department of Genetic Medicine and Development, University of Geneva Medical School and Swiss Institute of Bioinformatics, 1211 Geneva, Switzerland
| |
Collapse
|
17
|
Hughes KW, Case A, Matheny PB, Kivlin S, Petersen RH, Miller AN, Iturriaga T. Secret lifestyles of pyrophilous fungi in the genus Sphaerosporella. AMERICAN JOURNAL OF BOTANY 2020; 107:876-885. [PMID: 32496601 PMCID: PMC7384086 DOI: 10.1002/ajb2.1482] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Accepted: 03/03/2020] [Indexed: 05/07/2023]
Abstract
PREMISE Pyrophilous fungi form aboveground fruiting structures (ascocarps) following wildfires, but their ecology, natural history, and life cycles in the absence of wildfires are largely unknown. Sphaerosporella is considered to be pyrophilous. This study explores Sphaerosporella ascocarp appearance following a rare 2016 wildfire in the Great Smoky Mountains National Park (GSMNP), compares the timing of ascocarp formation with recovery of Sphaerosporella DNA sequences in soils, and explores the association of Sphaerosporella with post-fire Table Mountain pine (Pinus pungens) seedlings. METHODS Burned sites in the GSMNP were surveyed for pyrophilous fungal ascocarps over 2 years. Ascocarps, mycorrhizae, and endophyte cultures were evaluated morphologically and by Sanger sequencing of the nuclear ribosomal ITS gene region (fungal barcode; Schoch et al., 2012). DNA from soil cores was subjected to Illumina sequencing. RESULTS The timing and location of post-fire Sphaerosporella ascocarp formation was correlated with recovery of Sphaerosporella DNA sequences in soils. Genetic markers (fungal barcode) of Sphaerosporella were also recovered from mycorrhizal root tips and endophyte cultures from seedlings of Pinus pungens. CONCLUSIONS This study demonstrates that Sphaerosporella species, in the absence of fire, are biotrophic, forming both mycorrhizal and endophytic associations with developing Pinus pungens seedlings and may persist in nature in the absence of wildfire as a conifer symbiont. We speculate that Sphaerosporella may fruit only after the host plant is damaged or destroyed and that after wildfires, deep roots, needle endophytes, or heat-resistant spores could serve as a source of soil mycelium.
Collapse
Affiliation(s)
- Karen W. Hughes
- Department of Ecology and Evolutionary BiologyUniversity of TennesseeKnoxvilleTN37996USA
| | - Alexis Case
- Department of Ecology and Evolutionary BiologyUniversity of TennesseeKnoxvilleTN37996USA
| | - P. Brandon Matheny
- Department of Ecology and Evolutionary BiologyUniversity of TennesseeKnoxvilleTN37996USA
| | - Stephanie Kivlin
- Department of Ecology and Evolutionary BiologyUniversity of TennesseeKnoxvilleTN37996USA
| | - Ronald H. Petersen
- Department of Ecology and Evolutionary BiologyUniversity of TennesseeKnoxvilleTN37996USA
| | - Andrew N. Miller
- Illinois Natural History SurveyUniversity of Illinois Urbana‐Champaign1816 South Oak StreetChampaignIL61820USA
| | - Teresa Iturriaga
- School of Integrated Plant ScienceCornell University334 Plant Science BuildingIthacaNY14853‐5904USA
| |
Collapse
|
18
|
Sim M, Lee J, Lee D, Kwon D, Kim J. TAMA: improved metagenomic sequence classification through meta-analysis. BMC Bioinformatics 2020; 21:185. [PMID: 32397982 PMCID: PMC7218625 DOI: 10.1186/s12859-020-3533-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Accepted: 05/05/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Microorganisms are important occupants of many different environments. Identifying the composition of microbes and estimating their abundance promote understanding of interactions of microbes in environmental samples. To understand their environments more deeply, the composition of microorganisms in environmental samples has been studied using metagenomes, which are the collections of genomes of the microorganisms. Although many tools have been developed for taxonomy analysis based on different algorithms, variability of analysis outputs of existing tools from the same input metagenome datasets is the main obstacle for many researchers in this field. RESULTS Here, we present a novel meta-analysis tool for metagenome taxonomy analysis, called TAMA, by intelligently integrating outputs from three different taxonomy analysis tools. Using an integrated reference database, TAMA performs taxonomy assignment for input metagenome reads based on a meta-score by integrating scores of taxonomy assignment from different taxonomy classification tools. TAMA outperformed existing tools when evaluated using various benchmark datasets. It was also successfully applied to obtain relative species abundance profiles and difference in composition of microorganisms in two types of cheese metagenome and human gut metagenome. CONCLUSION TAMA can be easily installed and used for metagenome read classification and the prediction of relative species abundance from multiple numbers and types of metagenome read samples. TAMA can be used to more accurately uncover the composition of microorganisms in metagenome samples collected from various environments, especially when the use of a single taxonomy analysis tool is unreliable. TAMA is an open source tool, and can be downloaded at https://github.com/jkimlab/TAMA.
Collapse
Affiliation(s)
- Mikang Sim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Jongin Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Daehwan Lee
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Daehong Kwon
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea
| | - Jaebum Kim
- Department of Biomedical Science and Engineering, Konkuk University, Seoul, 05029, Republic of Korea.
| |
Collapse
|
19
|
Kivlin SN, Hawkes CV. Spatial and temporal turnover of soil microbial communities is not linked to function in a primary tropical forest. Ecology 2020; 101:e02985. [PMID: 31958139 DOI: 10.1002/ecy.2985] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 11/21/2019] [Accepted: 12/20/2019] [Indexed: 11/06/2022]
Abstract
The spatial and temporal linkages between turnover of soil microbial communities and their associated functions remain largely unexplored in terrestrial ecosystems. Yet defining these relationships and how they vary across ecosystems and microbial lineages is key to incorporating microbial communities into ecological forecasts and ecosystem models. To define linkages between turnover of soil bacterial and fungal communities and their function we sampled fungal and bacterial composition, abundance, and enzyme activities across a 3-ha area of wet tropical primary forest over 2 yr. We show that fungal and bacterial communities both exhibited temporal turnover, but turnover of both groups was much lower than in temperate ecosystems. Turnover over time was driven by gain and loss of microbial taxa and not changes in abundance of individual species present in multiple samples. Only fungi varied over space with idiosyncratic variation that did not increase linearly with distance among sampling locations. Only phosphorus-acquiring enzyme activities were linked to shifts in septate, decomposer fungal abundance; no enzymes were affected by composition or diversity of fungi or bacteria. Although temporal and spatial variation in composition was appreciable, because turnover of microbial communities did not alter the functional repertoire of decomposing enzymes, functional redundancy among taxa may be high in this ecosystem. Slow temporal turnover of tropical soil microbial communities and large functional redundancy suggests that shifts in abundance of particular functional groups may capture ecosystem function more accurately than composition in these heterogeneous ecosystems.
Collapse
Affiliation(s)
- Stephanie N Kivlin
- Department of Integrative Biology, University of Texas at Austin, Austin, Texas, 78712, USA
| | - Christine V Hawkes
- Department of Integrative Biology, University of Texas at Austin, Austin, Texas, 78712, USA
| |
Collapse
|
20
|
Doster E, Rovira P, Noyes NR, Burgess BA, Yang X, Weinroth MD, Linke L, Magnuson R, Boucher C, Belk KE, Morley PS. A Cautionary Report for Pathogen Identification Using Shotgun Metagenomics; A Comparison to Aerobic Culture and Polymerase Chain Reaction for Salmonella enterica Identification. Front Microbiol 2019; 10:2499. [PMID: 31736924 PMCID: PMC6838018 DOI: 10.3389/fmicb.2019.02499] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 10/16/2019] [Indexed: 12/19/2022] Open
Abstract
This study was conducted to compare aerobic culture, polymerase chain reaction (PCR), lateral flow immunoassay (LFI), and shotgun metagenomics for identification of Salmonella enterica in feces collected from feedlot cattle. Samples were analyzed in parallel using all four tests. Results from aerobic culture and PCR were 100% concordant and indicated low S. enterica prevalence (3/60 samples positive). Although low S. enterica prevalence restricted formal statistical comparisons, LFI and deep metagenomic sequencing results were discordant with these results. Specifically, metagenomic analysis using k-mer-based classification against the RefSeq database indicated that 11/60 of samples contained sequence reads that matched to the S. enterica genome and uniquely identified this species of bacteria within the sample. However, further examination revealed that plasmid sequences were often included with bacterial genomic sequence data submitted to NCBI, which can lead to incorrect taxonomic classification. To circumvent this classification problem, we separated all plasmid sequences included in bacterial RefSeq genomes and reassigned them to a unique taxon so that they would not be uniquely associated with specific bacterial species such as S. enterica. Using this revised database and taxonomic structure, we found that only 6/60 samples contained sequences specific for S. enterica, suggesting increased relative specificity. Reads identified as S. enterica in these six samples were further evaluated using BLAST and NCBI's nr/nt database, which identified that only 2/60 samples contained reads exclusive to S. enterica chromosomal genomes. These two samples were culture- and PCR-negative, suggesting that even deep metagenomic sequencing suffers from lower sensitivity and specificity in comparison to more traditional pathogen detection methods. Additionally, no sample reads were taxonomically classified as S. enterica with two other metagenomic tools, Metagenomic Intra-species Diversity Analysis System (MIDAS) and Metagenomic Phylogenetic Analysis 2 (MetaPhlAn2). This study re-affirmed that the traditional techniques of aerobic culture and PCR provide similar results for S. enterica identification in cattle feces. On the other hand, metagenomic results are highly influenced by the classification method and reference database employed. These results highlight the nuances of computational detection of species-level sequences within short-read metagenomic sequence data, and emphasize the need for cautious interpretation of such results.
Collapse
Affiliation(s)
- Enrique Doster
- Department of Microbiology, Immunology and Pathology, Colorado State University, Fort Collins, CO, United States
| | - Pablo Rovira
- Instituto Nacional de Investigacion Agropecuaria, Treinta y Tres, Uruguay
| | - Noelle R. Noyes
- Department of Veterinary Population Medicine, University of Minnesota, St. Paul, MN, United States
| | - Brandy A. Burgess
- Department of Population Health, University of Georgia, Athens, GA, United States
| | - Xiang Yang
- Department of Animal Science, University of California, Davis, Davis, CA, United States
| | - Margaret D. Weinroth
- Department of Animal Sciences, Colorado State University, Fort Collins, CO, United States
| | - Lyndsey Linke
- Department of Clinical Sciences, Colorado State University, Fort Collins, CO, United States
| | - Roberta Magnuson
- Department of Clinical Sciences, Colorado State University, Fort Collins, CO, United States
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, Gainesville, FL, United States
| | - Keith E. Belk
- Department of Animal Sciences, Colorado State University, Fort Collins, CO, United States
| | - Paul S. Morley
- Veterinary Education, Research, and Outreach Center, West Texas A&M University, Canyon, TX, United States
| |
Collapse
|
21
|
Piper AM, Batovska J, Cogan NOI, Weiss J, Cunningham JP, Rodoni BC, Blacket MJ. Prospects and challenges of implementing DNA metabarcoding for high-throughput insect surveillance. Gigascience 2019; 8:giz092. [PMID: 31363753 PMCID: PMC6667344 DOI: 10.1093/gigascience/giz092] [Citation(s) in RCA: 94] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2019] [Revised: 06/25/2019] [Accepted: 07/09/2019] [Indexed: 12/21/2022] Open
Abstract
Trap-based surveillance strategies are widely used for monitoring of invasive insect species, aiming to detect newly arrived exotic taxa as well as track the population levels of established or endemic pests. Where these surveillance traps have low specificity and capture non-target endemic species in excess of the target pests, the need for extensive specimen sorting and identification creates a major diagnostic bottleneck. While the recent development of standardized molecular diagnostics has partly alleviated this requirement, the single specimen per reaction nature of these methods does not readily scale to the sheer number of insects trapped in surveillance programmes. Consequently, target lists are often restricted to a few high-priority pests, allowing unanticipated species to avoid detection and potentially establish populations. DNA metabarcoding has recently emerged as a method for conducting simultaneous, multi-species identification of complex mixed communities and may lend itself ideally to rapid diagnostics of bulk insect trap samples. Moreover, the high-throughput nature of recent sequencing platforms could enable the multiplexing of hundreds of diverse trap samples on a single flow cell, thereby providing the means to dramatically scale up insect surveillance in terms of both the quantity of traps that can be processed concurrently and number of pest species that can be targeted. In this review of the metabarcoding literature, we explore how DNA metabarcoding could be tailored to the detection of invasive insects in a surveillance context and highlight the unique technical and regulatory challenges that must be considered when implementing high-throughput sequencing technologies into sensitive diagnostic applications.
Collapse
Affiliation(s)
- Alexander M Piper
- Agriculture Victoria Research, AgriBio Centre, 5 Ring Road, Bundoora 3083, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora 3083, VIC, Australia
| | - Jana Batovska
- Agriculture Victoria Research, AgriBio Centre, 5 Ring Road, Bundoora 3083, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora 3083, VIC, Australia
| | - Noel O I Cogan
- Agriculture Victoria Research, AgriBio Centre, 5 Ring Road, Bundoora 3083, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora 3083, VIC, Australia
| | - John Weiss
- Agriculture Victoria Research, AgriBio Centre, 5 Ring Road, Bundoora 3083, VIC, Australia
| | - John Paul Cunningham
- Agriculture Victoria Research, AgriBio Centre, 5 Ring Road, Bundoora 3083, VIC, Australia
| | - Brendan C Rodoni
- Agriculture Victoria Research, AgriBio Centre, 5 Ring Road, Bundoora 3083, VIC, Australia
- School of Applied Systems Biology, La Trobe University, Bundoora 3083, VIC, Australia
| | - Mark J Blacket
- Agriculture Victoria Research, AgriBio Centre, 5 Ring Road, Bundoora 3083, VIC, Australia
| |
Collapse
|
22
|
Weber LM, Saelens W, Cannoodt R, Soneson C, Hapfelmeier A, Gardner PP, Boulesteix AL, Saeys Y, Robinson MD. Essential guidelines for computational method benchmarking. Genome Biol 2019; 20:125. [PMID: 31221194 PMCID: PMC6584985 DOI: 10.1186/s13059-019-1738-8] [Citation(s) in RCA: 90] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods for an analysis. However, benchmarking studies must be carefully designed and implemented to provide accurate, unbiased, and informative results. Here, we summarize key practical guidelines and recommendations for performing high-quality benchmarking analyses, based on our experiences in computational biology.
Collapse
Affiliation(s)
- Lukas M Weber
- Institute of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, 8057, Zurich, Switzerland
| | - Wouter Saelens
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, 9052, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000, Ghent, Belgium
| | - Robrecht Cannoodt
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, 9052, Ghent, Belgium
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000, Ghent, Belgium
| | - Charlotte Soneson
- Institute of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland
- SIB Swiss Institute of Bioinformatics, University of Zurich, 8057, Zurich, Switzerland
- Present address: Friedrich Miescher Institute for Biomedical Research and SIB Swiss Institute of Bioinformatics, 4058, Basel, Switzerland
| | - Alexander Hapfelmeier
- Institute of Medical Informatics, Statistics and Epidemiology, Technical University of Munich, 81675, Munich, Germany
| | - Paul P Gardner
- Department of Biochemistry, University of Otago, Dunedin, 9016, New Zealand
| | - Anne-Laure Boulesteix
- Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig-Maximilians-University, 81377, Munich, Germany
| | - Yvan Saeys
- Data Mining and Modelling for Biomedicine, VIB Center for Inflammation Research, 9052, Ghent, Belgium.
- Department of Applied Mathematics, Computer Science and Statistics, Ghent University, 9000, Ghent, Belgium.
| | - Mark D Robinson
- Institute of Molecular Life Sciences, University of Zurich, 8057, Zurich, Switzerland.
- SIB Swiss Institute of Bioinformatics, University of Zurich, 8057, Zurich, Switzerland.
| |
Collapse
|