1
|
Varrella S, Livi S, Corinaldesi C, Castriota L, Maggio T, Vivona P, Pindo M, Fava S, Danovaro R, Dell'Anno A. A comprehensive assessment of non-indigenous species requires the combination of multi-marker eDNA metabarcoding with classical taxonomic identification. ENVIRONMENT INTERNATIONAL 2025; 199:109489. [PMID: 40288285 DOI: 10.1016/j.envint.2025.109489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2024] [Revised: 04/16/2025] [Accepted: 04/18/2025] [Indexed: 04/29/2025]
Abstract
In marine environment, non-indigenous species (NIS) can alter natural habitats and cause biodiversity loss with important consequences for ecosystems and socio-economic activities. With more than 1000 NIS introduced over the last century, the Mediterranean Sea is one of the most threatened regions worldwide, requiring an early identification of newly entered alien species for a proper environmental management. Here, we carried out environmental-DNA (eDNA) metabarcoding analyses, using multiple molecular markers (i.e., 18S rRNA, COI, and rbcL) and different genetic databases (i.e., NCBI, PR2, SILVA, MIDORI2, MGZDB, and BOLD), on seawater and sediment samples collected on a seasonal basis in three Mediterranean ports located in the North Adriatic, Ionian and Tyrrhenian Sea to identify marine species, and particularly NIS. The use of the multi-marker eDNA metabarcoding allowed the identification of a higher number of species compared to the morphological analyses (1484 vs. 752 species), with a minor portion of species shared by both approaches. Overall, only 4 NIS were consistently identified by both morphological and molecular approaches, whereas 27 and 17 NIS were exclusively detected by using eDNA metabarcoding and classical taxonomic analyses, respectively. The eDNA metabarcoding allowed also identifying the genetic signatures of 5 NIS never reported in the Italian waters. We conclude that eDNA metabarcoding can represent a highly sensitive tool for the early identification of NIS, but a comprehensive census of the NIS requires the combination of molecular and morphological approaches.
Collapse
Affiliation(s)
- Stefano Varrella
- Department of Life and Environmental Sciences, Polytechnic University of Marche, Via Brecce Bianche, 60131 Ancona, Italy; National Biodiversity Future Centre, 90133 Palermo, Italy.
| | - Silvia Livi
- Italian Institute for Environmental Protection and Research (ISPRA), Department for the Monitoring and Protection of the Environment and for the Conservation of Biodiversity Via Brancati 48, 00144 Rome, Italy
| | - Cinzia Corinaldesi
- National Biodiversity Future Centre, 90133 Palermo, Italy; Department of Materials, Environmental Sciences and Urban Planning, Polytechnic University of Marche, Via Brecce Bianche, 60131 Ancona, Italy
| | - Luca Castriota
- Italian Institute for Environmental Protection and Research (ISPRA), Department for the Monitoring and Protection of the Environment and for the Conservation of Biodiversity, Unit for Conservation Management and Sustainable Use of Fish and Marine Resources, 90149 Palermo, Italy
| | - Teresa Maggio
- Italian Institute for Environmental Protection and Research (ISPRA), Department for the Monitoring and Protection of the Environment and for the Conservation of Biodiversity, Unit for Conservation Management and Sustainable Use of Fish and Marine Resources, 90149 Palermo, Italy
| | - Pietro Vivona
- Italian Institute for Environmental Protection and Research (ISPRA), Department for the Monitoring and Protection of the Environment and for the Conservation of Biodiversity, Unit for Conservation Management and Sustainable Use of Fish and Marine Resources, 90149 Palermo, Italy
| | - Massimo Pindo
- Research and Innovation Centre, Fondazione Edmund Mach, San Michele all'Adige, Italy
| | - Sebastiano Fava
- Department of Life and Environmental Sciences, Polytechnic University of Marche, Via Brecce Bianche, 60131 Ancona, Italy
| | - Roberto Danovaro
- Department of Life and Environmental Sciences, Polytechnic University of Marche, Via Brecce Bianche, 60131 Ancona, Italy; National Biodiversity Future Centre, 90133 Palermo, Italy
| | - Antonio Dell'Anno
- Department of Life and Environmental Sciences, Polytechnic University of Marche, Via Brecce Bianche, 60131 Ancona, Italy; National Biodiversity Future Centre, 90133 Palermo, Italy
| |
Collapse
|
2
|
Hu H, Wei XY, Liu L, Wang YB, Bu LK, Jia HJ, Pei DS. Biogeographic patterns of meio- and micro-eukaryotic communities in dam-induced river-reservoir systems. Appl Microbiol Biotechnol 2024; 108:130. [PMID: 38229334 DOI: 10.1007/s00253-023-12993-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 10/30/2023] [Accepted: 12/28/2023] [Indexed: 01/18/2024]
Abstract
Although the Three Gorges Dam (TGD) is the world's largest hydroelectric dam, little is known about the spatial-temporal patterns and community assembly mechanisms of meio- and micro-eukaryotes and its two subtaxa (zooplankton and zoobenthos). This knowledge gap is particularly evident across various habitats and during different water-level periods, primarily arising from the annual regular dam regulation. To address this inquiry, we employed mitochondrial cytochrome c oxidase I (COI) gene-based environmental DNA (eDNA) metabarcoding technology to systematically analyze the biogeographic pattern of the three communities within the Three Gorges Reservoir (TGR). Our findings reveal distinct spatiotemporal characteristics and complementary patterns in the distribution of meio- and micro-eukaryotes. The three communities showed similar biogeographic patterns and assembly processes. Notably, the diversity of these three taxa gradually decreased along the river. Their communities were less shaped by stochastic processes, which gradually decreased along the longitudinal riverine-transition-lacustrine gradient. Hence, deterministic factors, such as seasonality, environmental, and spatial variables, along with species interactions, likely play a pivotal role in shaping these communities. Environmental factors primarily drive seasonal variations in these communities, while hydrological conditions, represented as spatial distance, predominantly influence spatial variations. These three communities followed the distance-decay pattern. In winter, compared to summer, both the decay and species interrelationships are more pronounced. Taken together, this study offers fresh insights into the composition and diversity patterns of meio- and micro-eukaryotes at the spatial-temporal level. It also uncovers the mechanisms behind community assembly in various environmental niches within the dam-induced river-reservoir systems. KEY POINTS: • Distribution and diversity of meio- and micro-eukaryotes exhibit distinct spatiotemporal patterns in the TGR. • Contribution of stochastic processes in community assembly gradually decreases along the river. • Deterministic factors and species interactions shape meio- and micro-eukaryotic community.
Collapse
Affiliation(s)
- Huan Hu
- Chongqing Jiaotong University, Chongqing, 400074, China
- Chongqing Institute of Green and Intelligent Technology, Chongqing School of University of Chinese Academy of Sciences, Chinese Academy of Sciences, Chongqing, 400714, China
| | - Xing-Yi Wei
- Chongqing Jiaotong University, Chongqing, 400074, China
- Chongqing Institute of Green and Intelligent Technology, Chongqing School of University of Chinese Academy of Sciences, Chinese Academy of Sciences, Chongqing, 400714, China
| | - Li Liu
- Chongqing Institute of Green and Intelligent Technology, Chongqing School of University of Chinese Academy of Sciences, Chinese Academy of Sciences, Chongqing, 400714, China
| | - Yuan-Bo Wang
- Chongqing Jiaotong University, Chongqing, 400074, China
- Chongqing Institute of Green and Intelligent Technology, Chongqing School of University of Chinese Academy of Sciences, Chinese Academy of Sciences, Chongqing, 400714, China
| | - Ling-Kang Bu
- Chongqing Institute of Green and Intelligent Technology, Chongqing School of University of Chinese Academy of Sciences, Chinese Academy of Sciences, Chongqing, 400714, China
- School of Public Health, Chongqing Medical University, Chongqing, 400016, China
| | - Huang-Jie Jia
- Chongqing Institute of Green and Intelligent Technology, Chongqing School of University of Chinese Academy of Sciences, Chinese Academy of Sciences, Chongqing, 400714, China
| | - De-Sheng Pei
- School of Public Health, Chongqing Medical University, Chongqing, 400016, China.
| |
Collapse
|
3
|
Perry WB, Seymour M, Orsini L, Jâms IB, Milner N, Edwards F, Harvey R, de Bruyn M, Bista I, Walsh K, Emmett B, Blackman R, Altermatt F, Lawson Handley L, Mächler E, Deiner K, Bik HM, Carvalho G, Colbourne J, Cosby BJ, Durance I, Creer S. An integrated spatio-temporal view of riverine biodiversity using environmental DNA metabarcoding. Nat Commun 2024; 15:4372. [PMID: 38782932 PMCID: PMC11116482 DOI: 10.1038/s41467-024-48640-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Accepted: 05/09/2024] [Indexed: 05/25/2024] Open
Abstract
Anthropogenically forced changes in global freshwater biodiversity demand more efficient monitoring approaches. Consequently, environmental DNA (eDNA) analysis is enabling ecosystem-scale biodiversity assessment, yet the appropriate spatio-temporal resolution of robust biodiversity assessment remains ambiguous. Here, using intensive, spatio-temporal eDNA sampling across space (five rivers in Europe and North America, with an upper range of 20-35 km between samples), time (19 timepoints between 2017 and 2018) and environmental conditions (river flow, pH, conductivity, temperature and rainfall), we characterise the resolution at which information on diversity across the animal kingdom can be gathered from rivers using eDNA. In space, beta diversity was mainly dictated by turnover, on a scale of tens of kilometres, highlighting that diversity measures are not confounded by eDNA from upstream. Fish communities showed nested assemblages along some rivers, coinciding with habitat use. Across time, seasonal life history events, including salmon and eel migration, were detected. Finally, effects of environmental conditions were taxon-specific, reflecting habitat filtering of communities rather than effects on DNA molecules. We conclude that riverine eDNA metabarcoding can measure biodiversity at spatio-temporal scales relevant to species and community ecology, demonstrating its utility in delivering insights into river community ecology during a time of environmental change.
Collapse
Affiliation(s)
- William Bernard Perry
- Molecular Ecology and Evolution at Bangor (MEEB), School of Biological Sciences, Bangor University, Bangor, Gwynedd, LL57 2UW, UK.
- Water Research Institute, Cardiff University, Cardiff, CF10 3AX, UK.
| | | | - Luisa Orsini
- Environmental Genomics Group, School of Biosciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Ifan Bryn Jâms
- Water Research Institute, Cardiff University, Cardiff, CF10 3AX, UK
| | - Nigel Milner
- Molecular Ecology and Evolution at Bangor (MEEB), School of Biological Sciences, Bangor University, Bangor, Gwynedd, LL57 2UW, UK
| | - François Edwards
- APEM Ltd, A17 Embankment Business Park, Heaton Mersey, Manchester, SK4 3GN, UK
| | - Rachel Harvey
- Centre for Ecology & Hydrology, Environment Centre Wales, Bangor, LL57 2UW, UK
| | - Mark de Bruyn
- Australian Research Centre for Human Evolution, School of Environment and Science, Griffith University, Queensland, 4111, Australia
| | - Iliana Bista
- LOEWE Centre for Translational Biodiversity Genomics, 60325, Frankfurt, Germany
- Senckenberg Research Institute, 60325, Frankfurt, Germany
- Naturalis Biodiversity Center, Darwinweg 2, 2333, Leiden, Netherlands
- Wellcome Sanger Institute, Tree of Life, Wellcome Genome Campus, Hinxton, CB10 1SA, UK
| | - Kerry Walsh
- Environment Agency, Horizon House, Deanery Road, Bristol, BS1 5AH, UK
| | - Bridget Emmett
- Centre for Ecology & Hydrology, Environment Centre Wales, Bangor, LL57 2UW, UK
| | - Rosetta Blackman
- Department of Aquatic Ecology, Eawag: Swiss Federal Institute of Aquatic Science and Technology, Überlandstrasse 133, CH-8600, Dübendorf, Switzerland
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, 8057, Zürich, Switzerland
- Evolutionary Biology Group (@EvoHull), Department of Biological and Marine Sciences, University of Hull (UoH), Cottingham Road, Hull, HU6 7RX, UK
| | - Florian Altermatt
- Department of Aquatic Ecology, Eawag: Swiss Federal Institute of Aquatic Science and Technology, Überlandstrasse 133, CH-8600, Dübendorf, Switzerland
- Department of Evolutionary Biology and Environmental Studies, University of Zurich, Winterthurerstrasse 190, 8057, Zürich, Switzerland
| | - Lori Lawson Handley
- Evolutionary Biology Group (@EvoHull), Department of Biological and Marine Sciences, University of Hull (UoH), Cottingham Road, Hull, HU6 7RX, UK
| | - Elvira Mächler
- Department of Aquatic Ecology, Eawag: Swiss Federal Institute of Aquatic Science and Technology, Überlandstrasse 133, CH-8600, Dübendorf, Switzerland
| | - Kristy Deiner
- Institute of Biogeochemistry and Pollutant Dynamics (IBP), ETH Zurich, Zurich, Switzerland
| | - Holly M Bik
- Department of Marine Sciences and Institute of Bioinformatics, University of Georgia, Georgia, USA
| | - Gary Carvalho
- Molecular Ecology and Evolution at Bangor (MEEB), School of Biological Sciences, Bangor University, Bangor, Gwynedd, LL57 2UW, UK
| | - John Colbourne
- Environmental Genomics Group, School of Biosciences, University of Birmingham, Birmingham, B15 2TT, UK
| | - Bernard Jack Cosby
- Centre for Ecology & Hydrology, Environment Centre Wales, Bangor, LL57 2UW, UK
| | - Isabelle Durance
- Water Research Institute, Cardiff University, Cardiff, CF10 3AX, UK
| | - Simon Creer
- Molecular Ecology and Evolution at Bangor (MEEB), School of Biological Sciences, Bangor University, Bangor, Gwynedd, LL57 2UW, UK.
| |
Collapse
|
4
|
Vuataz L, Reding JP, Reding A, Roesti C, Stoffel C, Vinçon G, Gattolliat JL. A comprehensive DNA barcoding reference database for Plecoptera of Switzerland. Sci Rep 2024; 14:6322. [PMID: 38491157 PMCID: PMC10943188 DOI: 10.1038/s41598-024-56930-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2023] [Accepted: 03/12/2024] [Indexed: 03/18/2024] Open
Abstract
DNA barcoding is an essential tool in modern biodiversity sciences. Despite considerable work to barcode the tree of life, many groups, including insects, remain partially or totally unreferenced, preventing barcoding from reaching its full potential. Aquatic insects, especially the three orders Ephemeroptera, Plecoptera, and Trichoptera (EPT), are key freshwater quality indicators worldwide. Among them, Plecoptera (stoneflies), which are among the most sensitive aquatic insects to habitat modification, play a central role in river monitoring surveys. Here, we present an update of the Plecoptera reference database for (meta)barcoding in Switzerland, now covering all 118 species known from this country. Fresh specimens, mostly from rare or localized species, were collected, and 151 new CO1 barcodes were generated. These were merged with the 422 previously published sequences, resulting in a dataset of 573 barcoded specimens. Our CO1 dataset was delimited in 115 CO1 clusters based on a priori morphological identifications, of which 17% are newly reported for Switzerland, and 4% are newly reported globally. Among the 115 CO1 clusters, 85% showed complete congruence with morphology. Distance-based analysis indicated local barcoding gaps in 97% of the CO1 clusters. This study significantly improves the Swiss reference database for stoneflies, enhancing future species identification accuracy and biodiversity monitoring. Additionally, this work reveals cryptic diversity and incongruence between morphology and barcodes, both presenting valuable opportunities for future integrative taxonomic studies. Voucher specimens, DNA extractions and reference barcodes are available for future developments, including metabarcoding and environmental DNA surveys.
Collapse
Affiliation(s)
- Laurent Vuataz
- Département de zoologie, Palais de Rumine, Muséum cantonal des sciences naturelles, Place Riponne 6, 1005, Lausanne, Switzerland.
- Department of Ecology and Evolution, University of Lausanne (UNIL), 1015, Lausanne, Switzerland.
| | | | | | | | - Céline Stoffel
- Département de zoologie, Palais de Rumine, Muséum cantonal des sciences naturelles, Place Riponne 6, 1005, Lausanne, Switzerland
- Department of Ecology and Evolution, University of Lausanne (UNIL), 1015, Lausanne, Switzerland
| | | | - Jean-Luc Gattolliat
- Département de zoologie, Palais de Rumine, Muséum cantonal des sciences naturelles, Place Riponne 6, 1005, Lausanne, Switzerland
- Department of Ecology and Evolution, University of Lausanne (UNIL), 1015, Lausanne, Switzerland
| |
Collapse
|
5
|
Barrenechea Angeles I, Nguyen NL, Greco M, Tan KS, Pawlowski J. Assigning the unassigned: A signature-based classification of rDNA metabarcodes reveals new deep-sea diversity. PLoS One 2024; 19:e0298440. [PMID: 38422100 PMCID: PMC10903905 DOI: 10.1371/journal.pone.0298440] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 01/23/2024] [Indexed: 03/02/2024] Open
Abstract
Environmental DNA metabarcoding reveals a vast genetic diversity of marine eukaryotes. Yet, most of the metabarcoding data remain unassigned due to the paucity of reference databases. This is particularly true for the deep-sea meiofauna and eukaryotic microbiota, whose hidden diversity is largely unexplored. Here, we tackle this issue by using unique DNA signatures to classify unknown metabarcodes assigned to deep-sea foraminifera. We analyzed metabarcoding data obtained from 311 deep-sea sediment samples collected in the Clarion-Clipperton Fracture Zone, an area of potential polymetallic nodule exploitation in the Eastern Pacific Ocean. Using the signatures designed in the 37F hypervariable region of the 18S rRNA gene, we were able to classify 802 unassigned metabarcodes into 61 novel lineages, which have been placed in 27 phylogenetic clades. The comparison of new lineages with other foraminiferal datasets shows that most novel lineages are widely distributed in the deep sea. Five lineages are also present in the shallow-water datasets; however, phylogenetic analysis of these lineages separates deep-sea and shallow-water metabarcodes except in one case. While the signature-based classification does not solve the problem of gaps in reference databases, this taxonomy-free approach provides insight into the distribution and ecology of deep-sea species represented by unassigned metabarcodes, which could be useful in future applications of metabarcoding for environmental monitoring.
Collapse
Affiliation(s)
- Inès Barrenechea Angeles
- Department of Earth Sciences, University of Geneva, Geneva, Switzerland
- Department of Genetics and Evolution, University of Geneva, Geneva, Switzerland
- Department of Geosciences, UiT-The Arctic University of Norway, Tromsø, Norway
| | - Ngoc-Loi Nguyen
- Institute of Oceanology, Polish Academy of Sciences, Sopot, Poland
| | - Mattia Greco
- Institute of Oceanology, Polish Academy of Sciences, Sopot, Poland
- Institute of Marine Sciences, Spanish National Research Council, Barcelona, Spain
| | - Koh Siang Tan
- Tropical Marine Science Institute, National University of Singapore, Singapore, Singapore
| | - Jan Pawlowski
- Institute of Oceanology, Polish Academy of Sciences, Sopot, Poland
- ID-Gene Ecodiagnostics Ltd., Plan-les-Ouates, Switzerland
| |
Collapse
|
6
|
San Martin G, Hautier L, Mingeot D, Dubois B. How reliable is metabarcoding for pollen identification? An evaluation of different taxonomic assignment strategies by cross-validation. PeerJ 2024; 12:e16567. [PMID: 38313030 PMCID: PMC10838070 DOI: 10.7717/peerj.16567] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2023] [Accepted: 11/12/2023] [Indexed: 02/06/2024] Open
Abstract
Metabarcoding is a powerful tool, increasingly used in many disciplines of environmental sciences. However, to assign a taxon to a DNA sequence, bioinformaticians need to choose between different strategies or parameter values and these choices sometimes seem rather arbitrary. In this work, we present a case study on ITS2 and rbcL databases used to identify pollen collected by bees in Belgium. We blasted a random sample of sequences from the reference database against the remainder of the database using different strategies and compared the known taxonomy with the predicted one. This in silico cross-validation (CV) approach proved to be an easy yet powerful way to (1) assess the relative accuracy of taxonomic predictions, (2) define rules to discard dubious taxonomic assignments and (3) provide a more objective basis to choose the best strategy. We obtained the best results with the best blast hit (best bit score) rather than by selecting the majority taxon from the top 10 hits. The predictions were further improved by favouring the most frequent taxon among those with tied best bit scores. We obtained better results with databases containing the full sequences available on NCBI rather than restricting the sequences to the region amplified by the primers chosen in our study. Leaked CV showed that when the true sequence is present in the database, blast might still struggle to match the right taxon at the species level, particularly with rbcL. Classical 10-fold CV-where the true sequence is removed from the database-offers a different yet more realistic view of the true error rates. Taxonomic predictions with this approach worked well up to the genus level, particularly for ITS2 (5-7% of errors). Using a database containing only the local flora of Belgium did not improve the predictions up to the genus level for local species and made them worse for foreign species. At the species level, using a database containing exclusively local species improved the predictions for local species by ∼12% but the error rate remained rather high: 25% for ITS2 and 42% for rbcL. Foreign species performed worse even when using a world database (59-79% of errors). We used classification trees and GLMs to model the % of errors vs. identity and consensus scores and determine appropriate thresholds below which the taxonomic assignment should be discarded. This resulted in a significant reduction in prediction errors, but at the cost of a much higher proportion of unassigned sequences. Despite this stringent filtering, at least 1/5 sequences deemed suitable for species-level identification ultimately proved to be misidentified. An examination of the variability in prediction accuracy between plant families showed that rbcL outperformed ITS2 for only two of the 27 families examined, and that the % correct species-level assignments were much better for some families (e.g. 95% for Sapindaceae) than for others (e.g. 35% for Salicaceae).
Collapse
Affiliation(s)
- Gilles San Martin
- Life Sciences Department, Plant and Forest Health Unit, Walloon Agricultural Research Centre, Gembloux, Belgium
| | - Louis Hautier
- Life Sciences Department, Plant and Forest Health Unit, Walloon Agricultural Research Centre, Gembloux, Belgium
| | - Dominique Mingeot
- Life Sciences Department, Bioengineering Unit, Walloon Agricultural Research Centre, Gembloux, Belgium
| | - Benjamin Dubois
- Life Sciences Department, Bioengineering Unit, Walloon Agricultural Research Centre, Gembloux, Belgium
| |
Collapse
|
7
|
Curd EE, Gal L, Gallego R, Silliman K, Nielsen S, Gold Z. rCRUX: A Rapid and Versatile Tool for Generating Metabarcoding Reference libraries in R. ENVIRONMENTAL DNA (HOBOKEN, N.J.) 2024; 6:e489. [PMID: 38370872 PMCID: PMC10871694 DOI: 10.1002/edn3.489] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Accepted: 10/19/2023] [Indexed: 02/20/2024]
Abstract
The sequencing revolution requires accurate taxonomic classification of DNA sequences. Key to making accurate taxonomic assignments are curated, comprehensive reference barcode databases. However, the generation and curation of such databases has remained challenging given the large and continuously growing volumes of both DNA sequence data and novel reference barcode targets. Monitoring and research applications require a greater diversity of specialized gene regions and targeted taxa then are currently curated by professional staff. Thus there is a growing need for an easy to implement computational tool that can generate comprehensive metabarcoding reference libraries for any bespoke locus. We address this need by reimagining CRUX from the Anacapa Toolkit and present the rCRUX package in R which, like it's predecessor, relies on sequence homology and PCR primer compatibility instead of keyword-searches to avoid limitations of user-defined metadata. The typical workflow involves searching for plausible seed amplicons (get_seeds_local() or get_seeds_remote()) by simulating in silico PCR to acquire a set of sequences analogous to PCR products containing a user-defined set of primer sequences. Next, these seeds are used to iteratively blast search seed sequences against a local copy of the National Center for Biotechnology Information (NCBI) formatted nt database using a taxonomic-rank based stratified random sampling approach ( blast_seeds() ). This results in a comprehensive set of sequence matches. This database is dereplicated and cleaned (derep_and_clean_db()) by identifying identical reference sequences and collapsing the taxonomic path to the lowest taxonomic agreement across all matching reads. This results in a curated, comprehensive database of primer-specific reference barcode sequences from NCBI. Databases can then be compared (compare_db()) to determine read and taxonomic overlap. We demonstrate that rCRUX provides more comprehensive reference databases for the MiFish Universal Teleost 12S, Taberlet trnl, fungal ITS, and Leray CO1 loci than CRABS, MetaCurator, RESCRIPt, and ecoPCR reference databases. We then further demonstrate the utility of rCRUX by generating 24 reference databases for 20 metabarcoding loci, many of which lack dedicated reference database curation efforts. The rCRUX package provides a simple to use tool for the generation of curated, comprehensive reference databases for user-defined loci, facilitating accurate and effective taxonomic classification of metabarcoding and DNA sequence efforts broadly.
Collapse
Affiliation(s)
- Emily E. Curd
- Vermont Biomedical Research Network, University of Vermont, VT, USA
| | - Luna Gal
- Landmark College, VT, USA
- California Cooperative Oceanic Fisheries Investigations (CalCOFI), Scripps Institution of Oceanography, University of California San Diego (UCSD), La Jolla, CA, USA
| | - Ramon Gallego
- Departamento de Biología, Universidad Autónoma de Madrid, Cantoblanco, Madrid, Spain
| | - Katherine Silliman
- Northern Gulf Institute, Mississippi State University, Starkville, MS, USA
- NOAA Atlantic Oceanographic and Meteorological Laboratory, Miami, FL, USA
| | | | - Zachary Gold
- California Cooperative Oceanic Fisheries Investigations (CalCOFI), Scripps Institution of Oceanography, University of California San Diego (UCSD), La Jolla, CA, USA
- NOAA Pacific Marine Environmental Laboratory, Seattle, WA, USA
| |
Collapse
|
8
|
Diaz-Suarez A, Noreikiene K, Kahar S, Ozerov MY, Gross R, Kisand V, Vasemägi A. DNA metabarcoding reveals spatial and temporal variation of fish eye fluke communities in lake ecosystems. Int J Parasitol 2024; 54:33-46. [PMID: 37633409 DOI: 10.1016/j.ijpara.2023.07.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2023] [Revised: 07/09/2023] [Accepted: 07/11/2023] [Indexed: 08/28/2023]
Abstract
Eye flukes (Diplostomidae) are diverse and abundant trematode parasites that form multi-species communities in fish with negative effects on host fitness and survival. However, the environmental factors and host-related characteristics that determine species diversity, composition, and coexistence in such communities remain poorly understood. Here, we developed a cost-effective cox1 region-specific DNA metabarcoding approach to characterize parasitic diplostomid communities in two common fish species (Eurasian perch and common roach) collected from seven temperate lakes in Estonia. We found considerable inter- and intra-lake, as well as inter-host species, variation in diplostomid communities. Sympatric host species characterization revealed that parasite communities were typically more diverse in roach than perch. Additionally, we detected five positive and two negative diplostomid species associations in roach, whereas only a single negative association was observed in perch. These results indicate that diplostomid communities in temperate lakes are complex and dynamic systems exhibiting both spatial and temporal heterogeneity. They are influenced by various environmental factors and by host-parasite and inter-parasite interactions. We expect that the described methodology facilitates ecological and biodiversity research of diplostomid parasites. It is also adaptable to other parasite groups where it could serve to improve current understanding of diversity, distribution, and interspecies interactions of other understudied taxa.
Collapse
Affiliation(s)
- Alfonso Diaz-Suarez
- Chair of Aquaculture, Institute of Veterinary Medicine and Animal Sciences, Estonian University of Life Sciences, Kreutzwaldi 46, 51006 Tartu, Estonia.
| | - Kristina Noreikiene
- Chair of Aquaculture, Institute of Veterinary Medicine and Animal Sciences, Estonian University of Life Sciences, Kreutzwaldi 46, 51006 Tartu, Estonia. https://twitter.com/snaudale
| | - Siim Kahar
- Chair of Aquaculture, Institute of Veterinary Medicine and Animal Sciences, Estonian University of Life Sciences, Kreutzwaldi 46, 51006 Tartu, Estonia
| | - Mikhail Y Ozerov
- Biodiversity Unit, University of Turku, 20014 Turku, Finland; Department of Biology, University of Turku, 20014 Turku, Finland; Department of Aquatic Resources, Swedish University of Agricultural Sciences, Stångholmsvägen 2, 17893 Drottningholm, Sweden
| | - Riho Gross
- Chair of Aquaculture, Institute of Veterinary Medicine and Animal Sciences, Estonian University of Life Sciences, Kreutzwaldi 46, 51006 Tartu, Estonia
| | - Veljo Kisand
- Institute of Technology, University of Tartu, 50411 Tartu, Estonia
| | - Anti Vasemägi
- Chair of Aquaculture, Institute of Veterinary Medicine and Animal Sciences, Estonian University of Life Sciences, Kreutzwaldi 46, 51006 Tartu, Estonia; Department of Aquatic Resources, Swedish University of Agricultural Sciences, Stångholmsvägen 2, 17893 Drottningholm, Sweden
| |
Collapse
|
9
|
Hubert N, Phillips JD, Hanner RH. Delimiting Species with Single-Locus DNA Sequences. Methods Mol Biol 2024; 2744:53-76. [PMID: 38683311 DOI: 10.1007/978-1-0716-3581-0_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/01/2024]
Abstract
DNA sequences are increasingly used for large-scale biodiversity inventories. Because these genetic data avoid the time-consuming initial sorting of specimens based on their phenotypic attributes, they have been recently incorporated into taxonomic workflows for overlooked and diverse taxa. Major statistical developments have accompanied this new practice, and several models have been proposed to delimit species with single-locus DNA sequences. However, proposed approaches to date make different assumptions regarding taxon lineage history, leading to strong discordance whenever comparisons are made among methods. Distance-based methods, such as Automatic Barcode Gap Discovery (ABGD) and Assemble Species by Automatic Partitioning (ASAP), rely on the detection of a barcode gap (i.e., the lack of overlap in the distributions of intraspecific and interspecific genetic distances) and the associated threshold in genetic distances. Network-based methods, as exemplified by the REfined Single Linkage (RESL) algorithm for the generation of Barcode Index Numbers (BINs), use connectivity statistics to hierarchically cluster-related haplotypes into molecular operational taxonomic units (MOTUs) which serve as species proxies. Tree-based methods, including Poisson Tree Processes (PTP) and the General Mixed Yule Coalescent (GMYC), fit statistical models to phylogenetic trees by maximum likelihood or Bayesian frameworks.Multiple webservers and stand-alone versions of these methods are now available, complicating decision-making regarding the most appropriate approach to use for a given taxon of interest. For instance, tree-based methods require an initial phylogenetic reconstruction, and multiple options are now available for this purpose such as RAxML and BEAST. Across all examined species delimitation methods, judicious parameter setting is paramount, as different model parameterizations can lead to differing conclusions. The objective of this chapter is to guide users step-by-step through all the procedures involved for each of these methods, while aggregating all necessary information required to conduct these analyses. The "Materials" section details how to prepare and format input files, including options to align sequences and conduct tree reconstruction with Maximum Likelihood and Bayesian inference. The Methods section presents the procedure and options available to conduct species delimitation analyses, including distance-, network-, and tree-based models. Finally, limits and future developments are discussed in the Notes section. Most importantly, species delimitation methods discussed herein are categorized based on five indicators: reliability, availability, scalability, understandability, and usability, all of which are fundamental properties needed for any approach to gain unanimous adoption within the DNA barcoding community moving forward.
Collapse
Affiliation(s)
- Nicolas Hubert
- UMR ISEM (IRD, UM, CNRS), Université de Montpellier, Montpellier, France.
| | - Jarrett D Phillips
- School of Computer Science, University of Guelph, Guelph, ON, Canada
- Department of Integrative Biology, University of Guelph, Guelph, ON, Canada
| | - Robert H Hanner
- Department of Integrative Biology, University of Guelph, Guelph, ON, Canada
| |
Collapse
|
10
|
Noguerales V, Meramveliotakis E, Castro-Insua A, Andújar C, Arribas P, Creedy TJ, Overcast I, Morlon H, Emerson BC, Vogler AP, Papadopoulou A. Community metabarcoding reveals the relative role of environmental filtering and spatial processes in metacommunity dynamics of soil microarthropods across a mosaic of montane forests. Mol Ecol 2023; 32:6110-6128. [PMID: 34775647 DOI: 10.1111/mec.16275] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 10/25/2021] [Accepted: 11/05/2021] [Indexed: 01/04/2023]
Abstract
Disentangling the relative role of environmental filtering and spatial processes in driving metacommunity structure across mountainous regions remains challenging, as the way we quantify spatial connectivity in topographically and environmentally heterogeneous landscapes can influence our perception of which process predominates. More empirical data sets are required to account for taxon- and context-dependency, but relevant research in understudied areas is often compromised by the taxonomic impediment. Here we used haplotype-level community DNA metabarcoding, enabled by stringent filtering of amplicon sequence variants (ASVs), to characterize metacommunity structure of soil microarthropod assemblages across a mosaic of five forest habitats on the Troodos mountain range in Cyprus. We found similar β diversity patterns at ASV and species (OTU, operational taxonomic unit) levels, which pointed to a primary role of habitat filtering resulting in the existence of largely distinct metacommunities linked to different forest types. Within-habitat turnover was correlated to topoclimatic heterogeneity, again emphasizing the role of environmental filtering. However, when integrating landscape matrix information for the highly fragmented Quercus alnifolia habitat, we also detected a major role of spatial isolation determined by patch connectivity, indicating that stochastic and niche-based processes synergistically govern community assembly. Alpha diversity patterns varied between ASV and OTU levels, with OTU richness decreasing with elevation and ASV richness following a longitudinal gradient, potentially reflecting a decline of genetic diversity eastwards due to historical pressures. Our study demonstrates the utility of haplotype-level community metabarcoding for characterizing metacommunity structure of complex assemblages and improving our understanding of biodiversity dynamics across mountainous landscapes worldwide.
Collapse
Affiliation(s)
- Víctor Noguerales
- Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
- Instituto de Productos Naturales y Agrobiología (IPNA-CSIC), San Cristóbal de La Laguna, Tenerife, Canary Islands, Spain
| | | | | | - Carmelo Andújar
- Instituto de Productos Naturales y Agrobiología (IPNA-CSIC), San Cristóbal de La Laguna, Tenerife, Canary Islands, Spain
| | - Paula Arribas
- Instituto de Productos Naturales y Agrobiología (IPNA-CSIC), San Cristóbal de La Laguna, Tenerife, Canary Islands, Spain
| | - Thomas J Creedy
- Department of Life Sciences, Natural History Museum, London, UK
| | - Isaac Overcast
- Institut de Biologie de l'ENS (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| | - Hélène Morlon
- Institut de Biologie de l'ENS (IBENS), Ecole Normale Supérieure, CNRS, INSERM, Université PSL, Paris, France
| | - Brent C Emerson
- Instituto de Productos Naturales y Agrobiología (IPNA-CSIC), San Cristóbal de La Laguna, Tenerife, Canary Islands, Spain
| | - Alfried P Vogler
- Department of Life Sciences, Natural History Museum, London, UK
- Department of Life Sciences, Silwood Park Campus, Imperial College London, Ascot, UK
| | - Anna Papadopoulou
- Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| |
Collapse
|
11
|
Meglécz E. mkLTG: a command-line tool for taxonomic assignment of metabarcoding sequences using variable identity thresholds. Biol Futur 2023; 74:369-375. [PMID: 38300415 DOI: 10.1007/s42977-024-00201-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 01/04/2024] [Indexed: 02/02/2024]
Abstract
Metabarcoding is now a widely used method for biodiversity studies. Taxonomic assignment of environmental sequences is one of the key steps of metabarcoding. Assignments based on lowest common ancestor (LCA) method generally rely on fixed arbitrary thresholds, and this is generally not well adapted for assignment of taxonomically diverse groups with variable coverage in reference databases. The mkLTG is a LCA-based method that uses a series of percentage of identity thresholds starting from stringent parameters and decreasing it if necessary. All parameters can be set separately for each percentage of identity threshold, which makes this tool adaptable for different databases, genetic markers and diverse taxonomic groups. The optimization step was included using the COI marker and a comprehensive, non-redundant database. The mkLTG tool is a command-line application with few dependencies that runs in all operating systems, therefore, it is easy to include into complex pipelines. All scripts are freely available including the benchmarking at https://github.com/meglecz/mkLTG .
Collapse
Affiliation(s)
- Emese Meglécz
- IMBE, CNRS, IRD, Aix Marseille University, Avignon University, Marseille, France.
| |
Collapse
|
12
|
Ewers I, Rajter L, Czech L, Mahé F, Stamatakis A, Dunthorn M. Interpreting phylogenetic placements for taxonomic assignment of environmental DNA. J Eukaryot Microbiol 2023; 70:e12990. [PMID: 37448139 DOI: 10.1111/jeu.12990] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 05/29/2023] [Accepted: 06/17/2023] [Indexed: 07/15/2023]
Abstract
Taxonomic assignment of operational taxonomic units (OTUs) is an important bioinformatics step in analyzing environmental sequencing data. Pairwise alignment and phylogenetic-placement methods represent two alternative approaches to taxonomic assignments, but their results can differ. Here we used available colpodean ciliate OTUs from forest soils to compare the taxonomic assignments of VSEARCH (which performs pairwise alignments) and EPA-ng (which performs phylogenetic placements). We showed that when there are differences in taxonomic assignments between pairwise alignments and phylogenetic placements at the subtaxon level, there is a low pairwise similarity of the OTUs to the reference database. We then showcase how the output of EPA-ng can be further evaluated using GAPPA to assess the taxonomic assignments when there exist multiple equally likely placements of an OTU, by taking into account the sum over the likelihood weights of the OTU placements within a subtaxon, and the branch distances between equally likely placement locations. We also inferred the evolutionary and ecological characteristics of the colpodean OTUs using their placements within subtaxa. This study demonstrates how to fully analyze the output of EPA-ng, by using GAPPA in conjunction with knowledge of the taxonomic diversity of the clade of interest.
Collapse
Affiliation(s)
- Isabelle Ewers
- Eukaryotic Microbiology, Faculty of Biology, University of Duisburg-Essen, Essen, Germany
| | - Lubomír Rajter
- Eukaryotic Microbiology, Faculty of Biology, University of Duisburg-Essen, Essen, Germany
- Phycology, Faculty of Biology, University of Duisburg-Essen, Essen, Germany
| | - Lucas Czech
- Department of Plant Biology, Carnegie Institution for Science, Stanford, California, USA
| | - Frédéric Mahé
- CIRAD, UMR PHIM, Montpellier, France
- PHIM Plant Health Institute, CIRAD, INRAE, Institut Agro, IRD, University of Montpellier, Montpellier, France
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Micah Dunthorn
- Natural History Museum, University of Oslo, Oslo, Norway
| |
Collapse
|
13
|
Curd EE, Gal L, Gallego R, Nielsen S, Gold Z. rCRUX: A Rapid and Versatile Tool for Generating Metabarcoding Reference libraries in R. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.05.31.543005. [PMID: 37397980 PMCID: PMC10312559 DOI: 10.1101/2023.05.31.543005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/04/2023]
Abstract
Key to making accurate taxonomic assignments are curated, comprehensive reference barcode databases. However, the generation and curation of such databases has remained challenging given the large and continuously growing volumes of DNA sequence data and novel reference barcode targets. Monitoring and research applications require a greater diversity of specialized gene regions and targeted taxa to meet taxonomic classification goals then are currently curated by professional staff. Thus, there is a growing need for an easy to implement tool that can generate comprehensive metabarcoding reference libraries for any bespoke locus. We address this need by reimagining CRUX from the Anacapa Toolkit and present the rCRUX package in R. The typical workflow involves searching for plausible seed amplicons (get_seeds_local() or get_seeds_remote()) by simulating in silico PCR to acquire seed sequences containing a user-defined primer set. Next these seeds are used to iteratively blast search seed sequences against a local NCBI formatted database using a taxonomic rank based stratified random sampling approach (blast_seeds()) that results in a comprehensive set of sequence matches. This database is dereplicated and cleaned (derep_and_clean_db()) by identifying identical reference sequences and collapsing the taxonomic path to the lowest taxonomic agreement across all matching reads. This results in a curated, comprehensive database of primer specific reference barcode sequences from NCBI. We demonstrate that rCRUX provides more comprehensive reference databases for the MiFish Universal Teleost 12S, Taberlet trnl, and fungal ITS locus than CRABS, METACURATOR, RESCRIPt, and ECOPCR reference databases. We then further demonstrate the utility of rCRUX by generating 16 reference databases for metabarcoding loci that lack dedicated reference database curation efforts. The rCRUX package provides a simple to use tool for the generation of curated, comprehensive reference databases for user-defined loci, facilitating accurate and effective taxonomic classification of metabarcoding and DNA sequence efforts broadly.
Collapse
Affiliation(s)
- Emily E. Curd
- Vermont Biomedical Research Network, University of Vermont, VT, USA
| | - Luna Gal
- Landmark College, VT, USA
- California Cooperative Oceanic Fisheries Investigations (CalCOFI), Scripps Institution of Oceanography, University of California San Diego (UCSD), La Jolla, CA, USA
| | - Ramon Gallego
- Universidad Autónoma de Madrid - Unidad de Genética, Spain
| | | | - Zachary Gold
- California Cooperative Oceanic Fisheries Investigations (CalCOFI), Scripps Institution of Oceanography, University of California San Diego (UCSD), La Jolla, CA, USA
- NOAA Pacific Marine Environmental Laboratory, Seattle, WA, USA
| |
Collapse
|
14
|
Jeunen GJ, Dowle E, Edgecombe J, von Ammon U, Gemmell NJ, Cross H. crabs-A software program to generate curated reference databases for metabarcoding sequencing data. Mol Ecol Resour 2023; 23:725-738. [PMID: 36437603 DOI: 10.1111/1755-0998.13741] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 10/30/2022] [Accepted: 11/15/2022] [Indexed: 11/29/2022]
Abstract
The measurement of biodiversity is an integral aspect of life science research. With the establishment of second- and third-generation sequencing technologies, an increasing amount of metabarcoding data is being generated as we seek to describe the extent and patterns of biodiversity in multiple contexts. The reliability and accuracy of taxonomically assigning metabarcoding sequencing data have been shown to be critically influenced by the quality and completeness of reference databases. Custom, curated, eukaryotic reference databases, however, are scarce, as are the software programs for generating them. Here, we present crabs (Creating Reference databases for Amplicon-Based Sequencing), a software package to create custom reference databases for metabarcoding studies. crabs includes tools to download sequences from multiple online repositories (i.e., NCBI, BOLD, EMBL, MitoFish), retrieve amplicon regions through in silico PCR analysis and pairwise global alignments, curate the database through multiple filtering parameters (e.g., dereplication, sequence length, sequence quality, unresolved taxonomy, inclusion/exclusion filter), export the reference database in multiple formats for immediate use in taxonomy assignment software, and investigate the reference database through implemented visualizations for diversity, primer efficiency, reference sequence length, database completeness and taxonomic resolution. crabs is a versatile tool for generating curated reference databases of user-specified genetic markers to aid taxonomy assignment from metabarcoding sequencing data. crabs can be installed via docker and is available for download as a conda package and via GitHub (https://github.com/gjeunen/reference_database_creator).
Collapse
Affiliation(s)
- Gert-Jan Jeunen
- Department of Anatomy, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Eddy Dowle
- Department of Anatomy, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Jonika Edgecombe
- Department of Anatomy, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Ulla von Ammon
- Coastal and Freshwater Group, Cawthron Institute, Nelson, New Zealand
| | - Neil J Gemmell
- Department of Anatomy, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand
| | - Hugh Cross
- Department of Anatomy, School of Biomedical Sciences, University of Otago, Dunedin, New Zealand.,National Ecological Observatory Network, Boulder, Colorado, USA
| |
Collapse
|
15
|
Bourret A, Nozères C, Parent E, Parent GJ. Maximizing the reliability and the number of species assignments in metabarcoding studies using a curated regional library and a public repository. METABARCODING AND METAGENOMICS 2023. [DOI: 10.3897/mbmg.7.98539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/25/2023] Open
Abstract
Biodiversity assessments relying on DNA have increased rapidly over the last decade. However, the reliability of taxonomic assignments in metabarcoding studies is variable and affected by the reference databases and the assignment methods used. Species level assignments are usually considered as reliable using regional libraries but unreliable using public repositories. In this study, we aimed to test this assumption for metazoan species detected in the Gulf of St. Lawrence in the Northwest Atlantic. We first created a regional library (GSL-rl) by data mining COI barcode sequences from BOLD, and included a reliability ranking system for species assignments. We then estimated 1) the accuracy and precision of the public repository NCBI-nt for species assignments using sequences from the regional library and 2) compared the detection and reliability of species assignments of a metabarcoding dataset using either NCBI-nt or the regional library and popular assignment methods. With NCBI-nt and sequences from the regional library, the BLAST-LCA (least common ancestor) method was the most precise method for species assignments, but the accuracy was higher with the BLAST-TopHit method (>80% over all taxa, between 70% and 90% amongst taxonomic groups). With the metabarcoding dataset, the reliability of species assignments was greater using GSL-rl compared to NCBI-nt. However, we also observed that the total number of reliable species assignments could be maximized using both GSL-rl and NCBI-nt with different optimized assignment methods. The use of a two-step approach for species assignments, i.e., using a regional library and a public repository, could improve the reliability and the number of detected species in metabarcoding studies.
Collapse
|
16
|
Han W, Tang H, Wei L, Zhang E. The first DNA barcode library of Chironomidae from the Tibetan Plateau with an evaluation of the status of the public databases. Ecol Evol 2023; 13:e9849. [PMID: 36861023 PMCID: PMC9969238 DOI: 10.1002/ece3.9849] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2022] [Revised: 01/26/2023] [Accepted: 02/02/2023] [Indexed: 03/03/2023] Open
Abstract
The main aim of this study was to curate a COI barcode library of Chironomidae from the Tibetan Plateau (TP) as an essential supplement to the public database. Another aim is to evaluate the current status of the public database of Chironomidae in aspects of taxonomic coverage, geographic representation, barcode quality, and efficiency for molecular identification, the Tibetan Plateau, China. In this study, 512 individuals of Chironomidae from the TP were identified based on morphological taxonomy and barcode analysis. The metadata of public records of Chironomidae were downloaded from the BOLD, and the quality of the public barcodes was ranked using the BAGS program. The reliability of the public library for molecular identification was evaluated with the newly curated library using the BLAST method. The newly curated library comprised 159 barcode species of 54 genera, of which 58.4% of species were likely new to science. There were great gaps in the taxonomic coverage and geographic representation in the public database, and only 29.18% of barcodes were identified at the species level. The quality of the public database was of concern, with only 20% of species being determined as concordant between BINs and morphological species. The accuracy of molecular identification using the public database was poor, and about 50% of matched barcodes could be correctly identified at the species level at the identity threshold of 97%. Based on these data, some recommendations are included here for improving barcoding studies on Chironomidae. The species richness of Chironomidae from the TP is much higher than ever recorded. Barcodes from more taxonomic groups and geographic regions are urgently needed to fill the great gap in the current public database of Chironomidae. Users should take caution when public databases are adopted as reference libraries for the taxonomic assignment.
Collapse
Affiliation(s)
- Wu Han
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and LimnologyChinese Academy of ScienceNanjingChina
- University of Chinese Academy of SciencesBeijing100039China
| | - Hongqu Tang
- Life Science and Technology CollegeJinan UniversityGuangzhouChina
| | - Lili Wei
- Life Science and Technology CollegeJinan UniversityGuangzhouChina
| | - Enlou Zhang
- State Key Laboratory of Lake Science and Environment, Nanjing Institute of Geography and LimnologyChinese Academy of ScienceNanjingChina
| |
Collapse
|
17
|
Abstract
In order to survey noroviruses in our environment, it is essential that both wet-lab and computational methods are fit for purpose. Using a simulated sequencing data set, denoising-based (DADA2, Deblur and USEARCH-UNOISE3) and clustering-based pipelines (VSEARCH and FROGS) were compared with respect to their ability to represent composition and sequence information. Open source classifiers (Ribosomal Database Project [RDP], BLASTn, IDTAXA, QIIME2 naive Bayes, and SINTAX) were trained using three different databases: a custom database, the NoroNet database, and the Human calicivirus database. Each classifier and database combination was compared from the perspective of their classification accuracy. VSEARCH provides a robust option for analyzing viral amplicons based on composition analysis; however, all pipelines could return OTUs with high similarity to the expected sequences. Importantly, pipeline choice could lead to more false positives (DADA2) or underclassification (FROGS), a key aspect when considering pipeline application for source attribution. Classification was more strongly impacted by the classifier than the database, although disagreement increased with norovirus GII.4 capsid variant designation. We recommend the use of the RDP classifier in conjunction with VSEARCH; however, maintenance of the underlying database is essential for optimal use. IMPORTANCE In benchmarking bioinformatic pipelines for analyzing high-throughput sequencing (HTS) data sets, we provide method standardization for bioinformatics broadly and specifically for norovirus in situations for which no officially endorsed methods exist at present. This study provides recommendations for the appropriate analysis and classification of norovirus amplicon HTS data and will be widely applicable during outbreak investigations.
Collapse
|
18
|
Courtot É, Boisseau M, Dhorne-Pollet S, Serreau D, Gesbert A, Reigner F, Basiaga M, Kuzmina T, Lluch J, Annonay G, Kuchly C, Diekmann I, Krücken J, von Samson-Himmelstjerna G, Mach N, Sallé G. Comparison of two molecular barcodes for the study of equine strongylid communities with amplicon sequencing. PeerJ 2023; 11:e15124. [PMID: 37070089 PMCID: PMC10105562 DOI: 10.7717/peerj.15124] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Accepted: 03/03/2023] [Indexed: 04/19/2023] Open
Abstract
Basic knowledge on the biology and epidemiology of equine strongylid species still needs to be improved to contribute to the design of better parasite control strategies. Nemabiome metabarcoding is a convenient tool to quantify and identify species in bulk samples that could overcome the hurdle that cyathostomin morphological identification represents. To date, this approach has relied on the internal transcribed spacer 2 (ITS-2) of the ribosomal RNA gene, with a limited investigation of its predictive performance for cyathostomin communities. Using DNA pools of single cyathostomin worms, this study aimed to provide the first elements to compare performances of the ITS-2 and a cytochrome c oxidase subunit I (COI) barcode newly developed in this study. Barcode predictive abilities were compared across various mock community compositions of two, five and 11 individuals from distinct species. The amplification bias of each barcode was estimated. Results were also compared between various types of biological samples, i.e., eggs, infective larvae or adults. Bioinformatic parameters were chosen to yield the closest representation of the cyathostomin community for each barcode, underscoring the need for communities of known composition for metabarcoding purposes. Overall, the proposed COI barcode was suboptimal relative to the ITS-2 rDNA region, because of PCR amplification biases, reduced sensitivity and higher divergence from the expected community composition. Metabarcoding yielded consistent community composition across the three sample types. However, imperfect correlations were found between relative abundances from infective larvae and other life-stages for Cylicostephanus species using the ITS-2 barcode. While the results remain limited by the considered biological material, they suggest that additional improvements are needed for both the ITS-2 and COI barcodes.
Collapse
Affiliation(s)
- Élise Courtot
- Animal Health, UMR1282 Infectiologie et Santé Publique, INRAE, Nouzilly, France
| | - Michel Boisseau
- Animal Health, UMR1282 Infectiologie et Santé Publique, INRAE, Nouzilly, France
- Animal Health, UMR1225 IHAP, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Toulouse, France
| | | | - Delphine Serreau
- Animal Health, UMR1282 Infectiologie et Santé Publique, INRAE, Nouzilly, France
| | - Amandine Gesbert
- Animal Physiology, UEPAO, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Nouzilly, France
| | - Fabrice Reigner
- Animal Physiology, UEPAO, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Nouzilly, France
| | | | - Tetiana Kuzmina
- Schmalhausen Institute of Zoology NAS of Ukraine, Kyiv, Ukraine
- Institute of Parasitology, Slovak Academy of Sciences, Košice, Slovak Republic
| | - Jérôme Lluch
- GeT-PlaGe, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Toulouse, France
| | - Gwenolah Annonay
- GeT-PlaGe, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Toulouse, France
| | - Claire Kuchly
- GeT-PlaGe, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Toulouse, France
| | - Irina Diekmann
- Institute for Parasitology and Tropical Veterinary Medicine, Freie Universität Berlin, Berlin, Germany
| | - Jürgen Krücken
- Institute for Parasitology and Tropical Veterinary Medicine, Freie Universität Berlin, Berlin, Germany
| | | | - Nuria Mach
- Animal Health, UMR1225 IHAP, Institut National de la Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Toulouse, France
| | - Guillaume Sallé
- Animal Health, UMR1282 Infectiologie et Santé Publique, INRAE, Nouzilly, France
| |
Collapse
|
19
|
Chen X, Han M, Liang Y, Zhao W, Wu Y, Sun Y, Shao H, McMinn A, Zhu L, Wang M. Progress in 'taxonomic sufficiency' in aquatic biological investigations. MARINE POLLUTION BULLETIN 2022; 185:114192. [PMID: 36356341 DOI: 10.1016/j.marpolbul.2022.114192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Revised: 09/24/2022] [Accepted: 09/27/2022] [Indexed: 06/16/2023]
Abstract
The 'taxonomic sufficiency' (TS) approach has been applied to algae, protists, invertebrates, and vertebrates, generally by aggregating species-level abundance data to a higher taxonomic level, where genus-level data are often highly correlated with species-level data and are a valid proxy level. The TS approach offers the possibility of a comparison of data from different geographical areas and highlights the effects of contaminants. The TS approach is stable in the face of different researchers and in the comparison of long-term biological survey data. The effectiveness of the TS approach may increase with increasing environmental gradients or spatial area. The TS approach should be avoided when the spatial area is small and small differences in species-level data are considered important, so as not to cancel out the distribution patterns specific to the local environment of the biological taxa.
Collapse
Affiliation(s)
- Xuechao Chen
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, Ocean University of China, Qingdao 266003, China
| | - Meiaoxue Han
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, Ocean University of China, Qingdao 266003, China
| | - Yantao Liang
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, Ocean University of China, Qingdao 266003, China; UMT-OUC Joint Centre for Marine Studies, Qingdao 266003, China
| | - Wanting Zhao
- College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Yuejiao Wu
- College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Ying Sun
- College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China
| | - Hongbing Shao
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, Ocean University of China, Qingdao 266003, China; UMT-OUC Joint Centre for Marine Studies, Qingdao 266003, China
| | - Andrew McMinn
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, Ocean University of China, Qingdao 266003, China; Institute for Marine and Antarctic Studies, University of Tasmania, Hobart, TAS 7001, Australia.
| | - Liyan Zhu
- College of Marine Life Sciences, Ocean University of China, Qingdao 266003, China.
| | - Min Wang
- College of Marine Life Sciences, Institute of Evolution and Marine Biodiversity, Frontiers Science Center for Deep Ocean Multispheres and Earth System, Ocean University of China, Qingdao 266003, China; The affiliated hospital of Qingdao University, Qingdao 266000, China; UMT-OUC Joint Centre for Marine Studies, Qingdao 266003, China.
| |
Collapse
|
20
|
Comparative environmental RNA and DNA metabarcoding analysis of river algae and arthropods for ecological surveys and water quality assessment. Sci Rep 2022; 12:19828. [PMID: 36400924 PMCID: PMC9674700 DOI: 10.1038/s41598-022-23888-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 11/07/2022] [Indexed: 11/19/2022] Open
Abstract
Environmental DNA (eDNA) metabarcoding is widely used for species analysis, while the use of environmental RNA (eRNA) metabarcoding is more limited. We conducted comparative eDNA/eRNA metabarcoding of the algae and arthropods (aquatic insects) in water samples from Naka River, Japan, to evaluate their potential for biological monitoring and water quality assessment. Both methods detected various algae and arthropod species; however, their compositions were remarkably different from those in traditional field surveys (TFSs), indicating low sensitivity. For algae, the species composition derived from eDNA and eRNA metabarcoding was equivalent. While TFSs focus on attached algae, metabarcoding analysis theoretically detects both planktonic and attached algae. A recently expanded genomic database for aquatic insects significantly contributed to the sensitivity and positive predictivity for arthropods. While the sensitivity of eRNA was lower than that of eDNA, the positive predictivity of eRNA was higher. The eRNA of terrestrial arthropods indicated extremely high or low read numbers when compared with eDNA, suggesting that eRNA could be an effective indicator of false positives. Arthropod and algae eDNA/eRNA metabarcoding analysis enabled water quality estimates from TFSs. The eRNA of algae and arthropods could thus be used to evaluate biodiversity and water quality and provide insights from ecological surveys.
Collapse
|
21
|
Garrido-Sanz L, Àngel Senar M, Piñol J. Drastic reduction of false positive species in samples of insects by intersecting the default output of two popular metagenomic classifiers. PLoS One 2022; 17:e0275790. [PMID: 36282811 PMCID: PMC9595558 DOI: 10.1371/journal.pone.0275790] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Accepted: 09/15/2022] [Indexed: 11/19/2022] Open
Abstract
The use of high-throughput sequencing to recover short DNA reads of many species has been widely applied on biodiversity studies, either as amplicon metabarcoding or shotgun metagenomics. These reads are assigned to taxa using classifiers. However, for different reasons, the results often contain many false positives. Here we focus on the reduction of false positive species attributable to the classifiers. We benchmarked two popular classifiers, BLASTn followed by MEGAN6 (BM) and Kraken2 (K2), to analyse shotgun sequenced artificial single-species samples of insects. To reduce the number of misclassified reads, we combined the output of the two classifiers in two different ways: (1) by keeping only the reads that were attributed to the same species by both classifiers (intersection approach); and (2) by keeping the reads assigned to some species by any classifier (union approach). In addition, we applied an analytical detection limit to further reduce the number of false positives species. As expected, both metagenomic classifiers used with default parameters generated an unacceptably high number of misidentified species (tens with BM, hundreds with K2). The false positive species were not necessarily phylogenetically close, as some of them belonged to different orders of insects. The union approach failed to reduce the number of false positives, but the intersection approach got rid of most of them. The addition of an analytic detection limit of 0.001 further reduced the number to ca. 0.5 false positive species per sample. The misidentification of species by most classifiers hampers the confidence of the DNA-based methods for assessing the biodiversity of biological samples. Our approach to alleviate the problem is straightforward and significantly reduced the number of reported false positive species.
Collapse
Affiliation(s)
- Lidia Garrido-Sanz
- Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
- * E-mail:
| | | | - Josep Piñol
- Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Spain
- CREAF, Cerdanyola del Vallès, Spain
| |
Collapse
|
22
|
Hempel CA, Wright N, Harvie J, Hleap JS, Adamowicz S, Steinke D. Metagenomics versus total RNA sequencing: most accurate data-processing tools, microbial identification accuracy and perspectives for ecological assessments. Nucleic Acids Res 2022; 50:9279-9293. [PMID: 35979944 PMCID: PMC9458450 DOI: 10.1093/nar/gkac689] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 07/05/2022] [Accepted: 07/29/2022] [Indexed: 12/24/2022] Open
Abstract
Metagenomics and total RNA sequencing (total RNA-Seq) have the potential to improve the taxonomic identification of diverse microbial communities, which could allow for the incorporation of microbes into routine ecological assessments. However, these target-PCR-free techniques require more testing and optimization. In this study, we processed metagenomics and total RNA-Seq data from a commercially available microbial mock community using 672 data-processing workflows, identified the most accurate data-processing tools, and compared their microbial identification accuracy at equal and increasing sequencing depths. The accuracy of data-processing tools substantially varied among replicates. Total RNA-Seq was more accurate than metagenomics at equal sequencing depths and even at sequencing depths almost one order of magnitude lower than those of metagenomics. We show that while data-processing tools require further exploration, total RNA-Seq might be a favorable alternative to metagenomics for target-PCR-free taxonomic identifications of microbial communities and might enable a substantial reduction in sequencing costs while maintaining accuracy. This could be particularly an advantage for routine ecological assessments, which require cost-effective yet accurate methods, and might allow for the incorporation of microbes into ecological assessments.
Collapse
Affiliation(s)
- Christopher A Hempel
- To whom correspondence should be addressed. Tel: +1 519 824 4120; Fax: +1 519 824 5703;
| | - Natalie Wright
- Department of Integrative Biology, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Julia Harvie
- Department of Integrative Biology, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Jose S Hleap
- SHARCNET, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Sarah J Adamowicz
- Department of Integrative Biology, University of Guelph, Guelph, ON N1G 2W1, Canada
| | - Dirk Steinke
- Department of Integrative Biology, University of Guelph, Guelph, ON N1G 2W1, Canada,Centre for Biodiversity Genomics, University of Guelph, Guelph, ON N1G 2W1, Canada
| |
Collapse
|
23
|
Czech L, Stamatakis A, Dunthorn M, Barbera P. Metagenomic Analysis Using Phylogenetic Placement-A Review of the First Decade. FRONTIERS IN BIOINFORMATICS 2022; 2:871393. [PMID: 36304302 PMCID: PMC9580882 DOI: 10.3389/fbinf.2022.871393] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2022] [Accepted: 04/11/2022] [Indexed: 12/20/2022] Open
Abstract
Phylogenetic placement refers to a family of tools and methods to analyze, visualize, and interpret the tsunami of metagenomic sequencing data generated by high-throughput sequencing. Compared to alternative (e. g., similarity-based) methods, it puts metabarcoding sequences into a phylogenetic context using a set of known reference sequences and taking evolutionary history into account. Thereby, one can increase the accuracy of metagenomic surveys and eliminate the requirement for having exact or close matches with existing sequence databases. Phylogenetic placement constitutes a valuable analysis tool per se, but also entails a plethora of downstream tools to interpret its results. A common use case is to analyze species communities obtained from metagenomic sequencing, for example via taxonomic assignment, diversity quantification, sample comparison, and identification of correlations with environmental variables. In this review, we provide an overview over the methods developed during the first 10 years. In particular, the goals of this review are 1) to motivate the usage of phylogenetic placement and illustrate some of its use cases, 2) to outline the full workflow, from raw sequences to publishable figures, including best practices, 3) to introduce the most common tools and methods and their capabilities, 4) to point out common placement pitfalls and misconceptions, 5) to showcase typical placement-based analyses, and how they can help to analyze, visualize, and interpret phylogenetic placement data.
Collapse
Affiliation(s)
- Lucas Czech
- Department of Plant Biology, Carnegie Institution for Science, Stanford, CA, United States
| | - Alexandros Stamatakis
- Computational Molecular Evolution Group, Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- Institute for Theoretical Informatics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Micah Dunthorn
- Natural History Museum, University of Oslo, Oslo, Norway
| | | |
Collapse
|
24
|
Leite BR, Vieira PE, Troncoso JS, Costa FO. Comparing species detection success between molecular markers in DNA metabarcoding of coastal macroinvertebrates. METABARCODING AND METAGENOMICS 2021. [DOI: 10.3897/mbmg.5.70063] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
DNA metabarcoding has great potential to improve marine biomonitoring programs by providing a rapid and accurate assessment of species composition in zoobenthic communities. However, some methodological improvements are still required, especially regarding failed detections, primers efficiency and incompleteness of databases. Here we assessed the efficiency of two different marker loci (COI and 18S) and three primer pairs in marine species detection through DNA metabarcoding of the macrozoobenthic communities colonizing three types of artificial substrates (slate, PVC and granite), sampled between 3 and 15 months of deployment. To accurately compare detection success between markers, we also compared the representativeness of the detected species in public databases and revised the reliability of the taxonomic assignments. Globally, we recorded extensive complementarity in the species detected by each marker, with 69% of the species exclusively detected by either 18S or COI. Individually, each of the three primer pairs recovered, at most, 52% of all species detected on the samples, showing also different abilities to amplify specific taxonomic groups. Most of the detected species have reliable reference sequences in their respective databases (82% for COI and 72% for 18S), meaning that when a species was detected by one marker and not by the other, it was most likely due to faulty amplification, and not by lack of matching sequences in the database. Overall, results showed the impact of marker and primer applied on species detection ability and indicated that, currently, if only a single marker or primer pair is employed in marine zoobenthos metabarcoding, a fair portion of the diversity may be overlooked.
Collapse
|
25
|
Narum S, News JK, Fountain-Jones N, Hooper Junior R, Ortiz-Barrientos D, O'Boyle B, Sibbett B. Editorial 2022. Mol Ecol Resour 2021; 22:1-8. [PMID: 34919782 DOI: 10.1111/1755-0998.13572] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
26
|
Creedy TJ, Andújar C, Meramveliotakis E, Noguerales V, Overcast I, Papadopoulou A, Morlon H, Vogler AP, Emerson BC, Arribas P. Coming of age for COI metabarcoding of whole organism community DNA: Towards bioinformatic harmonisation. Mol Ecol Resour 2021; 22:847-861. [PMID: 34496132 PMCID: PMC9292290 DOI: 10.1111/1755-0998.13502] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2021] [Revised: 07/28/2021] [Accepted: 08/23/2021] [Indexed: 11/26/2022]
Abstract
Metabarcoding of DNA extracted from community samples of whole organisms (whole organism community DNA, wocDNA) is increasingly being applied to terrestrial, marine and freshwater metazoan communities to provide rapid, accurate and high resolution data for novel molecular ecology research. The growth of this field has been accompanied by considerable development that builds on microbial metabarcoding methods to develop appropriate and efficient sampling and laboratory protocols for whole organism metazoan communities. However, considerably less attention has focused on ensuring bioinformatic methods are adapted and applied comprehensively in wocDNA metabarcoding. In this study we examined over 600 papers and identified 111 studies that performed COI metabarcoding of wocDNA. We then systematically reviewed the bioinformatic methods employed by these papers to identify the state‐of‐the‐art. Our results show that the increasing use of wocDNA COI metabarcoding for metazoan diversity is characterised by a clear absence of bioinformatic harmonisation, and the temporal trends show little change in this situation. The reviewed literature showed (i) high heterogeneity across pipelines, tasks and tools used, (ii) limited or no adaptation of bioinformatic procedures to the nature of the COI fragment, and (iii) a worrying underreporting of tasks, software and parameters. Based upon these findings we propose a set of recommendations that we think the metabarcoding community should consider to ensure that bioinformatic methods are appropriate, comprehensive and comparable. We believe that adhering to these recommendations will improve the long‐term integrative potential of wocDNA COI metabarcoding for biodiversity science.
Collapse
Affiliation(s)
- Thomas J Creedy
- Department of Life Sciences, Natural History Museum, London, UK
| | - Carmelo Andújar
- Instituto de Productos Naturales y Agrobiología (IPNA-CSIC), S.C. La Laguna, Spain
| | | | - Victor Noguerales
- Instituto de Productos Naturales y Agrobiología (IPNA-CSIC), S.C. La Laguna, Spain.,Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Isaac Overcast
- Institut de Biologie de l'ENS (IBENS), Département de Biologie, École Normale Supérieur, CNRS, INSERM, Université PSL, Paris, France
| | - Anna Papadopoulou
- Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus
| | - Hélène Morlon
- Institut de Biologie de l'ENS (IBENS), Département de Biologie, École Normale Supérieur, CNRS, INSERM, Université PSL, Paris, France
| | - Alfried P Vogler
- Department of Life Sciences, Natural History Museum, London, UK.,Department of Life Sciences, Imperial College London Silwood Park Campus, Ascot, UK
| | - Brent C Emerson
- Instituto de Productos Naturales y Agrobiología (IPNA-CSIC), S.C. La Laguna, Spain
| | - Paula Arribas
- Instituto de Productos Naturales y Agrobiología (IPNA-CSIC), S.C. La Laguna, Spain
| |
Collapse
|
27
|
Bik HM. Just keep it simple? Benchmarking the accuracy of taxonomy assignment software in metabarcoding studies. Mol Ecol Resour 2021; 21:2187-2189. [PMID: 34268901 DOI: 10.1111/1755-0998.13473] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 07/12/2021] [Accepted: 07/13/2021] [Indexed: 11/24/2022]
Abstract
How do you put a name on an unknown piece of DNA? From microbes to mammals, high-throughput metabarcoding studies provide a more objective view of natural communities, overcoming many of the inherent limitations of traditional field surveys and microscopy-based observations (Deiner et al., 2017). Taxonomy assignment is one of the most critical aspects of any metabarcoding study, yet this important bioinformatics task is routinely overlooked. Biodiversity surveys and conservation efforts often depend on formal species inventories: the presence (or absence) of species, and the number of individuals reported across space and time. However, computational workflows applied in eukaryotic metabarcoding studies were originally developed for use with bacterial/archaeal data sets, where microbial researchers rely on one conserved locus (nuclear 16S rRNA) and have access to vast databases with good coverage across most prokaryotic lineages - a situation not mirrored in most multicellular taxa. In this issue of Molecular Ecology Resources, Hleap et al. (2021) carry out an extensive benchmarking exercise focused on taxonomy assignment strategies for eukaryotic metabarcoding studies utilizing the mitochondrial Cytochrome C oxidase I marker gene (COI). They assess the performance and accuracy of software tools representing diverse methodological approaches: from "simple" strategies based on sequence similarity and composition, to model-based phylogenetic and probabilistic classification tools. Contrary to popular assumptions, less complex approaches (BLAST and the QIIME2 feature classifier) consistently outperformed more sophisticated mathematical algorithms and were highly accurate for assigning taxonomy at higher levels (e.g. family). Lower-level assignments at the genus and species level still pose significant challenge for most existing algorithms, and sparse eukaryotic reference databases further limit software performance. This study illuminates current best practices for metabarcoding taxonomy assignments, and underscores the need for community-driven efforts to expand taxonomic and geographic representation in reference DNA barcode databases.
Collapse
Affiliation(s)
- Holly M Bik
- Department of Marine Sciences and Institute of Bioinformatics, University of Georgia, Athens, Georgia, USA
| |
Collapse
|