1
|
Hakimzadeh A, Asbun AA, Albanese D, Bernard M, Buchner D, Callahan B, Caporaso JG, Curd E, Djemiel C, Durling MB, Elbrecht V, Gold Z, Gweon HS, Hajibabaei M, Hildebrand F, Mikryukov V, Normandeau E, Özkurt E, Palmer JM, Pascal G, Porter TM, Straub D, Vasar M, Větrovský T, Zafeiropoulos H, Anslan S. A pile of pipelines: An overview of the bioinformatics software for metabarcoding data analyses. Mol Ecol Resour 2024; 24:e13847. [PMID: 37548515 PMCID: PMC10847385 DOI: 10.1111/1755-0998.13847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2023] [Revised: 06/05/2023] [Accepted: 07/06/2023] [Indexed: 08/08/2023]
Abstract
Environmental DNA (eDNA) metabarcoding has gained growing attention as a strategy for monitoring biodiversity in ecology. However, taxa identifications produced through metabarcoding require sophisticated processing of high-throughput sequencing data from taxonomically informative DNA barcodes. Various sets of universal and taxon-specific primers have been developed, extending the usability of metabarcoding across archaea, bacteria and eukaryotes. Accordingly, a multitude of metabarcoding data analysis tools and pipelines have also been developed. Often, several developed workflows are designed to process the same amplicon sequencing data, making it somewhat puzzling to choose one among the plethora of existing pipelines. However, each pipeline has its own specific philosophy, strengths and limitations, which should be considered depending on the aims of any specific study, as well as the bioinformatics expertise of the user. In this review, we outline the input data requirements, supported operating systems and particular attributes of thirty-two amplicon processing pipelines with the goal of helping users to select a pipeline for their metabarcoding projects.
Collapse
Affiliation(s)
- Ali Hakimzadeh
- Institute of Ecology and Earth Sciences, University of Tartu, Estonia
| | - Alejandro Abdala Asbun
- Department of Marine Microbiology and Biogeochemistry, NIOZ Royal Netherlands Institute for Sea Research, Texel, Netherlands
| | - Davide Albanese
- Unit of Computational Biology, Research and Innovation Centre, Fondazione Edmund Mach, Italy
| | - Maria Bernard
- Université Paris-Saclay, INRAE, AgroParisTech, GABI, 78350, Jouy-en-Josas, France
- INRAE, SIGENAE, 78350, Jouy-en-Josas, France
| | - Dominik Buchner
- Aquatic Ecosystem Research, University of Duisburg-Essen, Universitätsstraße 5, 45141 Essen, Germany
| | - Benjamin Callahan
- Department of Population Health and Pathobiology, College of Veterinary Medicine and Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina, USA
| | - J. Gregory Caporaso
- Center for Applied Microbiome Science, Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
| | - Emily Curd
- Vermont Biomedical Research Network, University of Vermont, Burlington, VT, USA
| | - Christophe Djemiel
- Agroécologie, INRAE, Institut Agro, Univ. Bourgogne Franche-Comté, F-21000 Dijon, France
| | - Mikael B. Durling
- Department of Forest Mycology and Plant Pathology, Swedish University of Agricultural Sciences, Box 7026, 75007 Uppsala, Sweden
| | - Vasco Elbrecht
- Aquatic Ecosystem Research, University of Duisburg-Essen, Universitaetsstrasse 5, 45141, Essen, Germany
| | - Zachary Gold
- Scripps Institution of Oceanography, University of California San Diego, La Jolla, CA, USA
| | - Hyun S. Gweon
- UK Centre for Ecology & Hydrology, Wallingford, Oxfordshire, OX10 8BB, UK
- School of Biological Sciences, University of Reading, Reading, RG6 6EX, UK
| | - Mehrdad Hajibabaei
- Department of Integrative Biology and Centre for Biodiversity Genomics, University of Guelph, Canada
| | - Falk Hildebrand
- Gut Microbes & Health, Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, NR4 7UQ, UK
- Earlham Institute, Norwich Research Park, Norwich, Norfolk, NR4 7UZ, UK
| | | | - Eric Normandeau
- Institut de Biologie Intégrative et des Systèmes, Université Laval, Québec, QC, Canada
| | - Ezgi Özkurt
- Gut Microbes & Health, Quadram Institute Bioscience, Norwich Research Park, Norwich, Norfolk, NR4 7UQ, UK
- Earlham Institute, Norwich Research Park, Norwich, Norfolk, NR4 7UZ, UK
| | - Jonathan M. Palmer
- Center for Forest Mycology Research, Northern Research Station, US Forest Service, Madison, WI USA (current address: Genencor Technology Center, IFF, Palo Alto, CA USA)
| | - Géraldine Pascal
- GenPhySE, Université de Toulouse, INRAE, ENVT, F-31326, Castanet Tolosan, France
| | - Teresita M. Porter
- Department of Integrative Biology and Centre for Biodiversity Genomics, University of Guelph, Canada
| | - Daniel Straub
- Quantitative Biology Center (QBiC), University of Tübingen, Tübingen D-72076, Germany
| | - Martti Vasar
- Institute of Ecology and Earth Sciences, University of Tartu, Estonia
| | - Tomáš Větrovský
- Laboratory of Environmental Microbiology, Institute of Microbiology of the Czech Academy of Sciences, Vídeňská 1083, 14220 Praha 4, Czech Republic
| | - Haris Zafeiropoulos
- KU Leuven, Department of Microbiology, Immunology and Transplantation, Rega Institute for Medical Research, Laboratory of Molecular Bacteriology, 3000 Leuven, Belgium
| | - Sten Anslan
- Institute of Ecology and Earth Sciences, University of Tartu, Estonia
| |
Collapse
|
2
|
Meglécz E. mkLTG: a command-line tool for taxonomic assignment of metabarcoding sequences using variable identity thresholds. Biol Futur 2023; 74:369-375. [PMID: 38300415 DOI: 10.1007/s42977-024-00201-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 01/04/2024] [Indexed: 02/02/2024]
Abstract
Metabarcoding is now a widely used method for biodiversity studies. Taxonomic assignment of environmental sequences is one of the key steps of metabarcoding. Assignments based on lowest common ancestor (LCA) method generally rely on fixed arbitrary thresholds, and this is generally not well adapted for assignment of taxonomically diverse groups with variable coverage in reference databases. The mkLTG is a LCA-based method that uses a series of percentage of identity thresholds starting from stringent parameters and decreasing it if necessary. All parameters can be set separately for each percentage of identity threshold, which makes this tool adaptable for different databases, genetic markers and diverse taxonomic groups. The optimization step was included using the COI marker and a comprehensive, non-redundant database. The mkLTG tool is a command-line application with few dependencies that runs in all operating systems, therefore, it is easy to include into complex pipelines. All scripts are freely available including the benchmarking at https://github.com/meglecz/mkLTG .
Collapse
Affiliation(s)
- Emese Meglécz
- IMBE, CNRS, IRD, Aix Marseille University, Avignon University, Marseille, France.
| |
Collapse
|