Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

Download

Total Articles

19
(from Reference Citation Analysis)

Article PDFs (10)

Cited by > 0 (15)

Searched Name

Taxonomic assignment

Ranked By

Results Analysis

Year Published Analysis
Article Type Analysis
Publication Title Analysis
Category Analysis

Results Analysis

Indexed Articles

Year Published

Show more Refine

Article Type

Show more Refine

Article Statistics

Refine

MESH Headings

Show more Refine

First Author

Show more Refine

First Author Affiliations

Show more Refine

Authors

Show more Refine

Publication Titles

Show more Refine

Grant Agencies

Show more Refine

Countries/Regions

Show more Refine

Affiliations

Show more Refine

Corresponding Author Affiliations

Show more Refine

Category

Show more Refine

Number

Citation Analysis

Meglécz E. mkLTG: a command-line tool for taxonomic assignment of metabarcoding sequences using variable identity thresholds. Biol Futur 2023;74:369-375. [PMID: 38300415 DOI: 10.1007/s42977-024-00201-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2023] [Accepted: 01/04/2024] [Indexed: 02/02/2024]

Mugnai F, Costantini F, Chenuil A, Leduc M, Gutiérrez Ortega JM, Meglécz E. Be positive: customized reference databases and new, local barcodes balance false taxonomic assignments in metabarcoding studies. PeerJ 2023;11:e14616. [PMID: 36643652 PMCID: PMC9835706 DOI: 10.7717/peerj.14616] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 12/01/2022] [Indexed: 01/11/2023] Open

Abstract

Background

In metabarcoding analyses, the taxonomic assignment is crucial to place sequencing data in biological and ecological contexts. This fundamental step depends on a reference database, which should have a good taxonomic coverage to avoid unassigned sequences. However, this goal is rarely achieved in many geographic regions and for several taxonomic groups. On the other hand, more is not necessarily better, as sequences in reference databases belonging to taxonomic groups out of the studied region/environment context might lead to false assignments.

Methods

We investigated the effect of using several subsets of a cytochrome c oxidase subunit I (COI) reference database on taxonomic assignment. Published metabarcoding sequences from the Mediterranean Sea were assigned to taxa using COInr, which is a comprehensive, non-redundant and recent database of COI sequences obtained both from BOLD and NCBI, and two of its subsets: (i) all sequences except insects (COInr-WO-Insecta), which represent the overwhelming majority of COInr database, but are irrelevant for marine samples, and (ii) all sequences from taxonomic families present in the Mediterranean Sea (COInr-Med). Four different algorithms for taxonomic assignment were employed in parallel to evaluate differences in their output and data consistency.

Results

The reduction of the database to more specific custom subsets increased the number of unassigned sequences. Nevertheless, since most of them were incorrectly assigned by the less specific databases, this is a positive outcome. Moreover, the taxonomic resolution (the lowest taxonomic level to which a sequence is attributed) of several sequences tended to increase when using customized databases. These findings clearly indicated the need for customized databases adapted to each study. However, the very high proportion of unassigned sequences points to the need to enrich the local database with new barcodes specifically obtained from the studied region and/or taxonomic group. Including novel local barcodes to the COI database proved to be very profitable: by adding only 116 new barcodes sequenced in our laboratory, thus increasing the reference database by only 0.04%, we were able to improve the resolution for ca. 0.6-1% of the Amplicon Sequence Variants (ASVs).

Collapse

Garfias-Gallegos D, Zirión-Martínez C, Bustos-Díaz ED, Arellano-Fernández TV, Lovaco-Flores JA, Espinosa-Jaime A, Avelar-Rivas JA, Sélem-Mójica N. Metagenomics Bioinformatic Pipeline. Methods Mol Biol 2022;2512:153-179. [PMID: 35818005 DOI: 10.1007/978-1-0716-2429-6_10] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]

Dacey DP, Chain FJJ. Concatenation of paired-end reads improves taxonomic classification of amplicons for profiling microbial communities. BMC Bioinformatics 2021;22:493. [PMID: 34641782 PMCID: PMC8507205 DOI: 10.1186/s12859-021-04410-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Accepted: 09/29/2021] [Indexed: 01/04/2023] Open

Abstract

Background

Taxonomic classification of genetic markers for microbiome analysis is affected by the numerous choices made from sample preparation to bioinformatics analysis. Paired-end read merging is routinely used to capture the entire amplicon sequence when the read ends overlap. However, the exclusion of unmerged reads from further analysis can result in underestimating the diversity in the sequenced microbial community and is influenced by bioinformatic processes such as read trimming and the choice of reference database. A potential solution to overcome this is to concatenate (join) reads that do not overlap and keep them for taxonomic classification. The use of concatenated reads can outperform taxonomic recovery from single-end reads, but it remains unclear how their performance compares to merged reads. Using various sequenced mock communities with different amplicons, read length, read depth, taxonomic composition, and sequence quality, we tested how merging and concatenating reads performed for genus recall and precision in bioinformatic pipelines combining different parameters for read trimming and taxonomic classification using different reference databases.

Results

The addition of concatenated reads to merged reads always increased pipeline performance. The top two performing pipelines both included read concatenation, with variable strengths depending on the mock community. The pipeline that combined merged and concatenated reads that were quality-trimmed performed best for mock communities with larger amplicons and higher average quality sequences. The pipeline that used length-trimmed concatenated reads outperformed quality trimming in mock communities with lower quality sequences but lost a significant amount of input sequences for taxonomic classification during processing. Genus level classification was more accurate using the SILVA reference database compared to Greengenes.

Conclusions

Merged sequences with the addition of concatenated sequences that were unable to be merged increased performance of taxonomic classifications. This was especially beneficial in mock communities with larger amplicons. We have shown for the first time, using an in-depth comparison of pipelines containing merged vs concatenated reads combined with different trimming parameters and reference databases, the potential advantages of concatenating sequences in improving resolution in microbiome investigations.

Supplementary Information

The online version contains supplementary material available at 10.1186/s12859-021-04410-2.

Collapse

Mukherjee C, Leys EJ. Strain-Level Profiling of Oral Microbiota with Targeted Sequencing. Methods Mol Biol 2021;2327:239-52. [PMID: 34410649 DOI: 10.1007/978-1-0716-1518-8_14] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]

Catlett D, Son K, Liang C. ensembleTax: an R package for determinations of ensemble taxonomic assignments of phylogenetically-informative marker gene sequences. PeerJ 2021;9:e11865. [PMID: 34395092 PMCID: PMC8320524 DOI: 10.7717/peerj.11865] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2021] [Accepted: 07/05/2021] [Indexed: 11/20/2022] Open

Abstract

BACKGROUND

High-throughput sequencing of phylogenetically informative marker genes is a widely used method to assess the diversity and composition of microbial communities. Taxonomic assignment of sampled marker gene sequences (referred to as amplicon sequence variants, or ASVs) imparts ecological significance to these genetic data. To assign taxonomy to an ASV, a taxonomic assignment algorithm compares the ASV to a collection of reference sequences (a reference database) with known taxonomic affiliations. However, many taxonomic assignment algorithms and reference databases are available, and the optimal algorithm and database for a particular scientific question is often unclear. Here, we present the ensembleTax R package, which provides an efficient framework for integrating taxonomic assignments predicted with any number of taxonomic assignment algorithms and reference databases to determine ensemble taxonomic assignments for ASVs.

METHODS

The ensembleTax R package relies on two core algorithms: taxmapper and assign.ensembleTax. The taxmapper algorithm maps taxonomic assignments derived from one reference database onto the taxonomic nomenclature (a set of taxonomic naming and ranking conventions) of another reference database. The assign.ensembleTax algorithm computes ensemble taxonomic assignments for each ASV in a data set based on any number of taxonomic assignments determined with independent methods. Various parameters allow analysts to prioritize obtaining either more ASVs with more predicted clade names or more robust clade name predictions supported by multiple independent methods in ensemble taxonomic assignments.

RESULTS

The ensembleTax R package is used to compute two sets of ensemble taxonomic assignments for a collection of protistan ASVs sampled from the coastal ocean. Comparisons of taxonomic assignments predicted by individual methods with those predicted by ensemble methods show that conservative implementations of the ensembleTax package minimize disagreements between taxonomic assignments predicted by individual and ensemble methods, but result in ASVs with fewer ranks assigned taxonomy. Less conservative implementations of the ensembleTax package result in an increased fraction of ASVs classified at all taxonomic ranks, but increase the number of ASVs for which ensemble assignments disagree with those predicted by individual methods.

DISCUSSION

We discuss how implementation of the ensembleTax R package may be optimized to address specific scientific objectives based on the results of the application of the ensembleTax package to marine protist communities. While further work is required to evaluate the accuracy of ensemble taxonomic assignments relative to taxonomic assignments predicted by individual methods, we also discuss scenarios where ensemble methods are expected to improve the accuracy of taxonomy prediction for ASVs.

Collapse

Ma H, Tan TW, Ban KHK. A multi-task CNN learning model for taxonomic assignment of human viruses. BMC Bioinformatics 2021;22:194. [PMID: 34078269 PMCID: PMC8170063 DOI: 10.1186/s12859-021-04084-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Accepted: 03/16/2021] [Indexed: 01/09/2023] Open

Stefani F, Bencherif K, Sabourin S, Hadj-Sahraoui AL, Banchini C, Séguin S, Dalpé Y. Taxonomic assignment of arbuscular mycorrhizal fungi in an 18S metagenomic dataset: a case study with saltcedar (Tamarix aphylla). Mycorrhiza 2020;30:243-255. [PMID: 32180012 DOI: 10.1007/s00572-020-00946-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 03/06/2020] [Indexed: 06/10/2023]

Wylezich C, Belka A, Hanke D, Beer M, Blome S, Höper D. Metagenomics for broad and improved parasite detection: a proof-of-concept study using swine faecal samples. Int J Parasitol 2019;49:769-777. [PMID: 31361998 DOI: 10.1016/j.ijpara.2019.04.007] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2019] [Revised: 04/18/2019] [Accepted: 04/24/2019] [Indexed: 01/10/2023]

Abstract

Efficient and reliable identification of emerging pathogens is crucial for the design and implementation of timely and proportionate control strategies. This is difficult if the pathogen is so far unknown or only distantly related with known pathogens. Diagnostic metagenomics - an undirected, broad and sensitive method for the efficient identification of pathogens - was frequently used for virus and bacteria detection, but seldom applied to parasite identification. Here, metagenomics datasets prepared from swine faeces using an unbiased sample processing approach with RNA serving as starting material were re-analysed with respect to parasite detection. The taxonomic identification tool RIEMS, used for initial detection, provided basic hints on potential pathogens contained in the datasets. The suspected parasites/intestinal protists (Blastocystis, Entamoeba, Iodamoeba, Neobalantidium, Tetratrichomonas) were verified using subsequently applied reference mapping analyses on the base of rRNA sequences. Nearly full-length gene sequences could be extracted from the RNA-derived datasets. In the case of Blastocystis, subtyping was possible with subtype (ST)15 discovered for the first known time in swine faeces. Using RIEMS, some of the suspected candidates turned out to be false-positives caused by the poor status of sequences in publicly available databases. Altogether, 11 different species/STs of parasites/intestinal protists were detected in 34 out of 41 datasets extracted from metagenomics data. The approach operates without any primer bias that typically hampers the analysis of amplicon-based approaches, and allows the detection and taxonomic classification including subtyping of protist and metazoan endobionts (parasites, commensals or mutualists) based on an abundant biomarker, the 18S rRNA. The generic nature of the approach also allows evaluation of interdependencies that induce mutualistic or pathogenic effects that are often not clear for many intestinal protists and perhaps other parasites. Thus, metagenomics has the potential for generic pathogen identification beyond the characterisation of viruses and bacteria when starting from RNA instead of DNA.

Collapse

Henderson G, Yilmaz P, Kumar S, Forster RJ, Kelly WJ, Leahy SC, Guan LL, Janssen PH. Improved taxonomic assignment of rumen bacterial 16S rRNA sequences using a revised SILVA taxonomic framework. PeerJ 2019;7:e6496. [PMID: 30863673 PMCID: PMC6407505 DOI: 10.7717/peerj.6496] [Citation(s) in RCA: 46] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2018] [Accepted: 01/21/2019] [Indexed: 11/20/2022] Open

Abstract

The taxonomy and associated nomenclature of many taxa of rumen bacteria are poorly defined within databases of 16S rRNA genes. This lack of resolution results in inadequate definition of microbial community structures, with large parts of the community designated as incertae sedis, unclassified, or uncultured within families, orders, or even classes. We have begun resolving these poorly-defined groups of rumen bacteria, based on our desire to name these for use in microbial community profiling. We used the previously-reported global rumen census (GRC) dataset consisting of >4.5 million partial bacterial 16S rRNA gene sequences amplified from 684 rumen samples and representing a wide range of animal hosts and diets. Representative sequences from the 8,985 largest operational units (groups of sequence sharing >97% sequence similarity, and covering 97.8% of all sequences in the GRC dataset) were used to identify 241 pre-defined clusters (mainly at genus or family level) of abundant rumen bacteria in the ARB SILVA 119 framework. A total of 99 of these clusters (containing 63.8% of all GRC sequences) had no unique or had inadequate taxonomic identifiers, and each was given a unique nomenclature. We assessed this improved framework by comparing taxonomic assignments of bacterial 16S rRNA gene sequence data in the GRC dataset with those made using the original SILVA 119 framework, and three other frameworks. The two SILVA frameworks performed best at assigning sequences to genus-level taxa. The SILVA 119 framework allowed 55.4% of the sequence data to be assigned to 751 uniquely identifiable genus-level groups. The improved framework increased this to 87.1% of all sequences being assigned to one of 871 uniquely identifiable genus-level groups. The new designations were included in the SILVA 123 release (https://www.arb-silva.de/documentation/release-123/) and will be perpetuated in future releases.

Collapse

Yao Y, Jin Z, Lee JH. An improved statistical model for taxonomic assignment of metagenomics. BMC Genet 2018;19:98. [PMID: 30373533 PMCID: PMC6206629 DOI: 10.1186/s12863-018-0680-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2018] [Accepted: 10/02/2018] [Indexed: 01/01/2023] Open

Murali A, Bhargava A, Wright ES. IDTAXA: a novel approach for accurate taxonomic classification of microbiome sequences. Microbiome 2018;6:140. [PMID: 30092815 PMCID: PMC6085705 DOI: 10.1186/s40168-018-0521-5] [Citation(s) in RCA: 239] [Impact Index Per Article: 39.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 07/25/2018] [Indexed: 05/11/2023]

Abstract

BACKGROUND

Microbiome studies often involve sequencing a marker gene to identify the microorganisms in samples of interest. Sequence classification is a critical component of this process, whereby sequences are assigned to a reference taxonomy containing known sequence representatives of many microbial groups. Previous studies have shown that existing classification programs often assign sequences to reference groups even if they belong to novel taxonomic groups that are absent from the reference taxonomy. This high rate of "over classification" is particularly detrimental in microbiome studies because reference taxonomies are far from comprehensive.

RESULTS

Here, we introduce IDTAXA, a novel approach to taxonomic classification that employs principles from machine learning to reduce over classification errors. Using multiple reference taxonomies, we demonstrate that IDTAXA has higher accuracy than popular classifiers such as BLAST, MAPSeq, QIIME, SINTAX, SPINGO, and the RDP Classifier. Similarly, IDTAXA yields far fewer over classifications on Illumina mock microbial community data when the expected taxa are absent from the training set. Furthermore, IDTAXA offers many practical advantages over other classifiers, such as maintaining low error rates across varying input sequence lengths and withholding classifications from input sequences composed of random nucleotides or repeats.

CONCLUSIONS

IDTAXA's classifications may lead to different conclusions in microbiome studies because of the substantially reduced number of taxa that are incorrectly identified through over classification. Although misclassification error is relatively minor, we believe that many remaining misclassifications are likely caused by errors in the reference taxonomy. We describe how IDTAXA is able to identify many putative mislabeling errors in reference taxonomies, enabling training sets to be automatically corrected by eliminating spurious sequences. IDTAXA is part of the DECIPHER package for the R programming language, available through the Bioconductor repository or accessible online ( http://DECIPHER.codes ).

Collapse

Zheng Q, Bartow-McKenney C, Meisel JS, Grice EA. HmmUFOtu: An HMM and phylogenetic placement based ultra-fast taxonomic assignment and OTU picking tool for microbiome amplicon sequencing studies. Genome Biol 2018;19:82. [PMID: 29950165 PMCID: PMC6020470 DOI: 10.1186/s13059-018-1450-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2017] [Accepted: 05/09/2018] [Indexed: 02/01/2023] Open

Balech B, Sandionigi A, Manzari C, Trucchi E, Tullo A, Licciulli F, Grillo G, Sbisà E, De Felici S, Saccone C, D'Erchia AM, Cesaroni D, Casiraghi M, Vicario S. Tackling critical parameters in metazoan meta-barcoding experiments: a preliminary study based on coxI DNA barcode. PeerJ 2018;6:e4845. [PMID: 29915686 PMCID: PMC6004112 DOI: 10.7717/peerj.4845] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 05/04/2018] [Indexed: 11/21/2022] Open

Abstract

Nowadays DNA meta-barcoding is a powerful instrument capable of quickly discovering the biodiversity of an environmental sample by integrating the DNA barcoding approach with High Throughput Sequencing technologies. It mainly consists of the parallel reading of informative genomic fragment/s able to discriminate living entities. Although this approach has been widely studied, it still needs optimization in some necessary steps requested in its advanced accomplishment. A fundamental element concerns the standardization of bioinformatic analyses pipelines. The aim of the present study was to underline a number of critical parameters of laboratory material preparation and taxonomic assignment pipelines in DNA meta-barcoding experiments using the cytochrome oxidase subunit-I (coxI) barcode region, known as a suitable molecular marker for animal species identification. We compared nine taxonomic assignment pipelines, including a custom in-house method, based on Hidden Markov Models. Moreover, we evaluated the potential influence of universal primers amplification bias in qPCR, as well as the correlation between GC content with taxonomic assignment results. The pipelines were tested on a community of known terrestrial invertebrates collected by pitfall traps from a chestnut forest in Italy. Although the present analysis was not exhaustive and needs additional investigation, our results suggest some potential improvements in laboratory material preparation and the introduction of additional parameters in taxonomic assignment pipelines. These include the correct setup of OTU clustering threshold, the calibration of GC content affecting sequencing quality and taxonomic classification, as well as the evaluation of PCR primers amplification bias on the final biodiversity pattern. Thus, careful attention and further validation/optimization of the above-mentioned variables would be required in a DNA meta-barcoding experimental routine.

Collapse

Affiliation(s)

Bachir Balech Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari-Consiglio Nazionale delle Ricerche, Bari, Italy.,Dipartimento di Biologia, Università degli studi di Bari 'Aldo Moro', Bari, Italy
Anna Sandionigi Dipartimento di Biotecnologie e Bioscienze-Zooplantlab, Università degli studi di Milano Bicocca, Milan, Italy
Caterina Manzari Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari-Consiglio Nazionale delle Ricerche, Bari, Italy
Emiliano Trucchi Dipartimento di Biologia, Università di Roma Tor Vergata, Rome, Italy
Apollonia Tullo Istituto di Biomembrane, Bioenergetica e Biotecnologie Molecolari-Consiglio Nazionale delle Ricerche, Bari, Italy.,Istituto di Tecnologie Biomediche-Consiglio Nazionale delle Ricerche, Bari, Italy
Flavio Licciulli Istituto di Tecnologie Biomediche-Consiglio Nazionale delle Ricerche, Bari, Italy
Giorgio Grillo Istituto di Tecnologie Biomediche-Consiglio Nazionale delle Ricerche, Bari, Italy
Elisabetta Sbisà Istituto di Tecnologie Biomediche-Consiglio Nazionale delle Ricerche, Bari, Italy
Stefano De Felici Dipartimento di Biologia, Università di Roma Tor Vergata, Rome, Italy.,Istituto di Biologia Agroambientale e Forestale-Consiglio Nazionale delle Ricerche, Rome, Italy
Cecilia Saccone Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari 'Aldo Moro', Bari, Italy
Anna Maria D'Erchia Dipartimento di Bioscienze, Biotecnologie e Biofarmaceutica, Università degli Studi di Bari 'Aldo Moro', Bari, Italy
Donatella Cesaroni Dipartimento di Biologia, Università di Roma Tor Vergata, Rome, Italy
Maurizio Casiraghi Dipartimento di Biotecnologie e Bioscienze-Zooplantlab, Università degli studi di Milano Bicocca, Milan, Italy
Saverio Vicario Istituto sull'Inquinamento Atmosferico-Consiglio Nazionale delle Ricerche, Bari, Italy

Collapse

Bazinet AL, Ondov BD, Sommer DD, Ratnayake S. BLAST-based validation of metagenomic sequence assignments. PeerJ 2018;6:e4892. [PMID: 29868286 PMCID: PMC5978398 DOI: 10.7717/peerj.4892] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Accepted: 05/13/2018] [Indexed: 12/29/2022] Open

Popovic A, Parkinson J. Characterization of Eukaryotic Microbiome Using 18S Amplicon Sequencing. Methods Mol Biol 2018;1849:29-48. [PMID: 30298246 DOI: 10.1007/978-1-4939-8728-3_3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Beisser D, Graupner N, Grossmann L, Timm H, Boenigk J, Rahmann S. TaxMapper: an analysis tool, reference database and workflow for metatranscriptome analysis of eukaryotic microorganisms. BMC Genomics 2017;18:787. [PMID: 29037173 PMCID: PMC5644092 DOI: 10.1186/s12864-017-4168-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2017] [Accepted: 10/05/2017] [Indexed: 12/17/2022] Open

Abstract

Background

High-throughput sequencing (HTS) technologies are increasingly applied to analyse complex microbial ecosystems by mRNA sequencing of whole communities, also known as metatranscriptome sequencing. This approach is at the moment largely limited to prokaryotic communities and communities of few eukaryotic species with sequenced genomes. For eukaryotes the analysis is hindered mainly by a low and fragmented coverage of the reference databases to infer the community composition, but also by lack of automated workflows for the task.

Results

From the databases of the National Center for Biotechnology Information and Marine Microbial Eukaryote Transcriptome Sequencing Project, 142 references were selected in such a way that the taxa represent the main lineages within each of the seven supergroups of eukaryotes and possess predominantly complete transcriptomes or genomes. From these references, we created an annotated microeukaryotic reference database. We developed a tool called TaxMapper for a reliably mapping of sequencing reads against this database and filtering of unreliable assignments. For filtering, a classifier was trained and tested on each of the following: sequences of taxa in the database, sequences of taxa related to those in the database, and random sequences. Additionally, TaxMapper is part of a metatranscriptomic Snakemake workflow developed to perform quality assessment, functional and taxonomic annotation and (multivariate) statistical analysis including environmental data. The workflow is provided and described in detail to empower researchers to apply it for metatranscriptome analysis of any environmental sample.

Conclusions

TaxMapper shows superior performance compared to standard approaches, resulting in a higher number of true positive taxonomic assignments. Both the TaxMapper tool and the workflow are available as open-source code at Bitbucket under the MIT license: https://bitbucket.org/dbeisser/taxmapperand as a Bioconda package: https://bioconda.github.io/recipes/taxmapper/README.html.

Electronic supplementary material

The online version of this article (doi:10.1186/s12864-017-4168-6) contains supplementary material, which is available to authorized users.

Collapse

Kobus R, Hundt C, Müller A, Schmidt B. Accelerating metagenomic read classification on CUDA-enabled GPUs. BMC Bioinformatics 2017;18:11. [PMID: 28049411 PMCID: PMC5209836 DOI: 10.1186/s12859-016-1434-6] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2016] [Accepted: 12/16/2016] [Indexed: 11/10/2022] Open

Luo C, Rodriguez-R LM, Konstantinidis KT. A user's guide to quantitative and comparative analysis of metagenomic datasets. Methods Enzymol 2013;531:525-47. [PMID: 24060135 DOI: 10.1016/b978-0-12-407863-5.00023-x] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]