Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Ziemski M, Wisanwanichthan T, Bokulich NA, Kaehler BD. Beating Naive Bayes at Taxonomic Classification of 16S rRNA Gene Sequences. Front Microbiol 2021;12:644487. [PMID: 34220738 PMCID: PMC8249850 DOI: 10.3389/fmicb.2021.644487] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2020] [Accepted: 05/31/2021] [Indexed: 12/28/2022] Open

Number

Cited by Other Article(s)

Fautt C, Couradeau E, Hockett KL. Naïve Bayes Classifiers and accompanying dataset for Pseudomonas syringae isolate characterization. Sci Data 2024;11:178. [PMID: 38326362 PMCID: PMC10850129 DOI: 10.1038/s41597-024-03003-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 01/26/2024] [Indexed: 02/09/2024] Open

Xu CCY, Lemoine J, Albert A, Whirter ÉM, Barrett RDH. Community assembly of the human piercing microbiome. Proc Biol Sci 2023;290:20231174. [PMID: 38018103 PMCID: PMC10685111 DOI: 10.1098/rspb.2023.1174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 11/03/2023] [Indexed: 11/30/2023] Open

Liu G, Li T, Zhu X, Zhang X, Wang J. An independent evaluation in a CRC patient cohort of microbiome 16S rRNA sequence analysis methods: OTU clustering, DADA2, and Deblur. Front Microbiol 2023;14:1178744. [PMID: 37560524 PMCID: PMC10408458 DOI: 10.3389/fmicb.2023.1178744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 06/14/2023] [Indexed: 08/11/2023] Open

Abstract

16S rRNA is the universal gene of microbes, and it is often used as a target gene to obtain profiles of microbial communities via next-generation sequencing (NGS) technology. Traditionally, sequences are clustered into operational taxonomic units (OTUs) at a 97% threshold based on the taxonomic standard using 16S rRNA, and methods for the reduction of sequencing errors are bypassed, which may lead to false classification units. Several denoising algorithms have been published to solve this problem, such as DADA2 and Deblur, which can correct sequencing errors at single-nucleotide resolution by generating amplicon sequence variants (ASVs). As high-resolution ASVs are becoming more popular than OTUs and only one analysis method is usually selected in a particular study, there is a need for a thorough comparison of OTU clustering and denoising pipelines. In this study, three of the most widely used 16S rRNA methods (two denoising algorithms, DADA2 and Deblur, along with de novo OTU clustering) were thoroughly compared using 16S rRNA amplification sequencing data generated from 358 clinical stool samples from the Colorectal Cancer (CRC) Screening Cohort. Our findings indicated that all approaches led to similar taxonomic profiles (with P > 0.05 in PERMNAOVA and P <0.001 in the Mantel test), although the number of ASVs/OTUs and the alpha-diversity indices varied considerably. Despite considerable differences in disease-related markers identified, disease-related analysis showed that all methods could result in similar conclusions. Fusobacterium, Streptococcus, Peptostreptococcus, Parvimonas, Gemella, and Haemophilus were identified by all three methods as enriched in the CRC group, while Roseburia, Faecalibacterium, Butyricicoccus, and Blautia were identified by all three methods as enriched in the healthy group. In addition, disease-diagnostic models generated using machine learning algorithms based on the data from these different methods all achieved good diagnostic efficiency (AUC: 0.87-0.89), with the model based on DADA2 producing the highest AUC (0.8944 and 0.8907 in the training set and test set, respectively). However, there was no significant difference in performance between the models (P >0.05). In conclusion, this study demonstrates that DADA2, Deblur, and de novo OTU clustering display similar power levels in taxa assignment and can produce similar conclusions in the case of the CRC cohort.

Collapse

Parente E, Zotta T, Giavalisco M, Ricciardi A. Metataxonomic insights in the distribution of Lactobacillaceae in foods and food environments. Int J Food Microbiol 2023;391-393:110124. [PMID: 36841075 DOI: 10.1016/j.ijfoodmicro.2023.110124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Revised: 01/09/2023] [Accepted: 02/05/2023] [Indexed: 02/23/2023]

Abstract

Members of the family Lactobacillaceae, which now includes species formerly belonging to the genera Lactobacillus and Pediococcus, but also Leuconostocaceae, are of foremost importance in food fermentations and spoilage, but also as components of animal and human microbiota and as potentially pathogenic microorganisms. Knowledge of the ecological distribution of a given species and genus is important, among other things, for the inclusion in lists of microorganisms with a Qualified Presumption of Safety or with beneficial use. The objective of this work is to use the data in FoodMicrobionet database to obtain quantitative insights (in terms of both abundance and prevalence) on the distribution of these bacteria in foods and food environments. We first explored the reliability of taxonomic assignments using the SILVA v138.1 reference database with full length and partial sequences of the 16S rRNA gene for type strain sequences. Full length 16S rRNA gene sequences allow a reasonably good classification at the genus and species level in phylogenetic trees but shorter sequences (V1-V3, V3-V4, V4) perform much worse, with type strains of many species sharing identical V4 and V3-V4 sequences. Taxonomic assignment at the genus level of 16S rRNA genes sequences and the SILVA v138.1 reference database can be done for almost all genera of the family Lactobacillaceae with a high degree of confidence for full length sequences, and with a satisfactory level of accuracy for the V1-V3 regions. Results for the V3-V4 and V4 region are still acceptable but significantly worse. Taxonomic assignment at the species level for sequences for the V1-V3, V3-V4, V4 regions of the 16S rRNA gene of members of the family Lactobacillaceae is hardly possible and, even for full length sequences, and only 49.9 % of the type strain sequences can be unambiguously assigned to species. We then used the FoodMicrobionet database to evaluate the prevalence and abundance of Lactobacillaceae in food samples and in food related environments. Generalist and specialist genera were clearly evident. The ecological distribution of several genera was confirmed and insights on the distribution and potential origin of rare genera (Dellaglioa, Holzapfelia, Schleiferilactobacillus) were obtained. We also found that combining Amplicon Sequence Variants from different studies is indeed possible, but provides little additional information, even when strict criteria are used for the filtering of sequences.

Collapse

Rajput D, Wang WJ, Chen CC. Evaluation of a decided sample size in machine learning applications. BMC Bioinformatics 2023;24:48. [PMID: 36788550 PMCID: PMC9926644 DOI: 10.1186/s12859-023-05156-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 01/23/2023] [Indexed: 02/16/2023] Open

Abstract

BACKGROUND

An appropriate sample size is essential for obtaining a precise and reliable outcome of a study. In machine learning (ML), studies with inadequate samples suffer from overfitting of data and have a lower probability of producing true effects, while the increment in sample size increases the accuracy of prediction but may not cause a significant change after a certain sample size. Existing statistical approaches using standardized mean difference, effect size, and statistical power for determining sample size are potentially biased due to miscalculations or lack of experimental details. This study aims to design criteria for evaluating sample size in ML studies. We examined the average and grand effect sizes and the performance of five ML methods using simulated datasets and three real datasets to derive the criteria for sample size. We systematically increase the sample size, starting from 16, by randomly sampling and examine the impact of sample size on classifiers' performance and both effect sizes. Tenfold cross-validation was used to quantify the accuracy.

RESULTS

The results demonstrate that the effect sizes and the classification accuracies increase while the variances in effect sizes shrink with the increment of samples when the datasets have a good discriminative power between two classes. By contrast, indeterminate datasets had poor effect sizes and classification accuracies, which did not improve by increasing sample size in both simulated and real datasets. A good dataset exhibited a significant difference in average and grand effect sizes. We derived two criteria based on the above findings to assess a decided sample size by combining the effect size and the ML accuracy. The sample size is considered suitable when it has appropriate effect sizes (≥ 0.5) and ML accuracy (≥ 80%). After an appropriate sample size, the increment in samples will not benefit as it will not significantly change the effect size and accuracy, thereby resulting in a good cost-benefit ratio.

CONCLUSION

We believe that these practical criteria can be used as a reference for both the authors and editors to evaluate whether the selected sample size is adequate for a study.

Collapse

Ultsch A, Lötsch J. Robust Classification Using Posterior Probability Threshold Computation Followed by Voronoi Cell Based Class Assignment Circumventing Pitfalls of Bayesian Analysis of Biomedical Data. Int J Mol Sci 2022;23:ijms232214081. [PMID: 36430580 PMCID: PMC9693220 DOI: 10.3390/ijms232214081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 11/09/2022] [Accepted: 11/11/2022] [Indexed: 11/17/2022] Open

Sorbie A, Delgado Jiménez R, Benakis C. Increasing transparency and reproducibility in stroke-microbiota research: A toolbox for microbiota analysis. iScience 2022;25:103998. [PMID: 35310944 PMCID: PMC8931359 DOI: 10.1016/j.isci.2022.103998] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 01/18/2022] [Accepted: 02/24/2022] [Indexed: 12/29/2022] Open

Busa J, Polaka I. Variability of Classification Results in Data with High Dimensionality and Small Sample Size. ITMS 2021. [DOI: 10.7250/itms-2021-0007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open