1
|
Yasmin A, Rahman MS, Kador SM, Ahmed MM, Moon MEK, Akhter H, Sultana M, Begum A. Metagenomic insights into microbial diversity and potential pathogenic transmission in poultry farm environments of Bangladesh. BMC Microbiol 2025; 25:318. [PMID: 40405096 PMCID: PMC12096644 DOI: 10.1186/s12866-025-03970-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2024] [Accepted: 04/16/2025] [Indexed: 05/24/2025] Open
Abstract
The microbiome plays a critical role in poultry health and productivity, influencing growth, immunity, and the overall farm ecosystem. This study investigated microbial diversity, antibiotic resistance pathways, and functional potential across various components of poultry ecosystems-cloacal swabs, droppings, feed, hand swabs, soil, and water-in different districts of Bangladesh. Using 16S rRNA gene amplicon sequencing, we identified 2,745 Operational Taxonomic Units (OTUs) and analyzed microbial richness, community structure, and functional pathways. Alpha diversity metrics revealed that droppings exhibited the highest microbial richness (726 OTUs in Noakhali), while feed samples showed the lowest diversity (211 OTUs). Beta diversity analysis indicated significant differences in microbial composition across sample sources, with PERMANOVA confirming that sample origin accounted for 51.45% of the variability (p < 0.001). Proteobacteria dominated the microbial communities (48.36%), followed by Firmicutes (19.83%) and Cyanobacteria (12.02%). Key genera of concern, such as Enterobacter (26.62% in hand swabs), Acinetobacter (30.87% in cloacal swabs), and Shigella (22.89% in cloacal swabs), were identified, highlighting potential contamination and zoonotic risks. Conversely, beneficial genera like Lactobacillus (36.89% in feed) and Enterococcus (10.78% in droppings) were prevalent, suggesting roles in gut health and nutrient cycling. Functional pathway analysis (KEGG) revealed that carbohydrate and amino acid metabolism were highly active in droppings and feed, reflecting nutrient utilization. Antimicrobial resistance (AMR) pathways, such as 23S rRNA-methyltransferase and multidrug efflux pumps, were widespread, with pathogenic genera (Enterobacter, Acinetobacter, Shigella, Pseudomonas) showing strong positive correlations with AMR pathways. These findings underscore the influence of environmental factors on microbial diversity and functional potential in poultry farming. The study highlights the need for improved management practices and biosecurity measures to mitigate risks associated with microbial pathogens and antimicrobial resistance, ultimately supporting healthier and more sustainable poultry production in Bangladesh.
Collapse
Affiliation(s)
- Afroja Yasmin
- Department of Microbiology, University of Dhaka, Dhaka, 1000, Bangladesh
- Present Department of Pathobiology, Gazipur Agricultural University, Gazipur, 1706, Bangladesh
| | - M Shaminur Rahman
- Department of Microbiology, Jashore University of Science and Technology, Jashore, Bangladesh
| | - S M Kador
- Department of Microbiology, Jashore University of Science and Technology, Jashore, Bangladesh
| | - Md Mustak Ahmed
- Department of Microbiology, Jashore University of Science and Technology, Jashore, Bangladesh
| | - Md Eashanul Karim Moon
- Department of Microbiology, Jashore University of Science and Technology, Jashore, Bangladesh
| | - Humaira Akhter
- Department of Microbiology, University of Dhaka, Dhaka, 1000, Bangladesh
| | - Munawar Sultana
- Department of Microbiology, University of Dhaka, Dhaka, 1000, Bangladesh.
| | - Anowara Begum
- Department of Microbiology, University of Dhaka, Dhaka, 1000, Bangladesh.
| |
Collapse
|
2
|
Shibata T, Ohno A, Murakami I, Takakura M, Sasagawa T, Imanishi T, Mikami M. Effect of chemical peeling therapy for treatment of cervical intraepithelial neoplasia on cervicovaginal microbiota. J Appl Microbiol 2025; 136:lxaf080. [PMID: 40312781 DOI: 10.1093/jambio/lxaf080] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 03/07/2025] [Accepted: 04/30/2025] [Indexed: 05/03/2025]
Abstract
AIM The cervicovaginal microbiome is associated with progression and regression of cervical intraepithelial neoplasia (CIN). Chemical peeling, an investigational treatment that shows promise as a non-invasive treatment for CIN, exfoliates the human papillomavirus (HPV)-infected cervical epithelium; subsequent alterations to the cervicovaginal microbiome may be a key mechanism of its effect. METHODS AND RESULTS Using a retrospective paired-sample analysis, we investigated the cervicovaginal microbiota of 28 CIN patients, comparing pre- and post-treatment samples from the same individuals who achieved high-risk HPV clearance. We used 16S ribosomal RNA gene sequencing to detect microbial markers in liquid-based cytology solution from cervical scrapings. Enrichment of Lactobacillus hominis was significantly observed after chemical peeling by differential abundance analysis. Alterations in cervicovaginal bacteria after chemical peeling predicted multiple biochemical changes such as increased selenocompound and thiamine metabolism. CONCLUSIONS Chemical peeling may modulate microbiota and bacteria-derived metabolites, thereby contributing to an additional therapeutic mechanism against CIN.
Collapse
Affiliation(s)
- Takeo Shibata
- Department of Obstetrics and Gynecology, Kanazawa Medical University, 1-1 Daigaku, Uchinada, Ishikawa 920-0293, Japan
| | - Ayumu Ohno
- Department of Molecular Life Science, Tokai University School of Medicine, 143, Shimokasuya, Isehara, Kanagawa 259-1193, Japan
- Graduate School of Medicine, Dentistry and Pharmaceutical Sciences, Okayama University, 1-1-1 Tsushima-naka, Kita-ku, Okayama, Okayama 700-8530, Japan
- Collaborative Research Centre of Okayama University for Infectious Diseases in India at, ICMR-NICED, 57 Dr. SC Banerjee Road, Beliaghata, Kolkata 700010, India
| | - Isao Murakami
- Department of Obstetrics and Gynecology, Toho University Ohashi Medical Center, 2-22-36, Ohashi, Meguro, Tokyo 153-8515, Japan
| | - Masahiro Takakura
- Department of Obstetrics and Gynecology, Kanazawa Medical University, 1-1 Daigaku, Uchinada, Ishikawa 920-0293, Japan
| | - Toshiyuki Sasagawa
- Department of Obstetrics and Gynecology, Kanazawa Medical University, 1-1 Daigaku, Uchinada, Ishikawa 920-0293, Japan
| | - Tadashi Imanishi
- Department of Molecular Life Science, Tokai University School of Medicine, 143, Shimokasuya, Isehara, Kanagawa 259-1193, Japan
| | - Mikio Mikami
- Department of Obstetrics and Gynecology, Tokai University School of Medicine, 143, Shimokasuya, Isehara, Kanagawa 259-1193, Japan
| |
Collapse
|
3
|
Bayer PE, Bennett A, Nester G, Corrigan S, Raes EJ, Cooper M, Ayad ME, McVey P, Kardailsky A, Pearce J, Fraser MW, Goncalves P, Burnell S, Rauschert S. A Comprehensive Evaluation of Taxonomic Classifiers in Marine Vertebrate eDNA Studies. Mol Ecol Resour 2025:e14107. [PMID: 40243260 DOI: 10.1111/1755-0998.14107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2024] [Revised: 12/05/2024] [Accepted: 03/11/2025] [Indexed: 04/18/2025]
Abstract
Environmental DNA (eDNA) metabarcoding is a widely used tool for surveying marine vertebrate biodiversity. To this end, many computational tools have been released and a plethora of bioinformatic approaches are used for eDNA-based community composition analysis. Simulation studies and careful evaluation of taxonomic classifiers are essential to establish reliable benchmarks to improve the accuracy and reproducibility of eDNA-based findings. Here we present a comprehensive evaluation of nine taxonomic classifiers exploring three widely used mitochondrial markers (12S rDNA, 16S rDNA and COI) in Australian marine vertebrates. Curated reference databases and exclusion database tests were used to simulate diverse species compositions, including three positive control and two negative control datasets. Using these simulated datasets ranging from 36 to 302 marker genes, we were able to identify between 19% and 89% of marine vertebrate species using mitochondrial markers. We show that MMSeqs2 and Metabuli generally outperform BLAST with 10% and 11% higher F1 scores for 12S and 16S rDNA markers, respectively, and that Naive Bayes Classifiers such as Mothur outperform sequence-based classifiers except MMSeqs2 for COI markers by 11%. Database exclusion tests reveal that MMSeqs2 and BLAST are less susceptible to false positives compared to Kraken2 with default parameters. Based on these findings, we recommend that MMSeqs2 is used for taxonomic classification of marine vertebrates given its ability to improve species-level assignments while reducing the number of false positives. Our work contributes to the establishment of best practices in eDNA-based biodiversity analysis to ultimately increase the reliability of this monitoring tool in the context of marine vertebrate conservation.
Collapse
Affiliation(s)
- Philipp E Bayer
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Adam Bennett
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Georgia Nester
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
- Minderoo-UWA Deep-Sea Research Centre, School of Biological Sciences and Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Shannon Corrigan
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Eric J Raes
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Madalyn Cooper
- Minderoo Foundation, Perth, Western Australia, Australia
| | - Marcelle E Ayad
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Philip McVey
- Minderoo Foundation, Perth, Western Australia, Australia
| | - Anya Kardailsky
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Jessica Pearce
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Matthew W Fraser
- Minderoo Foundation, Perth, Western Australia, Australia
- School of Biological Sciences, The University of Western Australia, Crawley, Western Australia, Australia
| | - Priscila Goncalves
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Stephen Burnell
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| | - Sebastian Rauschert
- Minderoo Foundation, Perth, Western Australia, Australia
- Minderoo OceanOmics Centre at UWA, Oceans Institute, The University of Western Australia, Crawley, Western Australia, Australia
| |
Collapse
|
4
|
Bokulich NA. Integrating sequence composition information into microbial diversity analyses with k-mer frequency counting. mSystems 2025; 10:e0155024. [PMID: 39976436 PMCID: PMC11915819 DOI: 10.1128/msystems.01550-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Accepted: 01/23/2025] [Indexed: 02/21/2025] Open
Abstract
k-mer frequency information in biological sequences is used for a wide range of applications, including taxonomy classification, sequence similarity estimation, and supervised learning. However, in spite of its widespread utility, k-mer counting has been largely neglected for diversity estimation. This work examines the application of k-mer counting for alpha and beta diversity as well as supervised classification from microbiome marker-gene sequencing data sets (16S rRNA gene and full-length fungal internal transcribed spacer [ITS] sequences). Results demonstrate a close correspondence with phylogenetically aware diversity metrics, and advantages for using k-mer-based metrics for measuring microbial biodiversity in microbiome sequencing surveys. k-mer counting appears to be a suitable and efficient strategy for feature processing prior to diversity estimation as well as supervised learning in microbiome surveys. This allows the incorporation of subsequence-level information into diversity estimation without the computational cost of pairwise sequence alignment. k-mer counting is proposed as a complementary approach for feature processing prior to diversity estimation and supervised learning analyses, enabling large-scale reference-free profiling of microbiomes in biogeography, ecology, and biomedical data. A method for k-mer counting from marker-gene sequence data is implemented in the QIIME 2 plugin q2-kmerizer (https://github.com/bokulich-lab/q2-kmerizer). IMPORTANCE k-mers are all of the subsequences of length k that comprise a sequence. Comparing the frequency of k-mers in DNA sequences yields valuable information about the composition of these sequences and their similarity. This work demonstrates that k-mer frequencies from marker-gene sequence surveys can be used to inform diversity estimates and machine learning predictions that incorporate sequence composition information. Alpha and beta diversity estimates based on k-mer frequencies closely correspond to phylogenetically aware diversity metrics, suggesting that k-mer-based diversity estimates are useful proxy measurements especially when reliable phylogenies are not available, as is often the case for some DNA sequence targets such as for internal transcribed spacer sequences.
Collapse
|
5
|
Xu CCY, Fugère V, Barbosa da Costa N, Beisner BE, Bell G, Cristescu ME, Fussmann GF, Gonzalez A, Shapiro BJ, Barrett RDH. Pre-exposure to stress reduces loss of community and genetic diversity following severe environmental disturbance. Curr Biol 2025; 35:1061-1073.e4. [PMID: 39933522 DOI: 10.1016/j.cub.2025.01.037] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2024] [Revised: 12/10/2024] [Accepted: 01/17/2025] [Indexed: 02/13/2025]
Abstract
Environmental stress caused by anthropogenic impacts is increasing worldwide. Understanding the ecological and evolutionary consequences for biodiversity will be crucial for our ability to respond effectively. Historical exposure to environmental stress is expected to select for resistant species, shifting community composition toward more stress-tolerant taxa. Concurrent with this species sorting process, genotypes within resistant taxa that have the highest relative fitness under severe stress are expected to increase in frequency, leading to evolutionary adaptation. However, empirical demonstrations of these dual ecological and evolutionary processes in natural communities are rare. Here, we provide evidence for simultaneous species sorting and evolutionary adaptation across multiple species within a natural freshwater bacterial community. Using a two-phase stressor experimental design (acidification pre-exposure followed by severe acidification) in aquatic mesocosms, we show that pre-exposed communities were more resistant than naive communities to taxonomic loss when faced with severe acid stress. However, after sustained severe acidification, taxonomic richness of both pre-exposed and naive communities eventually converged. All communities experiencing severe acidification became dominated by an acidophilic bacterium, Acidiphilium rubrum, but this species retained greater genetic diversity and followed distinct evolutionary trajectories in pre-exposed relative to naive communities. These patterns were shared across other acidophilic species, providing repeated evidence for the impact of pre-exposure on evolutionary outcomes despite the convergence of community profiles. Our results underscore the need to consider both ecological and evolutionary processes to accurately predict the responses of natural communities to environmental change.
Collapse
Affiliation(s)
- Charles C Y Xu
- Department of Biology, McGill University Montreal, Montreal, QC H3A 1B1, Canada.
| | - Vincent Fugère
- Department of Biology, McGill University Montreal, Montreal, QC H3A 1B1, Canada; Groupe de Recherche Interuniversitaire en Limnologie (GRIL), Montreal, QC H3C 3J7, Canada; Department of Biological Sciences, University of Québec at Montreal, Montreal, QC H2V 0B3, Canada; Département des sciences de l'environnement, Université du Québec à Trois-Rivières, Trois-Rivières, QC G8Z 4M3, Canada
| | - Naíla Barbosa da Costa
- Groupe de Recherche Interuniversitaire en Limnologie (GRIL), Montreal, QC H3C 3J7, Canada; Département des Sciences Biologiques, Université de Montréal, Montreal, QC H2V 0B3, Canada
| | - Beatrix E Beisner
- Groupe de Recherche Interuniversitaire en Limnologie (GRIL), Montreal, QC H3C 3J7, Canada; Department of Biological Sciences, University of Québec at Montreal, Montreal, QC H2V 0B3, Canada
| | - Graham Bell
- Department of Biology, McGill University Montreal, Montreal, QC H3A 1B1, Canada
| | - Melania E Cristescu
- Department of Biology, McGill University Montreal, Montreal, QC H3A 1B1, Canada; Groupe de Recherche Interuniversitaire en Limnologie (GRIL), Montreal, QC H3C 3J7, Canada
| | - Gregor F Fussmann
- Department of Biology, McGill University Montreal, Montreal, QC H3A 1B1, Canada; Groupe de Recherche Interuniversitaire en Limnologie (GRIL), Montreal, QC H3C 3J7, Canada
| | - Andrew Gonzalez
- Department of Biology, McGill University Montreal, Montreal, QC H3A 1B1, Canada
| | - B Jesse Shapiro
- Groupe de Recherche Interuniversitaire en Limnologie (GRIL), Montreal, QC H3C 3J7, Canada; Department of Microbiology and Immunology, McGill University Montreal, Montreal, QC H3A 2B4, Canada; McGill Genome Centre, McGill University Montreal, Montreal, QC H3A 0G1, Canada
| | - Rowan D H Barrett
- Department of Biology, McGill University Montreal, Montreal, QC H3A 1B1, Canada.
| |
Collapse
|
6
|
Asim MN, Ibrahim MA, Asif T, Dengel A. RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models. Heliyon 2025; 11:e41488. [PMID: 39897847 PMCID: PMC11783440 DOI: 10.1016/j.heliyon.2024.e41488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 12/23/2024] [Accepted: 12/24/2024] [Indexed: 02/04/2025] Open
Abstract
Deciphering information of RNA sequences reveals their diverse roles in living organisms, including gene regulation and protein synthesis. Aberrations in RNA sequence such as dysregulation and mutations can drive a diverse spectrum of diseases including cancers, genetic disorders, and neurodegenerative conditions. Furthermore, researchers are harnessing RNA's therapeutic potential for transforming traditional treatment paradigms into personalized therapies through the development of RNA-based drugs and gene therapies. To gain insights of biological functions and to detect diseases at early stages and develop potent therapeutics, researchers are performing diverse types RNA sequence analysis tasks. RNA sequence analysis through conventional wet-lab methods is expensive, time-consuming and error prone. To enable large-scale RNA sequence analysis, empowerment of wet-lab experimental methods with Artificial Intelligence (AI) applications necessitates scientists to have a comprehensive knowledge of both DNA and AI fields. While molecular biologists encounter challenges in understanding AI methods, computer scientists often lack basic foundations of RNA sequence analysis tasks. Considering the absence of a comprehensive literature that bridges this research gap and promotes the development of AI-driven RNA sequence analysis applications, the contributions of this manuscript are manifold: It equips AI researchers with biological foundations of 47 distinct RNA sequence analysis tasks. It sets a stage for development of benchmark datasets related to 47 distinct RNA sequence analysis tasks by facilitating cruxes of 64 different biological databases. It presents word embeddings and language models applications across 47 distinct RNA sequence analysis tasks. It streamlines the development of new predictors by providing a comprehensive survey of 58 word embeddings and 70 language models based predictive pipelines performance values as well as top performing traditional sequence encoding based predictors and their performances across 47 RNA sequence analysis tasks.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Ali Ibrahim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Tayyaba Asif
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| |
Collapse
|
7
|
Duan H(N, Hearne G, Polikar R, Rosen GL. The Naïve Bayes classifier++ for metagenomic taxonomic classification-query evaluation. Bioinformatics 2024; 41:btae743. [PMID: 39700412 PMCID: PMC11729721 DOI: 10.1093/bioinformatics/btae743] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Revised: 11/26/2024] [Accepted: 12/16/2024] [Indexed: 12/21/2024] Open
Abstract
MOTIVATION This study examines the query performance of the NBC++ (Incremental Naive Bayes Classifier) program for variations in canonicality, k-mer size, databases, and input sample data size. We demonstrate that both NBC++ and Kraken2 are influenced by database depth, with macro measures improving as depth increases. However, fully capturing the diversity of life, especially viruses, remains a challenge. RESULTS NBC++ can competitively profile the superkingdom content of metagenomic samples using a small training database. NBC++ spends less time training and can use a fraction of the memory than Kraken2 but at the cost of long querying time. Major NBC++ enhancements include accommodating canonical k-mer storage (leading to significant storage savings) and adaptable and optimized memory allocation that accelerates query analysis and enables the software to be run on nearly any system. Additionally, the output now includes log-likelihood values for each training genome, providing users with valuable confidence information. AVAILABILITY AND IMPLEMENTATION Source code and Dockerfile are available at http://github.com/EESI/Naive_Bayes.
Collapse
Affiliation(s)
- Haozhe (Neil) Duan
- Ecological and Evolutionary Signal Processing and Informatics (EESI) Laboratory, Drexel University, Philadelphia, PA 19104, United States
| | - Gavin Hearne
- Ecological and Evolutionary Signal Processing and Informatics (EESI) Laboratory, Drexel University, Philadelphia, PA 19104, United States
| | - Robi Polikar
- Signal Processing and Pattern Recognition Laboratory, Electrical and Computer Engineering, Rowan University, Glassboro, NJ 08018, United States
| | - Gail L Rosen
- Ecological and Evolutionary Signal Processing and Informatics (EESI) Laboratory, Drexel University, Philadelphia, PA 19104, United States
| |
Collapse
|
8
|
Guo F, Hu H, Peng H, Liu J, Tang C, Zhang H. Research progress on machine algorithm prediction of liver cancer prognosis after intervention therapy. Am J Cancer Res 2024; 14:4580-4596. [PMID: 39417194 PMCID: PMC11477842 DOI: 10.62347/beao1926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Accepted: 09/13/2024] [Indexed: 10/19/2024] Open
Abstract
The treatment for liver cancer has transitioned from traditional surgical resection to interventional therapies, which have become increasingly popular among patients due to their minimally invasive nature and significant local efficacy. However, with advancements in treatment technologies, accurately assessing patient response and predicting long-term survival has become a crucial research topic. Over the past decade, machine algorithms have made remarkable progress in the medical field, particularly in hepatology and prognosis studies of hepatocellular carcinoma (HCC). Machine algorithms, including deep learning and machine learning, can identify prognostic patterns and trends by analyzing vast amounts of clinical data. Despite significant advancements, several issues remain unresolved in the prognosis prediction of liver cancer using machine algorithms. Key challenges and main controversies include effectively integrating multi-source clinical data to improve prediction accuracy, addressing data privacy and ethical concerns, and enhancing the transparency and interpretability of machine algorithm decision-making processes. This paper aims to systematically review and analyze the current applications and potential of machine algorithms in predicting the prognosis of patients undergoing interventional therapy for liver cancer, providing theoretical and empirical support for future research and clinical practice.
Collapse
Affiliation(s)
- Feng Guo
- Department of Interventional Diagnosis and Treatment, Yongzhou Central Hospital, Yongzhou Clinical College, University of South ChinaYongzhou 425000, Hunan, China
| | - Hao Hu
- Department of Gynecologic Oncology, Hubei Cancer Hospital, Tongji Medical College, Huazhong University of Science and TechnologyWuhan 430079, Hubei, China
| | - Hao Peng
- Department of Abdominal Oncology, The Central Hospital of Enshi Tujia and Miao Autonomous PrefectureEnshi 445000, Hubei, China
| | - Jia Liu
- Department of Oncology, The First People’s Hospital of Changde CityChangde 415003, Hunan, China
| | - Chengbo Tang
- Department of Interventional Diagnosis and Treatment, Yongzhou Central Hospital, Yongzhou Clinical College, University of South ChinaYongzhou 425000, Hunan, China
| | - Hao Zhang
- Department of Interventional Vascular Surgery, First Affiliated Hospital of Hunan Normal University (Hunan Provincial People’s Hospital)Changsha 410000, Hunan, China
| |
Collapse
|
9
|
Fautt C, Couradeau E, Hockett KL. Naïve Bayes Classifiers and accompanying dataset for Pseudomonas syringae isolate characterization. Sci Data 2024; 11:178. [PMID: 38326362 PMCID: PMC10850129 DOI: 10.1038/s41597-024-03003-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Accepted: 01/26/2024] [Indexed: 02/09/2024] Open
Abstract
The Pseudomonas syringae species complex (PSSC) is a diverse group of plant pathogens with a collective host range encompassing almost every food crop grown today. As a threat to global food security, rapid detection and characterization of epidemic and emerging pathogenic lineages is essential. However, phylogenetic identification is often complicated by an unclarified and ever-changing taxonomy, making practical use of available databases and the proper training of classifiers difficult. As such, while amplicon sequencing is a common method for routine identification of PSSC isolates, there is no efficient method for accurate classification based on this data. Here we present a suite of five Naïve bayes classifiers for PCR primer sets widely used for PSSC identification, trained on in-silico amplicon data from 2,161 published PSSC genomes using the life identification number (LIN) hierarchical clustering algorithm in place of traditional Linnaean taxonomy. Additionally, we include a dataset for translating classification results back into traditional taxonomic nomenclature (i.e. species, phylogroup, pathovar), and for predicting virulence factor repertoires.
Collapse
Affiliation(s)
- Chad Fautt
- Department of Plant Pathology and Environmental Microbiology, Pennsylvania State University, University Park, Pennsylvania, USA.
- Department of Ecosystem Science and Management, Pennsylvania State University, University Park, Pennsylvania, USA.
- Intercollege Graduate Degree Program in Ecology, Pennsylvania State University, University Park, Pennsylvania, USA.
| | - Estelle Couradeau
- Department of Ecosystem Science and Management, Pennsylvania State University, University Park, Pennsylvania, USA.
- Intercollege Graduate Degree Program in Ecology, Pennsylvania State University, University Park, Pennsylvania, USA.
| | - Kevin L Hockett
- Department of Plant Pathology and Environmental Microbiology, Pennsylvania State University, University Park, Pennsylvania, USA.
- Intercollege Graduate Degree Program in Ecology, Pennsylvania State University, University Park, Pennsylvania, USA.
| |
Collapse
|
10
|
Xu CCY, Lemoine J, Albert A, Whirter ÉM, Barrett RDH. Community assembly of the human piercing microbiome. Proc Biol Sci 2023; 290:20231174. [PMID: 38018103 PMCID: PMC10685111 DOI: 10.1098/rspb.2023.1174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2023] [Accepted: 11/03/2023] [Indexed: 11/30/2023] Open
Abstract
Predicting how biological communities respond to disturbance requires understanding the forces that govern their assembly. We propose using human skin piercings as a model system for studying community assembly after rapid environmental change. Local skin sterilization provides a 'clean slate' within the novel ecological niche created by the piercing. Stochastic assembly processes can dominate skin microbiomes due to the influence of environmental exposure on local dispersal, but deterministic processes might play a greater role within occluded skin piercings if piercing habitats impose strong selection pressures on colonizing species. Here we explore the human ear-piercing microbiome and demonstrate that community assembly is predominantly stochastic but becomes significantly more deterministic with time, producing increasingly diverse and ecologically complex communities. We also observed changes in two dominant and medically relevant antagonists (Cutibacterium acnes and Staphylococcus epidermidis), consistent with competitive exclusion induced by a transition from sebaceous to moist environments. By exploiting this common yet uniquely human practice, we show that skin piercings are not just culturally significant but also represent ecosystem engineering on the human body. The novel habitats and communities that skin piercings produce may provide general insights into biological responses to environmental disturbances with implications for both ecosystem and human health.
Collapse
Affiliation(s)
- Charles C. Y. Xu
- Redpath Museum, McGill University, 859 Sherbrooke Street West, Montreal, Quebec, Canada H3A 0C4
- Department of Biology, McGill University, Montreal, Quebec, Canada H3A 1B1
| | - Juliette Lemoine
- Redpath Museum, McGill University, 859 Sherbrooke Street West, Montreal, Quebec, Canada H3A 0C4
- Department of Biology, McGill University, Montreal, Quebec, Canada H3A 1B1
- Department of Ecology and Evolution, University of Lausanne, Lausanne 1015, Switzerland
| | - Avery Albert
- Redpath Museum, McGill University, 859 Sherbrooke Street West, Montreal, Quebec, Canada H3A 0C4
- Department of Natural Resource Sciences, McGill University, Sainte-Anne-de-Bellevue, Quebec, Canada H9X 3V9
- Trottier Space Institute, McGill University, Montreal, Quebec, Canada H3A 2A7
| | | | - Rowan D. H. Barrett
- Redpath Museum, McGill University, 859 Sherbrooke Street West, Montreal, Quebec, Canada H3A 0C4
- Department of Biology, McGill University, Montreal, Quebec, Canada H3A 1B1
| |
Collapse
|
11
|
Liu G, Li T, Zhu X, Zhang X, Wang J. An independent evaluation in a CRC patient cohort of microbiome 16S rRNA sequence analysis methods: OTU clustering, DADA2, and Deblur. Front Microbiol 2023; 14:1178744. [PMID: 37560524 PMCID: PMC10408458 DOI: 10.3389/fmicb.2023.1178744] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2023] [Accepted: 06/14/2023] [Indexed: 08/11/2023] Open
Abstract
16S rRNA is the universal gene of microbes, and it is often used as a target gene to obtain profiles of microbial communities via next-generation sequencing (NGS) technology. Traditionally, sequences are clustered into operational taxonomic units (OTUs) at a 97% threshold based on the taxonomic standard using 16S rRNA, and methods for the reduction of sequencing errors are bypassed, which may lead to false classification units. Several denoising algorithms have been published to solve this problem, such as DADA2 and Deblur, which can correct sequencing errors at single-nucleotide resolution by generating amplicon sequence variants (ASVs). As high-resolution ASVs are becoming more popular than OTUs and only one analysis method is usually selected in a particular study, there is a need for a thorough comparison of OTU clustering and denoising pipelines. In this study, three of the most widely used 16S rRNA methods (two denoising algorithms, DADA2 and Deblur, along with de novo OTU clustering) were thoroughly compared using 16S rRNA amplification sequencing data generated from 358 clinical stool samples from the Colorectal Cancer (CRC) Screening Cohort. Our findings indicated that all approaches led to similar taxonomic profiles (with P > 0.05 in PERMNAOVA and P <0.001 in the Mantel test), although the number of ASVs/OTUs and the alpha-diversity indices varied considerably. Despite considerable differences in disease-related markers identified, disease-related analysis showed that all methods could result in similar conclusions. Fusobacterium, Streptococcus, Peptostreptococcus, Parvimonas, Gemella, and Haemophilus were identified by all three methods as enriched in the CRC group, while Roseburia, Faecalibacterium, Butyricicoccus, and Blautia were identified by all three methods as enriched in the healthy group. In addition, disease-diagnostic models generated using machine learning algorithms based on the data from these different methods all achieved good diagnostic efficiency (AUC: 0.87-0.89), with the model based on DADA2 producing the highest AUC (0.8944 and 0.8907 in the training set and test set, respectively). However, there was no significant difference in performance between the models (P >0.05). In conclusion, this study demonstrates that DADA2, Deblur, and de novo OTU clustering display similar power levels in taxa assignment and can produce similar conclusions in the case of the CRC cohort.
Collapse
Affiliation(s)
- Guang Liu
- School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
- Guangdong Hongyuan Pukong Medical Technology Co., Ltd., Guangzhou, China
| | - Tong Li
- School of Bioscience and Bioengineering, South China University of Technology, Guangzhou, China
| | - Xiaoyan Zhu
- School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Xuanping Zhang
- School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| | - Jiayin Wang
- School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
| |
Collapse
|
12
|
Metataxonomic insights in the distribution of Lactobacillaceae in foods and food environments. Int J Food Microbiol 2023; 391-393:110124. [PMID: 36841075 DOI: 10.1016/j.ijfoodmicro.2023.110124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2022] [Revised: 01/09/2023] [Accepted: 02/05/2023] [Indexed: 02/23/2023]
Abstract
Members of the family Lactobacillaceae, which now includes species formerly belonging to the genera Lactobacillus and Pediococcus, but also Leuconostocaceae, are of foremost importance in food fermentations and spoilage, but also as components of animal and human microbiota and as potentially pathogenic microorganisms. Knowledge of the ecological distribution of a given species and genus is important, among other things, for the inclusion in lists of microorganisms with a Qualified Presumption of Safety or with beneficial use. The objective of this work is to use the data in FoodMicrobionet database to obtain quantitative insights (in terms of both abundance and prevalence) on the distribution of these bacteria in foods and food environments. We first explored the reliability of taxonomic assignments using the SILVA v138.1 reference database with full length and partial sequences of the 16S rRNA gene for type strain sequences. Full length 16S rRNA gene sequences allow a reasonably good classification at the genus and species level in phylogenetic trees but shorter sequences (V1-V3, V3-V4, V4) perform much worse, with type strains of many species sharing identical V4 and V3-V4 sequences. Taxonomic assignment at the genus level of 16S rRNA genes sequences and the SILVA v138.1 reference database can be done for almost all genera of the family Lactobacillaceae with a high degree of confidence for full length sequences, and with a satisfactory level of accuracy for the V1-V3 regions. Results for the V3-V4 and V4 region are still acceptable but significantly worse. Taxonomic assignment at the species level for sequences for the V1-V3, V3-V4, V4 regions of the 16S rRNA gene of members of the family Lactobacillaceae is hardly possible and, even for full length sequences, and only 49.9 % of the type strain sequences can be unambiguously assigned to species. We then used the FoodMicrobionet database to evaluate the prevalence and abundance of Lactobacillaceae in food samples and in food related environments. Generalist and specialist genera were clearly evident. The ecological distribution of several genera was confirmed and insights on the distribution and potential origin of rare genera (Dellaglioa, Holzapfelia, Schleiferilactobacillus) were obtained. We also found that combining Amplicon Sequence Variants from different studies is indeed possible, but provides little additional information, even when strict criteria are used for the filtering of sequences.
Collapse
|
13
|
Rajput D, Wang WJ, Chen CC. Evaluation of a decided sample size in machine learning applications. BMC Bioinformatics 2023; 24:48. [PMID: 36788550 PMCID: PMC9926644 DOI: 10.1186/s12859-023-05156-9] [Citation(s) in RCA: 96] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 01/23/2023] [Indexed: 02/16/2023] Open
Abstract
BACKGROUND An appropriate sample size is essential for obtaining a precise and reliable outcome of a study. In machine learning (ML), studies with inadequate samples suffer from overfitting of data and have a lower probability of producing true effects, while the increment in sample size increases the accuracy of prediction but may not cause a significant change after a certain sample size. Existing statistical approaches using standardized mean difference, effect size, and statistical power for determining sample size are potentially biased due to miscalculations or lack of experimental details. This study aims to design criteria for evaluating sample size in ML studies. We examined the average and grand effect sizes and the performance of five ML methods using simulated datasets and three real datasets to derive the criteria for sample size. We systematically increase the sample size, starting from 16, by randomly sampling and examine the impact of sample size on classifiers' performance and both effect sizes. Tenfold cross-validation was used to quantify the accuracy. RESULTS The results demonstrate that the effect sizes and the classification accuracies increase while the variances in effect sizes shrink with the increment of samples when the datasets have a good discriminative power between two classes. By contrast, indeterminate datasets had poor effect sizes and classification accuracies, which did not improve by increasing sample size in both simulated and real datasets. A good dataset exhibited a significant difference in average and grand effect sizes. We derived two criteria based on the above findings to assess a decided sample size by combining the effect size and the ML accuracy. The sample size is considered suitable when it has appropriate effect sizes (≥ 0.5) and ML accuracy (≥ 80%). After an appropriate sample size, the increment in samples will not benefit as it will not significantly change the effect size and accuracy, thereby resulting in a good cost-benefit ratio. CONCLUSION We believe that these practical criteria can be used as a reference for both the authors and editors to evaluate whether the selected sample size is adequate for a study.
Collapse
Affiliation(s)
- Daniyal Rajput
- Institute of Cognitive Neuroscience, National Central University, Zhongda Rd, No. 300, Zhongli District, Taoyuan City, 320317, Taiwan, ROC. .,Taiwan International Graduate Program in Interdisciplinary Neuroscience, National Central University and Academia Sinica, Taipei, Taiwan, ROC.
| | - Wei-Jen Wang
- grid.37589.300000 0004 0532 3167Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan, ROC
| | - Chun-Chuan Chen
- grid.37589.300000 0004 0532 3167Institute of Cognitive Neuroscience, National Central University, Zhongda Rd, No. 300, Zhongli District, Taoyuan City, 320317 Taiwan, ROC ,grid.37589.300000 0004 0532 3167Department of Biomedical Sciences and Engineering, National Central University, Taoyuan, Taiwan, ROC
| |
Collapse
|
14
|
Ultsch A, Lötsch J. Robust Classification Using Posterior Probability Threshold Computation Followed by Voronoi Cell Based Class Assignment Circumventing Pitfalls of Bayesian Analysis of Biomedical Data. Int J Mol Sci 2022; 23:ijms232214081. [PMID: 36430580 PMCID: PMC9693220 DOI: 10.3390/ijms232214081] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2022] [Revised: 11/09/2022] [Accepted: 11/11/2022] [Indexed: 11/17/2022] Open
Abstract
Bayesian inference is ubiquitous in science and widely used in biomedical research such as cell sorting or “omics” approaches, as well as in machine learning (ML), artificial neural networks, and “big data” applications. However, the calculation is not robust in regions of low evidence. In cases where one group has a lower mean but a higher variance than another group, new cases with larger values are implausibly assigned to the group with typically smaller values. An approach for a robust extension of Bayesian inference is proposed that proceeds in two main steps starting from the Bayesian posterior probabilities. First, cases with low evidence are labeled as “uncertain” class membership. The boundary for low probabilities of class assignment (threshold ε) is calculated using a computed ABC analysis as a data-based technique for item categorization. This leaves a number of cases with uncertain classification (p < ε). Second, cases with uncertain class membership are relabeled based on the distance to neighboring classified cases based on Voronoi cells. The approach is demonstrated on biomedical data typically analyzed with Bayesian statistics, such as flow cytometric data sets or biomarkers used in medical diagnostics, where it increased the class assignment accuracy by 1−10% depending on the data set. The proposed extension of the Bayesian inference of class membership can be used to obtain robust and plausible class assignments even for data at the extremes of the distribution and/or for which evidence is weak.
Collapse
Affiliation(s)
- Alfred Ultsch
- DataBionics Research Group, University of Marburg, Hans-Meerwein-Straße 22, 35032 Marburg, Germany
| | - Jörn Lötsch
- Institute of Clinical Pharmacology, Goethe-University, Theodor-Stern-Kai 7, 60590 Frankfurt am Main, Germany
- Fraunhofer Institute for Translational Medicine and Pharmacology ITMP, Theodor-Stern-Kai 7, 60596 Frankfurt am Main, Germany
- Correspondence:
| |
Collapse
|
15
|
Sorbie A, Delgado Jiménez R, Benakis C. Increasing transparency and reproducibility in stroke-microbiota research: A toolbox for microbiota analysis. iScience 2022; 25:103998. [PMID: 35310944 PMCID: PMC8931359 DOI: 10.1016/j.isci.2022.103998] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2021] [Revised: 01/18/2022] [Accepted: 02/24/2022] [Indexed: 12/29/2022] Open
Abstract
Homeostasis of gut microbiota is crucial in maintaining human health. Alterations, or "dysbiosis," are increasingly implicated in human diseases, such as cancer, inflammatory bowel diseases, and, more recently, neurological disorders. In ischemic stroke patients, gut microbial profiles are markedly different compared to healthy controls, whereas manipulation of microbiota in animal models of stroke modulates outcome, further implicating microbiota in stroke pathobiology. Despite this, evidence for the involvement of specific microbes or microbial products and microbial signatures have yet to be identified, likely owing to differences in methodology, data analysis, and confounding variables between different studies. Here, we provide a set of guidelines to enable researchers to conduct high-quality, reproducible, and transparent microbiota studies, focusing on 16S rRNA sequencing in the emerging subfield of the stroke-microbiota. In doing so, we aim to facilitate novel and reproducible associations between the microbiota and brain diseases, including stroke, and translation into clinical practice.
Collapse
Affiliation(s)
- Adam Sorbie
- Institute for Stroke and Dementia Research (ISD), Ludwig-Maximilians-Universität, Feodor-Lynen-Straße 81377, Munich, Germany
| | - Rosa Delgado Jiménez
- Institute for Stroke and Dementia Research (ISD), Ludwig-Maximilians-Universität, Feodor-Lynen-Straße 81377, Munich, Germany
| | - Corinne Benakis
- Institute for Stroke and Dementia Research (ISD), Ludwig-Maximilians-Universität, Feodor-Lynen-Straße 81377, Munich, Germany
| |
Collapse
|
16
|
Busa J, Polaka I. Variability of Classification Results in Data with High Dimensionality and Small Sample Size. INFORMATION TECHNOLOGY AND MANAGEMENT SCIENCE 2021. [DOI: 10.7250/itms-2021-0007] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The study focuses on the analysis of biological data containing information on the number of genome sequences of intestinal microbiome bacteria before and after antibiotic use. The data have high dimensionality (bacterial taxa) and a small number of records, which is typical of bioinformatics data. Classification models induced on data sets like this usually are not stable and the accuracy metrics have high variance. The aim of the study is to create a preprocessing workflow and a classification model that can perform the most accurate classification of the microbiome into groups before and after the use of antibiotics and lessen the variability of accuracy measures of the classifier. To evaluate the accuracy of the model, measures of the area under the ROC curve and the overall accuracy of the classifier were used. In the experiments, the authors examined how classification results were affected by feature selection and increased size of the data set.
Collapse
Affiliation(s)
- Jana Busa
- Riga Technical University, Riga, Latvia
| | | |
Collapse
|