1
|
Marini S, Barquero A, Wadhwani AA, Bian J, Ruiz J, Boucher C, Prosperi M. OCTOPUS: Disk-based, Multiplatform, Mobile-friendly Metagenomics Classifier. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2025; 2024:798-807. [PMID: 40417475 PMCID: PMC12099329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/27/2025]
Abstract
Portable genomic sequencers such as Oxford Nanopore's MinION enable real-time applications in clinical and environmental health. However, there is a bottleneck in the downstream analytics when bioinformatics pipelines are unavailable, e.g., when cloud processing is unreachable due to absence of Internet connection, or only low-end computing devices can be carried on site. Here we present a platform-friendly software for portable metagenomic analysis of Nanopore data, the Oligomer-based Classifier of Taxonomic Operational and Pan-genome Units via Singletons (OCTOPUS). OCTOPUS is written in Java, reimplements several features of the popular Kraken2 and KrakenUniq software, with original components for improving metagenomics classification on incomplete/sampled reference databases, making it ideal for running on smartphones or tablets. OCTOPUS obtains sensitivity and precision comparable to Kraken2, while dramatically decreasing (4- to 16-fold) the false positive rate, and yielding high correlation on real-word data. OCTOPUS is available along with customized databases at https://github.com/DataIntellSystLab/OCTOPUS and https://github.com/Ruiz-HCI-Lab/OctopusMobile.
Collapse
Affiliation(s)
- Simone Marini
- Department of Epidemiology, University of Florida, Gainesville, USA
- Emerging Pathogens Institute, University of Florida, Gainesville, USA
| | - Alexander Barquero
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Anisha Ashok Wadhwani
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, USA
| | - Jaime Ruiz
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Mattia Prosperi
- Department of Epidemiology, University of Florida, Gainesville, USA
| |
Collapse
|
2
|
Piteková B, Hric I, Zieg J, Baranovičová E, Konopásek P, Gécz J, Planet PJ, Bielik V. The gut microbiome and metabolome in children with a first febrile urinary tract infection: a pilot study. Pediatr Nephrol 2025:10.1007/s00467-025-06782-6. [PMID: 40369126 DOI: 10.1007/s00467-025-06782-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/18/2025] [Revised: 03/31/2025] [Accepted: 03/31/2025] [Indexed: 05/16/2025]
Abstract
BACKGROUND Urinary tract infection (UTI) is a common bacterial infection in the pediatric population. Febrile urinary tract infection (fUTI) can lead to severe complications such as urosepsis as well as kidney scarring, chronic kidney disease, and systemic hypertension. Recent research supports the hypothesis that dysbiosis of the microbiome may play a role in the pathogenesis and development of fUTI in infants. Our main aim was to compare the shift in gut microbiota composition between children with the first fUTI and controls. METHODS We conducted an observational study with 17 children with the first fUTI compared to 18 healthy controls. We performed analysis of the gastrointestinal microbiome and measurements of metabolites in stool and urine. RESULTS In the gut microbiome, we found significant differences with lower α-diversity the Shannon index) and significantly lower relative abundance of probiogenic bacteria (short-chain fatty acids (SCFA)) in children with the first episode of fUTI before the start of antibiotic therapy. Furthermore, our findings confirm that the length of breastfeeding has significant influence on gut microbiota composition, reducing pathogenic bacteria and enhancing beneficial taxa. Shannon diversity, duration of breastfeeding, and specific taxa, particularly Faecalibacterium and Escherichia, emerged as strong predictors linked to the development of fUTI. CONCLUSIONS This study demonstrates that gut microbiome changes are associated with the onset of fUTI in children. Machine learning models identified Shannon index, specific bacterial taxa, and breastfeeding as strong predictors of fUTI. The study highlighted the potential role of the gut microbiome in preventing fUTI.
Collapse
Affiliation(s)
- Barbora Piteková
- Department of Pediatric Emergency Medicine, National Institute of Children's Diseases, Bratislava, Slovakia
- Department of Pediatric Urology, Faculty of Medicine, Comenius University and National Institute of Children's Diseases, Bratislava, Slovakia
- Department of Pediatrics, Slovak Medical University in Bratislava, Bratislava, Slovakia
| | - Ivan Hric
- Biomedical Center, Institute of Clinical and Translational Research, Slovak Academy of Sciences, Bratislava, Slovakia
- Department of Biological and Medical Sciences, Faculty of Physical Education and Sport, Comenius University, Bratislava, Slovakia
| | - Jakub Zieg
- Department of Pediatrics, Second Faculty of Medicine, Charles University, Prague, Czech Republic
| | - Eva Baranovičová
- Biomedical Center Martin, Jessenius Faculty of Medicine in Martin, Comenius University in Bratislava, Martin, Slovakia
| | - Patrik Konopásek
- Department of Children and Adolescents, Third Faculty of Medicine, Charles University and University Hospital Kralovske Vinohrady, Prague, Czech Republic
| | - Jakub Gécz
- Department of Pediatric Emergency Medicine, National Institute of Children's Diseases, Bratislava, Slovakia
| | - Paul J Planet
- Children's Hospital of Philadelphia, Philadelphia, PA, USA
- Department of Pediatrics, University of Pennsylvania, Philadelphia, PA, USA
| | - Viktor Bielik
- Department of Biological and Medical Sciences, Faculty of Physical Education and Sport, Comenius University, Bratislava, Slovakia.
| |
Collapse
|
3
|
Song X, Wang Y, Wang Y, Zhao K, Tong D, Gao R, Lv X, Kong D, Ruan Y, Wang M, Tang X, Li F, Luo Y, Zhu Y, Xu J, Ma B. Rhizosphere-triggered viral lysogeny mediates microbial metabolic reprogramming to enhance arsenic oxidation. Nat Commun 2025; 16:4048. [PMID: 40307209 PMCID: PMC12044158 DOI: 10.1038/s41467-025-58695-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2024] [Accepted: 03/26/2025] [Indexed: 05/02/2025] Open
Abstract
The rhizosphere is a critical hotspot for metabolic activities involving arsenic (As). While recent studies indicate many functions for soil viruses, much remains overlooked regarding their quantitative impact on rhizosphere processes. Here, we analyze time-series metagenomes of rice (Oryza sativa L.)rhizosphere and bulk soil to explore how viruses mediate rhizosphere As biogeochemistry. We observe the rhizosphere favors lysogeny in viruses associated with As-oxidizing microbes, with a positive correlation between As oxidation and the prevalence of these microbial hosts. Moreover, results demonstrate these lysogenic viruses enrich both As oxidation and phosphorus co-metabolism genes and mediated horizontal gene transfers (HGTs) of As oxidases. In silico simulation with genome-scale metabolic models (GEMs) and in vitro validation with experiments estimate that rhizosphere lysogenic viruses contribute up to 25% of microbial As oxidation. These findings enhance our comprehension of the plant-microbiome-virome interplay and highlight the potential of rhizosphere viruses for improving soil health in sustainable agriculture.
Collapse
Affiliation(s)
- Xinwei Song
- State Key Laboratory of Soil Pollution Control and Safety, Zhejiang University, Hangzhou, 310058, China
- Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, 311200, China
- Zhejiang Provincial Key Laboratory of Agricultural, Resources and Environment, College of Environmental and Resource Science, Zhejiang University, Hangzhou, 310058, China
| | - Yiling Wang
- State Key Laboratory of Soil Pollution Control and Safety, Zhejiang University, Hangzhou, 310058, China
- Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, 311200, China
- Zhejiang Provincial Key Laboratory of Agricultural, Resources and Environment, College of Environmental and Resource Science, Zhejiang University, Hangzhou, 310058, China
| | - Youjing Wang
- Zhejiang Provincial Key Laboratory of Agricultural, Resources and Environment, College of Environmental and Resource Science, Zhejiang University, Hangzhou, 310058, China
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, Hangzhou, 310058, China
| | - Kankan Zhao
- State Key Laboratory of Soil Pollution Control and Safety, Zhejiang University, Hangzhou, 310058, China
- Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, 311200, China
- Zhejiang Provincial Key Laboratory of Agricultural, Resources and Environment, College of Environmental and Resource Science, Zhejiang University, Hangzhou, 310058, China
| | - Di Tong
- Zhejiang Provincial Key Laboratory of Agricultural, Resources and Environment, College of Environmental and Resource Science, Zhejiang University, Hangzhou, 310058, China
- Key Laboratory of Environment Remediation and Ecological Health, Ministry of Education, Hangzhou, 310058, China
| | - Ruichuan Gao
- Guangdong Key Laboratory of Integrated Agro-Environmental Pollution Control and Management, Institute of Eco-environmental and Soil Sciences, Guangdong Academy of Sciences, Guangzhou, 510650, China
| | - Xiaofei Lv
- Department of Environmental Engineering, China Jiliang University, Hangzhou, 310018, China
| | - Dedong Kong
- Institute of Digital Agriculture, Zhejiang Academy of Agricultural Sciences, Hangzhou, 310021, China
| | - Yunjie Ruan
- Institute of Agricultural Bio-Environmental Engineering, College of Bio-Systems Engineering and Food Science, Zhejiang University, Hangzhou, 310058, China
- The Rural Development Academy, Zhejiang University, Hangzhou, 310058, China
| | - Mengcen Wang
- State Key Laboratory of Rice Biology and Breeding, Ministry of Agricultural and Rural Affairs Laboratory of Molecular Biology of Crop Pathogens and Insects, Zhejiang University, Hangzhou, 310058, China
| | - Xianjin Tang
- State Key Laboratory of Soil Pollution Control and Safety, Zhejiang University, Hangzhou, 310058, China
- Zhejiang Provincial Key Laboratory of Agricultural, Resources and Environment, College of Environmental and Resource Science, Zhejiang University, Hangzhou, 310058, China
| | - Fangbai Li
- Guangdong Key Laboratory of Integrated Agro-Environmental Pollution Control and Management, Institute of Eco-environmental and Soil Sciences, Guangdong Academy of Sciences, Guangzhou, 510650, China
| | - Yongming Luo
- Key Laboratory of Soil Environment and Pollution Remediation, Institute of Soil Science, Chinese Academy of Sciences, 210000, Nanjing, China
| | - Yongguan Zhu
- State Key Laboratory of Urban and Regional Ecology, Research Center for Eco-environmental Sciences, Chinese Academy of Sciences, 100085, Beijing, China
| | - Jianming Xu
- State Key Laboratory of Soil Pollution Control and Safety, Zhejiang University, Hangzhou, 310058, China
- Zhejiang Provincial Key Laboratory of Agricultural, Resources and Environment, College of Environmental and Resource Science, Zhejiang University, Hangzhou, 310058, China
| | - Bin Ma
- State Key Laboratory of Soil Pollution Control and Safety, Zhejiang University, Hangzhou, 310058, China.
- Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, 311200, China.
- Zhejiang Provincial Key Laboratory of Agricultural, Resources and Environment, College of Environmental and Resource Science, Zhejiang University, Hangzhou, 310058, China.
| |
Collapse
|
4
|
Santos AFB, Nunes M, Filipa-Silva A, Pimentel V, Pingarilho M, Abrantes P, Miranda MNS, Crespo MTB, Abecasis AB, Parreira R, Seabra SG. Wastewater Metavirome Diversity: Exploring Replicate Inconsistencies and Bioinformatic Tool Disparities. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2025; 22:707. [PMID: 40427823 DOI: 10.3390/ijerph22050707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/14/2025] [Revised: 04/11/2025] [Accepted: 04/16/2025] [Indexed: 05/29/2025]
Abstract
This study investigates viral composition in wastewater through metagenomic analysis, evaluating the performance of four bioinformatic tools-Genome Detective, CZ.ID, INSaFLU-TELEVIR and Trimmomatic + Kraken2-on samples collected from four sites in each of two wastewater treatment plants (WWTPs) in Lisbon, Portugal in April 2019. From each site, we collected and processed separately three replicates and one pool of nucleic acids extracted from the replicates. A total of 32 samples were processed using sequence-independent single-primer amplification (SISPA) and sequenced on an Illumina MiSeq platform. Across the 128 sample-tool combinations, viral read counts varied widely, from 3 to 288,464. There was a lack of consistency between replicates and their pools in terms of viral abundance and diversity, revealing the heterogeneity of the wastewater matrix and the variability in sequencing effort. There was also a difference between software tools highlighting the impact of tool selection on community profiling. A positive correlation between crAssphage and human pathogens was found, supporting crAssphage as a proxy for public health surveillance. A custom Python pipeline automated viral identification report processing, taxonomic assignments and diversity calculations, streamlining analysis and ensuring reproducibility. These findings emphasize the importance of sequencing depth, software tool selection and standardized pipelines in advancing wastewater-based epidemiology.
Collapse
Affiliation(s)
- André F B Santos
- Global Health and Tropical Medicine (GHTM), Associate Laboratory in Translation and Innovation Towards Global Health (LA-REAL), Instituto de Higiene e Medicina Tropical (IHMT), Universidade NOVA de Lisboa, 1349-008 Lisboa, Portugal
| | - Mónica Nunes
- cE3c-Centre for Ecology, Evolution and Environmental Changes & CHANGE-Global Change and Sustainability Institute, Faculdade de Ciências, Universidade de Lisboa, Campo Grande, 1749-016 Lisbon, Portugal
| | - Andreia Filipa-Silva
- CIIMAR/CIMAR-LA, Centro Interdisciplinar de Investigação Marinha e Ambiental, Universidade do Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos, s/n, 4450-208 Matosinhos, Portugal
| | - Victor Pimentel
- Global Health and Tropical Medicine (GHTM), Associate Laboratory in Translation and Innovation Towards Global Health (LA-REAL), Instituto de Higiene e Medicina Tropical (IHMT), Universidade NOVA de Lisboa, 1349-008 Lisboa, Portugal
| | - Marta Pingarilho
- Global Health and Tropical Medicine (GHTM), Associate Laboratory in Translation and Innovation Towards Global Health (LA-REAL), Instituto de Higiene e Medicina Tropical (IHMT), Universidade NOVA de Lisboa, 1349-008 Lisboa, Portugal
| | - Patrícia Abrantes
- Global Health and Tropical Medicine (GHTM), Associate Laboratory in Translation and Innovation Towards Global Health (LA-REAL), Instituto de Higiene e Medicina Tropical (IHMT), Universidade NOVA de Lisboa, 1349-008 Lisboa, Portugal
| | - Mafalda N S Miranda
- Global Health and Tropical Medicine (GHTM), Associate Laboratory in Translation and Innovation Towards Global Health (LA-REAL), Instituto de Higiene e Medicina Tropical (IHMT), Universidade NOVA de Lisboa, 1349-008 Lisboa, Portugal
| | - Maria Teresa Barreto Crespo
- iBET, Instituto de Biologia Experimental e Tecnológica, Apartado 12, 2781-901 Oeiras, Portugal
- ITQB, Instituto de Tecnologia Química e Biológica António Xavier, Universidade Nova de Lisboa, Av. da República, 2780-157 Oeiras, Portugal
| | - Ana B Abecasis
- Global Health and Tropical Medicine (GHTM), Associate Laboratory in Translation and Innovation Towards Global Health (LA-REAL), Instituto de Higiene e Medicina Tropical (IHMT), Universidade NOVA de Lisboa, 1349-008 Lisboa, Portugal
| | - Ricardo Parreira
- Global Health and Tropical Medicine (GHTM), Associate Laboratory in Translation and Innovation Towards Global Health (LA-REAL), Instituto de Higiene e Medicina Tropical (IHMT), Universidade NOVA de Lisboa, 1349-008 Lisboa, Portugal
| | - Sofia G Seabra
- Global Health and Tropical Medicine (GHTM), Associate Laboratory in Translation and Innovation Towards Global Health (LA-REAL), Instituto de Higiene e Medicina Tropical (IHMT), Universidade NOVA de Lisboa, 1349-008 Lisboa, Portugal
| |
Collapse
|
5
|
Martí JM, Kok CR, Thissen JB, Mulakken NJ, Avila-Herrera A, Jaing CJ, Allen JE, Be NA. Addressing the dynamic nature of reference data: a new nucleotide database for robust metagenomic classification. mSystems 2025; 10:e0123924. [PMID: 40111052 PMCID: PMC12013259 DOI: 10.1128/msystems.01239-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Accepted: 02/14/2025] [Indexed: 03/22/2025] Open
Abstract
Accurate metagenomic classification relies on comprehensive, up-to-date, and validated reference databases. While the NCBI BLAST Nucleotide (nt) database, encompassing a vast collection of sequences from all domains of life, represents an invaluable resource, its massive size-currently exceeding 1012 nucleotides-and exponential growth pose significant challenges for researchers seeking to maintain current nt-based indices for metagenomic classification. Recognizing that no current nt-based indices exist for the widely used Centrifuge classifier, and the last public version currently available was released in 2018, we addressed this critical gap by leveraging advanced high-performance computing resources. We present new Centrifuge-compatible nt databases, meticulously constructed using a novel pipeline incorporating different quality control measures, including reference decontamination and filtering. These measures demonstrably reduce spurious classifications, as shown through our reanalysis of published metagenomic data where Plasmodium annotations were dramatically reduced using our decontaminated database, highlighting how database quality can significantly impact research conclusions. Through temporal comparisons, we also reveal how our approach minimizes inconsistencies in taxonomic assignments stemming from asynchronous updates between public sequence and taxonomy databases. These discrepancies are particularly evident in taxa such as Listeria monocytogenes and Naegleria fowleri, where classification accuracy varied significantly across database versions. These new databases, made available as pre-built Centrifuge indexes, respond to the need for an open, robust, nt-based pipeline for taxonomic classification in metagenomics. Applications such as environmental metagenomics, forensics, and clinical metagenomics, which require comprehensive taxonomic coverage, will benefit from this resource. Our work highlights the importance of treating reference databases as dynamic entities, subject to ongoing quality control and validation akin to software development best practices. This approach is crucial for ensuring accuracy and reliability of metagenomic analysis, especially as databases continue to expand in size and complexity. IMPORTANCE Accurately identifying the diverse microbes present in a sample, whether from the human gut, a soil sample, or a crime scene, is crucial for fields ranging from medicine to environmental science. Researchers rely on comprehensive DNA databases to match sequenced DNA fragments to known microbial species. However, the widely used NCBI nt database, while vast, poses significant challenges. Its massive size makes it difficult for many researchers to use effectively with taxonomic classifiers, and inconsistencies and contamination within the database can impact the accuracy of microbial identification. This work addresses these challenges by providing cleaned, updated, and validated nt-based databases specifically optimized for the widely used Centrifuge classification tool. This new resource demonstrably reduces errors and improves the reliability of microbial identification across diverse taxonomic groups. Moreover, by providing readily usable indexes, we overcome the size barrier, enabling researchers to leverage the full potential of the nt database for metagenomic analysis. Our findings underscore the need to treat reference databases as dynamic entities, emphasizing continuous quality control and versioning as essential practices for robust and reproducible metagenomics research.
Collapse
Affiliation(s)
- Jose Manuel Martí
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, California, USA
| | - Car Reen Kok
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, California, USA
| | - James B. Thissen
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, California, USA
| | - Nisha J. Mulakken
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, California, USA
| | - Aram Avila-Herrera
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, California, USA
| | - Crystal J. Jaing
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, California, USA
| | - Jonathan E. Allen
- Global Security Computing Applications Division, Lawrence Livermore National Laboratory, Livermore, California, USA
| | - Nicholas A. Be
- Biosciences and Biotechnology Division, Lawrence Livermore National Laboratory, Livermore, California, USA
| |
Collapse
|
6
|
Refahi M, Sokhansanj BA, Mell JC, Brown JR, Yoo H, Hearne G, Rosen GL. Enhancing nucleotide sequence representations in genomic analysis with contrastive optimization. Commun Biol 2025; 8:517. [PMID: 40155693 PMCID: PMC11953366 DOI: 10.1038/s42003-025-07902-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 03/07/2025] [Indexed: 04/01/2025] Open
Abstract
Analysis of genomic and metagenomic sequences is inherently more challenging than that of amino acid sequences due to the higher divergence among evolutionarily related nucleotide sequences, variable k-mer and codon usage within and among genomes of diverse species, and poorly understood selective constraints. We introduce Scorpio (Sequence Contrastive Optimization for Representation and Predictive Inference on DNA), a versatile framework designed for nucleotide sequences that employ contrastive learning to improve embeddings. By leveraging pre-trained genomic language models and k-mer frequency embeddings, Scorpio demonstrates competitive performance in diverse applications, including taxonomic and gene classification, antimicrobial resistance (AMR) gene identification, and promoter detection. A key strength of Scorpio is its ability to generalize to novel DNA sequences and taxa, addressing a significant limitation of alignment-based methods. Scorpio has been tested on multiple datasets with DNA sequences of varying lengths (long and short) and shows robust inference capabilities. Additionally, we provide an analysis of the biological information underlying this representation, including correlations between codon adaptation index as a gene expression factor, sequence similarity, and taxonomy, as well as the functional and structural information of genes.
Collapse
Affiliation(s)
| | - Bahrad A Sokhansanj
- Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Joshua C Mell
- College of Medicine, Drexel University, Philadelphia, PA, USA
| | - James R Brown
- Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Hyunwoo Yoo
- Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Gavin Hearne
- Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA
| | - Gail L Rosen
- Electrical and Computer Engineering, Drexel University, Philadelphia, PA, USA.
| |
Collapse
|
7
|
de Campos GM, Clemente LG, Lima ARJ, Cella E, Fonseca V, Ximenez JPB, Nishiyama MY, de Carvalho E, Sampaio SC, Giovanetti M, Elias MC, Slavov SN. Anellovirus abundance as an indicator for viral metagenomic classifier utility in plasma samples. Virol J 2025; 22:88. [PMID: 40148934 PMCID: PMC11951539 DOI: 10.1186/s12985-025-02708-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2025] [Accepted: 03/13/2025] [Indexed: 03/29/2025] Open
Abstract
BACKGROUND Viral metagenomics has expanded significantly in recent years due to advancements in next-generation sequencing, establishing it as the leading method for identifying emerging viruses. A crucial step in metagenomics is taxonomic classification, where sequence data is assigned to specific taxa, thereby enabling the characterization of species composition within a sample. Various taxonomic classifiers have been developed in recent years, each employing distinct classification approaches that produce varying results and abundance profiles, even when analyzing the same sample. METHODS In this study, we propose using the identification of Torque Teno Viruses (TTVs), from the Anelloviridae family, as indicators to evaluate the performance of four short-read-based metagenomic classifiers: Kraken2, Kaiju, CLARK and DIAMOND, when evaluating human plasma samples. RESULTS Our results show that each classifier assigns TTV species at different abundance levels, potentially influencing the interpretation of diversity within samples. Specifically, nucleotide-based classifiers tend to detect a broader range of TTV species, indicating higher sensitivity, while amino acid-based classifiers like DIAMOND and CLARK display lower abundance indices. Interestingly, despite employing different algorithms and data types (protein-based vs. nucleotide-based), Kaiju and Kraken2 performed similarly. CONCLUSION Our study underscores the critical impact of classifier selection on diversity indices in metagenomic analyses. Kaiju effectively assigned a wide variety of TTV species, demonstrating it did not require a high volume of reads to capture diversity. Nucleotide-based classifiers like CLARK and Kraken2 showed superior sensitivity, which is valuable for detecting emerging or rare viruses. At the same time, protein-based approaches such as DIAMOND and Kaiju proved robust for identifying known species with low variability.
Collapse
Affiliation(s)
- Gabriel Montenegro de Campos
- Programa de Pós-graduação em Oncologia Clínica, Células-Tronco e Terapia Celular, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Prêto, Brazil
| | - Luan Gaspar Clemente
- Escola Superior de Agricultura Luiz de Queiroz, Departamento de Zootecnia, Universidade de São Paulo, Piracicaba, Brazil
| | | | - Eleonora Cella
- Burnett School of Medical Sciences, College of Medicine, University of Central Florida, Orlando, FL, USA
| | - Vagner Fonseca
- Departamento de Ciências Exatas e Terra, Universidade Estadual da Bahia, Salvador, Brazil
- Centre for Epidemic Response and Innovation (CERI), School of Data Science and Computational Thinking, Stellenbosch University, Stellenbosch, South Africa
| | - João Paulo Bianchi Ximenez
- Departamento de Análises Clínicas, Toxicológicas e Bromatológicas, Faculdade de Ciências Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, Ribeirão Prêto, Brazil
| | | | | | - Sandra Coccuzzo Sampaio
- Centro de Vigilância Viral e Avaliação Sorológica- CeVIVas, Instituto Butantan, São Paulo, Brazil
| | - Marta Giovanetti
- Department of Science and Technologies for Sustainable Development and One Health, Università Campus Bio-Medico di Roma, Rome, Italy
- Instituto Rene Rachou, Fundação Oswaldo Cruz-FIOCRUZ, Belo Horizonte, Brazil
| | - Maria Carolina Elias
- Centro de Vigilância Viral e Avaliação Sorológica- CeVIVas, Instituto Butantan, São Paulo, Brazil
| | - Svetoslav Nanev Slavov
- Centro de Vigilância Viral e Avaliação Sorológica- CeVIVas, Instituto Butantan, São Paulo, Brazil.
| |
Collapse
|
8
|
Herazo-Álvarez J, Mora M, Cuadros-Orellana S, Vilches-Ponce K, Hernández-García R. A review of neural networks for metagenomic binning. Brief Bioinform 2025; 26:bbaf065. [PMID: 40131312 PMCID: PMC11934572 DOI: 10.1093/bib/bbaf065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2024] [Revised: 01/02/2025] [Accepted: 03/07/2025] [Indexed: 03/26/2025] Open
Abstract
One of the main goals of metagenomic studies is to describe the taxonomic diversity of microbial communities. A crucial step in metagenomic analysis is metagenomic binning, which involves the (supervised) classification or (unsupervised) clustering of metagenomic sequences. Various machine learning models have been applied to address this task. In this review, the contributions of artificial neural networks (ANN) in the context of metagenomic binning are detailed, addressing both supervised, unsupervised, and semi-supervised approaches. 34 ANN-based binning tools are systematically compared, detailing their architectures, input features, datasets, advantages, disadvantages, and other relevant aspects. The findings reveal that deep learning approaches, such as convolutional neural networks and autoencoders, achieve higher accuracy and scalability than traditional methods. Gaps in benchmarking practices are highlighted, and future directions are proposed, including standardized datasets and optimization of architectures, for third-generation sequencing. This review provides support to researchers in identifying trends and selecting suitable tools for the metagenomic binning problem.
Collapse
Affiliation(s)
- Jair Herazo-Álvarez
- Doctorado en Modelamiento Matemático Aplicado, Universidad Católica del Maule, Talca, Maule 3480564, Chile
- Laboratory of Technological Research in Pattern Recognition (LITRP), Universidad Católica del Maule, Talca, Maule 3480564, Chile
| | - Marco Mora
- Laboratory of Technological Research in Pattern Recognition (LITRP), Universidad Católica del Maule, Talca, Maule 3480564, Chile
- Departamento de Computación e Industrias, Facultad de Ciencias de la Ingeniería, Universidad Católica del Maule, Talca, Maule 3480564, Chile
| | - Sara Cuadros-Orellana
- Laboratory of Technological Research in Pattern Recognition (LITRP), Universidad Católica del Maule, Talca, Maule 3480564, Chile
- Centro de Biotecnología de los Recursos Naturales (CENBio), Universidad Católica del Maule, Talca, Maule 3480564, Chile
| | - Karina Vilches-Ponce
- Laboratory of Technological Research in Pattern Recognition (LITRP), Universidad Católica del Maule, Talca, Maule 3480564, Chile
| | - Ruber Hernández-García
- Laboratory of Technological Research in Pattern Recognition (LITRP), Universidad Católica del Maule, Talca, Maule 3480564, Chile
- Departamento de Computación e Industrias, Facultad de Ciencias de la Ingeniería, Universidad Católica del Maule, Talca, Maule 3480564, Chile
| |
Collapse
|
9
|
Nowicki M, Mroczek M, Mukhedkar D, Bała P, Nikolai Pimenoff V, Arroyo Mühr LS. HPV-KITE: sequence analysis software for rapid HPV genotype detection. Brief Bioinform 2025; 26:bbaf155. [PMID: 40205852 PMCID: PMC11982018 DOI: 10.1093/bib/bbaf155] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 03/04/2025] [Accepted: 03/19/2025] [Indexed: 04/11/2025] Open
Abstract
Human papillomaviruses (HPVs) are among the most diverse viral families that infect humans. Fortunately, only a small number of closely related HPV types affect human health, most notably by causing nearly all cervical cancers, as well as some oral and other anogenital cancers, particularly when infections with high-risk HPV types become persistent. Numerous viral polymerase chain reaction-based diagnostic methods as well as sequencing protocols have been developed for accurate, rapid, and efficient HPV genotyping. However, due to the large number of closely related HPV genotypes and the abundance of nonviral DNA in human derived biological samples, it can be challenging to correctly detect HPV genotypes using high throughput deep sequencing. Here, we introduce a novel HPV detection algorithm, HPV-KITE (HPV K-mer Index Tversky Estimator), which leverages k-mer data analysis and utilizes Tversky indexing for DNA and RNA sequence data. This method offers a rapid and sensitive alternative for detecting HPV from both metagenomic and transcriptomic datasets. We assessed HPV-KITE using three previously analyzed HPV infection-related datasets, comprising a total of 1430 sequenced human samples. For benchmarking, we compared our method's performance with standard HPV sequencing analysis algorithms, including general sequence-based mapping, and k-mer-based classification methods. Parallelization demonstrated fast processing times achieved through shingling, and scalability analysis revealed optimal performance when employing multiple nodes. Our results showed that HPV-KITE is one of the fastest, most accurate, and easiest ways to detect HPV genotypes from virtually any next-generation sequencing data. Moreover, the method is also highly scalable and available to be optimized for any microorganism other than HPV.
Collapse
Affiliation(s)
- Marek Nowicki
- Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Tyniecka 15/17, PL-02-630 Warsaw, Poland
- Faculty of Mathematics and Computer Science, Nicolaus Copernicus University in Toruń, ul. Chopina 12/18, PL-87-100 Toruń, Poland
| | - Magdalena Mroczek
- Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Tyniecka 15/17, PL-02-630 Warsaw, Poland
- Department of Biomedicine, University Hospital Basel, University of Basel, Hebelstrasse 20, CH-4031 Basel, Switzerland
| | - Dhananjay Mukhedkar
- Department of Clinical Science, Intervention and Technology, Forskningsgatan 56, Karolinska University Hospital, Karolinska Institutet, SE-14186 Stockholm, Sweden
- Hopsworks AB, Åsögatan 119, SE-116 24 Stockholm, Sweden
| | - Piotr Bała
- Interdisciplinary Centre for Mathematical and Computational Modelling, University of Warsaw, ul. Tyniecka 15/17, PL-02-630 Warsaw, Poland
| | - Ville Nikolai Pimenoff
- Department of Clinical Science, Intervention and Technology, Forskningsgatan 56, Karolinska University Hospital, Karolinska Institutet, SE-14186 Stockholm, Sweden
- Research Unit of Population Health and Borealis Biobank, Faculty of Medicine, University of Oulu, Aapistie 5 B, FI-90014 University of Oulu, Finland
| | - Laila Sara Arroyo Mühr
- Department of Clinical Science, Intervention and Technology, Forskningsgatan 56, Karolinska University Hospital, Karolinska Institutet, SE-14186 Stockholm, Sweden
| |
Collapse
|
10
|
Diener C, Holscher HD, Filek K, Corbin KD, Moissl-Eichinger C, Gibbons SM. Metagenomic estimation of dietary intake from human stool. Nat Metab 2025; 7:617-630. [PMID: 39966520 PMCID: PMC11949708 DOI: 10.1038/s42255-025-01220-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/14/2024] [Accepted: 01/16/2025] [Indexed: 02/20/2025]
Abstract
Dietary intake is tightly coupled to gut microbiota composition, human metabolism and the incidence of virtually all major chronic diseases. Dietary and nutrient intake are usually assessed using self-reporting methods, including dietary questionnaires and food records, which suffer from reporting biases and require strong compliance from study participants. Here, we present Metagenomic Estimation of Dietary Intake (MEDI): a method for quantifying food-derived DNA in human faecal metagenomes. We show that DNA-containing food components can be reliably detected in stool-derived metagenomic data, even when present at low abundances (more than ten reads). We show how MEDI dietary intake profiles can be converted into detailed metabolic representations of nutrient intake. MEDI identifies the onset of solid food consumption in infants, shows significant agreement with food frequency questionnaire responses in an adult population and shows agreement with food and nutrient intake in two controlled-feeding studies. Finally, we identify specific dietary features associated with metabolic syndrome in a large clinical cohort without dietary records, providing a proof-of-concept for detailed tracking of individual-specific, health-relevant dietary patterns without the need for questionnaires.
Collapse
Affiliation(s)
- Christian Diener
- Diagnostic and Research Institute of Hygiene, Microbiology and Environmental Medicine, Medical University of Graz, Graz, Austria.
- Institute for Systems Biology, Seattle, WA, USA.
| | - Hannah D Holscher
- Department of Food Science and Human Nutrition, University of Illinois Urbana-Champaign, Urbana, IL, USA
| | - Klara Filek
- Diagnostic and Research Institute of Hygiene, Microbiology and Environmental Medicine, Medical University of Graz, Graz, Austria
| | - Karen D Corbin
- AdventHealth Translational Research Institute, Orlando, FL, USA
| | - Christine Moissl-Eichinger
- Diagnostic and Research Institute of Hygiene, Microbiology and Environmental Medicine, Medical University of Graz, Graz, Austria
- BioTechMed Graz, Graz, Austria
| | - Sean M Gibbons
- Institute for Systems Biology, Seattle, WA, USA.
- Department of Bioengineering, University of Washington, Seattle, WA, USA.
- Department of Genome Sciences, University of Washington, Seattle, WA, USA.
- eScience Institute, University of Washington, Seattle, WA, USA.
| |
Collapse
|
11
|
Puller V, Plaza Oñate F, Prifti E, de Lahondès R. Impact of simulation and reference catalogues on the evaluation of taxonomic profiling pipelines. Microb Genom 2025; 11:001330. [PMID: 39804694 PMCID: PMC11728698 DOI: 10.1099/mgen.0.001330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 11/06/2024] [Indexed: 01/16/2025] Open
Abstract
Microbiome profiling tools rely on reference catalogues, which significantly affect their performance. Comparing them is, however, challenging, mainly due to differences in their native catalogues. In this study, we present a novel standardized benchmarking framework that makes such comparisons more accurate. We decided not to customize databases but to translate results to a common reference to use the tools with their native environment. Specifically, we conducted two realistic simulations of gut microbiome samples, each based on a specific taxonomic profiler, and used two different taxonomic references to project their results, namely the Genome Taxonomy Database and the Unified Human Gastrointestinal Genome. To demonstrate the importance of using such a framework, we evaluated four established profilers as well as the impact of the simulations and that of the common taxonomic references on the perceived performance of these profilers. Finally, we provide guidelines to enhance future profiler comparisons for human microbiome ecosystems: (i) use or create realistic simulations tailored to your biological context (BC), (ii) identify a common feature space suited to your BC and independent of the catalogues used by the profilers and (iii) apply a comprehensive set of metrics covering accuracy (sensitivity/precision), overall representativity (richness/Shannon) and quantification (UniFrac and/or Aitchison distance).
Collapse
Affiliation(s)
- Vadim Puller
- GMT Science 75 route de Lyons-La-Foret, Rouen F-76000, France
| | | | - Edi Prifti
- IRD, Sorbonne Université, Unité de Modélisation Mathématique et Informatique des Systèmes Complexes, UMMISCO, 32 Avenue Henri Varagnat, Bondy F-93143, France
- Sorbonne Université, INSERM, Nutrition et Obesities; Systemic Approaches, NutriOmique, AP-HP, Hôpital Pitié-Salpêtrière, 91 Boulevard de l’Hôpital, Paris F-75013, France
| | | |
Collapse
|
12
|
Agashe R, George J, Pathak A, Fasakin O, Seaman J, Chauhan A. Shotgun metagenomics analysis indicates Bradyrhizobium spp. as the predominant genera for heavy metal resistance and bioremediation in a long-term heavy metal-contaminated ecosystem. Microbiol Resour Announc 2024; 13:e0024524. [PMID: 39499072 PMCID: PMC11636340 DOI: 10.1128/mra.00245-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2024] [Accepted: 08/28/2024] [Indexed: 11/07/2024] Open
Abstract
Ten soil cores were collected from the long-term heavy metal-contaminated Savannah River Site (SRS) and studied using shotgun metagenomics. In-line with our previous reports, Bradyrhizobium spp. dominated the SRS soils, and thus we recommend that SRS bioremediation studies target the Bradyrhizobium genus.
Collapse
Affiliation(s)
- Rohan Agashe
- School of the Environment, Florida A&M University, Tallahassee, Florida, USA
| | - Jonathan George
- School of the Environment, Florida A&M University, Tallahassee, Florida, USA
| | - Ashish Pathak
- School of the Environment, Florida A&M University, Tallahassee, Florida, USA
| | - Olasunkanmi Fasakin
- School of the Environment, Florida A&M University, Tallahassee, Florida, USA
| | - John Seaman
- Savannah River Ecology Laboratory, University of Georgia, Aiken, South Carolina, USA
| | - Ashvini Chauhan
- School of the Environment, Florida A&M University, Tallahassee, Florida, USA
| |
Collapse
|
13
|
Nechalová L, Bielik V, Hric I, Babicová M, Baranovičová E, Grendár M, Koška J, Penesová A. Gut microbiota and metabolic responses to a 12-week caloric restriction combined with strength and HIIT training in patients with obesity: a randomized trial. BMC Sports Sci Med Rehabil 2024; 16:239. [PMID: 39639405 PMCID: PMC11619444 DOI: 10.1186/s13102-024-01029-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Accepted: 11/27/2024] [Indexed: 12/07/2024]
Abstract
BACKGROUND Nowadays, obesity has become a major health issue. In addition to negatively affecting body composition and metabolic health, recent evidence shows unfavorable shifts in gut microbiota in individuals with obesity. However, the effects of weight loss on gut microbes and metabolites remain controversial. Therefore, the purpose of this study was to investigate the effects of a 12-week program on gut microbiota and metabolic health in patients with obesity. METHODS We conducted a controlled trial in 23 male and female patients with obesity. Twelve participants completed a 12-week program of caloric restriction combined with strength and HIIT training (INT, pre-BMI 37.33 ± 6.57 kg/m2), and eleven participants were designated as non-intervention controls (pre-BMI 38.65 ± 8.07 kg/m2). Metagenomic sequencing of the V3-V4 region of the 16S rDNA gene from fecal samples allowed for gut microbiota classification. Nuclear magnetic resonance spectroscopy characterized selected serum and fecal metabolite concentrations. RESULTS Within INT, we observed a significant improvement in body composition; a significant decrease in liver enzymes (AST, ALT, and GMT); a significant increase in the relative abundance of the commensal bacteria (e.g., Akkermansia muciniphila, Parabacteroides merdae, and Phocaeicola vulgatus); and a significant decrease in the relative abundance of SCFA-producing bacteria (e.g., the genera Butyrivibrio, Coprococcus, and Blautia). In addition, significant correlations were found between gut microbes, body composition, metabolic health biomarkers, and SCFAs. Notably, the Random Forest Machine Learning analysis identified predictors (Butyrivibrio fibrisolvens, Blautia caecimuris, Coprococcus comes, and waist circumference) with a moderate ability to discriminate between INT subjects pre- and post-intervention. CONCLUSIONS Our results indicate that a 12-week caloric restriction combined with strength and HIIT training positively influences body composition, metabolic health biomarkers, gut microbiota, and microbial metabolites, demonstrating significant correlations among these variables. We observed a significant increase in the relative abundance of bacteria linked to obesity, e.g., Akkermansia muciniphila. Additionally, our study contributes to the ongoing debate about the role of SCFAs in obesity, as we observed a significant decrease in SCFA producers after a 12-week program. TRIAL REGISTRATION The trial was registered on [05/12/2014] with ClinicalTrials.gov (No: NCT02325804).
Collapse
Affiliation(s)
- Libuša Nechalová
- Department of Biological and Medical Science, Faculty of Physical Education and Sport, Comenius University in Bratislava, Bratislava, 814 69, Slovakia
- Biomedical Center, Institute of Clinical and Translational Research, Slovak Academy of Sciences, Bratislava, 845 05, Slovakia
| | - Viktor Bielik
- Department of Biological and Medical Science, Faculty of Physical Education and Sport, Comenius University in Bratislava, Bratislava, 814 69, Slovakia.
| | - Ivan Hric
- Department of Biological and Medical Science, Faculty of Physical Education and Sport, Comenius University in Bratislava, Bratislava, 814 69, Slovakia
- Biomedical Center, Institute of Clinical and Translational Research, Slovak Academy of Sciences, Bratislava, 845 05, Slovakia
| | - Miriam Babicová
- Department of Biological and Medical Science, Faculty of Physical Education and Sport, Comenius University in Bratislava, Bratislava, 814 69, Slovakia
| | - Eva Baranovičová
- Biomedical Center Martin, Jessenius Faculty of Medicine in Martin, Comenius University in Bratislava, Martin, 036 01, Slovakia
| | - Marián Grendár
- Biomedical Center Martin, Jessenius Faculty of Medicine in Martin, Comenius University in Bratislava, Martin, 036 01, Slovakia
| | - Juraj Koška
- Phoenix VA Health Care System, Phoenix, AZ, USA
| | - Adela Penesová
- Department of Biological and Medical Science, Faculty of Physical Education and Sport, Comenius University in Bratislava, Bratislava, 814 69, Slovakia
- Biomedical Center, Institute of Clinical and Translational Research, Slovak Academy of Sciences, Bratislava, 845 05, Slovakia
| |
Collapse
|
14
|
Bradford LM, Carrillo C, Wong A. Managing false positives during detection of pathogen sequences in shotgun metagenomics datasets. BMC Bioinformatics 2024; 25:372. [PMID: 39627685 PMCID: PMC11613480 DOI: 10.1186/s12859-024-05952-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2023] [Accepted: 10/07/2024] [Indexed: 12/08/2024] Open
Abstract
BACKGROUND Culture-independent diagnostic tests are gaining popularity as tools for detecting pathogens in food. Shotgun sequencing holds substantial promise for food testing as it provides abundant information on microbial communities, but the challenge is in analyzing large and complex sequencing datasets with a high degree of both sensitivity and specificity. Falsely classifying sequencing reads as originating from pathogens can lead to unnecessary food recalls or production shutdowns, while low sensitivity resulting in false negatives could lead to preventable illness. RESULTS We used simulated and published shotgun sequencing datasets containing Salmonella-derived reads to explore the appearance and mitigation of false positive results using the popular taxonomic annotation softwares Kraken2 and Metaphlan4. Using default parameters, Kraken2 is sensitive but prone to false positives, while Metaphlan4 is more specific but unable to detect Salmonella at low abundance. We then developed a bioinformatic pipeline for identifying and removing reads falsely identified as Salmonella by Kraken2 while retaining high sensitivity. Carefully considering software parameters and database choices is essential to avoiding false positive sample calls. With well-chosen parameters plus additional steps to confirm the taxonomic origin of reads, it is possible to detect pathogens with very high specificity and sensitivity.
Collapse
Affiliation(s)
| | - Catherine Carrillo
- Ottawa Laboratory (Carling), Canadian Food Inspection Agency, Ottawa, Canada
| | - Alex Wong
- Department of Biology, Carleton University, Ottawa, Canada.
- Institute for Advancing Health Through Agriculture, Texas A & M University, College Station, USA.
| |
Collapse
|
15
|
Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024; 23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open
Abstract
The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | | | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
16
|
Fulke AB, Eranezhath S, Raut S, Jadhav HS. Recent toolset of metagenomics for taxonomical and functional annotation of marine associated viruses: A review. REGIONAL STUDIES IN MARINE SCIENCE 2024; 77:103728. [DOI: 10.1016/j.rsma.2024.103728] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2025]
|
17
|
Zhou J, Zhang X, Wang Y, Liang H, Yang Y, Huang X, Deng J. Contamination Survey of Insect Genomic and Transcriptomic Data. Animals (Basel) 2024; 14:3432. [PMID: 39682398 DOI: 10.3390/ani14233432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 11/05/2024] [Accepted: 11/25/2024] [Indexed: 12/18/2024] Open
Abstract
The rapid advancement of high-throughput sequencing has led to a great increase in sequencing data, resulting in a significant accumulation of contamination, for example, sequences from non-target species may be present in the target species' sequencing data. Insecta, the most diverse group within Arthropoda, still lacks a comprehensive evaluation of contamination prevalence in public databases and an analysis of potential contamination causes. In this study, COI barcodes were used to investigate contamination from insects and mammals in GenBank's genomic and transcriptomic data across four insect orders. Among the 2796 WGS and 1382 TSA assemblies analyzed, contamination was detected in 32 (1.14%) WGS and 152 (11.0%) TSA assemblies. Key findings from this study include the following: (1) TSA data exhibited more severe contamination than WGS data; (2) contamination levels varied significantly among the four orders, with Hemiptera showing 9.22%, Coleoptera 3.48%, Hymenoptera 7.66%, and Diptera 1.89% contamination rates; (3) possible causes of contamination, such as food, parasitism, sample collection, and cross-contamination, were analyzed. Overall, this study proposes a workflow for checking the existence of contamination in WGS and TSA data and some suggestions to mitigate it.
Collapse
Affiliation(s)
- Jiali Zhou
- State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Xinrui Zhang
- State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Yujie Wang
- State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Haoxian Liang
- State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Yuhao Yang
- State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Xiaolei Huang
- State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Jun Deng
- State Key Laboratory of Ecological Pest Control for Fujian and Taiwan Crops, College of Plant Protection, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| |
Collapse
|
18
|
He Y, Zhou F, Bai J, Gao Y, Huang X, Wang Y. ViTax: adaptive hierarchical viral taxonomy classification with a taxonomy belief tree on a foundation model. Brief Bioinform 2024; 26:bbaf041. [PMID: 39921398 PMCID: PMC11805961 DOI: 10.1093/bib/bbaf041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2024] [Revised: 12/18/2024] [Accepted: 01/20/2025] [Indexed: 02/10/2025] Open
Abstract
Viruses exert a profound influence on both human health and the global ecosystem, yet they remain largely unexplored. Precise taxonomic classification of viral sequences is essential for discovering novel viruses, elucidating their functions, and assessing their implications for public health and environmental monitoring. Traditional taxonomy methods based on genome references are limited by the vast number of unexplored viruses, rapid mutation rates, and high genetic diversity. Additionally, highly imbalanced species distribution and significant variances in inter-species genomic distances across taxonomic units pose challenges to classifier training. Conceptualizing genomic sequences as sentences in a natural language, large language models provide novel approaches for extracting intrinsic viral genome characteristics. In this study, we introduce ViTax, a virus taxonomy classification tool powered by HyenaDNA, a large language foundation model for long-range genomic sequences at single nucleotide resolution. ViTax integrates supervised prototypical contrastive learning to address the highly imbalanced distributions across various taxonomic clades and demonstrates superior performance to current leading methods in virus taxonomy, particularly significant for long sequences. Moreover, ViTax designs a belief mapping tree using the Lowest Common Ancestor algorithm to adaptively assign a sequence to the lowest taxonomy clade with confidence. For the open-set problem, where sequences belong to novel and unexplored genera, ViTax can adaptively assign them to a higher level of known taxonomy with outstanding performance. These capabilities make ViTax a robust tool for advancing the accuracy and reliability of viral taxonomy classification. The code is available at https://github.com/Ying-Lab/ViTax.
Collapse
Affiliation(s)
- YuShuang He
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Feng Zhou
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361005, China
| | - JiaXing Bai
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - YiChun Gao
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
| | - Xiaobing Huang
- Department of Medical Oncology, Fuzhou First Hospital Affiliated with Fujian Medical University, Fuzhou, Fujian 350108, China
| | - Ying Wang
- Department of Automation, Xiamen University, Xiamen, Fujian 361005, China
- National Institute for Data Science in Health and Medicine, Xiamen University, Xiamen, Fujian 361005, China
- State Key Laboratory of Mariculture Breeding, Xiamen Key Laboratory of Big Data Intelligent Analysis and Decision, Xiamen University, Xiamen, Fujian 350108, China
| |
Collapse
|
19
|
Lin L, Li L, Yang X, Hou L, Wu D, Wang B, Ma B, Liao X, Yan X, Gad M, Su J, Liu Y, Liu K, Hu A. Unnoticed antimicrobial resistance risk in Tibetan cities unveiled by sewage metagenomic surveillance: Compared to the eastern Chinese cities. JOURNAL OF HAZARDOUS MATERIALS 2024; 479:135730. [PMID: 39243538 DOI: 10.1016/j.jhazmat.2024.135730] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Revised: 08/26/2024] [Accepted: 08/31/2024] [Indexed: 09/09/2024]
Abstract
Sewage surveillance is a cost-effective tool for assessing antimicrobial resistance (AMR) in urban populations. However, research on sewage AMR in remote areas is still limited. Here, we used shotgun metagenomic sequencing to profile antibiotic resistance genes (ARGs) and ARG-carrying pathogens (APs) across 15 cities in Tibetan Plateau (TP) and the major cities in eastern China. Notable regional disparities in sewage ARG composition were found, with a significantly higher ARG abundance in TP (2.97 copies/cell). A total of 542 and 545 APs were identified in sewage from TP and the East, respectively, while more than 40 % carried mobile genetic elements (MGEs). Moreover, 65 MGEs-carrying APs were identified as World Health Organization (WHO) priority-like bacterial and fungal pathogens. Notably, a fungal zoonotic pathogen, Enterocytozoon bieneusi, was found for the first time to carry a nitroimidazole resistance gene (nimJ). Although distinct in AP compositions, the relative abundances of APs were comparable in these two regions. Furthermore, sewage in TP was found to be comparable to the cities in eastern China in terms of ARG mobility and AMR risks. These findings provide insights into ARGs and APs distribution in Chinese sewage and stress the importance of AMR surveillance and management strategies in remote regions.
Collapse
Affiliation(s)
- Laichang Lin
- CAS Key Laboratory of Urban Pollutant Conversion, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, China
| | - Laiyi Li
- CAS Key Laboratory of Urban Pollutant Conversion, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiaoyong Yang
- School of Environmental and Material Engineering, Yantai University, Yantai 264005, China
| | - Liyuan Hou
- Department of Civil and Environmental Engineering, Utah State University, Logan, UT 84322, United States; Utah Water Research Laboratory, 1600 Canyon Road, Logan, UT 84321, United States
| | - Dong Wu
- Key Laboratory for Urban Ecological Processes and Eco-Restoration, School of Ecological and Environmental Science, East China Normal University, Shanghai 200241, China
| | - Binhao Wang
- Zhejiang Provincial Key Laboratory of Agricultural Resources and Environment, Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou 310058, China
| | - Bin Ma
- Zhejiang Provincial Key Laboratory of Agricultural Resources and Environment, Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou 310058, China
| | - Xin Liao
- CAS Key Laboratory of Urban Pollutant Conversion, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, China; University of Chinese Academy of Sciences, Beijing 100049, China
| | - Xiuhang Yan
- CAS Key Laboratory of Urban Pollutant Conversion, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, China; College of Life Sciences, Hebei University, Baoding 071002, China
| | - Mahmoud Gad
- Water Pollution Research Department, National Research Centre, Cairo 12622, Egypt
| | - Jianqiang Su
- Key Laboratory of Urban Environment and Health, Ningbo Observation and Research Station, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, China
| | - Yongqin Liu
- Center for the Pan-Third Pole Environment, Lanzhou University, Lanzhou 730000, China; State Key Laboratory of Tibetan Plateau Earth System, Resources and Environment (TPESRE), Institute of Tibetan Plateau Research, Chinese Academy of Sciences, Beijing 100101, China
| | - Keshao Liu
- State Key Laboratory of Tibetan Plateau Earth System, Resources and Environment (TPESRE), Institute of Tibetan Plateau Research, Chinese Academy of Sciences, Beijing 100101, China.
| | - Anyi Hu
- CAS Key Laboratory of Urban Pollutant Conversion, Institute of Urban Environment, Chinese Academy of Sciences, Xiamen 361021, China; University of Chinese Academy of Sciences, Beijing 100049, China.
| |
Collapse
|
20
|
Dindhoria K, Manyapu V, Ali A, Kumar R. Unveiling the role of emerging metagenomics for the examination of hypersaline environments. Biotechnol Genet Eng Rev 2024; 40:2090-2128. [PMID: 37017219 DOI: 10.1080/02648725.2023.2197717] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Accepted: 03/28/2023] [Indexed: 04/06/2023]
Abstract
Hypersaline ecosystems are distributed all over the globe. They are subjected to poly-extreme stresses and are inhabited by halophilic microorganisms possessing multiple adaptations. The halophiles have many biotechnological applications such as nutrient supplements, antioxidant synthesis, salt tolerant enzyme production, osmolyte synthesis, biofuel production, electricity generation etc. However, halophiles are still underexplored in terms of complex ecological interactions and functions as compared to other niches. The advent of metagenomics and the recent advancement of next-generation sequencing tools have made it feasible to investigate the microflora of an ecosystem, its interactions and functions. Both target gene and shotgun metagenomic approaches are commonly employed for the taxonomic, phylogenetic, and functional analyses of the hypersaline microbial communities. This review discusses different types of hypersaline niches, their residential microflora, and an overview of the metagenomic approaches used to investigate them. Various applications, hurdles and the recent advancements in metagenomic approaches have also been focused on here for their better understanding and utilization in the study of hypersaline microbiome.
Collapse
Affiliation(s)
- Kiran Dindhoria
- Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology Palampur, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| | - Vivek Manyapu
- Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology Palampur, Palampur, Himachal Pradesh, India
| | - Ashif Ali
- Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology Palampur, Palampur, Himachal Pradesh, India
| | - Rakshak Kumar
- Biotechnology Division, CSIR-Institute of Himalayan Bioresource Technology Palampur, Palampur, Himachal Pradesh, India
- Academy of Scientific and Innovative Research (AcSIR), Ghaziabad, India
| |
Collapse
|
21
|
Mian E, Petrucci E, Pizzi C, Comin M. MISSH: Fast Hashing of Multiple Spaced Seeds. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2330-2339. [PMID: 39320990 DOI: 10.1109/tcbb.2024.3467368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/27/2024]
Abstract
Alignment-free analysis of sequences has revolutionized the high-throughput processing of sequencing data within numerous bioinformatics pipelines. Hashing -mers represents a common function across various alignment-free applications, serving as a crucial tool for indexing, querying, and rapid similarity searching. More recently, spaced seeds, a specialized pattern that accommodates errors or mutations, have become a standard choice over traditional -mers. Spaced seeds offer enhanced sensitivity in many applications when compared to -mers. However, it's important to note that hashing spaced seeds significantly increases computational time. Furthermore, if multiple spaced seeds are employed, accuracy can be further improved, albeit at the expense of longer processing times. This paper addresses the challenge of efficiently hashing multiple spaced seeds. The proposed algorithms leverage the similarity of adjacent spaced seed hash values within an input sequence, allowing for the swift computation of subsequent hashes. Our experimental results, conducted across various tests, demonstrate a remarkable performance improvement over previously suggested algorithms, with potential speedups of up to 20 times. Additionally, we apply these efficient spaced seed hashing algorithms to a metagenomic application, specifically the classification of reads using Clark-S (Ounit and Lonardi, 2016). Our findings reveal a substantial speedup, effectively mitigating the slowdown caused by the utilization of multiple spaced seeds.
Collapse
|
22
|
Ma J, Yang X, He J. Comprehensive gut microbiota composition and microbial interactions among the three age groups. PLoS One 2024; 19:e0305583. [PMID: 39423213 PMCID: PMC11488730 DOI: 10.1371/journal.pone.0305583] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Accepted: 06/02/2024] [Indexed: 10/21/2024] Open
Abstract
There is a growing interest in studying the microbiota associated with aging by integrating multiple longevity researches while minimizing the influence of confounding factors. Here, we reprocessed metagenomic sequencing data from four different aging research studies and evaluated potential confounding factors in order to minimize the batch effect. Subsequently, we detected the diversity and abundance of the gut microbiome in three different age cohorts. Out of 1053 different bacteria species, only four showed substantial depletion across different age groups: Ligilactobacillus ruminis, Turicibacter sp. H121, Blautia massiliensis, and Anaerostipes hadrus. Archaea accumulated more in young individuals compared to elderly and centenarians. Candida albicans was more prevalent in centenarians, but Nakaseomyces glabratus (also known as Candida glabrata) was more common in elderly adults. Shuimuvirus IME207 showed a significant increase in centenarians compared to both control groups. In addition, we utilized a Fisher's exact test to investigate topological properties of differentially abundant microbiota in the co-occurrence network of each age group. Microbial signatures specific to different age stages were identified based on the condition: the reads showing differential abundance were higher compared to the other age groups. Lastly, we selected Methanosarcina sp. Kolksee for the Y group, Prevotella copri for the E group and Shuimuvirus IME207 for the C group as representatives of age-related characteristics to study how their interactions change during the aging process. Our results provide crucial insights into the gut microbiome's ecological dynamics in relation to the aging process.
Collapse
Affiliation(s)
- Jun Ma
- School of Food and Biological Engineering, Shaanxi University of Science and Technology, Xi’an, Shaanxi, China
- Guangdong Provincial Key Laboratory of Microbial Safety and Health, State Key Laboratory of Applied Microbiology Southern China, Institute of Microbiology, Guangdong Academy of Sciences, Guangzhou, Guangdong, China
| | - Xiaohua Yang
- Pulmonary and Critical Care Medicine, Tongchuan People’s Hospital, Tongchuan, Shaanxi, China
| | - Jianwu He
- Pulmonary and Critical Care Medicine, Tongchuan People’s Hospital, Tongchuan, Shaanxi, China
| |
Collapse
|
23
|
Şapcı AOB, Mirarab S. Memory-bound k-mer selection for large and evolutionarily diverse reference libraries. Genome Res 2024; 34:1455-1467. [PMID: 39209553 PMCID: PMC11529837 DOI: 10.1101/gr.279339.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2024] [Accepted: 08/06/2024] [Indexed: 09/04/2024]
Abstract
Using k-mers to find sequence matches is increasingly used in many bioinformatic applications, including metagenomic sequence classification. The accuracy of these downstream applications relies on the density of the reference databases, which are rapidly growing. Although the increased density provides hope for improvements in accuracy, scalability is a concern. Reference k-mers are kept in the memory during the query time, and saving all k-mers of these ever-expanding databases is fast becoming impractical. Several strategies for subsampling have been proposed, including minimizers and finding taxon-specific k-mers. However, we contend that these strategies are inadequate, especially when reference sets are taxonomically imbalanced, as are most microbial libraries. In this paper, we explore approaches for selecting a fixed-size subset of k-mers present in an ultra-large data set to include in a library such that the classification of reads suffers the least. Our experiments demonstrate the limitations of existing approaches, especially for novel and poorly sampled groups. We propose a library construction algorithm called k-mer RANKer (KRANK) that combines several components, including a hierarchical selection strategy with adaptive size restrictions and an equitable coverage strategy. We implement KRANK in highly optimized code and combine it with the locality-sensitive hashing classifier CONSULT-II to build a taxonomic classification and profiling method. On several benchmarks, KRANK k-mer selection significantly reduces memory consumption with minimal loss in classification accuracy. We show in extensive analyses based on CAMI benchmarks that KRANK outperforms k-mer-based alternatives in terms of taxonomic profiling and comes close to the best marker-based methods in terms of accuracy.
Collapse
Affiliation(s)
- Ali Osman Berk Şapcı
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, California 92093, USA
| | - Siavash Mirarab
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, California 92093, USA;
- Department of Electrical and Computer Engineering, University of California, San Diego, California 92093, USA
| |
Collapse
|
24
|
Lu R, Dumonceaux T, Anzar M, Zovoilis A, Antonation K, Barker D, Corbett C, Nadon C, Robertson J, Eagle SHC, Lung O, Rudar J, Surujballi O, Laing C. MNBC: a multithreaded Minimizer-based Naïve Bayes Classifier for improved metagenomic sequence classification. Bioinformatics 2024; 40:btae601. [PMID: 39388213 PMCID: PMC11522871 DOI: 10.1093/bioinformatics/btae601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Revised: 10/03/2024] [Accepted: 10/08/2024] [Indexed: 10/15/2024] Open
Abstract
MOTIVATION State-of-the-art tools for classifying metagenomic sequencing reads provide both rapid and accurate options, although the combination of both in a single tool is a constantly improving area of research. The machine learning-based Naïve Bayes Classifier (NBC) approach provides a theoretical basis for accurate classification of all reads in a sample. RESULTS We developed the multithreaded Minimizer-based Naïve Bayes Classifier (MNBC) tool to improve the NBC approach by applying minimizers, as well as plurality voting for closely related classification scores. A standard reference- and test-sequence framework using simulated variable-length reads benchmarked MNBC with six other state-of-the-art tools: MetaMaps, Ganon, Kraken2, KrakenUniq, CLARK, and Centrifuge. We also applied MNBC to the "marine" and "strain-madness" short-read metagenomic datasets in the Critical Assessment of Metagenome Interpretation (CAMI) II challenge using a corresponding database from the time. MNBC efficiently identified reads from unknown microorganisms, and exhibited the highest species- and genus-level precision and recall on short reads, as well as the highest species-level precision on long reads. It also achieved the highest accuracy on the "strain-madness" dataset. AVAILABILITY AND IMPLEMENTATION MNBC is freely available at: https://github.com/ComputationalPathogens/MNBC.
Collapse
Affiliation(s)
- Ruipeng Lu
- National Centre for Animal Disease, Canadian Food Inspection Agency, Lethbridge County, AB, T1J 5R7, Canada
| | - Tim Dumonceaux
- Saskatoon Research and Development Centre, Agriculture and Agri-Food Canada, Saskatoon, SK, S7N 0X2, Canada
| | - Muhammad Anzar
- Saskatoon Research and Development Centre, Agriculture and Agri-Food Canada, Saskatoon, SK, S7N 0X2, Canada
| | - Athanasios Zovoilis
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, R3E 0J9, Canada
| | - Kym Antonation
- National Microbiology Laboratory at Winnipeg, Public Health Agency of Canada, Winnipeg, MB, R3E 3M4, Canada
| | - Dillon Barker
- National Microbiology Laboratory at Winnipeg, Public Health Agency of Canada, Winnipeg, MB, R3E 3M4, Canada
| | - Cindi Corbett
- National Microbiology Laboratory at Winnipeg, Public Health Agency of Canada, Winnipeg, MB, R3E 3M4, Canada
| | - Celine Nadon
- National Microbiology Laboratory at Winnipeg, Public Health Agency of Canada, Winnipeg, MB, R3E 3M4, Canada
| | - James Robertson
- National Microbiology Laboratory at Guelph, Public Health Agency of Canada, Guelph, ON, N1G 3W4, Canada
| | - Shannon H C Eagle
- National Microbiology Laboratory at Guelph, Public Health Agency of Canada, Guelph, ON, N1G 3W4, Canada
| | - Oliver Lung
- National Centre for Foreign Animal Disease, Canadian Food Inspection Agency, Winnipeg, MB, R3E 3M4, Canada
| | - Josip Rudar
- National Centre for Foreign Animal Disease, Canadian Food Inspection Agency, Winnipeg, MB, R3E 3M4, Canada
| | - Om Surujballi
- Ottawa Animal Health Laboratory, Canadian Food Inspection Agency, Ottawa, ON, K2J 4S1, Canada
| | - Chad Laing
- National Centre for Animal Disease, Canadian Food Inspection Agency, Lethbridge County, AB, T1J 5R7, Canada
| |
Collapse
|
25
|
Do VH, Nguyen VS, Nguyen SH, Le DQ, Nguyen TT, Nguyen CH, Ho TH, Vo NS, Nguyen T, Nguyen HA, Cao MD. PanKA: Leveraging population pangenome to predict antibiotic resistance. iScience 2024; 27:110623. [PMID: 39228791 PMCID: PMC11369404 DOI: 10.1016/j.isci.2024.110623] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 04/14/2024] [Accepted: 07/29/2024] [Indexed: 09/05/2024] Open
Abstract
Machine learning has the potential to be a powerful tool in the fight against antimicrobial resistance (AMR), a critical global health issue. Machine learning can identify resistance mechanisms from DNA sequence data without prior knowledge. The first step in building a machine learning model is a feature extraction from sequencing data. Traditional methods like single nucleotide polymorphism (SNP) calling and k-mer counting yield numerous, often redundant features, complicating prediction and analysis. In this paper, we propose PanKA, a method using the pangenome to extract a concise set of relevant features for predicting AMR. PanKA not only enables fast model training and prediction but also improves accuracy. Applied to the Escherichia coli and Klebsiella pneumoniae bacterial species, our model is more accurate than conventional and state-of-the-art methods in predicting AMR.
Collapse
Affiliation(s)
- Van Hoan Do
- Center for Applied Mathematics and Informatics, Le Quy Don Technical University, Hanoi, Vietnam
| | - Van Sang Nguyen
- Center for Biomedical Informatics, Vingroup Big Data Institute, Hanoi, Vietnam
| | | | - Duc Quang Le
- Faculty of IT, Hanoi University of Civil Engineering, Hanoi, Vietnam
| | - Tam Thi Nguyen
- Oxford University Clinical Research Unit, Hanoi, Vietnam
| | - Canh Hao Nguyen
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, Japan
| | - Tho Huu Ho
- Department of Medical Microbiology, The 103 Military Hospital, Vietnam Military Medical University, Hanoi, Vietnam
- Department of Genomics & Cytogenetics, Institute of Biomedicine & Pharmacy, Vietnam Military Medical University, Hanoi, Vietnam
| | - Nam S. Vo
- Center for Biomedical Informatics, Vingroup Big Data Institute, Hanoi, Vietnam
| | | | | | | |
Collapse
|
26
|
Marini S, Barquero A, Wadhwani AA, Bian J, Ruiz J, Boucher C, Prosperi M. OCTOPUS: Disk-based, Multiplatform, Mobile-friendly Metagenomics Classifier. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.15.585215. [PMID: 38559026 PMCID: PMC10979967 DOI: 10.1101/2024.03.15.585215] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Portable genomic sequencers such as Oxford Nanopore's MinION enable real-time applications in clinical and environmental health. However, there is a bottleneck in the downstream analytics when bioinformatics pipelines are unavailable, e.g., when cloud processing is unreachable due to absence of Internet connection, or only low-end computing devices can be carried on site. Here we present a platform-friendly software for portable metagenomic analysis of Nanopore data, the Oligomer-based Classifier of Taxonomic Operational and Pan-genome Units via Singletons (OCTOPUS). OCTOPUS is written in Java, reimplements several features of the popular Kraken2 and KrakenUniq software, with original components for improving metagenomics classification on incomplete/sampled reference databases, making it ideal for running on smartphones or tablets. OCTOPUS obtains sensitivity and precision comparable to Kraken2, while dramatically decreasing (4- to 16-fold) the false positive rate, and yielding high correlation on real-word data. OCTOPUS is available along with customized databases at https://github.com/DataIntellSystLab/OCTOPUS and https://github.com/Ruiz-HCI-Lab/OctopusMobile.
Collapse
Affiliation(s)
- Simone Marini
- Department of Epidemiology, University of Florida, Gainesville, USA
- Emerging Pathogens Institute, University of Florida, Gainesville, USA
| | - Alexander Barquero
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Anisha Ashok Wadhwani
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, University of Florida, USA
| | - Jaime Ruiz
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Christina Boucher
- Department of Computer and Information Science and Engineering, University of Florida, USA
| | - Mattia Prosperi
- Department of Epidemiology, University of Florida, Gainesville, USA
| |
Collapse
|
27
|
Phumiphanjarphak W, Aiewsakun P. Entourage: all-in-one sequence analysis software for genome assembly, virus detection, virus discovery, and intrasample variation profiling. BMC Bioinformatics 2024; 25:222. [PMID: 38914932 PMCID: PMC11197340 DOI: 10.1186/s12859-024-05846-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Accepted: 06/14/2024] [Indexed: 06/26/2024] Open
Abstract
BACKGROUND Pan-virus detection, and virome investigation in general, can be challenging, mainly due to the lack of universally conserved genetic elements in viruses. Metagenomic next-generation sequencing can offer a promising solution to this problem by providing an unbiased overview of the microbial community, enabling detection of any viruses without prior target selection. However, a major challenge in utilising metagenomic next-generation sequencing for virome investigation is that data analysis can be highly complex, involving numerous data processing steps. RESULTS Here, we present Entourage to address this challenge. Entourage enables short-read sequence assembly, viral sequence search with or without reference virus targets using contig-based approaches, and intrasample sequence variation quantification. Several workflows are implemented in Entourage to facilitate end-to-end virus sequence detection analysis through a single command line, from read cleaning, sequence assembly, to virus sequence searching. The results generated are comprehensive, allowing for thorough quality control, reliability assessment, and interpretation. We illustrate Entourage's utility as a streamlined workflow for virus detection by employing it to comprehensively search for target virus sequences and beyond in raw sequence read data generated from HeLa cell culture samples spiked with viruses. Furthermore, we showcase its flexibility and performance on a real-world dataset by analysing a preassembled Tara Oceans dataset. Overall, our results show that Entourage performs well even with low virus sequencing depth in single digits, and it can be used to discover novel viruses effectively. Additionally, by using sequence data generated from a patient with chronic SARS-CoV-2 infection, we demonstrate Entourage's capability to quantify virus intrasample genetic variations, and generate publication-quality figures illustrating the results. CONCLUSIONS Entourage is an all-in-one, versatile, and streamlined bioinformatics software for virome investigation, developed with a focus on ease of use. Entourage is available at https://codeberg.org/CENMIG/Entourage under the MIT license.
Collapse
Affiliation(s)
- Worakorn Phumiphanjarphak
- Department of Microbiology, Faculty of Science, Mahidol University, Ratchathewi District, 272 Rama VI Road, Bangkok, 10400, Thailand
- Pornchai Matangkasombut Center for Microbial Genomics, Department of Microbiology, Faculty of Science, Mahidol University, Bangkok, Thailand
| | - Pakorn Aiewsakun
- Department of Microbiology, Faculty of Science, Mahidol University, Ratchathewi District, 272 Rama VI Road, Bangkok, 10400, Thailand.
- Pornchai Matangkasombut Center for Microbial Genomics, Department of Microbiology, Faculty of Science, Mahidol University, Bangkok, Thailand.
| |
Collapse
|
28
|
Schmid MW, Moradi A, Leigh DM, Schuman MC, van Moorsel SJ. Covering the bases: Population genomic structure of Lemna minor and the cryptic species L. japonica in Switzerland. Ecol Evol 2024; 14:e11599. [PMID: 38882534 PMCID: PMC11178436 DOI: 10.1002/ece3.11599] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/27/2024] [Accepted: 06/03/2024] [Indexed: 06/18/2024] Open
Abstract
Duckweeds, including the common duckweed Lemna minor, are increasingly used to test eco-evolutionary theories. Yet, despite its popularity and near-global distribution, the understanding of its population structure (and genetic variation therein) is still limited. It is essential that this is resolved, because of the impact genetic diversity has on experimental responses and scientific understanding. Through whole-genome sequencing, we assessed the genetic diversity and population genomic structure of 23 natural Lemna spp. populations from their natural range in Switzerland. We used two distinct analytical approaches, a reference-free kmer approach and the classical reference-based one. Two genetic clusters were identified across the described species distribution of L. minor, surprisingly corresponding to species-level divisions. The first cluster contained the targeted L. minor individuals and the second contained individuals from a cryptic species: Lemna japonica. Within the L. minor cluster, we identified a well-defined population structure with little intra-population genetic diversity (i.e., within ponds) but high inter-population diversity (i.e., between ponds). In L. japonica, the population structure was significantly weaker and genetic variation between a subset of populations was as low as within populations. This study revealed that L. japonica is more widespread than previously thought. Our findings signify that thorough genotype-to-phenotype analyses are needed in duckweed experimental ecology and evolution.
Collapse
Affiliation(s)
| | - Aboubakr Moradi
- Department of Geography University of Zurich Zurich Switzerland
- Department of Chemistry University of Zurich Zurich Switzerland
| | - Deborah M Leigh
- Swiss Federal Research Institute WSL Birmensdorf Switzerland
| | - Meredith C Schuman
- Department of Geography University of Zurich Zurich Switzerland
- Department of Chemistry University of Zurich Zurich Switzerland
| | | |
Collapse
|
29
|
Tian Q, Zhang P, Zhai Y, Wang Y, Zou Q. Application and Comparison of Machine Learning and Database-Based Methods in Taxonomic Classification of High-Throughput Sequencing Data. Genome Biol Evol 2024; 16:evae102. [PMID: 38748485 PMCID: PMC11135637 DOI: 10.1093/gbe/evae102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2024] [Indexed: 05/30/2024] Open
Abstract
The advent of high-throughput sequencing technologies has not only revolutionized the field of bioinformatics but has also heightened the demand for efficient taxonomic classification. Despite technological advancements, efficiently processing and analyzing the deluge of sequencing data for precise taxonomic classification remains a formidable challenge. Existing classification approaches primarily fall into two categories, database-based methods and machine learning methods, each presenting its own set of challenges and advantages. On this basis, the aim of our study was to conduct a comparative analysis between these two methods while also investigating the merits of integrating multiple database-based methods. Through an in-depth comparative study, we evaluated the performance of both methodological categories in taxonomic classification by utilizing simulated data sets. Our analysis revealed that database-based methods excel in classification accuracy when backed by a rich and comprehensive reference database. Conversely, while machine learning methods show superior performance in scenarios where reference sequences are sparse or lacking, they generally show inferior performance compared with database methods under most conditions. Moreover, our study confirms that integrating multiple database-based methods does, in fact, enhance classification accuracy. These findings shed new light on the taxonomic classification of high-throughput sequencing data and bear substantial implications for the future development of computational biology. For those interested in further exploring our methods, the source code of this study is publicly available on https://github.com/LoadStar822/Genome-Classifier-Performance-Evaluator. Additionally, a dedicated webpage showcasing our collected database, data sets, and various classification software can be found at http://lab.malab.cn/~tqz/project/taxonomic/.
Collapse
Affiliation(s)
- Qinzhong Tian
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003 China
| | - Pinglu Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003 China
| | - Yixiao Zhai
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003 China
| | - Yansu Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003 China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003 China
| |
Collapse
|
30
|
Song L, Langmead B. Centrifuger: lossless compression of microbial genomes for efficient and accurate metagenomic sequence classification. Genome Biol 2024; 25:106. [PMID: 38664753 PMCID: PMC11046777 DOI: 10.1186/s13059-024-03244-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 04/10/2024] [Indexed: 04/28/2024] Open
Abstract
Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries. Combining this compression method with other strategies for compacting the Ferragina-Manzini (FM) index, Centrifuger reduces the memory footprint by half compared to other FM-index-based approaches. Furthermore, the lossless compression and the unconstrained match length help Centrifuger achieve greater accuracy than competing methods at lower taxonomic levels.
Collapse
Affiliation(s)
- Li Song
- Department of Biomedical Data Science, Dartmouth College, Hanover, NH, USA.
- Department of Computer Science, Dartmouth College, Hanover, NH, USA.
- Department of Microbiology and Immunology, Dartmouth College, Hanover, NH, USA.
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| |
Collapse
|
31
|
Zheng A, Shaw J, Yu YW. Mora: abundance aware metagenomic read re-assignment for disentangling similar strains. BMC Bioinformatics 2024; 25:161. [PMID: 38649836 PMCID: PMC11035124 DOI: 10.1186/s12859-024-05768-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Accepted: 04/05/2024] [Indexed: 04/25/2024] Open
Abstract
BACKGROUND Taxonomic classification of reads obtained by metagenomic sequencing is often a first step for understanding a microbial community, but correctly assigning sequencing reads to the strain or sub-species level has remained a challenging computational problem. RESULTS We introduce Mora, a MetagenOmic read Re-Assignment algorithm capable of assigning short and long metagenomic reads with high precision, even at the strain level. Mora is able to accurately re-assign reads by first estimating abundances through an expectation-maximization algorithm and then utilizing abundance information to re-assign query reads. The key idea behind Mora is to maximize read re-assignment qualities while simultaneously minimizing the difference from estimated abundance levels, allowing Mora to avoid over assigning reads to the same genomes. On simulated diverse reads, this allows Mora to achieve F1 scores comparable to other algorithms while having less runtime. However, Mora significantly outshines other algorithms on very similar reads. We show that the high penalty of over assigning reads to a common reference genome allows Mora to accurately infer correct strains for real data in the form of E. coli reads. CONCLUSIONS Mora is a fast and accurate read re-assignment algorithm that is modularized, allowing it to be incorporated into general metagenomics and genomics workflows. It is freely available at https://github.com/AfZheng126/MORA .
Collapse
Affiliation(s)
- Andrew Zheng
- Mathematics, University of Toronto, 27 King's College Circle, Toronto, Ontario, M3R 0A3, Canada
| | - Jim Shaw
- Mathematics, University of Toronto, 27 King's College Circle, Toronto, Ontario, M3R 0A3, Canada.
| | - Yun William Yu
- Mathematics, University of Toronto, 27 King's College Circle, Toronto, Ontario, M3R 0A3, Canada.
- Computer and Mathematical Sciences, University of Toronto at Scarborough, 1265 Military Trail, Toronto, Ontario, M1C 1A4, Canada.
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, Pennsylvania, 15213, USA.
| |
Collapse
|
32
|
Pham DT, Phan V. MetaBIDx: a new computational approach to bacteria identification in microbiomes. MICROBIOME RESEARCH REPORTS 2024; 3:25. [PMID: 38841411 PMCID: PMC11149084 DOI: 10.20517/mrr.2024.01] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/07/2024] [Revised: 03/04/2024] [Accepted: 03/25/2024] [Indexed: 06/07/2024]
Abstract
Objectives: This study introduces MetaBIDx, a computational method designed to enhance species prediction in metagenomic environments. The method addresses the challenge of accurate species identification in complex microbiomes, which is due to the large number of generated reads and the ever-expanding number of bacterial genomes. Bacterial identification is essential for disease diagnosis and tracing outbreaks associated with microbial infections. Methods: MetaBIDx utilizes a modified Bloom filter for efficient indexing of reference genomes and incorporates a novel strategy for reducing false positives by clustering species based on their genomic coverages by identified reads. The approach was evaluated and compared with several well-established tools across various datasets. Precision, recall, and F1-score were used to quantify the accuracy of species prediction. Results: MetaBIDx demonstrated superior performance compared to other tools, especially in terms of precision and F1-score. The application of clustering based on approximate coverages significantly improved precision in species identification, effectively minimizing false positives. We further demonstrated that other methods can also benefit from our approach to removing false positives by clustering species based on approximate coverages. Conclusion: With a novel approach to reducing false positives and the effective use of a modified Bloom filter to index species, MetaBIDx represents an advancement in metagenomic analysis. The findings suggest that the proposed approach could also benefit other metagenomic tools, indicating its potential for broader application in the field. The study lays the groundwork for future improvements in computational efficiency and the expansion of microbial databases.
Collapse
Affiliation(s)
| | - Vinhthuy Phan
- Department of Computer Science, University of Memphis, Memphis, TN 38152, USA
| |
Collapse
|
33
|
Wang B, Ma B, Zhang Y, Stirling E, Yan Q, He Z, Liu Z, Yuan X, Zhang H. Global diversity, coexistence and consequences of resistome in inland waters. WATER RESEARCH 2024; 253:121253. [PMID: 38350193 DOI: 10.1016/j.watres.2024.121253] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Revised: 01/04/2024] [Accepted: 02/01/2024] [Indexed: 02/15/2024]
Abstract
Human activities have long impacted the health of Earth's rivers and lakes. These inland waters, crucial for our survival and productivity, have suffered from contamination which allows the formation and spread of antibiotic-resistant genes (ARGs) and consequently, ARG-carrying pathogens (APs). Yet, our global understanding of waterborne pathogen antibiotic resistance remains in its infancy. To shed light on this, our study examined 1240 metagenomic samples from both open and closed inland waters. We identified 22 types of ARGs, 19 types of mobile genetic elements (MGEs), and 14 types of virulence factors (VFs). Our findings showed that open waters have a higher average abundance and richness of ARGs, MGEs, and VFs, with more robust co-occurrence network compared to closed waters. Out of the samples studied, 321 APs were detected, representing a 43 % detection rate. Of these, the resistance gene 'bacA' was the most predominant. Notably, AP hotspots were identified in regions including East Asia, India, Western Europe, the eastern United States, and Brazil. Our research underscores how human activities profoundly influence the diversity and spread of resistome. It also emphasizes that both abiotic and biotic factors play pivotal roles in the emergence of ARG-carrying pathogens.
Collapse
Affiliation(s)
- Binhao Wang
- School of Engineering, Hangzhou Normal University, Hangzhou 310018, PR China
| | - Bin Ma
- Zhejiang Provincial Key Laboratory of Agricultural Resources and Environment, Institute of Soil and Water Resources and Environmental Science, College of Environmental and Resource Sciences, Zhejiang University, Hangzhou 310058, PR China; Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou 310058, PR China
| | - Yinan Zhang
- School of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou 310036, PR China
| | - Erinne Stirling
- Agriculture and Food, Commonwealth Scientific and Industrial Research Organisation, Adelaide 5064, Australia; School of Biological Sciences, The University of Adelaide, Adelaide 5005, Australia
| | - Qingyun Yan
- Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai 519080, PR China
| | - Zhili He
- Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Zhuhai 519080, PR China
| | - Zhiquan Liu
- School of Engineering, Hangzhou Normal University, Hangzhou 310018, PR China
| | - Xia Yuan
- School of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou 310036, PR China
| | - Hangjun Zhang
- School of Engineering, Hangzhou Normal University, Hangzhou 310018, PR China; Hangzhou International Urbanology Research Center and Center for Zhejiang Urban Governance Studies, Hangzhou, 311121, PR China.
| |
Collapse
|
34
|
Şapcı AOB, Rachtman E, Mirarab S. CONSULT-II: accurate taxonomic identification and profiling using locality-sensitive hashing. Bioinformatics 2024; 40:btae150. [PMID: 38492564 PMCID: PMC10985673 DOI: 10.1093/bioinformatics/btae150] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 02/17/2024] [Accepted: 03/14/2024] [Indexed: 03/18/2024] Open
Abstract
MOTIVATION Taxonomic classification of short reads and taxonomic profiling of metagenomic samples are well-studied yet challenging problems. The presence of species belonging to groups without close representation in a reference dataset is particularly challenging. While k-mer-based methods have performed well in terms of running time and accuracy, they tend to have reduced accuracy for such novel species. Thus, there is a growing need for methods that combine the scalability of k-mers with increased sensitivity. RESULTS Here, we show that using locality-sensitive hashing (LSH) can increase the sensitivity of the k-mer-based search. Our method, which combines LSH with several heuristics techniques including soft lowest common ancestor labeling and voting, is more accurate than alternatives in both taxonomic classification of individual reads and abundance profiling. AVAILABILITY AND IMPLEMENTATION CONSULT-II is implemented in C++, and the software, together with reference libraries, is publicly available on GitHub https://github.com/bo1929/CONSULT-II.
Collapse
Affiliation(s)
- Ali Osman Berk Şapcı
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, CA 92093, United States
| | - Eleonora Rachtman
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, CA 92093, United States
| | - Siavash Mirarab
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, CA 92093, United States
- Department of Electrical and Computer Engineering, University of California, San Diego, CA 92093, United States
| |
Collapse
|
35
|
Kan CM, Tsang HF, Pei XM, Ng SSM, Yim AKY, Yu ACS, Wong SCC. Enhancing Clinical Utility: Utilization of International Standards and Guidelines for Metagenomic Sequencing in Infectious Disease Diagnosis. Int J Mol Sci 2024; 25:3333. [PMID: 38542307 PMCID: PMC10970082 DOI: 10.3390/ijms25063333] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Revised: 03/11/2024] [Accepted: 03/12/2024] [Indexed: 11/11/2024] Open
Abstract
Metagenomic sequencing has emerged as a transformative tool in infectious disease diagnosis, offering a comprehensive and unbiased approach to pathogen detection. Leveraging international standards and guidelines is essential for ensuring the quality and reliability of metagenomic sequencing in clinical practice. This review explores the implications of international standards and guidelines for the application of metagenomic sequencing in infectious disease diagnosis. By adhering to established standards, such as those outlined by regulatory bodies and expert consensus, healthcare providers can enhance the accuracy and clinical utility of metagenomic sequencing. The integration of international standards and guidelines into metagenomic sequencing workflows can streamline diagnostic processes, improve pathogen identification, and optimize patient care. Strategies in implementing these standards for infectious disease diagnosis using metagenomic sequencing are discussed, highlighting the importance of standardized approaches in advancing precision infectious disease diagnosis initiatives.
Collapse
Affiliation(s)
- Chau-Ming Kan
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China; (C.-M.K.); (H.F.T.)
| | - Hin Fung Tsang
- Department of Health Technology and Informatics, The Hong Kong Polytechnic University, Hong Kong, China; (C.-M.K.); (H.F.T.)
| | - Xiao Meng Pei
- Department of Applied Biology & Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, China;
| | - Simon Siu Man Ng
- Department of Surgery, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong, China;
| | | | - Allen Chi-Shing Yu
- Codex Genetics Limited, Shatin, Hong Kong, China; (A.K.-Y.Y.); (A.C.-S.Y.)
| | - Sze Chuen Cesar Wong
- Department of Applied Biology & Chemical Technology, The Hong Kong Polytechnic University, Hong Kong, China;
| |
Collapse
|
36
|
Podda M, Bonechi S, Palladino A, Scaramuzzino M, Brozzi A, Roma G, Muzzi A, Priami C, Sîrbu A, Bodini M. Classification of Neisseria meningitidis genomes with a bag-of-words approach and machine learning. iScience 2024; 27:109257. [PMID: 38439962 PMCID: PMC10910294 DOI: 10.1016/j.isci.2024.109257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2023] [Revised: 12/13/2023] [Accepted: 02/13/2024] [Indexed: 03/06/2024] Open
Abstract
Whole genome sequencing of bacteria is important to enable strain classification. Using entire genomes as an input to machine learning (ML) models would allow rapid classification of strains while using information from multiple genetic elements. We developed a "bag-of-words" approach to encode, using SentencePiece or k-mer tokenization, entire bacterial genomes and analyze these with ML. Initial model selection identified SentencePiece with 8,000 and 32,000 words as the best approach for genome tokenization. We then classified in Neisseria meningitidis genomes the capsule B group genotype with 99.6% accuracy and the multifactor invasive phenotype with 90.2% accuracy, in an independent test set. Subsequently, in silico knockouts of 2,808 genes confirmed that the ML model predictions aligned with our current understanding of the underlying biology. To our knowledge, this is the first ML method using entire bacterial genomes to classify strains and identify genes considered relevant by the classifier.
Collapse
Affiliation(s)
- Marco Podda
- Vaccines Discovery Data Sciences, GSK Vaccines, GSK, 53100 Siena, Italy
| | - Simone Bonechi
- Vaccines Discovery Data Sciences, GSK Vaccines, GSK, 53100 Siena, Italy
- Department of Computer Science, University of Pisa, 56127 Pisa, Italy
| | - Andrea Palladino
- Vaccines Discovery Data Sciences, GSK Vaccines, GSK, 53100 Siena, Italy
| | | | - Alessandro Brozzi
- Vaccines Discovery Data Sciences, GSK Vaccines, GSK, 53100 Siena, Italy
| | - Guglielmo Roma
- Vaccines Discovery Data Sciences, GSK Vaccines, GSK, 53100 Siena, Italy
| | - Alessandro Muzzi
- Vaccines Discovery Data Sciences, GSK Vaccines, GSK, 53100 Siena, Italy
| | - Corrado Priami
- Department of Computer Science, University of Pisa, 56127 Pisa, Italy
| | - Alina Sîrbu
- Department of Computer Science, University of Pisa, 56127 Pisa, Italy
| | - Margherita Bodini
- Vaccines Discovery Data Sciences, GSK Vaccines, GSK, 53100 Siena, Italy
| |
Collapse
|
37
|
Sawada Y, Minei R, Tabata H, Ikemura T, Wada K, Wada Y, Nagata H, Iwasaki Y. Unsupervised AI reveals insect species-specific genome signatures. PeerJ 2024; 12:e17025. [PMID: 38464746 PMCID: PMC10924456 DOI: 10.7717/peerj.17025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2023] [Accepted: 02/07/2024] [Indexed: 03/12/2024] Open
Abstract
Insects are a highly diverse phylogeny and possess a wide variety of traits, including the presence or absence of wings and metamorphosis. These diverse traits are of great interest for studying genome evolution, and numerous comparative genomic studies have examined a wide phylogenetic range of insects. Here, we analyzed 22 insects belonging to a wide phylogenetic range (Endopterygota, Paraneoptera, Polyneoptera, Palaeoptera, and other insects) by using a batch-learning self-organizing map (BLSOM) for oligonucleotide compositions in their genomic fragments (100-kb or 1-Mb sequences), which is an unsupervised machine learning algorithm that can extract species-specific characteristics of the oligonucleotide compositions (genome signatures). The genome signature is of particular interest in terms of the mechanisms and biological significance that have caused the species-specific difference, and can be used as a powerful search needle to explore the various roles of genome sequences other than protein coding, and can be used to unveil mysteries hidden in the genome sequence. Since BLSOM is an unsupervised clustering method, the clustering of sequences was performed based on the oligonucleotide composition alone, without providing information about the species from which each fragment sequence was derived. Therefore, not only the interspecies separation, but also the intraspecies separation can be achieved. Here, we have revealed the specific genomic regions with oligonucleotide compositions distinct from the usual sequences of each insect genome, e.g., Mb-level structures found for a grasshopper Schistocerca americana. One aim of this study was to compare the genome characteristics of insects with those of vertebrates, especially humans, which are phylogenetically distant from insects. Recently, humans seem to be the "model organism" for which a large amount of information has been accumulated using a variety of cutting-edge and high-throughput technologies. Therefore, it is reasonable to use the abundant information from humans to study insect lineages. The specific regions of Mb length with distinct oligonucleotide compositions have also been previously observed in the human genome. These regions were enriched by transcription factor binding motifs (TFBSs) and hypothesized to be involved in the three-dimensional arrangement of chromosomal DNA in interphase nuclei. The present study characterized the species-specific oligonucleotide compositions (i.e., genome signatures) in insect genomes and identified specific genomic regions with distinct oligonucleotide compositions.
Collapse
Affiliation(s)
- Yui Sawada
- Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Tamura-cho, Japan
| | - Ryuhei Minei
- Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Tamura-cho, Japan
| | - Hiromasa Tabata
- Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Tamura-cho, Japan
| | - Toshimichi Ikemura
- Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Tamura-cho, Japan
| | - Kennosuke Wada
- Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Tamura-cho, Japan
| | - Yoshiko Wada
- Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Tamura-cho, Japan
| | - Hiroshi Nagata
- Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Tamura-cho, Japan
| | - Yuki Iwasaki
- Department of Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama-shi, Tamura-cho, Japan
| |
Collapse
|
38
|
Masenya K, Manganyi MC, Dikobe TB. Exploring Cereal Metagenomics: Unravelling Microbial Communities for Improved Food Security. Microorganisms 2024; 12:510. [PMID: 38543562 PMCID: PMC10974370 DOI: 10.3390/microorganisms12030510] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Revised: 02/21/2024] [Accepted: 02/28/2024] [Indexed: 11/12/2024] Open
Abstract
Food security is an urgent global challenge, with cereals playing a crucial role in meeting the nutritional requirements of populations worldwide. In recent years, the field of metagenomics has emerged as a powerful tool for studying the microbial communities associated with cereal crops and their impact on plant health and growth. This chapter aims to provide a comprehensive overview of cereal metagenomics and its role in enhancing food security through the exploration of beneficial and pathogenic microbial interactions. Furthermore, we will examine how the integration of metagenomics with other tools can effectively address the adverse effects on food security. For this purpose, we discuss the integration of metagenomic data and machine learning in providing novel insights into the dynamic interactions shaping plant-microbe relationships. We also shed light on the potential applications of leveraging microbial diversity and epigenetic modifications in improving crop resilience and yield sustainability. Ultimately, cereal metagenomics has revolutionized the field of food security by harnessing the potential of beneficial interactions between cereals and their microbiota, paving the way for sustainable agricultural practices.
Collapse
Affiliation(s)
- Kedibone Masenya
- National Zoological Gardens, South African National Biodiversity Institute, 32 Boom St., Pretoria 0001, South Africa
| | - Madira Coutlyne Manganyi
- Department of Biological and Environmental Sciences, Sefako Makgatho Health Sciences University, P.O. Box 139, Pretoria 0204, South Africa;
| | - Tshegofatso Bridget Dikobe
- Department of Botany, School of Biological Sciences, North-West University, Private Bag X2046, Mmabatho 2735, South Africa;
| |
Collapse
|
39
|
Diener C, Gibbons SM. Metagenomic estimation of dietary intake from human stool. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.02.578701. [PMID: 38370672 PMCID: PMC10871216 DOI: 10.1101/2024.02.02.578701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/20/2024]
Abstract
Dietary intake is tightly coupled to gut microbiota composition, human metabolism, and to the incidence of virtually all major chronic diseases. Dietary and nutrient intake are usually quantified using dietary questionnaires, which tend to focus on broad food categories, suffer from self-reporting biases, and require strong compliance from study participants. Here, we present MEDI (Metagenomic Estimation of Dietary Intake): a method for quantifying dietary intake using food-derived DNA in stool metagenomes. We show that food items can be accurately detected in metagenomic shotgun sequencing data, even when present at low abundances (>10 reads). Furthermore, we show how dietary intake, in terms of DNA abundance from specific organisms, can be converted into a detailed metabolic representation of nutrient intake. MEDI could identify the onset of solid food consumption in infants and it accurately predicted food questionnaire responses in an adult population. Additionally, we were able to identify specific dietary features associated with metabolic syndrome in a large clinical cohort, providing a proof-of-concept for detailed quantification of individual-specific dietary patterns without the need for questionnaires.
Collapse
Affiliation(s)
- Christian Diener
- Diagnostic and Research Institute of Hygiene, Microbiology and Environmental Medicine, Medical University of Graz, Graz, Austria
- Institute for Systems Biology, Seattle, WA, USA
| | - Sean M. Gibbons
- Institute for Systems Biology, Seattle, WA, USA
- Department of Bioengineering, University of Washington, Seattle, WA, USA
- Department of Genome Sciences, University of Washington, Seattle, WA, USA
- eScience Institute, University of Washington, Seattle, WA, USA
| |
Collapse
|
40
|
Bálint B, Merényi Z, Hegedüs B, Grigoriev IV, Hou Z, Földi C, Nagy LG. ContScout: sensitive detection and removal of contamination from annotated genomes. Nat Commun 2024; 15:936. [PMID: 38296951 PMCID: PMC10831095 DOI: 10.1038/s41467-024-45024-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 01/08/2024] [Indexed: 02/02/2024] Open
Abstract
Contamination of genomes is an increasingly recognized problem affecting several downstream applications, from comparative evolutionary genomics to metagenomics. Here we introduce ContScout, a precise tool for eliminating foreign sequences from annotated genomes. It achieves high specificity and sensitivity on synthetic benchmark data even when the contaminant is a closely related species, outperforms competing tools, and can distinguish horizontal gene transfer from contamination. A screen of 844 eukaryotic genomes for contamination identified bacteria as the most common source, followed by fungi and plants. Furthermore, we show that contaminants in ancestral genome reconstructions lead to erroneous early origins of genes and inflate gene loss rates, leading to a false notion of complex ancestral genomes. Taken together, we offer here a tool for sensitive removal of foreign proteins, identify and remove contaminants from diverse eukaryotic genomes and evaluate their impact on phylogenomic analyses.
Collapse
Affiliation(s)
- Balázs Bálint
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary
| | - Zsolt Merényi
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary
| | - Botond Hegedüs
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary
| | - Igor V Grigoriev
- U.S. Department of Energy Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Department of Plant and Microbial Biology, University of California Berkeley, Berkeley, CA, 94720, USA
| | - Zhihao Hou
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary
- Doctoral School of Biology, Faculty of Science and Informatics, University of Szeged, Szeged, 6720, Hungary
| | - Csenge Földi
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary
- Doctoral School of Biology, Faculty of Science and Informatics, University of Szeged, Szeged, 6720, Hungary
| | - László G Nagy
- Synthetic and Systems Biology Unit, HUN-REN Biological Research Centre, Szeged, Szeged, 6726, Hungary.
| |
Collapse
|
41
|
Fan J, Khan J, Singh NP, Pibiri GE, Patro R. Fulgor: a fast and compact k-mer index for large-scale matching and color queries. Algorithms Mol Biol 2024; 19:3. [PMID: 38254124 PMCID: PMC10810250 DOI: 10.1186/s13015-024-00251-9] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 01/03/2024] [Indexed: 01/24/2024] Open
Abstract
The problem of sequence identification or matching-determining the subset of reference sequences from a given collection that are likely to contain a short, queried nucleotide sequence-is relevant for many important tasks in Computational Biology, such as metagenomics and pangenome analysis. Due to the complex nature of such analyses and the large scale of the reference collections a resource-efficient solution to this problem is of utmost importance. This poses the threefold challenge of representing the reference collection with a data structure that is efficient to query, has light memory usage, and scales well to large collections. To solve this problem, we describe an efficient colored de Bruijn graph index, arising as the combination of a k-mer dictionary with a compressed inverted index. The proposed index takes full advantage of the fact that unitigs in the colored compacted de Bruijn graph are monochromatic (i.e., all k-mers in a unitig have the same set of references of origin, or color). Specifically, the unitigs are kept in the dictionary in color order, thereby allowing for the encoding of the map from k-mers to their colors in as little as 1 + o(1) bits per unitig. Hence, one color per unitig is stored in the index with almost no space/time overhead. By combining this property with simple but effective compression methods for integer lists, the index achieves very small space. We implement these methods in a tool called Fulgor, and conduct an extensive experimental analysis to demonstrate the improvement of our tool over previous solutions. For example, compared to Themisto-the strongest competitor in terms of index space vs. query time trade-off-Fulgor requires significantly less space (up to 43% less space for a collection of 150,000 Salmonella enterica genomes), is at least twice as fast for color queries, and is 2-6[Formula: see text] faster to construct.
Collapse
Affiliation(s)
- Jason Fan
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Jamshed Khan
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | - Noor Pratap Singh
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
| | | | - Rob Patro
- Department of Computer Science, University of Maryland, College Park, MD, 20742, USA.
| |
Collapse
|
42
|
Marić J, Križanović K, Riondet S, Nagarajan N, Šikić M. Comparative analysis of metagenomic classifiers for long-read sequencing datasets. BMC Bioinformatics 2024; 25:15. [PMID: 38212694 PMCID: PMC10782538 DOI: 10.1186/s12859-024-05634-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Accepted: 01/02/2024] [Indexed: 01/13/2024] Open
Abstract
BACKGROUND Long reads have gained popularity in the analysis of metagenomics data. Therefore, we comprehensively assessed metagenomics classification tools on the species taxonomic level. We analysed kmer-based tools, mapping-based tools and two general-purpose long reads mappers. We evaluated more than 20 pipelines which use either nucleotide or protein databases and selected 13 for an extensive benchmark. We prepared seven synthetic datasets to test various scenarios, including the presence of a host, unknown species and related species. Moreover, we used available sequencing data from three well-defined mock communities, including a dataset with abundance varying from 0.0001 to 20% and six real gut microbiomes. RESULTS General-purpose mappers Minimap2 and Ram achieved similar or better accuracy on most testing metrics than best-performing classification tools. They were up to ten times slower than the fastest kmer-based tools requiring up to four times less RAM. All tested tools were prone to report organisms not present in datasets, except CLARK-S, and they underperformed in the case of the high presence of the host's genetic material. Tools which use a protein database performed worse than those based on a nucleotide database. Longer read lengths made classification easier, but due to the difference in read length distributions among species, the usage of only the longest reads reduced the accuracy. The comparison of real gut microbiome datasets shows a similar abundance profiles for the same type of tools but discordance in the number of reported organisms and abundances between types. Most assessments showed the influence of database completeness on the reports. CONCLUSION The findings indicate that kmer-based tools are well-suited for rapid analysis of long reads data. However, when heightened accuracy is essential, mappers demonstrate slightly superior performance, albeit at a considerably slower pace. Nevertheless, a combination of diverse categories of tools and databases will likely be necessary to analyse complex samples. Discrepancies observed among tools when applied to real gut datasets, as well as a reduced performance in cases where unknown species or a significant proportion of the host genome is present in the sample, highlight the need for continuous improvement of existing tools. Additionally, regular updates and curation of databases are important to ensure their effectiveness.
Collapse
Affiliation(s)
- Josip Marić
- Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000, Zagreb, Croatia
| | - Krešimir Križanović
- Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000, Zagreb, Croatia
| | - Sylvain Riondet
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117596, Republic of Singapore
| | - Niranjan Nagarajan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore.
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117596, Republic of Singapore.
| | - Mile Šikić
- Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000, Zagreb, Croatia.
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore.
| |
Collapse
|
43
|
Alvarez RV, Landsman D. GTax: improving de novo transcriptome assembly by removing foreign RNA contamination. Genome Biol 2024; 25:12. [PMID: 38191464 PMCID: PMC10773103 DOI: 10.1186/s13059-023-03141-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Accepted: 12/08/2023] [Indexed: 01/10/2024] Open
Abstract
The cost and complexity of generating a complete reference genome means that many organisms lack an annotated reference. An alternative is to use a de novo reference transcriptome. This technology is cost-effective but is susceptible to off-target RNA contamination. In this manuscript, we present GTax, a taxonomy-structured database of genomic sequences that can be used with BLAST to detect and remove foreign contamination in RNA sequencing samples before assembly. In addition, we use a de novo transcriptome assembly of Solanum lycopersicum (tomato) to demonstrate that removing foreign contamination in sequencing samples reduces the number of assembled chimeric transcripts.
Collapse
Affiliation(s)
- Roberto Vera Alvarez
- Computational Biology Branch, National Center for Biotechnology Information, Intramural Research Program, National Library of Medicine, NIH, Bethesda, MD, USA
| | - David Landsman
- Computational Biology Branch, National Center for Biotechnology Information, Intramural Research Program, National Library of Medicine, NIH, Bethesda, MD, USA.
| |
Collapse
|
44
|
Zhao Y, Huang F, Wang W, Gao R, Fan L, Wang A, Gao SH. Application of high-throughput sequencing technologies and analytical tools for pathogen detection in urban water systems: Progress and future perspectives. THE SCIENCE OF THE TOTAL ENVIRONMENT 2023; 900:165867. [PMID: 37516185 DOI: 10.1016/j.scitotenv.2023.165867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Revised: 07/25/2023] [Accepted: 07/26/2023] [Indexed: 07/31/2023]
Abstract
The ubiquitous presence of pathogenic microorganisms, such as viruses, bacteria, fungi, and protozoa, in urban water systems poses a significant risk to public health. The emergence of infectious waterborne diseases mediated by urban water systems has become one of the leading global causes of mortality. However, the detection and monitoring of these pathogenic microorganisms have been limited by the complexity and diversity in the environmental samples. Conventional methods were restricted by long assay time, high benchmarks of identification, and narrow application sceneries. Novel technologies, such as high-throughput sequencing technologies, enable potentially full-spectrum detection of trace pathogenic microorganisms in complex environmental matrices. This review discusses the current state of high-throughput sequencing technologies for identifying pathogenic microorganisms in urban water systems with a concise summary. Furthermore, future perspectives in pathogen research emphasize the need for detection methods with high accuracy and sensitivity, the establishment of precise detection standards and procedures, and the significance of bioinformatics software and platforms. We have compiled a list of pathogens analysis software/platforms/databases that boast robust engines and high accuracy for preference. We highlight the significance of analyses by combining targeted and non-targeted sequencing technologies, short and long reads technologies, sequencing technologies, and bioinformatic tools in pursuing upgraded biosafety in urban water systems.
Collapse
Affiliation(s)
- Yanmei Zhao
- State Key Laboratory of Urban Water Resource and Environment, School of Civil & Environmental Engineering, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China
| | - Fang Huang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Wenxiu Wang
- Department of Ocean Science and Engineering, Southern University of Science and Technology (SUSTech), Shenzhen, China.
| | - Rui Gao
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Lu Fan
- Department of Ocean Science and Engineering, Southern University of Science and Technology (SUSTech), Shenzhen, China; Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou, China
| | - Aijie Wang
- State Key Laboratory of Urban Water Resource and Environment, School of Civil & Environmental Engineering, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China; State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin 150090, China
| | - Shu-Hong Gao
- State Key Laboratory of Urban Water Resource and Environment, School of Civil & Environmental Engineering, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China.
| |
Collapse
|
45
|
Song L, Langmead B. Centrifuger: lossless compression of microbial genomes for efficient and accurate metagenomic sequence classification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.15.567129. [PMID: 38014029 PMCID: PMC10680779 DOI: 10.1101/2023.11.15.567129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries. Combining this compression method with other strategies for compacting the Ferragina-Manzini (FM) index, Centrifuger reduces the memory footprint by half compared to other FM-index-based approaches. Furthermore, the lossless compression and the unconstrained match length help Centrifuger achieve greater accuracy than competing methods at lower taxonomic levels.
Collapse
Affiliation(s)
- Li Song
- Department of Biomedical Data Science, Dartmouth College, Hanover, NH
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, MD
| |
Collapse
|
46
|
Mandal RK, Mandal A, Denny JE, Namazii R, John CC, Schmidt NW. Gut Bacteroides act in a microbial consortium to cause susceptibility to severe malaria. Nat Commun 2023; 14:6465. [PMID: 37833304 PMCID: PMC10575898 DOI: 10.1038/s41467-023-42235-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 10/03/2023] [Indexed: 10/15/2023] Open
Abstract
Malaria is caused by Plasmodium species and remains a significant cause of morbidity and mortality globally. Gut bacteria can influence the severity of malaria, but the contribution of specific bacteria to the risk of severe malaria is unknown. Here, multiomics approaches demonstrate that specific species of Bacteroides are causally linked to the risk of severe malaria. Plasmodium yoelii hyperparasitemia-resistant mice gavaged with murine-isolated Bacteroides fragilis develop P. yoelii hyperparasitemia. Moreover, Bacteroides are significantly more abundant in Ugandan children with severe malarial anemia than with asymptomatic P. falciparum infection. Human isolates of Bacteroides caccae, Bacteroides uniformis, and Bacteroides ovatus were able to cause susceptibility to severe malaria in mice. While monocolonization of germ-free mice with Bacteroides alone is insufficient to cause susceptibility to hyperparasitemia, meta-analysis across multiple studies support a main role for Bacteroides in susceptibility to severe malaria. Approaches that target gut Bacteroides present an opportunity to prevent severe malaria and associated deaths.
Collapse
Affiliation(s)
- Rabindra K Mandal
- Ryan White Center for Pediatric Infectious Diseases and Global Health, Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Anita Mandal
- Ryan White Center for Pediatric Infectious Diseases and Global Health, Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Joshua E Denny
- Department of Microbiology and Immunology, University of Louisville, Louisville, KY, USA
| | - Ruth Namazii
- Department of Paediatrics and Child Health, Makerere University, Kampala, Uganda
| | - Chandy C John
- Ryan White Center for Pediatric Infectious Diseases and Global Health, Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Nathan W Schmidt
- Ryan White Center for Pediatric Infectious Diseases and Global Health, Herman B Wells Center for Pediatric Research, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, USA.
- Department of Microbiology and Immunology, University of Louisville, Louisville, KY, USA.
| |
Collapse
|
47
|
Rumbavicius I, Rounge TB, Rognes T. HoCoRT: host contamination removal tool. BMC Bioinformatics 2023; 24:371. [PMID: 37784008 PMCID: PMC10544359 DOI: 10.1186/s12859-023-05492-w] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Accepted: 09/21/2023] [Indexed: 10/04/2023] Open
Abstract
BACKGROUND Shotgun metagenome sequencing data obtained from a host environment will usually be contaminated with sequences from the host organism. Host sequences should be removed before further analysis to avoid biases, reduce downstream computational load, or ensure privacy in the case of a human host. The tools that we identified, as designed specifically to perform host contamination sequence removal, were either outdated, not maintained, or complicated to use. Consequently, we have developed HoCoRT, a fast and user-friendly tool that implements several methods for optimised host sequence removal. We have evaluated the speed and accuracy of these methods. RESULTS HoCoRT is an open-source command-line tool for host contamination removal. It is designed to be easy to install and use, offering a one-step option for genome indexing. HoCoRT employs a variety of well-known mapping, classification, and alignment methods to classify reads. The user can select the underlying classification method and its parameters, allowing adaptation to different scenarios. Based on our investigation of various methods and parameters using synthetic human gut and oral microbiomes, and on assessment of publicly available data, we provide recommendations for typical datasets with short and long reads. CONCLUSIONS To decontaminate a human gut microbiome with short reads using HoCoRT, we found the optimal combination of speed and accuracy with BioBloom, Bowtie2 in end-to-end mode, and HISAT2. Kraken2 consistently demonstrated the highest speed, albeit with a trade-off in accuracy. The same applies to an oral microbiome, but here Bowtie2 was notably slower than the other tools. For long reads, the detection of human host reads is more difficult. In this case, a combination of Kraken2 and Minimap2 achieved the highest accuracy and detected 59% of human reads. In comparison to the dedicated DeconSeq tool, HoCoRT using Bowtie2 in end-to-end mode proved considerably faster and slightly more accurate. HoCoRT is available as a Bioconda package, and the source code can be accessed at https://github.com/ignasrum/hocort along with the documentation. It is released under the MIT licence and is compatible with Linux and macOS (except for the BioBloom module).
Collapse
Affiliation(s)
- Ignas Rumbavicius
- Centre for Bioinformatics, Department of Informatics, University of Oslo, PO Box 1080 Blindern, 0316, Oslo, Norway
| | - Trine B Rounge
- Centre for Bioinformatics, Department of Pharmacy, University of Oslo, PO Box 1068 Blindern, 0316, Oslo, Norway.
- Cancer Registry of Norway, PO Box 5313 Majorstuen, 0304, Oslo, Norway.
| | - Torbjørn Rognes
- Centre for Bioinformatics, Department of Informatics, University of Oslo, PO Box 1080 Blindern, 0316, Oslo, Norway.
- Department of Microbiology, Oslo University Hospital, PO Box 4950 Nydalen, 0424, Oslo, Norway.
| |
Collapse
|
48
|
Pusadkar V, Azad RK. Benchmarking Metagenomic Classifiers on Simulated Ancient and Modern Metagenomic Data. Microorganisms 2023; 11:2478. [PMID: 37894136 PMCID: PMC10609333 DOI: 10.3390/microorganisms11102478] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 09/28/2023] [Accepted: 09/29/2023] [Indexed: 10/29/2023] Open
Abstract
Taxonomic profiling of ancient metagenomic samples is challenging due to the accumulation of specific damage patterns on DNA over time. Although a number of methods for metagenome profiling have been developed, most of them have been assessed on modern metagenomes or simulated metagenomes mimicking modern metagenomes. Further, a comparative assessment of metagenome profilers on simulated metagenomes representing a spectrum of degradation depth, from the extremity of ancient (most degraded) to current or modern (not degraded) metagenomes, has not yet been performed. To understand the strengths and weaknesses of different metagenome profilers, we performed their comprehensive evaluation on simulated metagenomes representing human dental calculus microbiome, with the level of DNA damage successively raised to mimic modern to ancient metagenomes. All classes of profilers, namely, DNA-to-DNA, DNA-to-protein, and DNA-to-marker comparison-based profilers were evaluated on metagenomes with varying levels of damage simulating deamination, fragmentation, and contamination. Our results revealed that, compared to deamination and fragmentation, human and environmental contamination of ancient DNA (with modern DNA) has the most pronounced effect on the performance of each profiler. Further, the DNA-to-DNA (e.g., Kraken2, Bracken) and DNA-to-marker (e.g., MetaPhlAn4) based profiling approaches showed complementary strengths, which can be leveraged to elevate the state-of-the-art of ancient metagenome profiling.
Collapse
Affiliation(s)
- Vaidehi Pusadkar
- Department of Biological Sciences, University of North Texas, Denton, TX 76203, USA;
- BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA
| | - Rajeev K. Azad
- Department of Biological Sciences, University of North Texas, Denton, TX 76203, USA;
- BioDiscovery Institute, University of North Texas, Denton, TX 76203, USA
- Department of Mathematics, University of North Texas, Denton, TX 76203, USA
| |
Collapse
|
49
|
Wang B, Xu J, Wang Y, Stirling E, Zhao K, Lu C, Tan X, Kong D, Yan Q, He Z, Ruan Y, Ma B. Tackling Soil ARG-Carrying Pathogens with Global-Scale Metagenomics. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2023; 10:e2301980. [PMID: 37424042 PMCID: PMC10502870 DOI: 10.1002/advs.202301980] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 06/11/2023] [Indexed: 07/11/2023]
Abstract
Antibiotic overuse and the subsequent environmental contamination of residual antibiotics poses a public health crisis via an acceleration in the spread of antibiotic resistance genes (ARGs) through horizontal gene transfer. Although the occurrence, distribution, and driving factors of ARGs in soils have been widely investigated, little is known about the antibiotic resistance of soilborne pathogens at a global scale. To explore this gap, contigs from 1643 globally sourced metagnomes are assembled, yielding 407 ARG-carrying pathogens (APs) with at least one ARG; APs are detected in 1443 samples (sample detection rate of 87.8%). The richness of APs is greater in agricultural soils (with a median of 20) than in non-agricultural ecosystems. Agricultural soils possess a high prevalence of clinical APs affiliated with Escherichia, Enterobacter, Streptococcus, and Enterococcus. The APs detected in agricultural soils tend to coexist with multidrug resistance genes and bacA. A global map of soil AP richness is generated, where anthropogenic and climatic factors explained AP hot spots in East Asia, South Asia, and the eastern United States. The results herein advance this understanding of the global distribution of soil APs and determine regions prioritized to control soilborne APs worldwide.
Collapse
Affiliation(s)
- Binhao Wang
- Zhejiang Provincial Key Laboratory of Agricultural Resources and EnvironmentInstitute of Soil and Water Resources and Environmental ScienceCollege of Environmental and Resource SciencesZhejiang UniversityHangzhou310058P. R. China
| | - Jianming Xu
- Zhejiang Provincial Key Laboratory of Agricultural Resources and EnvironmentInstitute of Soil and Water Resources and Environmental ScienceCollege of Environmental and Resource SciencesZhejiang UniversityHangzhou310058P. R. China
| | - Yiling Wang
- Zhejiang Provincial Key Laboratory of Agricultural Resources and EnvironmentInstitute of Soil and Water Resources and Environmental ScienceCollege of Environmental and Resource SciencesZhejiang UniversityHangzhou310058P. R. China
- Hangzhou Global Scientific and Technological Innovation CenterZhejiang UniversityHangzhou310058P. R. China
| | - Erinne Stirling
- Agriculture and FoodCommonwealth Scientific and Industrial Research OrganizationAdelaide5064Australia
- School of Biological SciencesThe University of AdelaideAdelaide5005Australia
| | - Kankan Zhao
- Zhejiang Provincial Key Laboratory of Agricultural Resources and EnvironmentInstitute of Soil and Water Resources and Environmental ScienceCollege of Environmental and Resource SciencesZhejiang UniversityHangzhou310058P. R. China
- Hangzhou Global Scientific and Technological Innovation CenterZhejiang UniversityHangzhou310058P. R. China
| | - Caiyu Lu
- Zhejiang Provincial Key Laboratory of Agricultural Resources and EnvironmentInstitute of Soil and Water Resources and Environmental ScienceCollege of Environmental and Resource SciencesZhejiang UniversityHangzhou310058P. R. China
- Hangzhou Global Scientific and Technological Innovation CenterZhejiang UniversityHangzhou310058P. R. China
| | - Xiangfeng Tan
- Institute of Digital AgricultureZhejiang Academy of Agricultural SciencesHangzhou310021P. R. China
- Xianghu LaboratoryHangzhouZhejiang311200P. R. China
| | - Dedong Kong
- Institute of Digital AgricultureZhejiang Academy of Agricultural SciencesHangzhou310021P. R. China
- Xianghu LaboratoryHangzhouZhejiang311200P. R. China
| | - Qingyun Yan
- Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai)Zhuhai519080P. R. China
| | - Zhili He
- Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai)Zhuhai519080P. R. China
| | - Yunjie Ruan
- Institute of Agricultural Bio‐Environmental EngineeringCollege of Bio‐SystemsEngineering and Food ScienceZhejiang UniversityHangzhou310058P. R. China
- The Rural Development AcademyZhejiang UniversityHangzhou310058P. R. China
| | - Bin Ma
- Zhejiang Provincial Key Laboratory of Agricultural Resources and EnvironmentInstitute of Soil and Water Resources and Environmental ScienceCollege of Environmental and Resource SciencesZhejiang UniversityHangzhou310058P. R. China
- Hangzhou Global Scientific and Technological Innovation CenterZhejiang UniversityHangzhou310058P. R. China
| |
Collapse
|
50
|
Keenum I, Player R, Kralj J, Servetas S, Sussman MD, Russell JA, Stone J, Chandrapati S, Sozhamannan S. Amplicon Sequencing Minimal Information (ASqMI): Quality and Reporting Guidelines for Actionable Calls in Biodefense Applications. J AOAC Int 2023; 106:1424-1430. [PMID: 37067472 PMCID: PMC10472743 DOI: 10.1093/jaoacint/qsad047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2023] [Revised: 03/30/2023] [Accepted: 04/07/2023] [Indexed: 04/18/2023]
Abstract
BACKGROUND Accurate, high-confidence data is critical for assessing potential biothreat incidents. In a biothreat event, false-negative and -positive results have serious consequences. Worst case scenarios can result in unnecessary shutdowns or fatalities at an exorbitant monetary and psychological cost, respectively. Quantitative PCR assays for agents of interest have been successfully used for routine biosurveillance. Recently, there has been increased impetus for adoption of amplicon sequencing (AS) for biosurveillance because it enables discrimination of true positives from near-neighbor false positives, as well as broad, simultaneous detection of many targets in many pathogens in a high-throughput scheme. However, the high sensitivity of AS can lead to false positives. Appropriate controls and workflow reporting can help address these challenges. OBJECTIVES Data reporting standards are critical to data trustworthiness. The standards presented herein aim to provide a framework for method quality assessment in biodetection. METHODS We present a set of standards, Amplicon Sequencing Minimal Information (ASqMI), developed under the auspices of the AOAC INTERNATIONAL Stakeholder Program on Agent Detection Assays for making actionable calls in biosurveillance applications. In addition to the first minimum information guidelines for AS, we provide a controls checklist and scoring scheme to assure AS run quality and assess potential sample contamination. RESULTS Adoption of the ASqMI guidelines will improve data quality, help track workflow performance, and ultimately provide decision makers confidence to trust the results of this new and powerful technology. CONCLUSION AS workflows can provide robust, confident calls for biodetection; however, due diligence in reporting and controls are needed. The ASqMI guideline is the first AS minimum reporting guidance document that also provides the means for end users to evaluate their workflows to improve confidence. HIGHLIGHTS Standardized reporting guidance for actionable calls is critical to ensuring trustworthy data.
Collapse
Affiliation(s)
- Ishi Keenum
- National Institute of Standards and Technology, Biosystems and Biomaterials Division, Complex Microbial Systems Group, Gaithersburg, MD 20899, USA
| | - Robert Player
- The Johns Hopkins University, Applied Physics Laboratory, Laurel, MD 20723, USA
- Datirium, LLC, Cincinnati, OH 45526, USA
| | - Jason Kralj
- National Institute of Standards and Technology, Biosystems and Biomaterials Division, Complex Microbial Systems Group, Gaithersburg, MD 20899, USA
| | - Stephanie Servetas
- National Institute of Standards and Technology, Biosystems and Biomaterials Division, Complex Microbial Systems Group, Gaithersburg, MD 20899, USA
| | - Michael D Sussman
- US Department of Agriculture, Agricultural Analytics Division, Livestock and Poultry Programs, Agricultural Marketing Service, Washington, DC 20250 USA
| | | | | | | | - Shanmuga Sozhamannan
- Joint Program Executive Office for Chemical, Biological, Radiological and Nuclear Defense (JPEO-CBRND), Joint Project Lead for CBRND Enabling Biotechnologies (JPL CBRND EB), Frederick, MD 21702, USA
- Joint Research and Development, Inc., Stafford, VA 22556, USA
| |
Collapse
|