1
|
Jia X, Hong L, Wang Y, Zhang Q, Wang Y, Jia M, Luo Y, Wang T, Ye J, Wang H. Effect of microbial diversity and their functions on soil nutrient cycling in the rhizosphere zone of Dahongpao mother tree and cutting Dahongpao. FRONTIERS IN PLANT SCIENCE 2025; 16:1574020. [PMID: 40406725 PMCID: PMC12095365 DOI: 10.3389/fpls.2025.1574020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/10/2025] [Accepted: 04/08/2025] [Indexed: 05/26/2025]
Abstract
Dahongpao mother tree (Camellia sinensis) is nearly 400 years old and is the symbol of Wuyi rock tea. It is unclear whether the structure and function of the rhizosphere soil microbial community of Dahongpao mother tree (MD) and its cutting Dahongpao (PD) change after planting. In this study, macrogenomics was used to analyze the structure and function of rhizosphere soil microbial communities, as well as to explore their relationship with soil nutrient transformations in MD and PD tea trees. The results showed that pH, total nitrogen, total phosphorus, total potassium, available nitrogen, available phosphorus and available potassium were significantly higher in the rhizosphere soil of MD than in PD by 1.22, 3.24, 5.38, 1.10, 1.52, 4.42 and 1.17 times, respectively. Secondly, soil urease, sucrase, protease, cellulase and catalase activities were also significantly higher in MD than in PD by 1.25-, 2.95-, 1.14-, 1.23-, and 1.30-fold. Macrogenomic analysis showed that rhizosphere soil microbial richness and diversity were higher in MD than in PD. There were eight characteristic microorganisms that significantly differed between MD and PD rhizosphere soils, and the results of functional analysis showed that MD rhizosphere soil microorganisms had higher carbon, nitrogen, and phosphorus biotransformation capacity, were more conducive to the accumulation and release of nutrients in the soil, and were more conducive to the promotion of tea tree growth. The results of PLS-SEM equation analysis showed that characteristic microorganisms positively regulated soil microbial function (1.00**), enzyme activity (0.84*) and nutrient content (0.82*). It can be seen that the abundance of soil characteristic microorganisms in the rhizospehre soil of MD increased significantly compared with that of PD, prompting a significant enhancement of their corresponding functions, which was more conducive to soil improvement, increased soil enzyme activity, enhanced soil nutrient biotransformation, and then increased soil nutrient accumulation and effectiveness, and promoted the growth of tea trees. This study provides an important theoretical basis for microbial regulation of tea tree cuttings management.
Collapse
Affiliation(s)
- Xiaoli Jia
- College of Tea and Food Science, Wuyi University, Wuyishan, China
| | - Lei Hong
- College of Life Science, Longyan University, Longyan, China
- College of JunCao Science and Ecology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Yulin Wang
- College of Life Science, Longyan University, Longyan, China
| | - Qi Zhang
- College of Tea and Food Science, Wuyi University, Wuyishan, China
| | - Yuhua Wang
- College of JunCao Science and Ecology, Fujian Agriculture and Forestry University, Fuzhou, China
| | - Miao Jia
- College of Tea and Food Science, Wuyi University, Wuyishan, China
| | - Yangxin Luo
- College of Life Science, Longyan University, Longyan, China
| | - Tingting Wang
- College of Life Science, Longyan University, Longyan, China
| | - Jianghua Ye
- College of Tea and Food Science, Wuyi University, Wuyishan, China
| | - Haibin Wang
- College of Tea and Food Science, Wuyi University, Wuyishan, China
- College of Life Science, Longyan University, Longyan, China
| |
Collapse
|
2
|
Liang H, Zou Y, Wang M, Hu T, Wang H, He W, Ju Y, Guo R, Chen J, Guo F, Zeng T, Dong Y, Zhang Y, Wang B, Liu C, Jin X, Zhang W, Xu X, Xiao L. Efficiently constructing complete genomes with CycloneSEQ to fill gaps in bacterial draft assemblies. GIGABYTE 2025; 2025:gigabyte154. [PMID: 40329937 PMCID: PMC12051259 DOI: 10.46471/gigabyte.154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Accepted: 04/22/2025] [Indexed: 05/08/2025] Open
Abstract
Current microbial sequencing relies on short-read platforms like Illumina and DNBSEQ, which are cost-effective and accurate but often produce fragmented draft genomes. Here, we used CycloneSEQ for long-read sequencing of ATCC BAA-835, producing long-reads with an average length of 11.6 kbp and an average quality score of 14.4. Hybrid assembly with short-reads data resulted in an error rate of only 0.04 mismatches and 0.08 indels per 100 kbp compared to the reference genome. This method, validated across nine species, successfully assembled complete circular genomes. Hybrid assembly significantly enhances genome completeness by using long-reads to fill gaps and accurately assembling multi-copy rRNA genes, unlike short-reads alone. Data subsampling showed that combining over 500 Mbp of short-read data with 100 Mbp of long-read data yields high-quality circular assemblies. CycloneSEQ long-reads improves the assembly of circular complete genomes from mixed microbial communities; however, its base quality needs improving. Integrating DNBSEQ short-reads improved accuracy, resulting in complete and accurate assemblies.
Collapse
Affiliation(s)
- Hewei Liang
- BGI Research, Shenzhen 518083, China
- BGI Research, Wuhan 430074, China
- Shenzhen Engineering Laboratory of Detection and Intervention of Human Intestinal Microbiome, BGI Research, Shenzhen 518083, China
| | - Yuanqiang Zou
- Shenzhen Engineering Laboratory of Detection and Intervention of Human Intestinal Microbiome, BGI Research, Shenzhen 518083, China
- State Key Laboratory of Genome and Multi-omics Technologies, BGI Research, Shenzhen 518083, China
| | - Mengmeng Wang
- BGI Research, Shenzhen 518083, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Tongyuan Hu
- BGI Research, Shenzhen 518083, China
- BGI Research, Wuhan 430074, China
| | - Haoyu Wang
- BGI Research, Shenzhen 518083, China
- College of Life Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
| | - Wenxin He
- BGI Research, Shenzhen 518083, China
| | | | | | - Junyi Chen
- BGI Research, Shenzhen 518083, China
- BGI Hangzhou CycloneSEQ Technology Co., Ltd, Hangzhou 310030, China
| | - Fei Guo
- BGI Research, Shenzhen 518083, China
- BGI Hangzhou CycloneSEQ Technology Co., Ltd, Hangzhou 310030, China
| | - Tao Zeng
- BGI Research, Shenzhen 518083, China
- BGI Hangzhou CycloneSEQ Technology Co., Ltd, Hangzhou 310030, China
| | - Yuliang Dong
- BGI Research, Shenzhen 518083, China
- BGI Hangzhou CycloneSEQ Technology Co., Ltd, Hangzhou 310030, China
| | - Yuning Zhang
- BGI Research, Shenzhen 518083, China
- BGI Hangzhou CycloneSEQ Technology Co., Ltd, Hangzhou 310030, China
| | - Bo Wang
- State Key Laboratory of Genome and Multi-omics Technologies, BGI Research, Shenzhen 518083, China
- China National GeneBank, BGI Research, Shenzhen 518120, China
- Shenzhen Key Laboratory of Environmental Microbial Genomics and Application, BGI Research, Shenzhen 518083, China
| | | | - Xin Jin
- BGI Research, Shenzhen 518083, China
| | | | - Xun Xu
- State Key Laboratory of Genome and Multi-omics Technologies, BGI Research, Shenzhen 518083, China
| | - Liang Xiao
- Shenzhen Engineering Laboratory of Detection and Intervention of Human Intestinal Microbiome, BGI Research, Shenzhen 518083, China
- State Key Laboratory of Genome and Multi-omics Technologies, BGI Research, Shenzhen 518083, China
| |
Collapse
|
3
|
Shen C, Wedell E, Pop M, Warnow T. TIPP3 and TIPP3-fast: Improved abundance profiling in metagenomics. PLoS Comput Biol 2025; 21:e1012593. [PMID: 40184383 PMCID: PMC11970662 DOI: 10.1371/journal.pcbi.1012593] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2024] [Accepted: 02/26/2025] [Indexed: 04/06/2025] Open
Abstract
We present TIPP3 and TIPP3-fast, new tools for abundance profiling in metagenomic datasets. Like its predecessor, TIPP2, the TIPP3 pipeline uses a maximum likelihood approach to place reads into labeled taxonomies using marker genes, but it achieves superior accuracy to TIPP2 by enabling the use of much larger taxonomies through improved algorithmic techniques. We show that TIPP3 is generally more accurate than leading methods for abundance profiling in two important contexts: when reads come from genomes not already in a public database (i.e., novel genomes) and when reads contain sequencing errors. We also show that TIPP3-fast has slightly lower accuracy than TIPP3, but is also generally more accurate than other leading methods and uses a small fraction of TIPP3's runtime. Additionally, we highlight the potential benefits of restricting abundance profiling methods to those reads that map to marker genes (i.e., using a filtered marker-gene based analysis), which we show typically improves accuracy. TIPP3 is freely available at https://github.com/c5shen/TIPP3.
Collapse
Affiliation(s)
- Chengze Shen
- Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
| | - Eleanor Wedell
- Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
| | - Mihai Pop
- Department of Computer Science, University of Maryland at College Park, College Park, Maryland, United States of America
| | - Tandy Warnow
- Siebel School of Computing and Data Science, University of Illinois Urbana-Champaign, Urbana, Illinois, United States of America
| |
Collapse
|
4
|
de Campos GM, Clemente LG, Lima ARJ, Cella E, Fonseca V, Ximenez JPB, Nishiyama MY, de Carvalho E, Sampaio SC, Giovanetti M, Elias MC, Slavov SN. Anellovirus abundance as an indicator for viral metagenomic classifier utility in plasma samples. Virol J 2025; 22:88. [PMID: 40148934 PMCID: PMC11951539 DOI: 10.1186/s12985-025-02708-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2025] [Accepted: 03/13/2025] [Indexed: 03/29/2025] Open
Abstract
BACKGROUND Viral metagenomics has expanded significantly in recent years due to advancements in next-generation sequencing, establishing it as the leading method for identifying emerging viruses. A crucial step in metagenomics is taxonomic classification, where sequence data is assigned to specific taxa, thereby enabling the characterization of species composition within a sample. Various taxonomic classifiers have been developed in recent years, each employing distinct classification approaches that produce varying results and abundance profiles, even when analyzing the same sample. METHODS In this study, we propose using the identification of Torque Teno Viruses (TTVs), from the Anelloviridae family, as indicators to evaluate the performance of four short-read-based metagenomic classifiers: Kraken2, Kaiju, CLARK and DIAMOND, when evaluating human plasma samples. RESULTS Our results show that each classifier assigns TTV species at different abundance levels, potentially influencing the interpretation of diversity within samples. Specifically, nucleotide-based classifiers tend to detect a broader range of TTV species, indicating higher sensitivity, while amino acid-based classifiers like DIAMOND and CLARK display lower abundance indices. Interestingly, despite employing different algorithms and data types (protein-based vs. nucleotide-based), Kaiju and Kraken2 performed similarly. CONCLUSION Our study underscores the critical impact of classifier selection on diversity indices in metagenomic analyses. Kaiju effectively assigned a wide variety of TTV species, demonstrating it did not require a high volume of reads to capture diversity. Nucleotide-based classifiers like CLARK and Kraken2 showed superior sensitivity, which is valuable for detecting emerging or rare viruses. At the same time, protein-based approaches such as DIAMOND and Kaiju proved robust for identifying known species with low variability.
Collapse
Affiliation(s)
- Gabriel Montenegro de Campos
- Programa de Pós-graduação em Oncologia Clínica, Células-Tronco e Terapia Celular, Faculdade de Medicina de Ribeirão Preto, Universidade de São Paulo, Ribeirão Prêto, Brazil
| | - Luan Gaspar Clemente
- Escola Superior de Agricultura Luiz de Queiroz, Departamento de Zootecnia, Universidade de São Paulo, Piracicaba, Brazil
| | | | - Eleonora Cella
- Burnett School of Medical Sciences, College of Medicine, University of Central Florida, Orlando, FL, USA
| | - Vagner Fonseca
- Departamento de Ciências Exatas e Terra, Universidade Estadual da Bahia, Salvador, Brazil
- Centre for Epidemic Response and Innovation (CERI), School of Data Science and Computational Thinking, Stellenbosch University, Stellenbosch, South Africa
| | - João Paulo Bianchi Ximenez
- Departamento de Análises Clínicas, Toxicológicas e Bromatológicas, Faculdade de Ciências Farmacêuticas de Ribeirão Preto, Universidade de São Paulo, Ribeirão Prêto, Brazil
| | | | | | - Sandra Coccuzzo Sampaio
- Centro de Vigilância Viral e Avaliação Sorológica- CeVIVas, Instituto Butantan, São Paulo, Brazil
| | - Marta Giovanetti
- Department of Science and Technologies for Sustainable Development and One Health, Università Campus Bio-Medico di Roma, Rome, Italy
- Instituto Rene Rachou, Fundação Oswaldo Cruz-FIOCRUZ, Belo Horizonte, Brazil
| | - Maria Carolina Elias
- Centro de Vigilância Viral e Avaliação Sorológica- CeVIVas, Instituto Butantan, São Paulo, Brazil
| | - Svetoslav Nanev Slavov
- Centro de Vigilância Viral e Avaliação Sorológica- CeVIVas, Instituto Butantan, São Paulo, Brazil.
| |
Collapse
|
5
|
Cheng W, Wang Y, Wang Y, Hong L, Qiu M, Luo Y, Zhang Q, Wang T, Jia X, Wang H, Ye J. Aerospace Mutagenized Tea Tree Increases Rhizospheric Microorganisms, Enhances Nutrient Conversion Capacity and Promotes Growth. PLANTS (BASEL, SWITZERLAND) 2025; 14:981. [PMID: 40219049 PMCID: PMC11990241 DOI: 10.3390/plants14070981] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2025] [Revised: 03/18/2025] [Accepted: 03/19/2025] [Indexed: 04/14/2025]
Abstract
The utilization of aerospace mutagenesis in plant breeding is a novel, efficient technology. This study investigates the effects of aerospace mutagenesis on tea tree growth, soil nutrient conversion, and soil microbial community structure and function. The results showed that aerospace mutagenized tea trees showed increased leaf area, 100-bud weight, and yield. The rhizosphere soil of mutagenized tea tree displayed an increase in microorganisms, enhanced carbon and nitrogen cycling capacity, and significant increases in nutrient conversion and antioxidant enzyme activities. In addition, the content of available nutrients was also increased. Aerospace mutagenesis showed an increase in the abundance of soil-characteristic microorganisms (Solirubrobacterales bacterium, Capillimicrobium parvum, Mycobacterium colombiense, Mycobacterium rhizamassiliense, and Conexibacter woesei), and enhancement of the intensity of metabolic pathways, glyoxylate and dicarboxylate metabolism, biosynthesis of secondary metabolites, microbial metabolism in diverse environments, carbon metabolism, fatty acid metabolism, carbon metabolism, biosynthesis of amino acids, and biosynthesis of cofactors of soil microorganisms. Interaction network and partial least squares structural equation modeling (PLS-SEM) equation analysis showed that after aerospace mutagenesis, soil-characteristic microorganisms positively affected soil microbial functions, soil microbial biomass carbon and nitrogen, respiration intensity, and soil enzyme activities; furthermore, it improved available nutrient content and tea tree growth. This study provides an important reference for the cultivation and management of aerospace mutagenized tea trees and microbial regulation of tea tree growth.
Collapse
Affiliation(s)
- Weiting Cheng
- College of Horticulture, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- College of Tea and Food, Wuyi University, Wuyishan 354300, China
| | - Yulin Wang
- College of Life Science, Longyan University, Longyan 364012, China
| | - Yuhua Wang
- College of Horticulture, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Lei Hong
- College of Horticulture, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Miaoen Qiu
- College of Horticulture, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Yangxin Luo
- College of Life Science, Longyan University, Longyan 364012, China
| | - Qi Zhang
- College of Tea and Food, Wuyi University, Wuyishan 354300, China
| | - Tingting Wang
- College of Life Science, Longyan University, Longyan 364012, China
| | - Xiaoli Jia
- College of Tea and Food, Wuyi University, Wuyishan 354300, China
| | - Haibin Wang
- College of Life Science, Longyan University, Longyan 364012, China
| | - Jianghua Ye
- College of Horticulture, Fujian Agriculture and Forestry University, Fuzhou 350002, China
- College of Tea and Food, Wuyi University, Wuyishan 354300, China
| |
Collapse
|
6
|
Ergunay K, Bourke BP, Linton YM. Exploring the potential of tick transcriptomes for virus screening: A data reuse approach for tick-borne virus surveillance. PLoS Negl Trop Dis 2025; 19:e0012907. [PMID: 40048471 PMCID: PMC11922208 DOI: 10.1371/journal.pntd.0012907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Revised: 03/19/2025] [Accepted: 02/11/2025] [Indexed: 03/20/2025] Open
Abstract
BACKGROUND We set out to investigate the utility of publicly available tick transcriptomic data to identify and characterize known and recently described tick-borne viruses, using de novo assembly and subsequent protein database alignment and taxonomical binning. METHODOLOGY/PRINCIPAL FINDINGS A total of 127 virus contigs were recovered from 35 transcriptomes, originating from cell lines (40%), colony-reared ticks (25.7%) or field-collected ticks (34.2%). Generated virus contigs encompass DNA (n = 2) and RNA (n = 13) virus families, with 3 and 28 taxonomically distinct isolates, respectively. Known human and animal pathogens comprise 32.8% of the contigs, where Beiji nairovirus (BJNV) was the most prevalent tick-borne pathogenic virus, identified in 22.8% of the transcriptomes. Other pathogens included Nuomin virus (NUMV) (2.8%), African swine fever virus (ASFV) (5.7%), African horse sickness virus 3 (AHSV-3) (2.8%) and Alongshan virus (ALSV) (2.8%). CONCLUSIONS Previously generated transcriptome data can be leveraged for detecting tick-borne viruses, as exemplified by new descriptions of ALSV and BJNV in new geographic locations and other viruses previously detailed in screening reports. Monitoring pathogens using publicly available data might facilitate biosurveillance by directing efforts to regions of preliminary spillover and identifying targets for screening. Metadata availability is crucial for further assessments of detections.
Collapse
Affiliation(s)
- Koray Ergunay
- Walter Reed Biosystematics Unit (WRBU), Smithsonian Institution, Museum Support Center, Suitland, Maryland, United States of America
- One Health Branch, Walter Reed Army Institute of Research (WRAIR), Silver Spring, Maryland, United States of America
- Department of Entomology, Smithsonian Institution–National Museum of Natural History (NMNH), Washington, DC, United States of America
| | - Brian P. Bourke
- Walter Reed Biosystematics Unit (WRBU), Smithsonian Institution, Museum Support Center, Suitland, Maryland, United States of America
- One Health Branch, Walter Reed Army Institute of Research (WRAIR), Silver Spring, Maryland, United States of America
- Department of Entomology, Smithsonian Institution–National Museum of Natural History (NMNH), Washington, DC, United States of America
| | - Yvonne-Marie Linton
- Walter Reed Biosystematics Unit (WRBU), Smithsonian Institution, Museum Support Center, Suitland, Maryland, United States of America
- One Health Branch, Walter Reed Army Institute of Research (WRAIR), Silver Spring, Maryland, United States of America
- Department of Entomology, Smithsonian Institution–National Museum of Natural History (NMNH), Washington, DC, United States of America
| |
Collapse
|
7
|
Chen X, Yin X, Xu X, Zhang T. Species-resolved profiling of antibiotic resistance genes in complex metagenomes through long-read overlapping with Argo. Nat Commun 2025; 16:1744. [PMID: 39966439 PMCID: PMC11836353 DOI: 10.1038/s41467-025-57088-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2024] [Accepted: 02/11/2025] [Indexed: 02/20/2025] Open
Abstract
Environmental surveillance of antibiotic resistance genes (ARGs) is critical for understanding and mitigating the spread of antimicrobial resistance. Current short-read-based ARG profiling methods are limited in their ability to provide detailed host information, which is indispensable for tracking the transmission and assessing the risk of ARGs. Here, we present Argo, a novel approach that leverages long-read overlapping to rapidly identify and quantify ARGs in complex environmental metagenomes at the species level. Argo significantly enhances the resolution of ARG detection by assigning taxonomic labels collectively to clusters of reads, rather than to individual reads. By benchmarking the performance in host identification using simulation, we confirm the advantage of long-read overlapping over existing metagenomic profiling strategies in terms of accuracy. Using sequenced mock communities with varying quality scores and read lengths, along with a global fecal dataset comprising 329 human and non-human primate samples, we demonstrate Argo's capability to deliver comprehensive and species-resolved ARG profiles in real settings.
Collapse
Affiliation(s)
- Xi Chen
- Environmental Microbiome Engineering and Biotechnology Lab, Department of Civil Engineering, The University of Hong Kong, Hong Kong SAR, China
| | - Xiaole Yin
- Environmental Microbiome Engineering and Biotechnology Lab, Department of Civil Engineering, The University of Hong Kong, Hong Kong SAR, China
| | - Xiaoqing Xu
- Environmental Microbiome Engineering and Biotechnology Lab, Department of Civil Engineering, The University of Hong Kong, Hong Kong SAR, China
| | - Tong Zhang
- Environmental Microbiome Engineering and Biotechnology Lab, Department of Civil Engineering, The University of Hong Kong, Hong Kong SAR, China.
- School of Public Health, The University of Hong Kong, Hong Kong SAR, China.
- Macau Institute for Applied Research in Medicine and Health, Macau University of Science and Technology, Macao SAR, China.
- State Key Laboratory of Marine Pollution, City University of Hong Kong, Hong Kong SAR, China.
- Shenzhen Institute of Research and Innovation, The University of Hong Kong, Shenzhen, China.
| |
Collapse
|
8
|
Davidson IM, Nikbakht E, Haupt LM, Ashton KJ, Dunn PJ. Methodological approaches in 16S sequencing of female reproductive tract in fertility patients: a review. J Assist Reprod Genet 2025; 42:15-37. [PMID: 39433639 PMCID: PMC11805751 DOI: 10.1007/s10815-024-03292-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2024] [Accepted: 10/07/2024] [Indexed: 10/23/2024] Open
Abstract
BACKGROUND The female genital tract microbiome has become a particular area of interest in improving assisted reproductive technology (ART) outcomes with the emergence of next-generation sequencing (NGS) technology. However, NGS assessment of microbiomes currently lacks uniformity and poses significant challenges for accurate and precise bacterial population representation. OBJECTIVE As multiple NGS platforms and assays have been developed in recent years for microbiome investigation-including the advent of long-read sequencing technologies-this work aimed to identify current trends and practices undertaken in female genital tract microbiome investigations. RESULTS Areas like sample collection and transport, DNA extraction, 16S amplification vs. metagenomics, NGS library preparation, and bioinformatic analysis demonstrated a detrimental lack of uniformity. The lack of uniformity present is a significant limitation characterised by gap discrepancies in generation and interpretation of results. Minimal consistency was observed in primer design, DNA extraction techniques, sample transport, and bioinformatic analyses. CONCLUSION With third-generation sequencing technology highlighted as a promising tool in microbiota-based research via full-length 16S rRNA sequencing, there is a desperate need for future studies to investigate and optimise methodological approaches of the genital tract microbiome to ensure better uniformity of methods and results interpretation to improve clinical impact.
Collapse
Affiliation(s)
- I M Davidson
- Health Sciences & Medicine, Bond University, Gold Coast, Australia
| | - E Nikbakht
- Health Sciences & Medicine, Bond University, Gold Coast, Australia
| | - L M Haupt
- Stem Cell and Neurogenesis Group, Genomics Research Centre, Centre for Genomics and Personalised Health, School of Biomedical Sciences, Queensland University of Technology (QUT), 60 Musk Ave., Kelvin Grove, Brisbane, QLD, 4059, Australia
- Centre for Biomedical Technologies, Queensland University of Technology (QUT), 60 Musk Ave., Kelvin Grove, Brisbane, QLD, 4059, Australia
- ARC Training Centre for Cell and Tissue Engineering Technologies, Queensland University of Technology (QUT), Brisbane, Australia
- Max Planck Queensland Centre for the Materials Sciences of Extracellular Matrices, Queensland University of Technology (QUT), Brisbane, Australia
| | - K J Ashton
- Health Sciences & Medicine, Bond University, Gold Coast, Australia
| | - P J Dunn
- Health Sciences & Medicine, Bond University, Gold Coast, Australia.
| |
Collapse
|
9
|
Puller V, Plaza Oñate F, Prifti E, de Lahondès R. Impact of simulation and reference catalogues on the evaluation of taxonomic profiling pipelines. Microb Genom 2025; 11:001330. [PMID: 39804694 PMCID: PMC11728698 DOI: 10.1099/mgen.0.001330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2024] [Accepted: 11/06/2024] [Indexed: 01/16/2025] Open
Abstract
Microbiome profiling tools rely on reference catalogues, which significantly affect their performance. Comparing them is, however, challenging, mainly due to differences in their native catalogues. In this study, we present a novel standardized benchmarking framework that makes such comparisons more accurate. We decided not to customize databases but to translate results to a common reference to use the tools with their native environment. Specifically, we conducted two realistic simulations of gut microbiome samples, each based on a specific taxonomic profiler, and used two different taxonomic references to project their results, namely the Genome Taxonomy Database and the Unified Human Gastrointestinal Genome. To demonstrate the importance of using such a framework, we evaluated four established profilers as well as the impact of the simulations and that of the common taxonomic references on the perceived performance of these profilers. Finally, we provide guidelines to enhance future profiler comparisons for human microbiome ecosystems: (i) use or create realistic simulations tailored to your biological context (BC), (ii) identify a common feature space suited to your BC and independent of the catalogues used by the profilers and (iii) apply a comprehensive set of metrics covering accuracy (sensitivity/precision), overall representativity (richness/Shannon) and quantification (UniFrac and/or Aitchison distance).
Collapse
Affiliation(s)
- Vadim Puller
- GMT Science 75 route de Lyons-La-Foret, Rouen F-76000, France
| | | | - Edi Prifti
- IRD, Sorbonne Université, Unité de Modélisation Mathématique et Informatique des Systèmes Complexes, UMMISCO, 32 Avenue Henri Varagnat, Bondy F-93143, France
- Sorbonne Université, INSERM, Nutrition et Obesities; Systemic Approaches, NutriOmique, AP-HP, Hôpital Pitié-Salpêtrière, 91 Boulevard de l’Hôpital, Paris F-75013, France
| | | |
Collapse
|
10
|
Duan H(N, Hearne G, Polikar R, Rosen GL. The Naïve Bayes classifier++ for metagenomic taxonomic classification-query evaluation. Bioinformatics 2024; 41:btae743. [PMID: 39700412 PMCID: PMC11729721 DOI: 10.1093/bioinformatics/btae743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Revised: 11/26/2024] [Accepted: 12/16/2024] [Indexed: 12/21/2024] Open
Abstract
MOTIVATION This study examines the query performance of the NBC++ (Incremental Naive Bayes Classifier) program for variations in canonicality, k-mer size, databases, and input sample data size. We demonstrate that both NBC++ and Kraken2 are influenced by database depth, with macro measures improving as depth increases. However, fully capturing the diversity of life, especially viruses, remains a challenge. RESULTS NBC++ can competitively profile the superkingdom content of metagenomic samples using a small training database. NBC++ spends less time training and can use a fraction of the memory than Kraken2 but at the cost of long querying time. Major NBC++ enhancements include accommodating canonical k-mer storage (leading to significant storage savings) and adaptable and optimized memory allocation that accelerates query analysis and enables the software to be run on nearly any system. Additionally, the output now includes log-likelihood values for each training genome, providing users with valuable confidence information. AVAILABILITY AND IMPLEMENTATION Source code and Dockerfile are available at http://github.com/EESI/Naive_Bayes.
Collapse
Affiliation(s)
- Haozhe (Neil) Duan
- Ecological and Evolutionary Signal Processing and Informatics (EESI) Laboratory, Drexel University, Philadelphia, PA 19104, United States
| | - Gavin Hearne
- Ecological and Evolutionary Signal Processing and Informatics (EESI) Laboratory, Drexel University, Philadelphia, PA 19104, United States
| | - Robi Polikar
- Signal Processing and Pattern Recognition Laboratory, Electrical and Computer Engineering, Rowan University, Glassboro, NJ 08018, United States
| | - Gail L Rosen
- Ecological and Evolutionary Signal Processing and Informatics (EESI) Laboratory, Drexel University, Philadelphia, PA 19104, United States
| |
Collapse
|
11
|
Han Y, He J, Li M, Peng Y, Jiang H, Zhao J, Li Y, Deng F. Unlocking the Potential of Metagenomics with the PacBio High-Fidelity Sequencing Technology. Microorganisms 2024; 12:2482. [PMID: 39770685 PMCID: PMC11728442 DOI: 10.3390/microorganisms12122482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 11/28/2024] [Accepted: 11/29/2024] [Indexed: 01/16/2025] Open
Abstract
Traditional methods for studying microbial communities have been limited due to difficulties in culturing and sequencing all microbial species. Recent advances in third-generation sequencing technologies, particularly PacBio's high-fidelity (HiFi) sequencing, have significantly advanced metagenomics by providing accurate long-read sequences. This review explores the role of HiFi sequencing in overcoming the limitations of previous sequencing methods, including high error rates and fragmented assemblies. We discuss the benefits and applications of HiFi sequencing across various environments, such as the human gut and soil, which provides broader context for further exploration. Key studies are discussed to highlight HiFi sequencing's ability to recover complete and coherent microbial genomes from complex microbiomes, showcasing its superior accuracy and continuity compared to other sequencing technologies. Additionally, we explore the potential applications of HiFi sequencing in quantitative microbial analysis, as well as the detection of single nucleotide variations (SNVs) and structural variations (SVs). PacBio HiFi sequencing is establishing a new benchmark in metagenomics, with the potential to significantly enhance our understanding of microbial ecology and drive forward advancements in both environmental and clinical applications.
Collapse
Affiliation(s)
- Yanhua Han
- Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, College of Life Science and Engineering, Foshan University, Foshan 528225, China; (Y.H.); (J.H.); (M.L.); (H.J.); (Y.L.)
- School of Life Science and Engineering, Foshan University, Foshan 528225, China
| | - Jinling He
- Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, College of Life Science and Engineering, Foshan University, Foshan 528225, China; (Y.H.); (J.H.); (M.L.); (H.J.); (Y.L.)
- School of Life Science and Engineering, Foshan University, Foshan 528225, China
| | - Minghui Li
- Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, College of Life Science and Engineering, Foshan University, Foshan 528225, China; (Y.H.); (J.H.); (M.L.); (H.J.); (Y.L.)
- School of Life Science and Engineering, Foshan University, Foshan 528225, China
| | - Yunjuan Peng
- College of Animal Science, South China Agricultural University, Guangzhou 510642, China; (Y.P.); (J.Z.)
| | - Hui Jiang
- Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, College of Life Science and Engineering, Foshan University, Foshan 528225, China; (Y.H.); (J.H.); (M.L.); (H.J.); (Y.L.)
- School of Life Science and Engineering, Foshan University, Foshan 528225, China
| | - Jiangchao Zhao
- College of Animal Science, South China Agricultural University, Guangzhou 510642, China; (Y.P.); (J.Z.)
| | - Ying Li
- Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, College of Life Science and Engineering, Foshan University, Foshan 528225, China; (Y.H.); (J.H.); (M.L.); (H.J.); (Y.L.)
- School of Life Science and Engineering, Foshan University, Foshan 528225, China
| | - Feilong Deng
- Guangdong Provincial Key Laboratory of Animal Molecular Design and Precise Breeding, College of Life Science and Engineering, Foshan University, Foshan 528225, China; (Y.H.); (J.H.); (M.L.); (H.J.); (Y.L.)
- School of Life Science and Engineering, Foshan University, Foshan 528225, China
| |
Collapse
|
12
|
Moeckel C, Mareboina M, Konnaris MA, Chan CS, Mouratidis I, Montgomery A, Chantzi N, Pavlopoulos GA, Georgakopoulos-Soares I. A survey of k-mer methods and applications in bioinformatics. Comput Struct Biotechnol J 2024; 23:2289-2303. [PMID: 38840832 PMCID: PMC11152613 DOI: 10.1016/j.csbj.2024.05.025] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2024] [Revised: 05/14/2024] [Accepted: 05/15/2024] [Indexed: 06/07/2024] Open
Abstract
The rapid progression of genomics and proteomics has been driven by the advent of advanced sequencing technologies, large, diverse, and readily available omics datasets, and the evolution of computational data processing capabilities. The vast amount of data generated by these advancements necessitates efficient algorithms to extract meaningful information. K-mers serve as a valuable tool when working with large sequencing datasets, offering several advantages in computational speed and memory efficiency and carrying the potential for intrinsic biological functionality. This review provides an overview of the methods, applications, and significance of k-mers in genomic and proteomic data analyses, as well as the utility of absent sequences, including nullomers and nullpeptides, in disease detection, vaccine development, therapeutics, and forensic science. Therefore, the review highlights the pivotal role of k-mers in addressing current genomic and proteomic problems and underscores their potential for future breakthroughs in research.
Collapse
Affiliation(s)
- Camille Moeckel
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Manvita Mareboina
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Maxwell A. Konnaris
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Candace S.Y. Chan
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Ioannis Mouratidis
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| | - Austin Montgomery
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | - Nikol Chantzi
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
| | | | - Ilias Georgakopoulos-Soares
- Institute for Personalized Medicine, Department of Biochemistry and Molecular Biology, The Pennsylvania State University College of Medicine, Hershey, PA, USA
- Huck Institute of the Life Sciences, Penn State University, University Park, Pennsylvania, USA
| |
Collapse
|
13
|
Purushothaman S, Meola M, Roloff T, Rooney AM, Egli A. Evaluation of DNA extraction kits for long-read shotgun metagenomics using Oxford Nanopore sequencing for rapid taxonomic and antimicrobial resistance detection. Sci Rep 2024; 14:29531. [PMID: 39604411 PMCID: PMC11603047 DOI: 10.1038/s41598-024-80660-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2024] [Accepted: 11/21/2024] [Indexed: 11/29/2024] Open
Abstract
During a bacterial infection or colonization, the detection of antimicrobial resistance (AMR) is critical, but slow due to culture-based approaches for clinical and screening samples. Culture-based phenotypic AMR detection and confirmation require up to 72 hours (h) or even weeks for slow-growing bacteria. Direct shotgun metagenomics by long-read sequencing using Oxford Nanopore Technologies (ONT) may reduce the time for bacterial species and AMR gene identification. However, screening swabs for metagenomics is complex due to the range of Gram-negative and -positive bacteria, diverse AMR genes, and host DNA present in the samples. Therefore, DNA extraction is a critical initial step. We aimed to compare the performance of different DNA extraction protocols for ONT applications to reliably identify species and AMR genes using a shotgun long-read metagenomic approach. We included three different sample types: ZymoBIOMICS Microbial Community Standard, an in-house mock community of ESKAPE pathogens including Enterococcus faecium, Staphylococcus aureus, Klebsiella pneumoniae, Acinetobacter baumannii, Pseudomonas aeruginosa, and Escherichia coli (ESKAPE Mock), and anonymized clinical swab samples. We processed all sample types with four different DNA extraction kits utilizing different lysis (enzymatic vs. mechanical) and purification (spin-column vs. magnetic beads) methods. We used kits from Qiagen (QIAamp DNA Mini and QIAamp PowerFecal Pro DNA) and Promega (Maxwell RSC Cultured Cells and Maxwell RSC Buccal Swab DNA). After extraction, samples were subject to the Rapid Barcoding Kit (RBK004) for library preparation followed by sequencing on the GridION with R9.4.1 flow cells. The fast5 files were base called to fastq files using Guppy in High Accuracy (HAC) mode with the inbuilt MinKNOW software. Raw read quality was assessed using NanoPlot and human reads were removed using Minimap2 alignment against the Hg38 genome. Taxonomy identification was performed on the raw reads using Kraken2 and on assembled contigs using Minimap2. The AMR genes were identified using Minimap2 with alignment against the CARD database on both the raw reads and assembled contigs. We identified all bacterial species present in the Zymo Mock Community (8/8) and ESKAPE Mock (6/6) with Qiagen PowerFecal Pro DNA kit (chemical and mechanical lysis) at read and assembly levels. Enzymatic lysis retrieved fewer aligned bases for the Gram-positive species (Staphylococcus aureus and Enterococcus faecium) from the ESKAPE Mock on the assembly level compared to the mechanical lysis. We detected the AMR genes from Gram-negative and -positive species in the ESKAPE Mock with the QIAamp PowerFecal Pro DNA kit on reads level with a maximum median time of 1.9 h of sequencing. Long-read metagenomics with ONT may reduce the turnaround time in screening for AMR genes. Currently, the QIAamp PowerFecal Pro DNA kit (chemical and mechanical lysis) for DNA extraction along with the Rapid Barcoding Kit for the ONT sequencing captured the best taxonomy and AMR identification for our specific use case.
Collapse
Affiliation(s)
- Srinithi Purushothaman
- Institute of Medical Microbiology, University of Zurich, Gloriastrasse 30, Zurich, 8006, Switzerland
| | - Marco Meola
- Institute of Medical Microbiology, University of Zurich, Gloriastrasse 30, Zurich, 8006, Switzerland
| | - Tim Roloff
- Institute of Medical Microbiology, University of Zurich, Gloriastrasse 30, Zurich, 8006, Switzerland
| | - Ashley M Rooney
- Institute of Medical Microbiology, University of Zurich, Gloriastrasse 30, Zurich, 8006, Switzerland
| | - Adrian Egli
- Institute of Medical Microbiology, University of Zurich, Gloriastrasse 30, Zurich, 8006, Switzerland.
| |
Collapse
|
14
|
Barber DG, Child HT, Joslin GR, Wierzbicki L, Tennant RK. Statistical design approach enables optimised mechanical lysis for enhanced long-read soil metagenomics. Sci Rep 2024; 14:28934. [PMID: 39578630 PMCID: PMC11584900 DOI: 10.1038/s41598-024-80584-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Accepted: 11/19/2024] [Indexed: 11/24/2024] Open
Abstract
Metagenomic analysis has enabled insights into soil community structure and dynamics. Long-read sequencing for metagenomics can enhance microbial ecology by improving taxonomic classification, genome assembly, and functional annotation. However, protocols for purifying high-molecular weight DNA from soil are not yet optimised. We used a statistical design of experiments approach to enhance mechanical lysis of soil samples, increasing the length of purified DNA fragments. Low energy input into mechanical lysis improved DNA integrity, resulting in longer sequenced reads. Our optimized settings of 4 m s-1 for 10 s increased fragment length by 70% compared to the manufacturer's recommendations. Longer reads from low intensity lysis produced longer contiguous sequences after assembly, potentially improving a range of down-stream analyses. Importantly, there was minimal bias exhibited in the microbial community composition due to lysis efficiency variations. We therefore propose a framework for improving the fragment lengths of DNA purified from diverse soil types, improving soil science research with long-read sequencing.
Collapse
Affiliation(s)
- Daniel G Barber
- Faculty of Environment, Science and Economy, Amory Building, Rennes Drive, Exeter, Devon, EX4 4RJ, UK
| | - Harry T Child
- Faculty of Environment, Science and Economy, Amory Building, Rennes Drive, Exeter, Devon, EX4 4RJ, UK
| | - Gabrielle R Joslin
- Faculty of Environment, Science and Economy, Amory Building, Rennes Drive, Exeter, Devon, EX4 4RJ, UK
| | - Lucy Wierzbicki
- Faculty of Environment, Science and Economy, Amory Building, Rennes Drive, Exeter, Devon, EX4 4RJ, UK
| | - Richard K Tennant
- Faculty of Environment, Science and Economy, Amory Building, Rennes Drive, Exeter, Devon, EX4 4RJ, UK.
| |
Collapse
|
15
|
Collis RM, Biggs PJ, Burgess SA, Midwinter AC, Liu J, Brightwell G, Cookson AL. Assessing antimicrobial resistance in pasture-based dairy farms: a 15-month surveillance study in New Zealand. Appl Environ Microbiol 2024; 90:e0139024. [PMID: 39440981 DOI: 10.1128/aem.01390-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2024] [Accepted: 09/12/2024] [Indexed: 10/25/2024] Open
Abstract
Antimicrobial resistance is a global public and animal health concern. Antimicrobial resistance genes (ARGs) have been detected in dairy farm environments globally; however, few longitudinal studies have utilized shotgun metagenomics for ARG surveillance in pasture-based systems. This 15-month study aimed to undertake a baseline survey using shotgun metagenomics to assess the relative abundance and diversity of ARGs in two pasture-based dairy farm environments in New Zealand with different management practices. There was no statistically significant difference in overall ARG relative abundance between the two dairy farms (P = 0.321) during the study period. Compared with overseas data, the relative abundance of ARG copies per 16S rRNA gene in feces (0.08-0.17), effluent (0.03-0.37), soil (0.20-0.63), and bulk tank milk (0.0-0.12) samples was low. Models comparing the presence or absence of resistance classes found in >10% of all feces, effluent, and soil samples demonstrated no statistically significant associations (P > 0.05) with "season," and only multi-metal (P = 0.020) and tetracycline (P = 0.0003) resistance were significant at the "farm" level. Effluent samples harbored the most diverse ARGs, some with a recognized public health risk, whereas soil samples had the highest ARG relative abundance but without recognized health risks. This highlights the importance of considering the genomic context and risk of ARGs in metagenomic data sets. This study suggests that antimicrobial resistance on pasture-based dairy farms is low and provides essential baseline ARG surveillance data for such farming systems.IMPORTANCEAntimicrobial resistance is a global threat to human and animal health. Despite the detection of antimicrobial resistance genes (ARGs) in dairy farm environments globally, longitudinal surveillance in pasture-based systems remains limited. This study assessed the relative abundance and diversity of ARGs in two New Zealand dairy farms with different management practices and provided important baseline ARG surveillance data on pasture-based dairy farms. The overall ARG relative abundance on these two farms was low, which provides further evidence for consumers of the safety of New Zealand's export products. Effluent samples harbored the most diverse range of ARGs, some of which were classified with a recognized risk to public health, whereas soil samples had the highest ARG relative abundance; however, the soil ARGs were not classified with a recognized public health risk. This emphasizes the need to consider genomic context and risk as well as ARG relative abundance in resistome studies.
Collapse
Affiliation(s)
- Rose M Collis
- Food System Integrity, AgResearch Ltd, Hopkirk Research Institute, Massey University, Palmerston North, New Zealand
- Molecular Epidemiology and Public Health Laboratory, School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Patrick J Biggs
- Molecular Epidemiology and Public Health Laboratory, School of Veterinary Science, Massey University, Palmerston North, New Zealand
- School of Natural Sciences, Massey University, Palmerston North, New Zealand
- New Zealand Food Safety Science and Research Centre, Massey University, Palmerston North, New Zealand
| | - Sara A Burgess
- Molecular Epidemiology and Public Health Laboratory, School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Anne C Midwinter
- Molecular Epidemiology and Public Health Laboratory, School of Veterinary Science, Massey University, Palmerston North, New Zealand
| | - Jinxin Liu
- Laboratory of Gastrointestinal Microbiology, College of Animal Science and Technology, Nanjing Agricultural University, Nanjing, China
| | - Gale Brightwell
- Food System Integrity, AgResearch Ltd, Hopkirk Research Institute, Massey University, Palmerston North, New Zealand
- New Zealand Food Safety Science and Research Centre, Massey University, Palmerston North, New Zealand
| | - Adrian L Cookson
- Food System Integrity, AgResearch Ltd, Hopkirk Research Institute, Massey University, Palmerston North, New Zealand
- Molecular Epidemiology and Public Health Laboratory, School of Veterinary Science, Massey University, Palmerston North, New Zealand
| |
Collapse
|
16
|
Weber CC. Disentangling cobionts and contamination in long-read genomic data using sequence composition. G3 (BETHESDA, MD.) 2024; 14:jkae187. [PMID: 39148415 PMCID: PMC11540323 DOI: 10.1093/g3journal/jkae187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/02/2024] [Accepted: 08/02/2024] [Indexed: 08/17/2024]
Abstract
The recent acceleration in genome sequencing targeting previously unexplored parts of the tree of life presents computational challenges. Samples collected from the wild often contain sequences from several organisms, including the target, its cobionts, and contaminants. Effective methods are therefore needed to separate sequences. Though advances in sequencing technology make this task easier, it remains difficult to taxonomically assign sequences from eukaryotic taxa that are not well represented in databases. Therefore, reference-based methods alone are insufficient. Here, I examine how we can take advantage of differences in sequence composition between organisms to identify symbionts, parasites, and contaminants in samples, with minimal reliance on reference data. To this end, I explore data from the Darwin Tree of Life project, including hundreds of high-quality HiFi read sets from insects. Visualizing two-dimensional representations of read tetranucleotide composition learned by a variational autoencoder can reveal distinct components of a sample. Annotating the embeddings with additional information, such as coding density, estimated coverage, or taxonomic labels allows rapid assessment of the contents of a dataset. The approach scales to millions of sequences, making it possible to explore unassembled read sets, even for large genomes. Combined with interactive visualization tools, it allows a large fraction of cobionts reported by reference-based screening to be identified. Crucially, it also facilitates retrieving genomes for which suitable reference data are absent.
Collapse
Affiliation(s)
- Claudia C Weber
- Tree of Life, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| |
Collapse
|
17
|
Gulyás G, Kakuk B, Dörmő Á, Járay T, Prazsák I, Csabai Z, Henkrich MM, Boldogkői Z, Tombácz D. Cross-comparison of gut metagenomic profiling strategies. Commun Biol 2024; 7:1445. [PMID: 39505993 PMCID: PMC11541596 DOI: 10.1038/s42003-024-07158-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 10/28/2024] [Indexed: 11/08/2024] Open
Abstract
The rapid advancements in sequencing technologies and bioinformatics have enabled metagenomic research of complex microbial systems, but reliable results depend on consistent laboratory and bioinformatics approaches. Current efforts to identify best practices often focus on optimizing specific steps, making it challenging to understand the influence of each stage on microbial population analysis and compare data across studies. This study evaluated DNA extraction, library construction methodologies, sequencing platforms, and computational approaches using a dog stool sample, two synthetic microbial community mixtures, and various sequencing data sources. Our work, the most comprehensive evaluation of metagenomic methods to date. We developed a software tool, termed minitax, which provides consistent results across the range of platforms and methodologies. Our findings showed that the Zymo Research Quick-DNA HMW MagBead Kit, Illumina DNA Prep library preparation method, and the minitax bioinformatics tool were the most effective for high-quality microbial diversity analysis. However, the effectiveness of pipelines or method combinations is sample-specific, making it difficult to identify a universally optimal approach. Therefore, employing multiple approaches is crucial for obtaining reliable outcomes in microbial systems.
Collapse
Affiliation(s)
- Gábor Gulyás
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
- MTA-SZTE Lendület GeMiNI Research Group, University of Szeged, Szeged, Hungary
| | - Balázs Kakuk
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
- MTA-SZTE Lendület GeMiNI Research Group, University of Szeged, Szeged, Hungary
| | - Ákos Dörmő
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
- MTA-SZTE Lendület GeMiNI Research Group, University of Szeged, Szeged, Hungary
| | - Tamás Járay
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
- MTA-SZTE Lendület GeMiNI Research Group, University of Szeged, Szeged, Hungary
| | - István Prazsák
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
- MTA-SZTE Lendület GeMiNI Research Group, University of Szeged, Szeged, Hungary
| | - Zsolt Csabai
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
- MTA-SZTE Lendület GeMiNI Research Group, University of Szeged, Szeged, Hungary
| | - Miksa Máté Henkrich
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary
- MTA-SZTE Lendület GeMiNI Research Group, University of Szeged, Szeged, Hungary
| | - Zsolt Boldogkői
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary.
- MTA-SZTE Lendület GeMiNI Research Group, University of Szeged, Szeged, Hungary.
| | - Dóra Tombácz
- Department of Medical Biology, Faculty of Medicine, University of Szeged, Szeged, Hungary.
- MTA-SZTE Lendület GeMiNI Research Group, University of Szeged, Szeged, Hungary.
| |
Collapse
|
18
|
Ostos I, Flórez-Pardo LM, Camargo C. A metagenomic approach to demystify the anaerobic digestion black box and achieve higher biogas yield: a review. Front Microbiol 2024; 15:1437098. [PMID: 39464396 PMCID: PMC11502389 DOI: 10.3389/fmicb.2024.1437098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 09/23/2024] [Indexed: 10/29/2024] Open
Abstract
The increasing reliance on fossil fuels and the growing accumulation of organic waste necessitates the exploration of sustainable energy alternatives. Anaerobic digestion (AD) presents one such solution by utilizing secondary biomass to produce biogas while reducing greenhouse gas emissions. Given the crucial role of microbial activity in anaerobic digestion, a deeper understanding of the microbial community is essential for optimizing biogas production. While metagenomics has emerged as a valuable tool for unravelling microbial composition and providing insights into the functional potential in biodigestion, it falls short of interpreting the functional and metabolic interactions, limiting a comprehensive understanding of individual roles in the community. This emphasizes the significance of expanding the scope of metagenomics through innovative tools that highlight the often-overlooked, yet crucial, role of microbiota in biomass digestion. These tools can more accurately elucidate microbial ecological fitness, shared metabolic pathways, and interspecies interactions. By addressing current limitations and integrating metagenomics with other omics approaches, more accurate predictive techniques can be developed, facilitating informed decision-making to optimize AD processes and enhance biogas yields, thereby contributing to a more sustainable future.
Collapse
Affiliation(s)
- Iván Ostos
- Grupo de Investigación en Ingeniería Electrónica, Industrial, Ambiental, Metrología GIEIAM, Universidad Santiago de Cali, Cali, Colombia
| | - Luz Marina Flórez-Pardo
- Grupo de Investigación en Modelado, Análisis y Simulación de Procesos Ambientales e Industriales PAI+, Universidad Autónoma de Occidente, Cali, Colombia
| | - Carolina Camargo
- Centro de Investigación de la Caña de Azúcar, CENICAÑA, Cali, Colombia
| |
Collapse
|
19
|
Toporowska M, Żebracki K, Mazur A, Mazur-Marzec H, Šulčius S, Alzbutas G, Lukashevich V, Dziga D, Mieczan T. Biodegradation of microcystins by microbiota of duckweed Spirodelapolyrhiza. CHEMOSPHERE 2024; 366:143436. [PMID: 39349071 DOI: 10.1016/j.chemosphere.2024.143436] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/11/2024] [Revised: 09/20/2024] [Accepted: 09/27/2024] [Indexed: 10/02/2024]
Abstract
Cyanobacteria-produced allelochemicals, including hepatotoxic microcystins (MCs), exert an inhibitory effect on macrophyte growth. However, the role of macrophyte-associated bacteria and algae (macrophyte microbiota) in mitigating these immediate negative effects of cyanotoxins remains poorly understood. In this paper, we analyzed the biodegradation of microcystin-RR, MC-LR, and MC-LF by microbiota of the macrophyte Spirodela polyrhiza. The biodegradation of two MC variants was observed and LC-MS/MS analysis allowed identifying the degradation products of MC-RR (m/z 1011, 984, 969, 877, 862, 820, and 615) and MC-LR (m/z 968 and 953), including eight previously unreported products. No degradation products of MC-LF were detected, suggesting its stability and resistance under experimental conditions. NGS-based profiling of microbial consortia revealed no major differences in bacterial community composition across experimental treatments. Taxa previously reported as capable of MC degradation have been found in S. polyrhiza microbiota. Furthermore, the presence of genes encoding putative microcystinase homologues and the formation of new linear intermediates suggest a biochemical pathway that is similar, but not identical to previously reported. The ability of aquatic plant microbiota to biodegrade MCs holds environmental significance, and further studies in this field are required.
Collapse
Affiliation(s)
- Magdalena Toporowska
- Department of Hydrobiology and Protection of Ecosystems, University of Life Sciences in Lublin, Dobrzańskiego 37, 20-262 Lublin, Poland.
| | - Kamil Żebracki
- Department of Genetics and Microbiology, University of Maria Curie-Skłodowska, Akademicka 19, 20-033, Lublin, Poland.
| | - Andrzej Mazur
- Department of Genetics and Microbiology, University of Maria Curie-Skłodowska, Akademicka 19, 20-033, Lublin, Poland.
| | - Hanna Mazur-Marzec
- Department of Marine Biology and Biotechnology, University of Gdańsk, Al. Marszałka Piłsudskiego 46, 81-378 Gdynia, Poland.
| | - Sigitas Šulčius
- Laboratory of Algology and Microbial Ecology, Nature Research Centre, Akademijos Str. 2, LT-08412 Vilnius, Lithuania; Department of Bioinformatics, Nature Research Centre, Akademijos Str. 2, LT-08412 Vilnius, Lithuania.
| | - Gediminas Alzbutas
- Department of Bioinformatics, Nature Research Centre, Akademijos Str. 2, LT-08412 Vilnius, Lithuania.
| | - Valiantsin Lukashevich
- Laboratory of Algology and Microbial Ecology, Nature Research Centre, Akademijos Str. 2, LT-08412 Vilnius, Lithuania.
| | - Dariusz Dziga
- Laboratory of Metabolomics, Faculty of Biochemistry, Biophysics and Biotechnology, Jagiellonian University, Gronostajowa 7, 30-387 Krakow, Poland.
| | - Tomasz Mieczan
- Department of Hydrobiology and Protection of Ecosystems, University of Life Sciences in Lublin, Dobrzańskiego 37, 20-262 Lublin, Poland.
| |
Collapse
|
20
|
Kutuzova S, Nielsen M, Piera P, Nissen JN, Rasmussen S. Taxometer: Improving taxonomic classification of metagenomics contigs. Nat Commun 2024; 15:8357. [PMID: 39333501 PMCID: PMC11437175 DOI: 10.1038/s41467-024-52771-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2024] [Accepted: 09/20/2024] [Indexed: 09/29/2024] Open
Abstract
For taxonomy based classification of metagenomics assembled contigs, current methods use sequence similarity to identify their most likely taxonomy. However, in the related field of metagenomic binning, contigs are routinely clustered using information from both the contig sequences and their abundance. We introduce Taxometer, a neural network based method that improves the annotations and estimates the quality of any taxonomic classifier using contig abundance profiles and tetra-nucleotide frequencies. We apply Taxometer to five short-read CAMI2 datasets and find that it increases the average share of correct species-level contig annotations of the MMSeqs2 tool from 66.6% to 86.2%. Additionally, it reduce the share of wrong species-level annotations in the CAMI2 Rhizosphere dataset by an average of two-fold for Metabuli, Centrifuge, and Kraken2. Futhermore, we use Taxometer for benchmarking taxonomic classifiers on two complex long-read metagenomics data sets where ground truth is not known. Taxometer is available as open-source software and can enhance any taxonomic annotation of metagenomic contigs.
Collapse
Affiliation(s)
- Svetlana Kutuzova
- Department of Computer Science, University of Copenhagen, Universitetsparken 1, Copenhagen, 2100, Denmark
- The Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Blegdamsvej 3A, Copenhagen, 2200, Denmark
- The Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Blegdamsvej 3A, Copenhagen, 2200, Denmark
| | - Mads Nielsen
- Department of Computer Science, University of Copenhagen, Universitetsparken 1, Copenhagen, 2100, Denmark
| | - Pau Piera
- The Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Blegdamsvej 3A, Copenhagen, 2200, Denmark
- The Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Blegdamsvej 3A, Copenhagen, 2200, Denmark
| | - Jakob Nybo Nissen
- The Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Blegdamsvej 3A, Copenhagen, 2200, Denmark.
- The Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Blegdamsvej 3A, Copenhagen, 2200, Denmark.
| | - Simon Rasmussen
- The Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, Blegdamsvej 3A, Copenhagen, 2200, Denmark.
- The Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Blegdamsvej 3A, Copenhagen, 2200, Denmark.
- The Novo Nordisk Foundation Center for Genomic Mechanisms of Disease, Broad Institute of MIT and Harvard, Cambridge, 02142, MA, USA.
| |
Collapse
|
21
|
Child HT, Wierzbicki L, Joslin GR, Tennant RK. Comparative evaluation of soil DNA extraction kits for long read metagenomic sequencing. Access Microbiol 2024; 6:000868.v3. [PMID: 39346682 PMCID: PMC11432601 DOI: 10.1099/acmi.0.000868.v3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Accepted: 09/12/2024] [Indexed: 10/01/2024] Open
Abstract
Metagenomics has been transformative in our understanding of the diversity and function of soil microbial communities. Applying long read sequencing to whole genome shotgun metagenomics has the potential to revolutionise soil microbial ecology through improved taxonomic classification, functional characterisation and metagenome assembly. However, optimisation of robust methods for long read metagenomics of environmental samples remains undeveloped. In this study, Oxford Nanopore sequencing using samples from five commercially available soil DNA extraction kits was compared across four soil types, in order to optimise read length and reproducibility for comparative long read soil metagenomics. Average extracted DNA lengths varied considerably between kits, but longer DNA fragments did not translate consistently into read lengths. Highly variable decreases in the length of resulting reads from some kits were associated with poor classification rate and low reproducibility in microbial communities identified between technical repeats. Replicate samples from other kits showed more consistent conversion of extracted DNA fragment size into read length and resulted in more congruous microbial community representation. Furthermore, extraction kits showed significant differences in the community representation and structure they identified across all soil types. Overall, the QIAGEN DNeasy PowerSoil Pro Kit displayed the best suitability for reproducible long-read WGS metagenomic sequencing, although further optimisation of DNA purification and library preparation may enable translation of higher molecular weight DNA from other kits into longer read lengths. These findings provide a novel insight into the importance of optimising DNA extraction for achieving replicable results from long read metagenomic sequencing of environmental samples.
Collapse
Affiliation(s)
- Harry T. Child
- Geography, Faculty of Environment, Science and Economy, University of Exeter, Amory Building, Rennes Drive, Exeter, Devon, EX4 4RJ, UK
| | - Lucy Wierzbicki
- Geography, Faculty of Environment, Science and Economy, University of Exeter, Amory Building, Rennes Drive, Exeter, Devon, EX4 4RJ, UK
| | - Gabrielle R. Joslin
- Geography, Faculty of Environment, Science and Economy, University of Exeter, Amory Building, Rennes Drive, Exeter, Devon, EX4 4RJ, UK
| | - Richard K. Tennant
- Geography, Faculty of Environment, Science and Economy, University of Exeter, Amory Building, Rennes Drive, Exeter, Devon, EX4 4RJ, UK
| |
Collapse
|
22
|
Yepes-García J, Falquet L. Metagenome quality metrics and taxonomical annotation visualization through the integration of MAGFlow and BIgMAG. F1000Res 2024; 13:640. [PMID: 39360247 PMCID: PMC11445639 DOI: 10.12688/f1000research.152290.2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 09/03/2024] [Indexed: 10/04/2024] Open
Abstract
Background Building Metagenome-Assembled Genomes (MAGs) from highly complex metagenomics datasets encompasses a series of steps covering from cleaning the sequences, assembling them to finally group them into bins. Along the process, multiple tools aimed to assess the quality and integrity of each MAG are implemented. Nonetheless, even when incorporated within end-to-end pipelines, the outputs of these pieces of software must be visualized and analyzed manually lacking integration in a complete framework. Methods We developed a Nextflow pipeline (MAGFlow) for estimating the quality of MAGs through a wide variety of approaches (BUSCO, CheckM2, GUNC and QUAST), as well as for annotating taxonomically the metagenomes using GTDB-Tk2. MAGFlow is coupled to a Python-Dash application (BIgMAG) that displays the concatenated outcomes from the tools included by MAGFlow, highlighting the most important metrics in a single interactive environment along with a comparison/clustering of the input data. Results By using MAGFlow/BIgMAG, the user will be able to benchmark the MAGs obtained through different workflows or establish the quality of the MAGs belonging to different samples following the divide and rule methodology. Conclusions MAGFlow/BIgMAG represents a unique tool that integrates state-of-the-art tools to study different quality metrics and extract visually as much information as possible from a wide range of genome features.
Collapse
Affiliation(s)
- Jeferyd Yepes-García
- Swiss Institute of Bioinformatics, Lausanne, Vaud, 1015, Switzerland
- Department of Biology, University of Fribourg, Fribourg, Canton of Fribourg, 1700, Switzerland
| | - Laurent Falquet
- Swiss Institute of Bioinformatics, Lausanne, Vaud, 1015, Switzerland
- Department of Biology, University of Fribourg, Fribourg, Canton of Fribourg, 1700, Switzerland
| |
Collapse
|
23
|
Bosilj M, Suljič A, Zakotnik S, Slunečko J, Kogoj R, Korva M. MetaAll: integrative bioinformatics workflow for analysing clinical metagenomic data. Brief Bioinform 2024; 25:bbae597. [PMID: 39550223 PMCID: PMC11568877 DOI: 10.1093/bib/bbae597] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2024] [Revised: 10/17/2024] [Accepted: 11/11/2024] [Indexed: 11/18/2024] Open
Abstract
Over the past decade, there have been many improvements in the field of metagenomics, including sequencing technologies, advances in bioinformatics and the development of reference databases, but a one-size-fits-all sequencing and bioinformatics pipeline does not yet seem achievable. In this study, we address the bioinformatics part of the analysis by combining three methods into a three-step workflow that increases the sensitivity and specificity of clinical metagenomics and improves pathogen detection. The individual tools are combined into a user-friendly workflow suitable for analysing short paired-end (PE) and long reads from metagenomics datasets-MetaAll. To demonstrate the applicability of the developed workflow, four complicated clinical cases with different disease presentations and multiple samples collected from different biological sites as well as the CAMI Clinical pathogen detection challenge dataset were used. MetaAll was able to identify putative pathogens in all but one case. In this case, however, traditional microbiological diagnostics were also unsuccessful. In addition, co-infection with Haemophilus influenzae and Human rhinovirus C54 was detected in case 1 and co-infection with SARS-Cov-2 and Influenza A virus (FluA) subtype H3N2 was detected in case 3. In case 2, in which conventional diagnostics could not find a pathogen, mNGS pointed to Klebsiella pneumoniae as the suspected pathogen. Finally, this study demonstrated the importance of combining read classification, contig validation and targeted reference mapping for more reliable detection of infectious agents in clinical metagenome samples.
Collapse
Affiliation(s)
- Martin Bosilj
- Institute of Microbiology and Immunology, Faculty of Medicine, University of Ljubljana, Zaloška cesta 4, 1000 Ljubljana, Slovenia
| | - Alen Suljič
- Institute of Microbiology and Immunology, Faculty of Medicine, University of Ljubljana, Zaloška cesta 4, 1000 Ljubljana, Slovenia
| | - Samo Zakotnik
- Institute of Microbiology and Immunology, Faculty of Medicine, University of Ljubljana, Zaloška cesta 4, 1000 Ljubljana, Slovenia
| | - Jan Slunečko
- Institute of Microbiology and Immunology, Faculty of Medicine, University of Ljubljana, Zaloška cesta 4, 1000 Ljubljana, Slovenia
| | - Rok Kogoj
- Institute of Microbiology and Immunology, Faculty of Medicine, University of Ljubljana, Zaloška cesta 4, 1000 Ljubljana, Slovenia
| | - Misa Korva
- Institute of Microbiology and Immunology, Faculty of Medicine, University of Ljubljana, Zaloška cesta 4, 1000 Ljubljana, Slovenia
| |
Collapse
|
24
|
Buddle S, Forrest L, Akinsuyi N, Martin Bernal LM, Brooks T, Venturini C, Miller C, Brown JR, Storey N, Atkinson L, Best T, Roy S, Goldsworthy S, Castellano S, Simmonds P, Harvala H, Golubchik T, Williams R, Breuer J, Morfopoulou S, Torres Montaguth OE. Evaluating metagenomics and targeted approaches for diagnosis and surveillance of viruses. Genome Med 2024; 16:111. [PMID: 39252069 PMCID: PMC11382446 DOI: 10.1186/s13073-024-01380-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2024] [Accepted: 08/30/2024] [Indexed: 09/11/2024] Open
Abstract
BACKGROUND Metagenomics is a powerful approach for the detection of unknown and novel pathogens. Workflows based on Illumina short-read sequencing are becoming established in diagnostic laboratories. However, high sequencing depth requirements, long turnaround times, and limited sensitivity hinder broader adoption. We investigated whether we could overcome these limitations using protocols based on untargeted sequencing with Oxford Nanopore Technologies (ONT), which offers real-time data acquisition and analysis, or a targeted panel approach, which allows the selective sequencing of known pathogens and could improve sensitivity. METHODS We evaluated detection of viruses with readily available untargeted metagenomic workflows using Illumina and ONT, and an Illumina-based enrichment approach using the Twist Bioscience Comprehensive Viral Research Panel (CVRP), which targets 3153 viruses. We tested samples consisting of a dilution series of a six-virus mock community in a human DNA/RNA background, designed to resemble clinical specimens with low microbial abundance and high host content. Protocols were designed to retain the host transcriptome, since this could help confirm the absence of infectious agents. We further compared the performance of commonly used taxonomic classifiers. RESULTS Capture with the Twist CVRP increased sensitivity by at least 10-100-fold over untargeted sequencing, making it suitable for the detection of low viral loads (60 genome copies per ml (gc/ml)), but additional methods may be needed in a diagnostic setting to detect untargeted organisms. While untargeted ONT had good sensitivity at high viral loads (60,000 gc/ml), at lower viral loads (600-6000 gc/ml), longer and more costly sequencing runs would be required to achieve sensitivities comparable to the untargeted Illumina protocol. Untargeted ONT provided better specificity than untargeted Illumina sequencing. However, the application of robust thresholds standardized results between taxonomic classifiers. Host gene expression analysis is optimal with untargeted Illumina sequencing but possible with both the CVRP and ONT. CONCLUSIONS Metagenomics has the potential to become standard-of-care in diagnostics and is a powerful tool for the discovery of emerging pathogens. Untargeted Illumina and ONT metagenomics and capture with the Twist CVRP have different advantages with respect to sensitivity, specificity, turnaround time and cost, and the optimal method will depend on the clinical context.
Collapse
Affiliation(s)
- Sarah Buddle
- Infection, Immunity and Inflammation Department, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Leysa Forrest
- Genetics and Genomic Medicine Department, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Naomi Akinsuyi
- Infection, Immunity and Inflammation Department, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Luz Marina Martin Bernal
- Genetics and Genomic Medicine Department, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Tony Brooks
- Genetics and Genomic Medicine Department, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Cristina Venturini
- Infection, Immunity and Inflammation Department, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Charles Miller
- Department of Microbiology, Virology and Infection Prevention & Control, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Julianne R Brown
- Department of Microbiology, Virology and Infection Prevention & Control, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Nathaniel Storey
- Department of Microbiology, Virology and Infection Prevention & Control, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Laura Atkinson
- Department of Microbiology, Virology and Infection Prevention & Control, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Timothy Best
- Department of Microbiology, Virology and Infection Prevention & Control, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK
| | - Sunando Roy
- Genetics and Genomic Medicine Department, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Sian Goldsworthy
- Genetics and Genomic Medicine Department, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Sergi Castellano
- Genetics and Genomic Medicine Department, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Peter Simmonds
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
| | - Heli Harvala
- Radcliffe Department of Medicine, University of Oxford, Oxford, UK
- Division of Infection and Immunity, University College London, London, UK
- Microbiology Services, NHS Blood and Transplant, Colindale, UK
| | - Tanya Golubchik
- Nuffield Department of Medicine, University of Oxford, Oxford, UK
- Sydney Infectious Diseases Institute, Faculty of Medicine and Health, University of Sydney, Sydney, Australia
| | - Rachel Williams
- Genetics and Genomic Medicine Department, Great Ormond Street Institute of Child Health, University College London, London, UK
| | - Judith Breuer
- Infection, Immunity and Inflammation Department, Great Ormond Street Institute of Child Health, University College London, London, UK.
- Department of Microbiology, Virology and Infection Prevention & Control, Great Ormond Street Hospital for Children NHS Foundation Trust, London, UK.
| | - Sofia Morfopoulou
- Infection, Immunity and Inflammation Department, Great Ormond Street Institute of Child Health, University College London, London, UK.
- Section for Paediatrics, Department of Infectious Diseases, Faculty of Medicine, Imperial College London, London, UK.
| | - Oscar Enrique Torres Montaguth
- Infection, Immunity and Inflammation Department, Great Ormond Street Institute of Child Health, University College London, London, UK.
| |
Collapse
|
25
|
Glendinning L, Wu Z, Vervelde L, Watson M, Balic A. Infectious bronchitis virus vaccination, but not the presence of XCR1, is correlated with large differences in chicken caecal microbiota. Microb Genom 2024; 10:001289. [PMID: 39222347 PMCID: PMC11541229 DOI: 10.1099/mgen.0.001289] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2024] [Accepted: 08/16/2024] [Indexed: 09/04/2024] Open
Abstract
The chicken immune system and microbiota play vital roles in maintaining gut homeostasis and protecting against pathogens. In mammals, XCR1+ conventional dendritic cells (cDCs) are located in the gut-draining lymph nodes and play a major role in gut homeostasis. These cDCs sample antigens in the gut luminal contents and limit the inflammatory response to gut commensal microbes by generating appropriate regulatory and effector T-cell responses. We hypothesized that these cells play similar roles in sustaining gut homeostasis in chickens, and that chickens lacking XCR1 were likely to contain a dysbiotic caecal microbiota. Here we compare the caecal microbiota of chickens that were either heterozygous or homozygous XCR1 knockouts, that had or had not been vaccinated for infectious bronchitis virus (IBV). We used short-read (Illumina) and long-read (PacBio HiFi) metagenomic sequencing to reconstruct 670 high-quality, strain-level metagenome assembled genomes. We found no significant differences between alpha diversity or the abundance of specific microbial taxa between genotypes. However, IBV vaccination was found to correlate with significant differences in the richness and beta diversity of the microbiota, and to the abundance of 40 bacterial genera. In conclusion, we found that a lack of XCR1 was not correlated with significant changes in the chicken microbiota, but IBV vaccination was.
Collapse
Affiliation(s)
| | - Zhiguang Wu
- The Roslin Institute, University of Edinburgh, Edinburgh, UK
| | - Lonneke Vervelde
- The Roslin Institute, University of Edinburgh, Edinburgh, UK
- Royal GD Animal Health, Deventer, Netherlands
| | - Mick Watson
- Centre for Digital Innovation, DSM Biotechnology Centre, Delft, Netherlands
- Scotland’s Rural College, Peter Wilson Building, King’s Buildings, Edinburgh, UK
| | - Adam Balic
- The Roslin Institute, University of Edinburgh, Edinburgh, UK
- Department of Biochemistry and Pharmacology, Bio21 Molecular Science and Biotechnology Institute, The University of Melbourne, Parkville, VIC, Australia
| |
Collapse
|
26
|
Barber DG, Davies CA, Hartley IP, Tennant RK. Evaluation of commercial RNA extraction kits for long-read metatranscriptomics in soil. Microb Genom 2024; 10. [PMID: 39298196 PMCID: PMC11412367 DOI: 10.1099/mgen.0.001298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/21/2024] Open
Abstract
Metatranscriptomic analysis of the soil microbiome has the potential to reveal molecular mechanisms that drive soil processes regulated by the microbial community. Therefore, RNA samples must be of sufficient yield and quality to robustly quantify differential gene expression. While short-read sequencing technology is often favoured for metatranscriptomics, long-read sequencing has the potential to provide several benefits over short-read technologies. The ability to resolve complete transcripts on a portable sequencing platform for a relatively low capital expenditure makes Oxford Nanopore Technology an attractive prospect for addressing many of the challenges of soil metatranscriptomics. To fully enable long-read metatranscriptomic analysis of the functional molecular pathways expressed in these diverse habitats, RNA purification methods from soil must be optimised for long-read sequencing. Here we compare RNA samples purified using five commercially available extraction kits designed for use with soil. We found that the Qiagen RNeasy PowerSoil Total RNA Kit performed the best across RNA yield, quality and purity and was robust across different soil types. We found that sufficient sequencing depth can be achieved to characterise the active community for total RNA samples using Oxford Nanopore Technology, and discuss its current limitations for differential gene expression analysis in soil studies.
Collapse
Affiliation(s)
- Daniel G Barber
- Geography, Faculty of Environment, Science and Economy, Amory Building, Rennes Drive, Exeter, Devon, EX4 4RJ, UK
| | - Christian A Davies
- Shell International Exploration and Production Inc., Shell Technology Centre Houston, Houston, TX, 77082, USA
| | - Iain P Hartley
- Geography, Faculty of Environment, Science and Economy, Amory Building, Rennes Drive, Exeter, Devon, EX4 4RJ, UK
| | - Richard K Tennant
- Geography, Faculty of Environment, Science and Economy, Amory Building, Rennes Drive, Exeter, Devon, EX4 4RJ, UK
| |
Collapse
|
27
|
Sapoval N, Liu Y, Curry KD, Kille B, Huang W, Kokroko N, Nute MG, Tyshaieva A, Dilthey A, Molloy EK, Treangen TJ. Lightweight taxonomic profiling of long-read metagenomic datasets with Lemur and Magnet. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.06.01.596961. [PMID: 38895276 PMCID: PMC11185576 DOI: 10.1101/2024.06.01.596961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/21/2024]
Abstract
The advent of long-read sequencing of microbiomes necessitates the development of new taxonomic profilers tailored to long-read shotgun metagenomic datasets. Here, we introduce Lemur and Magnet, a pair of tools optimized for lightweight and accurate taxonomic profiling for long-read shotgun metagenomic datasets. Lemur is a marker-gene-based method that leverages an EM algorithm to reduce false positive calls while preserving true positives; Magnet is a whole-genome read-mapping-based method that provides detailed presence and absence calls for bacterial genomes. We demonstrate that Lemur and Magnet can run in minutes to hours on a laptop with 32 GB of RAM, even for large inputs, a crucial feature given the portability of long-read sequencing machines. Furthermore, the marker gene database used by Lemur is only 4 GB and contains information from over 300,000 RefSeq genomes. Lemur and Magnet are open-source and available at https://github.com/treangenlab/lemur and https://github.com/treangenlab/magnet.
Collapse
Affiliation(s)
- Nicolae Sapoval
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| | - Yunxi Liu
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| | - Kristen D. Curry
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| | - Bryce Kille
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| | - Wenyu Huang
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| | - Natalie Kokroko
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| | - Michael G. Nute
- Department of Computer Science, Rice University, Houston, TX 77005, USA
| | - Alona Tyshaieva
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| | - Alexander Dilthey
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| | - Erin K. Molloy
- Department of Bioengineerings, Rice University, Houston, TX 77005, USA
| | - Todd J. Treangen
- Department of Computer Science, Rice University, Houston, TX 77005, USA
- Department of Bioengineerings, Rice University, Houston, TX 77005, USA
| |
Collapse
|
28
|
Chen X, Yin X, Shi X, Yan W, Yang Y, Liu L, Zhang T. Melon: metagenomic long-read-based taxonomic identification and quantification using marker genes. Genome Biol 2024; 25:226. [PMID: 39160564 PMCID: PMC11331721 DOI: 10.1186/s13059-024-03363-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 07/30/2024] [Indexed: 08/21/2024] Open
Abstract
Long-read sequencing holds great potential for characterizing complex microbial communities, yet taxonomic profiling tools designed specifically for long reads remain lacking. We introduce Melon, a novel marker-based taxonomic profiler that capitalizes on the unique attributes of long reads. Melon employs a two-stage classification scheme to reduce computational time and is equipped with an expectation-maximization-based post-correction module to handle ambiguous reads. Melon achieves superior performance compared to existing tools in both mock and simulated samples. Using wastewater metagenomic samples, we demonstrate the applicability of Melon by showing it provides reliable estimates of overall genome copies, and species-level taxonomic profiles.
Collapse
Affiliation(s)
- Xi Chen
- Environmental Microbiome Engineering and Biotechnology Lab, Department of Civil Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong, China
| | - Xiaole Yin
- Environmental Microbiome Engineering and Biotechnology Lab, Department of Civil Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong, China
| | - Xianghui Shi
- Environmental Microbiome Engineering and Biotechnology Lab, Department of Civil Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong, China
| | - Weifu Yan
- Environmental Microbiome Engineering and Biotechnology Lab, Department of Civil Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong, China
| | - Yu Yang
- Environmental Microbiome Engineering and Biotechnology Lab, Department of Civil Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong, China
| | - Lei Liu
- Environmental Microbiome Engineering and Biotechnology Lab, Department of Civil Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong, China
| | - Tong Zhang
- Environmental Microbiome Engineering and Biotechnology Lab, Department of Civil Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong, China.
| |
Collapse
|
29
|
Ulrich JU, Renard BY. Fast and space-efficient taxonomic classification of long reads with hierarchical interleaved XOR filters. Genome Res 2024; 34:914-924. [PMID: 38886068 PMCID: PMC11293544 DOI: 10.1101/gr.278623.123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Accepted: 05/23/2024] [Indexed: 06/20/2024]
Abstract
Metagenomic long-read sequencing is gaining popularity for various applications, including pathogen detection and microbiome studies. To analyze the large data created in those studies, software tools need to taxonomically classify the sequenced molecules and estimate the relative abundances of organisms in the sequenced sample. Because of the exponential growth of reference genome databases, the current taxonomic classification methods have large computational requirements. This issue motivated us to develop a new data structure for fast and memory-efficient querying of long reads. Here, we present Taxor as a new tool for long-read metagenomic classification using a hierarchical interleaved XOR filter data structure for indexing and querying large reference genome sets. Taxor implements several k-mer-based approaches, such as syncmers, for pseudoalignment to classify reads and an expectation-maximization algorithm for metagenomic profiling. Our results show that Taxor outperforms state-of-the-art tools regarding precision while having a similar recall for long-read taxonomic classification. Most notably, Taxor reduces the memory requirements and index size by >50% and is among the fastest tools regarding query times. This enables real-time metagenomics analysis with large reference databases on a small laptop in the field.
Collapse
Affiliation(s)
- Jens-Uwe Ulrich
- Data Analytics and Computational Statistics, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany;
- Phylogenomics Unit, Center for Artificial Intelligence in Public Health Research, Robert Koch Institute, 15745 Wildau, Germany
- Department of Mathematics and Computer Science, Free University of Berlin, 14195 Berlin, Germany
| | - Bernhard Y Renard
- Data Analytics and Computational Statistics, Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam, 14482 Potsdam, Germany;
| |
Collapse
|
30
|
Hirsch P, Molano LA, Engel A, Zentgraf J, Rahmann S, Hannig M, Müller R, Kern F, Keller A, Schmartz G. Mibianto: ultra-efficient online microbiome analysis through k-mer based metagenomics. Nucleic Acids Res 2024; 52:W407-W414. [PMID: 38716863 PMCID: PMC11223814 DOI: 10.1093/nar/gkae364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 04/03/2024] [Accepted: 04/24/2024] [Indexed: 07/06/2024] Open
Abstract
Quantifying microbiome species and composition from metagenomic assays is often challenging due to its time-consuming nature and computational complexity. In Bioinformatics, k-mer-based approaches were long established to expedite the analysis of large sequencing data and are now widely used to annotate metagenomic data. We make use of k-mer counting techniques for efficient and accurate compositional analysis of microbiota from whole metagenome sequencing. Mibianto solves this problem by operating directly on read files, without manual preprocessing or complete data exchange. It handles diverse sequencing platforms, including short single-end, paired-end, and long read technologies. Our sketch-based workflow significantly reduces the data volume transferred from the user to the server (up to 99.59% size reduction) to subsequently perform taxonomic profiling with enhanced efficiency and privacy. Mibianto offers functionality beyond k-mer quantification; it supports advanced community composition estimation, including diversity, ordination, and differential abundance analysis. Our tool aids in the standardization of computational workflows, thus supporting reproducibility of scientific sequencing studies. It is adaptable to small- and large-scale experimental designs and offers a user-friendly interface, thus making it an invaluable tool for both clinical and research-oriented metagenomic studies. Mibianto is freely available without the need for a login at: https://www.ccb.uni-saarland.de/mibianto.
Collapse
Affiliation(s)
- Pascal Hirsch
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| | | | - Annika Engel
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| | - Jens Zentgraf
- Algorithmic Bioinformatics, Center for Bioinformatics Saar and Saarland University, Saarland Informatics Campus, 66123 Saarbrücken, Germany
- Saarbrücken Graduate School of Computer Science, Saarland Informatics Campus, 66123 Saarbrücken, Germany
| | - Sven Rahmann
- Algorithmic Bioinformatics, Center for Bioinformatics Saar and Saarland University, Saarland Informatics Campus, 66123 Saarbrücken, Germany
| | - Matthias Hannig
- Clinic of Operative Dentistry, Periodontology and Preventive Dentistry, Saarland University Hospital, Saarland University, Kirrberger Str. 100, Building 73, 66421 Homburg, Saar, Germany
| | - Rolf Müller
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, 66123 Saarbrücken, Germany
- Deutsches Zentrum für Infektionsforschung (DZIF), Standort Hannover-Braunschweig, 38124 Braunschweig, Germany
- PharmaScienceHub, 66123 Saarbrücken, Germany
| | - Fabian Kern
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, 66123 Saarbrücken, Germany
| | - Andreas Keller
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
- Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research, 66123 Saarbrücken, Germany
- PharmaScienceHub, 66123 Saarbrücken, Germany
| | - Georges P Schmartz
- Chair for Clinical Bioinformatics, Saarland University, 66123 Saarbrücken, Germany
| |
Collapse
|
31
|
Schäfer L, Jehle JA, Kleespies RG, Wennmann JT. Pathogens of the oak processionary moth Thaumetopoea processionea: Developing a user-friendly bioassay system and metagenome analyses for microorganisms. J Invertebr Pathol 2024; 205:108121. [PMID: 38705355 DOI: 10.1016/j.jip.2024.108121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2023] [Revised: 02/05/2024] [Accepted: 04/26/2024] [Indexed: 05/07/2024]
Abstract
The oak processionary moth (OPM) Thaumetopoea processionea is a pest of oak trees and poses health risks to humans due to the urticating setae of later instar larvae. For this reason, it is difficult to rear OPM under laboratory conditions, carry out bioassays or examine larvae for pathogens. Biological control targets the early larval instars and is based primarily on commercial preparations of Bacillus thuringiensis ssp. kurstaki (Btk). To test the entomopathogenic potential of other spore-forming bacteria, a user-friendly bioassay system was developed that (i) applies bacterial spore suspensions by oak bud dipping, (ii) targets first instar larvae through feeding exposure and (iii) takes into account their group-feeding behavior. A negligible mortality in the untreated control proved the functionality of the newly established bioassay system. Whereas the commercial Btk HD-1 strain was used as a bioassay standard and confirmed as being highly efficient, a Bacillus wiedmannii strain was ineffective in killing OPM larvae. Larvae, which died during the infection experiment, were further subjected to Nanopore sequencing for a metagenomic approach for entomopathogen detection. It further corroborated that B.wiedmannii was not able to infect and establish in OPM, but identified potential insect pathogenic species from the genera Serratia and Pseudomonas.
Collapse
Affiliation(s)
- Lea Schäfer
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Biological Control, Schwabenheimer Str. 101, 69221 Dossenheim, Germany
| | - Johannes A Jehle
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Biological Control, Schwabenheimer Str. 101, 69221 Dossenheim, Germany
| | - Regina G Kleespies
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Biological Control, Schwabenheimer Str. 101, 69221 Dossenheim, Germany
| | - Jörg T Wennmann
- Julius Kühn Institute (JKI) - Federal Research Centre for Cultivated Plants, Institute for Biological Control, Schwabenheimer Str. 101, 69221 Dossenheim, Germany.
| |
Collapse
|
32
|
Agustinho DP, Fu Y, Menon VK, Metcalf GA, Treangen TJ, Sedlazeck FJ. Unveiling microbial diversity: harnessing long-read sequencing technology. Nat Methods 2024; 21:954-966. [PMID: 38689099 PMCID: PMC11955098 DOI: 10.1038/s41592-024-02262-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2022] [Accepted: 03/29/2024] [Indexed: 05/02/2024]
Abstract
Long-read sequencing has recently transformed metagenomics, enhancing strain-level pathogen characterization, enabling accurate and complete metagenome-assembled genomes, and improving microbiome taxonomic classification and profiling. These advancements are not only due to improvements in sequencing accuracy, but also happening across rapidly changing analysis methods. In this Review, we explore long-read sequencing's profound impact on metagenomics, focusing on computational pipelines for genome assembly, taxonomic characterization and variant detection, to summarize recent advancements in the field and provide an overview of available analytical methods to fully leverage long reads. We provide insights into the advantages and disadvantages of long reads over short reads and their evolution from the early days of long-read sequencing to their recent impact on metagenomics and clinical diagnostics. We further point out remaining challenges for the field such as the integration of methylation signals in sub-strain analysis and the lack of benchmarks.
Collapse
Affiliation(s)
- Daniel P Agustinho
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Yilei Fu
- Department of Computer Science, Rice University, Houston, TX, USA
| | - Vipin K Menon
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
- Senior research project manager, Human Genetics, Genentech, South San Francisco, CA, USA
| | - Ginger A Metcalf
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA
| | - Todd J Treangen
- Department of Computer Science, Rice University, Houston, TX, USA
- Department of Bioengineering, Rice University, Houston, TX, USA
| | - Fritz J Sedlazeck
- Human Genome Sequencing center, Baylor College of Medicine, Houston, TX, USA.
- Department of Computer Science, Rice University, Houston, TX, USA.
| |
Collapse
|
33
|
Tian Q, Zhang P, Zhai Y, Wang Y, Zou Q. Application and Comparison of Machine Learning and Database-Based Methods in Taxonomic Classification of High-Throughput Sequencing Data. Genome Biol Evol 2024; 16:evae102. [PMID: 38748485 PMCID: PMC11135637 DOI: 10.1093/gbe/evae102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/12/2024] [Indexed: 05/30/2024] Open
Abstract
The advent of high-throughput sequencing technologies has not only revolutionized the field of bioinformatics but has also heightened the demand for efficient taxonomic classification. Despite technological advancements, efficiently processing and analyzing the deluge of sequencing data for precise taxonomic classification remains a formidable challenge. Existing classification approaches primarily fall into two categories, database-based methods and machine learning methods, each presenting its own set of challenges and advantages. On this basis, the aim of our study was to conduct a comparative analysis between these two methods while also investigating the merits of integrating multiple database-based methods. Through an in-depth comparative study, we evaluated the performance of both methodological categories in taxonomic classification by utilizing simulated data sets. Our analysis revealed that database-based methods excel in classification accuracy when backed by a rich and comprehensive reference database. Conversely, while machine learning methods show superior performance in scenarios where reference sequences are sparse or lacking, they generally show inferior performance compared with database methods under most conditions. Moreover, our study confirms that integrating multiple database-based methods does, in fact, enhance classification accuracy. These findings shed new light on the taxonomic classification of high-throughput sequencing data and bear substantial implications for the future development of computational biology. For those interested in further exploring our methods, the source code of this study is publicly available on https://github.com/LoadStar822/Genome-Classifier-Performance-Evaluator. Additionally, a dedicated webpage showcasing our collected database, data sets, and various classification software can be found at http://lab.malab.cn/~tqz/project/taxonomic/.
Collapse
Affiliation(s)
- Qinzhong Tian
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003 China
| | - Pinglu Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003 China
| | - Yixiao Zhai
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003 China
| | - Yansu Wang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003 China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003 China
| |
Collapse
|
34
|
Peres da Silva R, Suphavilai C, Nagarajan N. MetageNN: a memory-efficient neural network taxonomic classifier robust to sequencing errors and missing genomes. BMC Bioinformatics 2024; 25:153. [PMID: 38627615 PMCID: PMC11022314 DOI: 10.1186/s12859-024-05760-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 03/22/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND With the rapid increase in throughput of long-read sequencing technologies, recent studies have explored their potential for taxonomic classification by using alignment-based approaches to reduce the impact of higher sequencing error rates. While alignment-based methods are generally slower, k-mer-based taxonomic classifiers can overcome this limitation, potentially at the expense of lower sensitivity for strains and species that are not in the database. RESULTS We present MetageNN, a memory-efficient long-read taxonomic classifier that is robust to sequencing errors and missing genomes. MetageNN is a neural network model that uses short k-mer profiles of sequences to reduce the impact of distribution shifts on error-prone long reads. Benchmarking MetageNN against other machine learning approaches for taxonomic classification (GeNet) showed substantial improvements with long-read data (20% improvement in F1 score). By utilizing nanopore sequencing data, MetageNN exhibits improved sensitivity in situations where the reference database is incomplete. It surpasses the alignment-based MetaMaps and MEGAN-LR, as well as the k-mer-based Kraken2 tools, with improvements of 100%, 36%, and 23% respectively at the read-level analysis. Notably, at the community level, MetageNN consistently demonstrated higher sensitivities than the previously mentioned tools. Furthermore, MetageNN requires < 1/4th of the database storage used by Kraken2, MEGAN-LR and MMseqs2 and is > 7× faster than MetaMaps and GeNet and > 2× faster than MEGAN-LR and MMseqs2. CONCLUSION This proof of concept work demonstrates the utility of machine-learning-based methods for taxonomic classification using long reads. MetageNN can be used on sequences not classified by conventional methods and offers an alternative approach for memory-efficient classifiers that can be optimized further.
Collapse
Affiliation(s)
- Rafael Peres da Silva
- School of Computing, National University of Singapore, Singapore, 117417, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), Singapore, 138672, Republic of Singapore.
| | - Chayaporn Suphavilai
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), Singapore, 138672, Republic of Singapore
| | - Niranjan Nagarajan
- School of Computing, National University of Singapore, Singapore, 117417, Republic of Singapore.
- Agency for Science, Technology and Research (A*STAR), Genome Institute of Singapore (GIS), Singapore, 138672, Republic of Singapore.
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 119228, Republic of Singapore.
| |
Collapse
|
35
|
Gand M, Navickaite I, Bartsch LJ, Grützke J, Overballe-Petersen S, Rasmussen A, Otani S, Michelacci V, Matamoros BR, González-Zorn B, Brouwer MSM, Di Marcantonio L, Bloemen B, Vanneste K, Roosens NHCJ, AbuOun M, De Keersmaecker SCJ. Towards facilitated interpretation of shotgun metagenomics long-read sequencing data analyzed with KMA for the detection of bacterial pathogens and their antimicrobial resistance genes. Front Microbiol 2024; 15:1336532. [PMID: 38659981 PMCID: PMC11042533 DOI: 10.3389/fmicb.2024.1336532] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Accepted: 02/29/2024] [Indexed: 04/26/2024] Open
Abstract
Metagenomic sequencing is a promising method that has the potential to revolutionize the world of pathogen detection and antimicrobial resistance (AMR) surveillance in food-producing environments. However, the analysis of the huge amount of data obtained requires performant bioinformatics tools and databases, with intuitive and straightforward interpretation. In this study, based on long-read metagenomics data of chicken fecal samples with a spike-in mock community, we proposed confidence levels for taxonomic identification and AMR gene detection, with interpretation guidelines, to help with the analysis of the output data generated by KMA, a popular k-mer read alignment tool. Additionally, we demonstrated that the completeness and diversity of the genomes present in the reference databases are key parameters for accurate and easy interpretation of the sequencing data. Finally, we explored whether KMA, in a two-step procedure, can be used to link the detected AMR genes to their bacterial host chromosome, both detected within the same long-reads. The confidence levels were successfully tested on 28 metagenomics datasets which were obtained with sequencing of real and spiked samples from fecal (chicken, pig, and buffalo) or food (minced beef and food enzyme products) origin. The methodology proposed in this study will facilitate the analysis of metagenomics sequencing datasets for KMA users. Ultimately, this will contribute to improvements in the rapid diagnosis and surveillance of pathogens and AMR genes in food-producing environments, as prioritized by the EU.
Collapse
Affiliation(s)
- Mathieu Gand
- Transversal Activities in Applied Genomics, Sciensano, Brussels, Belgium
| | - Indre Navickaite
- Department of Bacteriology, Animal and Plant Health Agency, Weybridge, United Kingdom
| | - Lee-Julia Bartsch
- Department of Biological Safety, German Federal Institute for Risk Assessment, Berlin, Germany
| | - Josephine Grützke
- Department of Biological Safety, German Federal Institute for Risk Assessment, Berlin, Germany
| | | | - Astrid Rasmussen
- Bacterial Reference Center, Statens Serum Institute, Copenhagen, Denmark
| | - Saria Otani
- National Food Institute, Technical University of Denmark, Kongens Lyngby, Denmark
| | - Valeria Michelacci
- Department of Food Safety, Nutrition and Veterinary Public Health, Istituto Superiore di Sanità, Rome, Italy
| | | | - Bruno González-Zorn
- Department of Animal Health, Complutense University of Madrid, Madrid, Spain
| | - Michael S. M. Brouwer
- Wageningen Bioveterinary Research Part of Wageningen University and Research, Lelystad, Netherlands
| | - Lisa Di Marcantonio
- Istituto Zooprofilattico Sperimentale dell’Abruzzo e del Molise “G. Caporale”, Teramo, Italy
| | - Bram Bloemen
- Transversal Activities in Applied Genomics, Sciensano, Brussels, Belgium
| | - Kevin Vanneste
- Transversal Activities in Applied Genomics, Sciensano, Brussels, Belgium
| | | | - Manal AbuOun
- Department of Bacteriology, Animal and Plant Health Agency, Weybridge, United Kingdom
| | | |
Collapse
|
36
|
Edwin NR, Fitzpatrick AH, Brennan F, Abram F, O'Sullivan O. An in-depth evaluation of metagenomic classifiers for soil microbiomes. ENVIRONMENTAL MICROBIOME 2024; 19:19. [PMID: 38549112 PMCID: PMC10979606 DOI: 10.1186/s40793-024-00561-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Accepted: 03/11/2024] [Indexed: 04/01/2024]
Abstract
BACKGROUND Recent endeavours in metagenomics, exemplified by projects such as the human microbiome project and TARA Oceans, have illuminated the complexities of microbial biomes. A robust bioinformatic pipeline and meticulous evaluation of their methodology have contributed to the success of these projects. The soil environment, however, with its unique challenges, requires a specialized methodological exploration to maximize microbial insights. A notable limitation in soil microbiome studies is the dearth of soil-specific reference databases available to classifiers that emulate the complexity of soil communities. There is also a lack of in-vitro mock communities derived from soil strains that can be assessed for taxonomic classification accuracy. RESULTS In this study, we generated a custom in-silico mock community containing microbial genomes commonly observed in the soil microbiome. Using this mock community, we simulated shotgun sequencing data to evaluate the performance of three leading metagenomic classifiers: Kraken2 (supplemented with Bracken, using a custom database derived from GTDB-TK genomes along with its own default database), Kaiju, and MetaPhlAn, utilizing their respective default databases for a robust analysis. Our results highlight the importance of optimizing taxonomic classification parameters, database selection, as well as analysing trimmed reads and contigs. Our study showed that classifiers tailored to the specific taxa present in our samples led to fewer errors compared to broader databases including microbial eukaryotes, protozoa, or human genomes, highlighting the effectiveness of targeted taxonomic classification. Notably, an optimal classifier performance was achieved when applying a relative abundance threshold of 0.001% or 0.005%. The Kraken2 supplemented with bracken, with a custom database demonstrated superior precision, sensitivity, F1 score, and overall sequence classification. Using a custom database, this classifier classified 99% of in-silico reads and 58% of real-world soil shotgun reads, with the latter identifying previously overlooked phyla using a custom database. CONCLUSION This study underscores the potential advantages of in-silico methodological optimization in metagenomic analyses, especially when deciphering the complexities of soil microbiomes. We demonstrate that the choice of classifier and database significantly impacts microbial taxonomic profiling. Our findings suggest that employing Kraken2 with Bracken, coupled with a custom database of GTDB-TK genomes and fungal genomes at a relative abundance threshold of 0.001% provides optimal accuracy in soil shotgun metagenome analysis.
Collapse
Affiliation(s)
- Niranjana Rose Edwin
- Teagasc, Moorepark Food Research Centre, Moorepark, Fermoy, Cork, Ireland
- Functional Environmental Microbiology, School of Biological and Chemical Sciences, Ryan Institute, University of Galway, Galway, Ireland
- VistaMilk SFI Research Centre, Cork, Ireland
| | | | - Fiona Brennan
- Teagasc, Soils, Environment and Landuse Department, Johnstown Castle, Wexford, Ireland
- VistaMilk SFI Research Centre, Cork, Ireland
| | - Florence Abram
- Functional Environmental Microbiology, School of Biological and Chemical Sciences, Ryan Institute, University of Galway, Galway, Ireland
| | - Orla O'Sullivan
- Teagasc, Moorepark Food Research Centre, Moorepark, Fermoy, Cork, Ireland.
- VistaMilk SFI Research Centre, Cork, Ireland.
| |
Collapse
|
37
|
Buytaers FE, Verhaegen B, Van Nieuwenhuysen T, Roosens NHC, Vanneste K, Marchal K, De Keersmaecker SCJ. Strain-level characterization of foodborne pathogens without culture enrichment for outbreak investigation using shotgun metagenomics facilitated with nanopore adaptive sampling. Front Microbiol 2024; 15:1330814. [PMID: 38495515 PMCID: PMC10940517 DOI: 10.3389/fmicb.2024.1330814] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2023] [Accepted: 02/12/2024] [Indexed: 03/19/2024] Open
Abstract
Introduction Shotgun metagenomics has previously proven effective in the investigation of foodborne outbreaks by providing rapid and comprehensive insights into the microbial contaminant. However, culture enrichment of the sample has remained a prerequisite, despite the potential impact on pathogen detection resulting from the growth competition. To circumvent the need for culture enrichment, we explored the use of adaptive sampling using various databases for a targeted nanopore sequencing, compared to shotgun metagenomics alone. Methods The adaptive sampling method was first tested on DNA of mashed potatoes mixed with DNA of a Staphylococcus aureus strain previously associated with a foodborne outbreak. The selective sequencing was used to either deplete the potato sequencing reads or enrich for the pathogen sequencing reads, and compared to a shotgun sequencing. Then, living S. aureus were spiked at 105 CFU into 25 g of mashed potatoes. Three DNA extraction kits were tested, in combination with enrichment using adaptive sampling, following whole genome amplification. After data analysis, the possibility to characterize the contaminant with the different sequencing and extraction methods, without culture enrichment, was assessed. Results Overall, the adaptive sampling outperformed the shotgun sequencing. While the use of a host removal DNA extraction kit and targeted sequencing using a database of foodborne pathogens allowed rapid detection of the pathogen, the most complete characterization was achieved when using solely a database of S. aureus combined with a conventional DNA extraction kit, enabling accurate placement of the strain on a phylogenetic tree alongside outbreak cases. Discussion This method shows great potential for strain-level analysis of foodborne outbreaks without the need for culture enrichment, thereby enabling faster investigations and facilitating precise pathogen characterization. The integration of adaptive sampling with metagenomics presents a valuable strategy for more efficient and targeted analysis of microbial communities in foodborne outbreaks, contributing to improved food safety and public health.
Collapse
Affiliation(s)
- Florence E. Buytaers
- Transversal activities in Applied Genomics, Sciensano, Brussels, Belgium
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
| | - Bavo Verhaegen
- National Reference Laboratory for Foodborne Outbreaks (NRL-FBO) and for Coagulase Positive Staphylococci (NRL-CPS), Foodborne Pathogens, Sciensano, Brussels, Belgium
| | - Tom Van Nieuwenhuysen
- National Reference Laboratory for Foodborne Outbreaks (NRL-FBO) and for Coagulase Positive Staphylococci (NRL-CPS), Foodborne Pathogens, Sciensano, Brussels, Belgium
| | | | - Kevin Vanneste
- Transversal activities in Applied Genomics, Sciensano, Brussels, Belgium
| | - Kathleen Marchal
- Department of Plant Biotechnology and Bioinformatics, Ghent University, Ghent, Belgium
- Department of Information Technology, IDlab, IMEC, Ghent University, Ghent, Belgium
| | | |
Collapse
|
38
|
Galli BD, Nikoloudaki O, Granehäll L, Carafa I, Pozza M, De Marchi M, Gobbetti M, Di Cagno R. Comparative analysis of microbial succession and proteolysis focusing on amino acid pathways in Asiago-PDO cheese from two dairies. Int J Food Microbiol 2024; 411:110548. [PMID: 38154252 DOI: 10.1016/j.ijfoodmicro.2023.110548] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 11/18/2023] [Accepted: 12/17/2023] [Indexed: 12/30/2023]
Abstract
In this study, a comprehensive and comparative analysis was conducted on Italian Asiago-PDO cheese obtained from two different dairies named Dairy I and Dairy II using industrial and natural fermented milk, respectively. The analysis encompassed the evaluation of chemical composition, the succession of the microbiota during manufacture and ripening, and proteolysis mainly focusing on free individual amino acid (FAA) profiles. A metagenomic approach was used to investigate the cheese microbiome functionality. Differences in gross chemical composition were more evident during ripening, with Dairy II showing higher variability within batches. The microbiota varied significantly between the two dairies and ripening stages. The choice of starter culture shaped the microbiota during production and affected the microbial diversity of non-starter lactic acid bacteria (NSLAB) originated from the raw milk during ripening. Peptide chromatographic profiles and FAA concentrations increased as ripening progressed, with Dairy I showing higher production of FAA. Functional analysis of the metagenomes linked species to specific amino acid metabolism/catabolism pathways. The amino acid metabolism pathways, particularly those related to aromatic amino acids, lysine, and branched-chain amino acids, were affected by the presence of specific NSLAB species, which differed between the two dairies. The results obtained in this study reveal the impact of starter culture on peculiar cheese microbiota assemblies, which selectively targets amino acid pathways, providing insights into the potential flavor and aroma characteristics of Asiago-PDO cheese.
Collapse
Affiliation(s)
- Bruno Domingues Galli
- Free University of Bozen-Bolzano, Faculty of Agricultural, Environmental and Food Sciences, Piazza Università 1, 39100 Bolzano, BZ, Italy
| | - Olga Nikoloudaki
- Free University of Bozen-Bolzano, Faculty of Agricultural, Environmental and Food Sciences, Piazza Università 1, 39100 Bolzano, BZ, Italy.
| | - Lena Granehäll
- Free University of Bozen-Bolzano, Faculty of Agricultural, Environmental and Food Sciences, Piazza Università 1, 39100 Bolzano, BZ, Italy.
| | - Ilaria Carafa
- Free University of Bozen-Bolzano, Faculty of Agricultural, Environmental and Food Sciences, Piazza Università 1, 39100 Bolzano, BZ, Italy
| | - Marta Pozza
- University of Padova, Department of Agronomy, Food, Natural resources, Animals and Environment, Viale dell'Università 16, 35020 Legnaro, PD, Italy.
| | - Massimo De Marchi
- University of Padova, Department of Agronomy, Food, Natural resources, Animals and Environment, Viale dell'Università 16, 35020 Legnaro, PD, Italy.
| | - Marco Gobbetti
- Free University of Bozen-Bolzano, Faculty of Agricultural, Environmental and Food Sciences, Piazza Università 1, 39100 Bolzano, BZ, Italy.
| | - Raffaella Di Cagno
- Free University of Bozen-Bolzano, Faculty of Agricultural, Environmental and Food Sciences, Piazza Università 1, 39100 Bolzano, BZ, Italy.
| |
Collapse
|
39
|
Kim C, Pongpanich M, Porntaveetus T. Unraveling metagenomics through long-read sequencing: a comprehensive review. J Transl Med 2024; 22:111. [PMID: 38282030 PMCID: PMC10823668 DOI: 10.1186/s12967-024-04917-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 01/21/2024] [Indexed: 01/30/2024] Open
Abstract
The study of microbial communities has undergone significant advancements, starting from the initial use of 16S rRNA sequencing to the adoption of shotgun metagenomics. However, a new era has emerged with the advent of long-read sequencing (LRS), which offers substantial improvements over its predecessor, short-read sequencing (SRS). LRS produces reads that are several kilobases long, enabling researchers to obtain more complete and contiguous genomic information, characterize structural variations, and study epigenetic modifications. The current leaders in LRS technologies are Pacific Biotechnologies (PacBio) and Oxford Nanopore Technologies (ONT), each offering a distinct set of advantages. This review covers the workflow of long-read metagenomics sequencing, including sample preparation (sample collection, sample extraction, and library preparation), sequencing, processing (quality control, assembly, and binning), and analysis (taxonomic annotation and functional annotation). Each section provides a concise outline of the key concept of the methodology, presenting the original concept as well as how it is challenged or modified in the context of LRS. Additionally, the section introduces a range of tools that are compatible with LRS and can be utilized to execute the LRS process. This review aims to present the workflow of metagenomics, highlight the transformative impact of LRS, and provide researchers with a selection of tools suitable for this task.
Collapse
Affiliation(s)
- Chankyung Kim
- Center of Excellence in Genomics and Precision Dentistry, Department of Physiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand
- Graduate Program in Bioinformatics and Computational Biology, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
| | - Monnat Pongpanich
- Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand
- Center of Excellence for Cancer and Inflammation, Chulalongkorn University, Bangkok, Thailand
| | - Thantrira Porntaveetus
- Center of Excellence in Genomics and Precision Dentistry, Department of Physiology, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand.
- Graduate Program in Geriatric and Special Patients Care, Faculty of Dentistry, Chulalongkorn University, Bangkok, Thailand.
| |
Collapse
|
40
|
Valencia EM, Maki KA, Dootz JN, Barb JJ. Mock community taxonomic classification performance of publicly available shotgun metagenomics pipelines. Sci Data 2024; 11:81. [PMID: 38233447 PMCID: PMC10794705 DOI: 10.1038/s41597-023-02877-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 12/22/2023] [Indexed: 01/19/2024] Open
Abstract
Shotgun metagenomic sequencing comprehensively samples the DNA of a microbial sample. Choosing the best bioinformatics processing package can be daunting due to the wide variety of tools available. Here, we assessed publicly available shotgun metagenomics processing packages/pipelines including bioBakery, Just a Microbiology System (JAMS), Whole metaGenome Sequence Assembly V2 (WGSA2), and Woltka using 19 publicly available mock community samples and a set of five constructed pathogenic gut microbiome samples. Also included is a workflow for labelling bacterial scientific names with NCBI taxonomy identifiers for better resolution in assessing results. The Aitchison distance, a sensitivity metric, and total False Positive Relative Abundance were used for accuracy assessments for all pipelines and mock samples. Overall, bioBakery4 performed the best with most of the accuracy metrics, while JAMS and WGSA2, had the highest sensitivities. Furthermore, bioBakery is commonly used and only requires a basic knowledge of command line usage. This work provides an unbiased assessment of shotgun metagenomics packages and presents results assessing the performance of the packages using mock community sequence data.
Collapse
Affiliation(s)
- E Michael Valencia
- Translational Biobehavioral and Health Disparities Branch, National Institutes of Health Clinical Center, Bethesda, MD, 20814, USA
| | - Katherine A Maki
- Translational Biobehavioral and Health Disparities Branch, National Institutes of Health Clinical Center, Bethesda, MD, 20814, USA
| | - Jennifer N Dootz
- Biosystems and Biomaterials Division, National Institute of Standards and Technology, Gaithersburg, MD, 20899, USA
| | - Jennifer J Barb
- Translational Biobehavioral and Health Disparities Branch, National Institutes of Health Clinical Center, Bethesda, MD, 20814, USA.
| |
Collapse
|
41
|
Marić J, Križanović K, Riondet S, Nagarajan N, Šikić M. Comparative analysis of metagenomic classifiers for long-read sequencing datasets. BMC Bioinformatics 2024; 25:15. [PMID: 38212694 PMCID: PMC10782538 DOI: 10.1186/s12859-024-05634-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2023] [Accepted: 01/02/2024] [Indexed: 01/13/2024] Open
Abstract
BACKGROUND Long reads have gained popularity in the analysis of metagenomics data. Therefore, we comprehensively assessed metagenomics classification tools on the species taxonomic level. We analysed kmer-based tools, mapping-based tools and two general-purpose long reads mappers. We evaluated more than 20 pipelines which use either nucleotide or protein databases and selected 13 for an extensive benchmark. We prepared seven synthetic datasets to test various scenarios, including the presence of a host, unknown species and related species. Moreover, we used available sequencing data from three well-defined mock communities, including a dataset with abundance varying from 0.0001 to 20% and six real gut microbiomes. RESULTS General-purpose mappers Minimap2 and Ram achieved similar or better accuracy on most testing metrics than best-performing classification tools. They were up to ten times slower than the fastest kmer-based tools requiring up to four times less RAM. All tested tools were prone to report organisms not present in datasets, except CLARK-S, and they underperformed in the case of the high presence of the host's genetic material. Tools which use a protein database performed worse than those based on a nucleotide database. Longer read lengths made classification easier, but due to the difference in read length distributions among species, the usage of only the longest reads reduced the accuracy. The comparison of real gut microbiome datasets shows a similar abundance profiles for the same type of tools but discordance in the number of reported organisms and abundances between types. Most assessments showed the influence of database completeness on the reports. CONCLUSION The findings indicate that kmer-based tools are well-suited for rapid analysis of long reads data. However, when heightened accuracy is essential, mappers demonstrate slightly superior performance, albeit at a considerably slower pace. Nevertheless, a combination of diverse categories of tools and databases will likely be necessary to analyse complex samples. Discrepancies observed among tools when applied to real gut datasets, as well as a reduced performance in cases where unknown species or a significant proportion of the host genome is present in the sample, highlight the need for continuous improvement of existing tools. Additionally, regular updates and curation of databases are important to ensure their effectiveness.
Collapse
Affiliation(s)
- Josip Marić
- Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000, Zagreb, Croatia
| | - Krešimir Križanović
- Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000, Zagreb, Croatia
| | - Sylvain Riondet
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117596, Republic of Singapore
| | - Niranjan Nagarajan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore.
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117596, Republic of Singapore.
| | - Mile Šikić
- Faculty of Electrical Engineering and Computing, University of Zagreb, Unska 3, 10000, Zagreb, Croatia.
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Genome, Singapore, 138672, Republic of Singapore.
| |
Collapse
|
42
|
Valerio F, Twort VG, Duplouy A. Screening Host Genomic Data for Wolbachia Infections. Methods Mol Biol 2024; 2739:251-274. [PMID: 38006557 DOI: 10.1007/978-1-0716-3553-7_16] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2023]
Abstract
Less than a decade ago, the production of Wolbachia genomic assemblies was tedious, time-consuming, and expensive. The production of Wolbachia genomic DNA free of contamination from host DNA, as required for Wolbachia-targeted sequencing, was then only possible after the amplification and extraction of a large amount of clonal Wolbachia DNA. However, as an endosymbiotic bacterium, Wolbachia does not grow outside the host cell environment, and large-scale recovery of the bacteria required mass rearing of their host, preferably clones of a single individual to avoid strain genetic diversity, or amplification of cell cultures infected with a single Wolbachia strain. Bacterial DNA could be separated from host DNA based on genomic size. Nowadays, the production of full Wolbachia genomes does not require the physical isolation of the bacterial strains from their respective hosts, and the bacterium is often sequenced as a by-catch of host genomic projects. Here, we provide a step-by-step protocol to (1) identify whether host genome projects contain reads from associated Wolbachia and (2) isolate/retrieve the Wolbachia reads from the rest of the sequenced material. We hope this simple protocol will support many projects aiming at studying diverse Wolbachia genome assemblies.
Collapse
Affiliation(s)
- Federica Valerio
- Insect Symbiosis Ecology and Evolution, Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland
- Research Centre for Ecological Changes, University of Helsinki, Helsinki, Finland
| | - Victoria G Twort
- The Finnish Museum of Natural History, Luomus, University of Helsinki, Helsinki, Finland
| | - Anne Duplouy
- Insect Symbiosis Ecology and Evolution, Organismal and Evolutionary Biology Research Program, Faculty of Biological and Environmental Sciences, University of Helsinki, Helsinki, Finland.
- Research Centre for Ecological Changes, University of Helsinki, Helsinki, Finland.
| |
Collapse
|
43
|
Gand M, Bloemen B, Vanneste K, Roosens NHC, De Keersmaecker SCJ. Comparison of 6 DNA extraction methods for isolation of high yield of high molecular weight DNA suitable for shotgun metagenomics Nanopore sequencing to detect bacteria. BMC Genomics 2023; 24:438. [PMID: 37537550 PMCID: PMC10401787 DOI: 10.1186/s12864-023-09537-5] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 07/27/2023] [Indexed: 08/05/2023] Open
Abstract
BACKGROUND Oxford Nanopore Technologies (ONT) offers an accessible platform for long-read sequencing, which improves the reconstruction of genomes and helps to resolve complex genomic contexts, especially in the case of metagenome analysis. To take the best advantage of long-read sequencing, DNA extraction methods must be able to isolate pure high molecular weight (HMW) DNA from complex metagenomics samples, without introducing any bias. New methods released on the market, and protocols developed at the research level, were specifically designed for this application and need to be assessed. RESULTS In this study, with different bacterial cocktail mixes, analyzed as pure or spiked in a synthetic fecal matrix, we evaluated the performances of 6 DNA extraction methods using various cells lysis and purification techniques, from quick and easy, to more time-consuming and gentle protocols, including a portable method for on-site application. In addition to the comparison of the quality, quantity and purity of the extracted DNA, the performance obtained when doing Nanopore sequencing on a MinION flow cell was also tested. From the obtained results, the Quick-DNA HMW MagBead Kit (Zymo Research) was selected as producing the best yield of pure HMW DNA. Furthermore, this kit allowed an accurate detection, by Nanopore sequencing, of almost all the bacterial species present in a complex mock community. CONCLUSION Amongst the 6 tested methods, the Quick-DNA HMW MagBead Kit (Zymo Research) was considered as the most suitable for Nanopore sequencing and would be recommended for bacterial metagenomics studies using this technology.
Collapse
Affiliation(s)
- Mathieu Gand
- Transversal Activities in Applied Genomics, Sciensano, Rue Juliette Wytsman 14, 1050, Brussels, Belgium
| | - Bram Bloemen
- Transversal Activities in Applied Genomics, Sciensano, Rue Juliette Wytsman 14, 1050, Brussels, Belgium
| | - Kevin Vanneste
- Transversal Activities in Applied Genomics, Sciensano, Rue Juliette Wytsman 14, 1050, Brussels, Belgium
| | - Nancy H C Roosens
- Transversal Activities in Applied Genomics, Sciensano, Rue Juliette Wytsman 14, 1050, Brussels, Belgium
| | - Sigrid C J De Keersmaecker
- Transversal Activities in Applied Genomics, Sciensano, Rue Juliette Wytsman 14, 1050, Brussels, Belgium.
| |
Collapse
|
44
|
Notario E, Visci G, Fosso B, Gissi C, Tanaskovic N, Rescigno M, Marzano M, Pesole G. Amplicon-Based Microbiome Profiling: From Second- to Third-Generation Sequencing for Higher Taxonomic Resolution. Genes (Basel) 2023; 14:1567. [PMID: 37628619 PMCID: PMC10454624 DOI: 10.3390/genes14081567] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 07/25/2023] [Accepted: 07/27/2023] [Indexed: 08/27/2023] Open
Abstract
The 16S rRNA amplicon-based sequencing approach represents the most common and cost-effective strategy with great potential for microbiome profiling. The use of second-generation sequencing (NGS) technologies has led to protocols based on the amplification of one or a few hypervariable regions, impacting the outcome of the analysis. Nowadays, comparative studies are necessary to assess different amplicon-based approaches, including the full-locus sequencing currently feasible thanks to third-generation sequencing (TGS) technologies. This study compared three different methods to achieve the deepest microbiome taxonomic characterization: (a) the single-region approach, (b) the multiplex approach, covering several regions of the target gene/region, both based on NGS short reads, and (c) the full-length approach, which analyzes the whole length of the target gene thanks to TGS long reads. Analyses carried out on benchmark microbiome samples, with a known taxonomic composition, highlighted a different classification performance, strongly associated with the type of hypervariable regions and the coverage of the target gene. Indeed, the full-length approach showed the greatest discriminating power, up to species level, also on complex real samples. This study supports the transition from NGS to TGS for the study of the microbiome, even if experimental and bioinformatic improvements are still necessary.
Collapse
Affiliation(s)
- Elisabetta Notario
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, 70126 Bari, Italy; (E.N.); (B.F.); (C.G.)
| | - Grazia Visci
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Consiglio Nazionale delle Ricerche, 70126 Bari, Italy;
| | - Bruno Fosso
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, 70126 Bari, Italy; (E.N.); (B.F.); (C.G.)
| | - Carmela Gissi
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, 70126 Bari, Italy; (E.N.); (B.F.); (C.G.)
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Consiglio Nazionale delle Ricerche, 70126 Bari, Italy;
- CoNISMa, Consorzio Nazionale Interuniversitario per le Scienze del Mare, 00196 Roma, Italy
| | | | - Maria Rescigno
- IRCCS Humanitas Research Hospital, 20089 Rozzano, Italy;
- Department of Biomedical Sciences, Humanitas University, 20072 Pieve Emanuele, Italy
| | - Marinella Marzano
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Consiglio Nazionale delle Ricerche, 70126 Bari, Italy;
| | - Graziano Pesole
- Department of Biosciences, Biotechnology and Environment, University of Bari Aldo Moro, 70126 Bari, Italy; (E.N.); (B.F.); (C.G.)
- Institute of Biomembranes, Bioenergetics and Molecular Biotechnologies, Consiglio Nazionale delle Ricerche, 70126 Bari, Italy;
- Consorzio Interuniversitario Biotecnologie, 34148 Trieste, Italy
| |
Collapse
|