1
|
Nori SRC, Walsh CJ, McAuliffe FM, Moore RL, Van Sinderen D, Feehily C, Cotter PD. Strain-level variation among vaginal Lactobacillus crispatus and Lactobacillus iners as identified by comparative metagenomics. NPJ Biofilms Microbiomes 2025; 11:49. [PMID: 40122890 PMCID: PMC11930926 DOI: 10.1038/s41522-025-00682-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2024] [Accepted: 03/09/2025] [Indexed: 03/25/2025] Open
Abstract
The vaginal microbiome, a relatively simple, low diversity ecosystem crucial for female health, is often dominated by Lactobacillus spp. Detailed strain-level data, facilitated by shotgun sequencing, can provide a greater understanding of the mechanisms of colonization and host-microbe interactions. We analysed 354 vaginal metagenomes from pregnant women in Ireland to investigate metagenomic community state types and strain-level variation, focusing on cell surface interfaces. Our analysis revealed multiple subspecies, with Lactobacillus crispatus and Lactobacillus iners being the most dominant. We found genes, including putative mucin-binding genes, distinct to L. crispatus subspecies. Using 337 metagenome-assembled genomes, we observed a higher number of strain-specific genes in L. crispatus related to cell wall biogenesis, carbohydrate and amino acid metabolism, many under positive selection. A cell surface glycan gene cluster was predominantly found in L. crispatus but absent in L. iners and Gardnerella vaginalis. These findings highlight strain-specific factors associated with colonisation and host-microbe interactions.
Collapse
Affiliation(s)
- Sai Ravi Chandra Nori
- Teagasc Food Research Centre, Fermoy, Co, Cork, Ireland
- APC Microbiome Ireland, National University of Ireland, Cork, Ireland
- School of Microbiology, University College Cork, Cork, Ireland
- SFI Centre for Research Training in Genomics Data Science, School of Mathematics, Statistics & Applied Mathematics, University of Galway, Galway, Ireland
| | - Calum J Walsh
- The Centre for Pathogen Genomics, Department of Microbiology & Immunology, Peter Doherty Institute for Infection & Immunity, University of Melbourne, Melbourne, Australia
| | - Fionnuala M McAuliffe
- UCD Perinatal Research Centre, School of Medicine, University College Dublin, National Maternity Hospital, Dublin, Ireland
| | - Rebecca L Moore
- UCD Perinatal Research Centre, School of Medicine, University College Dublin, National Maternity Hospital, Dublin, Ireland
| | - Douwe Van Sinderen
- APC Microbiome Ireland, National University of Ireland, Cork, Ireland
- School of Microbiology, University College Cork, Cork, Ireland
| | - Conor Feehily
- School of Infection and Immunity, University of Glasgow, Glasgow, G12 8TA, United Kingdom.
| | - Paul D Cotter
- Teagasc Food Research Centre, Fermoy, Co, Cork, Ireland.
- APC Microbiome Ireland, National University of Ireland, Cork, Ireland.
- School of Microbiology, University College Cork, Cork, Ireland.
| |
Collapse
|
2
|
Sirasani JP, Gardner C, Jung G, Lee H, Ahn TH. Bioinformatic approaches to blood and tissue microbiome analyses: challenges and perspectives. Brief Bioinform 2025; 26:bbaf176. [PMID: 40269515 PMCID: PMC12018304 DOI: 10.1093/bib/bbaf176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2024] [Revised: 03/05/2025] [Accepted: 03/25/2025] [Indexed: 04/25/2025] Open
Abstract
Advances in next-generation sequencing have resulted in a growing understanding of the microbiome and its role in human health. Unlike traditional microbiome analysis, blood and tissue microbiome analyses focus on the detection and characterization of microbial DNA in blood and tissue, previously considered a sterile environment. In this review, we discuss the challenges and methodologies associated with analyzing these samples, particularly emphasizing blood and tissue microbiome research. Key preprocessing steps-including the removal of ribosomal RNA, host DNA, and other contaminants-are critical to reducing noise and accurately capturing microbial evidence. We also explore how taxonomic profiling tools, machine learning, and advanced normalization techniques address contamination and low microbial biomass, thereby improving reliability. While it offers the potential for identifying microbial involvement in systemic diseases previously undetectable by traditional methods, this methodology also carries risks and lacks universal acceptance due to concerns over reliability and interpretation errors. This paper critically reviews these factors, highlighting both the promise and pitfalls of using blood and tissue microbiome analyses as a tool for biomarker discovery.
Collapse
Affiliation(s)
- Jammi Prasanthi Sirasani
- Program of Bioinformatics and Computational Biology, Saint Louis University, St. Louis, MO, United States
| | - Cory Gardner
- Department of Computer Science, Saint Louis University, St. Louis, MO, United States
| | - Gihwan Jung
- Department of Computer Science, Saint Louis University, St. Louis, MO, United States
| | - Hyunju Lee
- AI Graduate School, Gwangju Institute of Science and Technology, Gwangju 61005, South Korea
| | - Tae-Hyuk Ahn
- Program of Bioinformatics and Computational Biology, Saint Louis University, St. Louis, MO, United States
- Department of Computer Science, Saint Louis University, St. Louis, MO, United States
| |
Collapse
|
3
|
Büttner KA, Bregy V, Wegner F, Purushothaman S, Imkamp F, Roloff Handschin T, Puolakkainen MH, Hiltunen-Back E, Braun D, Kisakesen I, Schreiber A, Entrocassi AC, Gallo Vaulet ML, López Aquino D, Svidler López L, La Rosa L, Egli A, Rodríguez Fermepin M, Seth-Smith HM, On Behalf Of The Escmid Study Group For Mycoplasma And Chlamydia Infections Esgmac. Evaluating methods for genome sequencing of Chlamydia trachomatis and other sexually transmitted bacteria directly from clinical swabs. Microb Genom 2025; 11. [PMID: 39943872 DOI: 10.1099/mgen.0.001353] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/09/2025] Open
Abstract
Rates of bacterial sexually transmitted infections (STIs) are rising, and accessing their genomes provides information on strain evolution, circulating strains and encoded antimicrobial resistance (AMR). Notable pathogens include Chlamydia trachomatis (CT), Neisseria gonorrhoeae (NG) and Treponema pallidum (TP), globally the most common bacterial STIs. Mycoplasmoides (formerly Mycoplasma) genitalium (MG) is also a bacterial STI that is of concern due to AMR development. These bacteria are also fastidious or hard to culture, and standard sampling methods lyse bacteria, completely preventing pathogen culture. Clinical samples contain large amounts of human and other microbiota DNA. These factors hinder the sequencing of bacterial STI genomes. We aimed to overcome these challenges in obtaining whole-genome sequences and evaluated four approaches using clinical samples from Argentina (39), and Switzerland (14), and cultured samples from Finland (2) and Argentina (1). First, direct genome sequencing from swab samples was attempted through Illumina deep metagenomic sequencing, showing extremely low levels of target DNA, with under 0.01% of the sequenced reads being from the target pathogens. Second, host DNA depletion followed by Illumina sequencing was not found to produce enrichment in these very low-load samples. Third, we tried a selective long-read approach with the new adaptive sequencing from Oxford Nanopore Technologies, which also did not improve enrichment sufficiently to provide genomic information. Finally, target enrichment using a novel pan-genome set of custom SureSelect probes targeting CT, NG, TP and MG followed by Illumina sequencing was successful. We produced whole genomes from 64% of CT-positive samples, from 36% of NG-positive samples and 60% of TP-positive samples. Additionally, we enriched MG DNA to gain partial genomes from 60% of samples. This is the first publication to date to utilize a pan-genome STI panel in target enrichment. Target enrichment, though costly, proved essential for obtaining genomic data from clinical samples. These data can be utilized to examine circulating strains and genotypic resistance and guide public health strategies.
Collapse
Affiliation(s)
- Karina Andrea Büttner
- Universidad de Buenos Aires, Facultad de Farmacia y Bioquímica, Departamento de Bioquímica Clínica, Cátedra de Microbiología Clínica, Buenos Aires, Argentina
- Universidad de Buenos Aires, Instituto de Fisiopatología y Bioquímica Clínica (INFIBIOC), Buenos Aires, Argentina
- Member of the ESCMID Study Group on Mycoplasma and Chlamydia (ESGMAC), Basel, Switzerland
| | - Vera Bregy
- Institute of Medical Microbiology, University of Zürich, Zurich, Switzerland
| | - Fanny Wegner
- Institute of Medical Microbiology, University of Zürich, Zurich, Switzerland
| | | | - Frank Imkamp
- Institute of Medical Microbiology, University of Zürich, Zurich, Switzerland
| | | | - Mirja H Puolakkainen
- Member of the ESCMID Study Group on Mycoplasma and Chlamydia (ESGMAC), Basel, Switzerland
- Department of Virology and Helsinki University Hospital, Helsinki, Finland
- Department of Virology and Immunology, University of Helsinki, Helsinki, Finland
| | - Eija Hiltunen-Back
- Department of Dermatology and Allergology, University of Helsinki and HUS Helsinki University Hospital, Helsinki, Finland
| | - Domnique Braun
- Department of Infectious Diseases, University Hospital Zürich, University of Zurich, Zürich, Switzerland
| | - Ibrahim Kisakesen
- Life Sciences and Diagnostic Group, Agilent Technologies France, Les Ulis, France
| | - Andreas Schreiber
- Life Sciences and Diagnostic Group, Agilent Technologies France, Les Ulis, France
| | - Andrea Carolina Entrocassi
- Universidad de Buenos Aires, Facultad de Farmacia y Bioquímica, Departamento de Bioquímica Clínica, Cátedra de Microbiología Clínica, Buenos Aires, Argentina
- Universidad de Buenos Aires, Instituto de Fisiopatología y Bioquímica Clínica (INFIBIOC), Buenos Aires, Argentina
| | - María Lucía Gallo Vaulet
- Universidad de Buenos Aires, Facultad de Farmacia y Bioquímica, Departamento de Bioquímica Clínica, Cátedra de Microbiología Clínica, Buenos Aires, Argentina
- Universidad de Buenos Aires, Instituto de Fisiopatología y Bioquímica Clínica (INFIBIOC), Buenos Aires, Argentina
| | | | | | - Luciana La Rosa
- Centro Privado de Cirugía y Coloproctología, Buenos Aires, Argentina
| | - Adrian Egli
- Institute of Medical Microbiology, University of Zürich, Zurich, Switzerland
| | - Marcelo Rodríguez Fermepin
- Universidad de Buenos Aires, Facultad de Farmacia y Bioquímica, Departamento de Bioquímica Clínica, Cátedra de Microbiología Clínica, Buenos Aires, Argentina
- Universidad de Buenos Aires, Instituto de Fisiopatología y Bioquímica Clínica (INFIBIOC), Buenos Aires, Argentina
- Member of the ESCMID Study Group on Mycoplasma and Chlamydia (ESGMAC), Basel, Switzerland
| | - Helena Mb Seth-Smith
- Member of the ESCMID Study Group on Mycoplasma and Chlamydia (ESGMAC), Basel, Switzerland
- Institute of Medical Microbiology, University of Zürich, Zurich, Switzerland
| | | |
Collapse
|
4
|
Büttner KA, Wegner F, Bregy V, Entrocassi AC, Gallo Vaulet ML, López Aquino D, La Rosa L, Svidler López L, Puolakkainen MH, Hiltunen-Back E, Imkamp F, Egli A, Seth-Smith HMB, Rodríguez Fermepin M, On Behalf Of The Escmid Study Group For Mycoplasma And Chlamydia Infections Esgmac. Chlamydia trachomatis genomes from rectal samples: description of a new clade comprising ompA-genotype L4 from Argentina. Microb Genom 2025; 11. [PMID: 39943870 DOI: 10.1099/mgen.0.001350] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/09/2025] Open
Abstract
Whole-genome analysis has provided insights into the evolution of Chlamydia trachomatis and, recently, into circulating strains that cause lymphogranuloma venereum (LGV). A large LGV outbreak of a new ompA-genotype, L2b, was first reported in Europe in the early 2000s, primarily affecting men who have sex with men (MSM), and then expanded globally. More recent work shows that this outbreak is diversifying into variants of described ompA-genotypes, with the same L2b genomic backbone. This study extends the investigation of LGV cases to Argentina and Finland. In 2017, an LGV outbreak was described in Argentina characterized by distinct genomic features shown by both ompA-genotyping and Multi-Locus Sequence Typing (MLST) analysis. We have obtained whole-genome sequences from cultured isolates and clinical samples via SureSelect (Agilent) target enrichment. Based on ompA and phylogenetic analyses, we describe further diversity within the ompA-genotype L2b clade, illustrating the transmission dynamics in Argentina and Finland. A key finding is that of a novel clade of Argentinian samples, characterized by a proposed new ompA-genotype L4. Additionally, we present the genome sequence of a non-LGV strain associated with anorectal proctitis. These findings contribute to the investigation of LGV evolution, particularly with the presence of the novel L4 lineage, and provide insights into genomic diversity and transmission dynamics of C. trachomatis.
Collapse
Affiliation(s)
- Karina Andrea Büttner
- Universidad de Buenos Aires, Facultad de Farmacia y Bioquímica, Departamento de Bioquímica Clínica, Cátedra de Microbiología Clínica, Buenos Aires, Argentina
- Universidad de Buenos Aires, Instituto de Fisiopatología y Bioquímica Clínica (INFIBIOC), Buenos Aires, Argentina
- Member of the ESCMID study Group on Mycoplasma and Chlamydia (ESGMAC)
| | - Fanny Wegner
- Institute of Medical Microbiology, University of Zürich, Zurich, Switzerland
| | - Vera Bregy
- Institute of Medical Microbiology, University of Zürich, Zurich, Switzerland
| | - Andrea Carolina Entrocassi
- Universidad de Buenos Aires, Facultad de Farmacia y Bioquímica, Departamento de Bioquímica Clínica, Cátedra de Microbiología Clínica, Buenos Aires, Argentina
- Universidad de Buenos Aires, Instituto de Fisiopatología y Bioquímica Clínica (INFIBIOC), Buenos Aires, Argentina
| | - María Lucía Gallo Vaulet
- Universidad de Buenos Aires, Facultad de Farmacia y Bioquímica, Departamento de Bioquímica Clínica, Cátedra de Microbiología Clínica, Buenos Aires, Argentina
- Universidad de Buenos Aires, Instituto de Fisiopatología y Bioquímica Clínica (INFIBIOC), Buenos Aires, Argentina
| | | | - Luciana La Rosa
- Centro Privado de Cirugía y Coloproctología, Buenos Aires, Argentina
| | | | - Mirja H Puolakkainen
- Member of the ESCMID study Group on Mycoplasma and Chlamydia (ESGMAC)
- University of Helsinki, Department of Virology and Helsinki University Hospital, Department of Virology and Immunology, Helsinki, Finland
| | - Eija Hiltunen-Back
- Department of Dermatology and Allergology, University of Helsinki and HUS Helsinki University Hospital, Helsinki, Finland
| | - Frank Imkamp
- Institute of Medical Microbiology, University of Zürich, Zurich, Switzerland
| | - Adrian Egli
- Institute of Medical Microbiology, University of Zürich, Zurich, Switzerland
| | - Helena M B Seth-Smith
- Member of the ESCMID study Group on Mycoplasma and Chlamydia (ESGMAC)
- Institute of Medical Microbiology, University of Zürich, Zurich, Switzerland
| | - Marcelo Rodríguez Fermepin
- Universidad de Buenos Aires, Facultad de Farmacia y Bioquímica, Departamento de Bioquímica Clínica, Cátedra de Microbiología Clínica, Buenos Aires, Argentina
- Universidad de Buenos Aires, Instituto de Fisiopatología y Bioquímica Clínica (INFIBIOC), Buenos Aires, Argentina
- Member of the ESCMID study Group on Mycoplasma and Chlamydia (ESGMAC)
| | | |
Collapse
|
5
|
Guccione C, Patel L, Tomofuji Y, McDonald D, Gonzalez A, Sepich-Poore GD, Sonehara K, Zakeri M, Chen Y, Dilmore AH, Damle N, Baranzini SE, Hightower G, Nakatsuji T, Gallo RL, Langmead B, Okada Y, Curtius K, Knight R. Incomplete human reference genomes can drive false sex biases and expose patient-identifying information in metagenomic data. Nat Commun 2025; 16:825. [PMID: 39827261 PMCID: PMC11742726 DOI: 10.1038/s41467-025-56077-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Accepted: 01/07/2025] [Indexed: 01/22/2025] Open
Abstract
As next-generation sequencing technologies produce deeper genome coverages at lower costs, there is a critical need for reliable computational host DNA removal in metagenomic data. We find that insufficient host filtration using prior human genome references can introduce false sex biases and inadvertently permit flow-through of host-specific DNA during bioinformatic analyses, which could be exploited for individual identification. To address these issues, we introduce and benchmark three host filtration methods of varying throughput, with concomitant applications across low biomass samples such as skin and high microbial biomass datasets including fecal samples. We find that these methods are important for obtaining accurate results in low biomass samples (e.g., tissue, skin). Overall, we demonstrate that rigorous host filtration is a key component of privacy-minded analyses of patient microbiomes and provide computationally efficient pipelines for accomplishing this task on large-scale datasets.
Collapse
Affiliation(s)
- Caitlin Guccione
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Lucas Patel
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA, USA
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Medical Scientist Training Program, University of California, San Diego, La Jolla, CA, USA
| | - Yoshihiko Tomofuji
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo, 113-8654, Japan
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, 565-0871, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
| | - Daniel McDonald
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Antonio Gonzalez
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Gregory D Sepich-Poore
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
- Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Kyuto Sonehara
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo, 113-8654, Japan
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, 565-0871, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
| | - Mohsen Zakeri
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Yang Chen
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Biomedical Sciences Graduate Program, University of California San Diego, La Jolla, CA, USA
- Department of Dermatology, University of California San Diego, La Jolla, CA, USA
| | - Amanda Hazel Dilmore
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Biomedical Sciences Graduate Program, University of California San Diego, La Jolla, CA, USA
| | - Neil Damle
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
- Department of Cognitive Science, University of California San Diego, La Jolla, CA, USA
| | - Sergio E Baranzini
- Weill Institute for Neurosciences. Department of Neurology. University of California, San Francisco (UCSF), San Francisco, CA, USA
| | - George Hightower
- Department of Dermatology, University of California San Diego, La Jolla, CA, USA
- Rady Children's Hospital, San Diego, CA, USA
| | - Teruaki Nakatsuji
- Department of Dermatology, University of California San Diego, La Jolla, CA, USA
| | - Richard L Gallo
- Department of Dermatology, University of California San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA
| | - Yukinori Okada
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo, 113-8654, Japan
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita, 565-0871, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita, 565-0871, Japan
- Premium Research Institute for Human Metaverse Medicine (WPI-PRIMe), Osaka University, Suita, 565-0871, Japan
| | - Kit Curtius
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA, USA.
- VA San Diego Healthcare System, San Diego, CA, USA.
- Moores Cancer Center, University of California San Diego, La Jolla, CA, USA.
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA.
- Shu Chien-Gene Lay Department of Bioengineering, University of California San Diego, La Jolla, CA, USA.
- Halıcıoğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA.
- Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA.
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
6
|
Gao Y, Luo H, Lyu H, Yang H, Yousuf S, Huang S, Liu YX. Benchmarking short-read metagenomics tools for removing host contamination. Gigascience 2025; 14:giaf004. [PMID: 40036691 PMCID: PMC11878760 DOI: 10.1093/gigascience/giaf004] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2024] [Revised: 10/31/2024] [Accepted: 01/09/2025] [Indexed: 03/06/2025] Open
Abstract
BACKGROUND The rapid evolution of metagenomic sequencing technology offers remarkable opportunities to explore the intricate roles of microbiome in host health and disease, as well as to uncover the unknown structure and functions of microbial communities. However, the swift accumulation of metagenomic data poses substantial challenges for data analysis. Contamination from host DNA can substantially compromise result accuracy and increase additional computational resources by including nontarget sequences. RESULTS In this study, we assessed the impact of computational host DNA decontamination on downstream analyses, highlighting its importance in producing accurate results efficiently. We also evaluated the performance of conventional tools like KneadData, Bowtie2, BWA, KMCP, Kraken2, and KrakenUniq, each offering unique advantages for different applications. Furthermore, we highlighted the importance of an accurate host reference genome, noting that its absence negatively affected the decontamination performance across all tools. CONCLUSIONS Our findings underscore the need for careful selection of decontamination tools and reference genomes to enhance the accuracy of metagenomic analyses. These insights provide valuable guidance for improving the reliability and reproducibility of microbiome research.
Collapse
Affiliation(s)
- Yunyun Gao
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Hao Luo
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Hujie Lyu
- Department of Life Sciences, Imperial College of London, London SW7 2AZ, UK
| | - Haifei Yang
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
- College of Life Sciences, Qingdao Agricultural University, Qingdao 266000, China
| | - Salsabeel Yousuf
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| | - Shi Huang
- Faculty of Dentistry, The University of Hong Kong, Hong Kong SAR, China
| | - Yong-Xin Liu
- Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China
| |
Collapse
|
7
|
Guccione C, Patel L, Tomofuji Y, McDonald D, Gonzalez A, Sepich-Poore GD, Sonehara K, Zakeri M, Chen Y, Dilmore AH, Damle N, Baranzini SE, Nakatsuji T, Gallo RL, Langmead B, Okada Y, Curtius K, Knight R. Incomplete human reference genomes can drive false sex biases and expose patient-identifying information in metagenomic data. RESEARCH SQUARE 2024:rs.3.rs-4721159. [PMID: 39502785 PMCID: PMC11537348 DOI: 10.21203/rs.3.rs-4721159/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2024]
Abstract
As next-generation sequencing technologies produce deeper genome coverages at lower costs, there is a critical need for reliable computational host DNA removal in metagenomic data. We find that insufficient host filtration using prior human genome references can introduce false sex biases and inadvertently permit flow-through of host-specific DNA during bioinformatic analyses, which could be exploited for individual identification. To address these issues, we introduce and benchmark three host filtration methods of varying throughput, with concomitant applications across low biomass samples such as skin and high microbial biomass datasets including fecal samples. We find that these methods are important for obtaining accurate results in low biomass samples (e.g., tissue, skin). Overall, we demonstrate that rigorous host filtration is a key component of privacy-minded analyses of patient microbiomes and provide computationally efficient pipelines for accomplishing this task on large-scale datasets.
Collapse
Affiliation(s)
- Caitlin Guccione
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, California 92093, USA
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, California 92093, USA
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Lucas Patel
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, California 92093, USA
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Medical Scientist Training Program, University of California, San Diego, La Jolla, California, USA
| | - Yoshihiko Tomofuji
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo 113-8654, Japan
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita 565-0871, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan
| | - Daniel McDonald
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | - Antonio Gonzalez
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
| | | | - Kyuto Sonehara
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo 113-8654, Japan
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita 565-0871, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan
| | - Mohsen Zakeri
- Department of Computer Science, Johns Hopkins University
| | - Yang Chen
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Biomedical Sciences Graduate Program, University of California San Diego, La Jolla, CA, USA
- Halicioğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
| | - Amanda Hazel Dilmore
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Biomedical Sciences Graduate Program, University of California San Diego, La Jolla, CA, USA
| | - Neil Damle
- Halicioğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
- Department of Cognitive Science, University of California San Diego, La Jolla, CA, USA
| | - Sergio E. Baranzini
- Weill Institute for Neurosciences. Department of Neurology. University of California, San Francisco (UCSF), San Francisco, CA 94158, USA
| | - Teruaki Nakatsuji
- Department of Dermatology, University of California San Diego, La Jolla, CA, USA
| | - Richard L. Gallo
- Department of Dermatology, University of California San Diego, La Jolla, CA, USA
- Center for Microbiome Innovation, University of California San Diego, La Jolla, CA, USA
| | - Ben Langmead
- Department of Computer Science, Johns Hopkins University
| | - Yukinori Okada
- Department of Genome Informatics, Graduate School of Medicine, the University of Tokyo, Tokyo 113-8654, Japan
- Department of Statistical Genetics, Osaka University Graduate School of Medicine, Suita 565-0871, Japan
- Laboratory for Systems Genetics, RIKEN Center for Integrative Medical Sciences, Yokohama 230-0045, Japan
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita 565-0871, Japan
- Premium Research Institute for Human Metaverse Medicine (WPI-PRIMe), Osaka University, Suita 565-0871, Japan
| | - Kit Curtius
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, California 92093, USA
- VA San Diego Healthcare System, San Diego, CA, USA
| | - Rob Knight
- Department of Pediatrics, University of California San Diego, La Jolla, CA, USA
- Halicioğlu Data Science Institute, University of California San Diego, La Jolla, CA, USA
- Laboratory of Statistical Immunology, Immunology Frontier Research Center (WPI-IFReC), Osaka University, Suita 565-0871, Japan
- Department of Bioengineering, University of California San Diego, La Jolla, CA, USA
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA, USA
| |
Collapse
|
8
|
Sabin SJ, Beesley CA, Marston CK, Paisie TK, Gulvik CA, Sprenger GA, Gee JE, Traxler RM, Bell ME, McQuiston JR, Weiner ZP. Investigating Anthrax-Associated Virulence Genes among Archival and Contemporary Bacillus cereus Group Genomes. Pathogens 2024; 13:884. [PMID: 39452755 PMCID: PMC11510535 DOI: 10.3390/pathogens13100884] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2024] [Revised: 09/26/2024] [Accepted: 10/02/2024] [Indexed: 10/26/2024] Open
Abstract
Bacillus anthracis causes anthrax through virulence factors encoded on two plasmids. However, non-B. anthracis organisms within the closely related, environmentally ubiquitous Bacillus cereus group (BCG) may cause an anthrax-like disease in humans through the partial adoption of anthrax-associated virulence genes, challenging the definition of anthrax disease. To elucidate these phenomena and their evolutionary past, we performed whole-genome sequencing on non-anthracis BCG isolates, including 93 archival (1967-2003) and 5 contemporary isolates (2019-2023). We produced annotated genomic assemblies and performed a pan-genome analysis to identify evidence of virulence gene homology and virulence gene acquisition by linear inheritance or horizontal gene transfer. At least one anthrax-associated virulence gene was annotated in ten isolates. Most homologous sequences in archival isolates showed evidence of pseudogenization and subsequent gene loss. The presence or absence of accessory genes, including anthrax-associated virulence genes, aligned with the phylogenetic structure of the BCG core genome. These findings support the hypothesis that anthrax-associated virulence genes were inherited from a common ancestor in the BCG and were retained or lost across different lineages, and contribute to a growing body of work informing public health strategies related to anthrax surveillance and identification.
Collapse
Affiliation(s)
- Susanna J. Sabin
- Laboratory Leadership Service Fellow Assigned to the National Center for Emerging and Zoonotic Infectious Diseases, CDC, Atlanta, GA 30329, USA
| | - Cari A. Beesley
- Centers for Disease Control and Prevention, National Center for Emerging and Zoonotic Infectious Diseases, Division of High-Consequence Pathogens and Pathology, Bacterial Special Pathogens Branch, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | - Chung K. Marston
- Centers for Disease Control and Prevention, National Center for Emerging and Zoonotic Infectious Diseases, Division of High-Consequence Pathogens and Pathology, Bacterial Special Pathogens Branch, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | - Taylor K. Paisie
- Oak Ridge Institute for Science and Education, Oak Ridge, TN 37830, USA
| | - Christopher A. Gulvik
- Centers for Disease Control and Prevention, National Center for Emerging and Zoonotic Infectious Diseases, Division of High-Consequence Pathogens and Pathology, Bacterial Special Pathogens Branch, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | | | - Jay E. Gee
- Centers for Disease Control and Prevention, National Center for Emerging and Zoonotic Infectious Diseases, Division of High-Consequence Pathogens and Pathology, Bacterial Special Pathogens Branch, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | - Rita M. Traxler
- Centers for Disease Control and Prevention, National Center for Emerging and Zoonotic Infectious Diseases, Division of High-Consequence Pathogens and Pathology, Bacterial Special Pathogens Branch, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | - Melissa E. Bell
- Centers for Disease Control and Prevention, National Center for Emerging and Zoonotic Infectious Diseases, Division of High-Consequence Pathogens and Pathology, Bacterial Special Pathogens Branch, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | - John R. McQuiston
- Centers for Disease Control and Prevention, National Center for Emerging and Zoonotic Infectious Diseases, Division of High-Consequence Pathogens and Pathology, Bacterial Special Pathogens Branch, 1600 Clifton Rd, Atlanta, GA 30329, USA
| | - Zachary P. Weiner
- Centers for Disease Control and Prevention, National Center for Emerging and Zoonotic Infectious Diseases, Division of High-Consequence Pathogens and Pathology, Bacterial Special Pathogens Branch, 1600 Clifton Rd, Atlanta, GA 30329, USA
| |
Collapse
|
9
|
Spohr P, Ried M, Kühle L, Dilthey A. SWGTS-a platform for stream-based host DNA depletion. Bioinformatics 2024; 40:btae332. [PMID: 38788219 PMCID: PMC11167210 DOI: 10.1093/bioinformatics/btae332] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 05/07/2024] [Accepted: 05/23/2024] [Indexed: 05/26/2024] Open
Abstract
MOTIVATION Microbial sequencing data from clinical samples is often contaminated with human sequences, which have to be removed prior to sharing. Existing methods for human read removal, however, are applicable only after the target dataset has been retrieved in its entirety, putting the recipient at least temporarily in control of a potentially identifiable genetic dataset with potential implications under regulatory frameworks such as the GDPR. In some instances, the ability to carry out stream-based host depletion as part of the data transfer process may be preferable. RESULTS We present SWGTS, a client-server application for the transfer and stream-based host depletion of sequencing reads. SWGTS enforces a robust upper bound on the maximum amount of human genetic data from any one client held in memory at any point in time by storing all incoming sequencing data in a limited-size, client-specific intermediate processing buffer, and by throttling the rate of incoming data if it exceeds the speed of host depletion carried out on the SWGTS server in the background. SWGTS exposes a HTTP-REST interface, is implemented using docker-compose, Redis and traefik, and requires less than 8 Gb of RAM for deployment. We demonstrate high filtering accuracy of SWGTS; incoming data transfer rates of up to 1.65 megabases per second in a conservative configuration; and mitigation of re-identification risks by the ability to limit the number of SNPs present on a popular population-scale genotyping array covered by reads in the SWGTS buffer to a low user-defined number, such as 10 or 100. AVAILABILITY AND IMPLEMENTATION SWGTS is available on GitHub: https://github.com/AlBi-HHU/swgts (https://doi.org/10.5281/zenodo.10891052). The repository also contains a jupyter notebook that can be used to reproduce all the benchmarks used in this article. All datasets used for benchmarking are publicly available.
Collapse
Affiliation(s)
- Philipp Spohr
- Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf, 40225, Germany
- Center for Digital Medicine, Düsseldorf, 40225, Germany
| | - Max Ried
- Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf, 40225, Germany
- Center for Digital Medicine, Düsseldorf, 40225, Germany
| | - Laura Kühle
- Algorithmic Bioinformatics, Heinrich Heine University Düsseldorf, Düsseldorf, 40225, Germany
- Center for Digital Medicine, Düsseldorf, 40225, Germany
| | - Alexander Dilthey
- Center for Digital Medicine, Düsseldorf, 40225, Germany
- Institute of Medical Microbiology and Hospital Hygiene, University Hospital Düsseldorf, Heinrich Heine University Düsseldorf, Düsseldorf, 40225, Germany
| |
Collapse
|
10
|
Anantharam R, Duchen D, Cox AL, Timp W, Thomas DL, Clipman SJ, Kandathil AJ. Long-Read Nanopore-Based Sequencing of Anelloviruses. Viruses 2024; 16:723. [PMID: 38793605 PMCID: PMC11125752 DOI: 10.3390/v16050723] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 04/27/2024] [Accepted: 04/30/2024] [Indexed: 05/26/2024] Open
Abstract
Routinely used metagenomic next-generation sequencing (mNGS) techniques often fail to detect low-level viremia (<104 copies/mL) and appear biased towards viruses with linear genomes. These limitations hinder the capacity to comprehensively characterize viral infections, such as those attributed to the Anelloviridae family. These near ubiquitous non-pathogenic components of the human virome have circular single-stranded DNA genomes that vary in size from 2.0 to 3.9 kb and exhibit high genetic diversity. Hence, species identification using short reads can be challenging. Here, we introduce a rolling circle amplification (RCA)-based metagenomic sequencing protocol tailored for circular single-stranded DNA genomes, utilizing the long-read Oxford Nanopore platform. The approach was assessed by sequencing anelloviruses in plasma drawn from people who inject drugs (PWID) in two geographically distinct cohorts. We detail the methodological adjustments implemented to overcome difficulties inherent in sequencing circular genomes and describe a computational pipeline focused on anellovirus detection. We assessed our protocol across various sample dilutions and successfully differentiated anellovirus sequences in conditions simulating mixed infections. This method provides a robust framework for the comprehensive characterization of circular viruses within the human virome using the Oxford Nanopore.
Collapse
Affiliation(s)
- Raghavendran Anantharam
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; (R.A.)
| | - Dylan Duchen
- Center for Biomedical Data Science, Yale University School of Medicine, New Haven, CT 06511, USA;
- Department of Pathology, Yale University School of Medicine, New Haven, CT 06519, USA
| | - Andrea L. Cox
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; (R.A.)
| | - Winston Timp
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21218, USA
| | - David L. Thomas
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; (R.A.)
| | - Steven J. Clipman
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; (R.A.)
| | - Abraham J. Kandathil
- Division of Infectious Diseases, Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA; (R.A.)
| |
Collapse
|
11
|
Hall MB, Coin LJM. Pangenome databases improve host removal and mycobacteria classification from clinical metagenomic data. Gigascience 2024; 13:giae010. [PMID: 38573185 PMCID: PMC10993716 DOI: 10.1093/gigascience/giae010] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2023] [Revised: 01/10/2024] [Accepted: 02/27/2024] [Indexed: 04/05/2024] Open
Abstract
BACKGROUND Culture-free real-time sequencing of clinical metagenomic samples promises both rapid pathogen detection and antimicrobial resistance profiling. However, this approach introduces the risk of patient DNA leakage. To mitigate this risk, we need near-comprehensive removal of human DNA sequences at the point of sequencing, typically involving the use of resource-constrained devices. Existing benchmarks have largely focused on the use of standardized databases and largely ignored the computational requirements of depletion pipelines as well as the impact of human genome diversity. RESULTS We benchmarked host removal pipelines on simulated and artificial real Illumina and Nanopore metagenomic samples. We found that construction of a custom kraken database containing diverse human genomes results in the best balance of accuracy and computational resource usage. In addition, we benchmarked pipelines using kraken and minimap2 for taxonomic classification of Mycobacterium reads using standard and custom databases. With a database representative of the Mycobacterium genus, both tools obtained improved specificity and sensitivity, compared to the standard databases for classification of Mycobacterium tuberculosis. Computational efficiency of these custom databases was superior to most standard approaches, allowing them to be executed on a laptop device. CONCLUSIONS Customized pangenome databases provide the best balance of accuracy and computational efficiency when compared to standard databases for the task of human read removal and M. tuberculosis read classification from metagenomic samples. Such databases allow for execution on a laptop, without sacrificing accuracy, an especially important consideration in low-resource settings. We make all customized databases and pipelines freely available.
Collapse
Affiliation(s)
- Michael B Hall
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, 3000 Victoria, Australia
| | - Lachlan J M Coin
- Department of Microbiology and Immunology, Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, 3000 Victoria, Australia
| |
Collapse
|