1
|
Smith BJ, Zhao C, Dubinkina V, Jin X, Zahavi L, Shoer S, Moltzau-Anderson J, Segal E, Pollard KS. Accurate estimation of intraspecific microbial gene content variation in metagenomic data with MIDAS v3 and StrainPGC. Genome Res 2025; 35:1247-1260. [PMID: 40210439 PMCID: PMC12047655 DOI: 10.1101/gr.279543.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2024] [Accepted: 03/06/2025] [Indexed: 04/12/2025]
Abstract
Metagenomics has greatly expanded our understanding of the human gut microbiome by revealing a vast diversity of bacterial species within and across individuals. Even within a single species, different strains can have highly divergent gene content, affecting traits such as antibiotic resistance, metabolism, and virulence. Methods that harness metagenomic data to resolve strain-level differences in functional potential are crucial for understanding the causes and consequences of this intraspecific diversity. The enormous size of pangenome references, strain mixing within samples, and inconsistent sequencing depth present challenges for existing tools that analyze samples one at a time. To address this gap, we updated the MIDAS pangenome profiler, now released as version 3, and developed StrainPGC, an approach to strain-specific gene content estimation that combines strain tracking and correlations across multiple samples. We validate our integrated analysis using a complex synthetic community of strains from the human gut and find that StrainPGC outperforms existing approaches. Analyzing a large, publicly available metagenome collection from inflammatory bowel disease patients and healthy controls, we catalog the functional repertoires of thousands of strains across hundreds of species, capturing extensive diversity missing from reference databases. Finally, we apply StrainPGC to metagenomes from a clinical trial of fecal microbiota transplantation for the treatment of ulcerative colitis. We identify two Escherichia coli strains, from two different donors, that are both frequently transmitted to patients but have notable differences in functional potential. StrainPGC and MIDAS v3 together enable precise, intraspecific pangenomic investigations using large collections of metagenomic data without microbial isolation or de novo assembly.
Collapse
Affiliation(s)
- Byron J Smith
- The Gladstone Institute of Data Science and Biotechnology, San Francisco, California 94158, USA
| | - Chunyu Zhao
- Chan Zuckerberg Biohub San Francisco, San Francisco, California 94158, USA
| | - Veronika Dubinkina
- The Gladstone Institute of Data Science and Biotechnology, San Francisco, California 94158, USA
| | - Xiaofan Jin
- The Gladstone Institute of Data Science and Biotechnology, San Francisco, California 94158, USA
- Department of Biomedical Engineering, University of Calgary, Calgary, Alberta T2N 1N4, Canada
| | - Liron Zahavi
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Saar Shoer
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Jacqueline Moltzau-Anderson
- Department of Gastroenterology, University of California, San Francisco, California 94115, USA
- Benioff Center for Microbiome Medicine, Department of Medicine, University of California San Francisco, San Francisco, California 94143, USA
| | - Eran Segal
- Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot 7610001, Israel
- Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot 7610001, Israel
| | - Katherine S Pollard
- The Gladstone Institute of Data Science and Biotechnology, San Francisco, California 94158, USA;
- Chan Zuckerberg Biohub San Francisco, San Francisco, California 94158, USA
- Department of Epidemiology and Biostatistics, University of California, San Francisco, California 94158, USA
| |
Collapse
|
2
|
Yaffe E, Dethlefsen L, Patankar AV, Gui C, Holmes S, Relman DA. Brief antibiotic use drives human gut bacteria towards low-cost resistance. Nature 2025; 641:182-191. [PMID: 40269166 DOI: 10.1038/s41586-025-08781-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Accepted: 02/12/2025] [Indexed: 04/25/2025]
Abstract
Understanding the relationship between antibiotic use and the evolution of antimicrobial resistance is vital for effective antibiotic stewardship. Yet, animal models and in vitro experiments poorly replicate real-world conditions1. To explain how resistance evolves in vivo, we exposed 60 human participants to ciprofloxacin and used longitudinal stool samples and a new computational method to assemble the genomes of 5,665 populations of commensal bacterial species within participants. Analysis of 2.3 million polymorphic sequence variants revealed 513 populations that underwent selective sweeps. We found convergent evolution focused on DNA gyrase and evidence of dispersed selective pressure at other genomic loci. Roughly 10% of susceptible bacterial populations evolved towards resistance through sweeps that involved substitutions at a specific amino acid in gyrase. The evolution of gyrase was associated with large populations that decreased in relative abundance during exposure. Sweeps persisted for more than 10 weeks in most cases and were not projected to revert within a year. Targeted amplification showed that gyrase mutations arose de novo within the participants and exhibited no measurable fitness cost. These findings revealed that brief ciprofloxacin exposure drives the evolution of resistance in gut commensals, with mutations persisting long after exposure. This study underscores the capacity of the human gut to promote the evolution of resistance and identifies key genomic and ecological factors that shape bacterial adaptation in vivo.
Collapse
Affiliation(s)
- Eitan Yaffe
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA.
- Infectious Diseases Section, Veterans Affairs Palo Alto Health Care System, Palo Alto, CA, USA.
| | - Les Dethlefsen
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
| | - Arati V Patankar
- Infectious Diseases Section, Veterans Affairs Palo Alto Health Care System, Palo Alto, CA, USA
| | - Chen Gui
- Infectious Diseases Section, Veterans Affairs Palo Alto Health Care System, Palo Alto, CA, USA
| | - Susan Holmes
- Department of Statistics, Stanford University, Stanford, CA, USA
| | - David A Relman
- Department of Medicine, Stanford University School of Medicine, Stanford, CA, USA
- Infectious Diseases Section, Veterans Affairs Palo Alto Health Care System, Palo Alto, CA, USA
- Department of Microbiology and Immunology, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
3
|
Kim Y, Worby CJ, Acharya S, van Dijk LR, Alfonsetti D, Gromko Z, Azimzadeh PN, Dodson KW, Gerber GK, Hultgren SJ, Earl AM, Berger B, Gibson TE. Longitudinal profiling of low-abundance strains in microbiomes with ChronoStrain. Nat Microbiol 2025; 10:1184-1197. [PMID: 40328944 PMCID: PMC12122369 DOI: 10.1038/s41564-025-01983-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Accepted: 03/13/2025] [Indexed: 05/08/2025]
Abstract
The ability to detect and quantify microbiota over time from shotgun metagenomic data has a plethora of clinical, basic science and public health applications. Given these applications, and the observation that pathogens and other taxa of interest can reside at low relative abundance, there is a critical need for algorithms that accurately profile low-abundance microbial taxa with strain-level resolution. Here we present ChronoStrain: a sequence quality- and time-aware Bayesian model for profiling strains in longitudinal samples. ChronoStrain explicitly models the presence or absence of each strain and produces a probability distribution over abundance trajectories for each strain. Using synthetic and semi-synthetic data, we demonstrate how ChronoStrain outperforms existing methods in abundance estimation and presence/absence prediction. Applying ChronoStrain to two human microbiome datasets demonstrated its improved interpretability for profiling Escherichia coli strain blooms in longitudinal faecal samples from adult women with recurring urinary tract infections, and its improved accuracy for detecting Enterococcus faecalis strains in infant faecal samples. Compared with state-of-the-art methods, ChronoStrain's ability to detect low-abundance taxa is particularly stark.
Collapse
Affiliation(s)
- Younhun Kim
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Colin J Worby
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Sawal Acharya
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA
| | - Lucas R van Dijk
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Delft Bioinformatics Lab, Delft University of Technology, Delft, the Netherlands
| | - Daniel Alfonsetti
- Computer Science and AI Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Zackary Gromko
- Computer Science and AI Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Philippe N Azimzadeh
- Department of Molecular Microbiology and Center for Women's Infectious Disease Research, Washington University School of Medicine, St Louis, MO, USA
| | - Karen W Dodson
- Department of Molecular Microbiology and Center for Women's Infectious Disease Research, Washington University School of Medicine, St Louis, MO, USA
| | - Georg K Gerber
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Harvard-MIT Health Sciences and Technology, Cambridge, MA, USA
| | - Scott J Hultgren
- Department of Molecular Microbiology and Center for Women's Infectious Disease Research, Washington University School of Medicine, St Louis, MO, USA
| | - Ashlee M Earl
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Bonnie Berger
- Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Computer Science and AI Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.
- Harvard-MIT Health Sciences and Technology, Cambridge, MA, USA.
| | - Travis E Gibson
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA.
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Harvard Medical School, Boston, MA, USA.
- Computer Science and AI Lab, Massachusetts Institute of Technology, Cambridge, MA, USA.
| |
Collapse
|
4
|
O'Reilly KM, Wade MJ, Farkas K, Amman F, Lison A, Munday JD, Bingham J, Mthombothi ZE, Fang Z, Brown CS, Kao RR, Danon L. Analysis insights to support the use of wastewater and environmental surveillance data for infectious diseases and pandemic preparedness. Epidemics 2025; 51:100825. [PMID: 40174494 DOI: 10.1016/j.epidem.2025.100825] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 02/06/2025] [Accepted: 03/26/2025] [Indexed: 04/04/2025] Open
Abstract
Wastewater-based epidemiology is the detection of pathogens from sewage systems and the interpretation of these data to improve public health. Its use has increased in scope since 2020, when it was demonstrated that SARS-CoV-2 RNA could be successfully extracted from the wastewater of affected populations. In this Perspective we provide an overview of recent advances in pathogen detection within wastewater, propose a framework for identifying the utility of wastewater sampling for pathogen detection and suggest areas where analytics require development. Ensuring that both data collection and analysis are tailored towards key questions at different stages of an epidemic will improve the inference made. For analyses to be useful we require methods to determine the absence of infection, early detection of infection, reliably estimate epidemic trajectories and prevalence, and detect novel variants without reliance on consensus sequences. This research area has included many innovations that have improved the interpretation of collected data and we are optimistic that innovation will continue in the future.
Collapse
Affiliation(s)
- K M O'Reilly
- Centre for Mathematical Modelling of Infectious Diseases & Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, Keppel Street, London WC1E 7HT, UK.
| | - M J Wade
- Data, Analytics & Surveillance Group, UK Health Security Agency, 10 South Colonnade, Canary Wharf, London E14 4PU, UK
| | - K Farkas
- School of Environmental and Natural Sciences, Bangor University, Bangor, Gwynedd LL57 2UW, UK
| | - F Amman
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
| | - A Lison
- Department of Biosystems Science and Engineering, ETH Zürich, Schanzenstrasse 44, Basel 4056, Switzerland
| | - J D Munday
- Department of Biosystems Science and Engineering, ETH Zürich, Schanzenstrasse 44, Basel 4056, Switzerland
| | - J Bingham
- South African Center for Epidemiological Modelling and Analysis (SACEMA), Stellenbosch University, Stellenbosch, South Africa
| | - Z E Mthombothi
- South African Center for Epidemiological Modelling and Analysis (SACEMA), Stellenbosch University, Stellenbosch, South Africa
| | - Z Fang
- Biomathematics and Statistics Scotland, James Clerk Maxwell Building, King's Buildings, Peter Guthrie Tait Road, Edinburgh EH9 3FD, UK
| | - C S Brown
- Clinical & Emerging Infection Directorate, UK Health Security Agency, 61 Colindale Avenue, London NW9 5EQ, UK; NIHR HPRU in Healthcare Associated Infections and Antimicrobial Resistance, Imperial College London, Faculty of Medicine, Imperial College London, South Kensington Campus, London SW7 2AZ, UK
| | - R R Kao
- Roslin Institute and School of Physics and Astronomy, University of Edinburgh, EH25 9RG, UK
| | - L Danon
- Department of Engineering Mathematics, Ada Lovelace Building, University Walk, Bristol BS8 1TW, UK
| |
Collapse
|
5
|
Heinken A, Hulshof TO, Nap B, Martinelli F, Basile A, O'Brolchain A, O'Sullivan NF, Gallagher C, Magee E, McDonagh F, Lalor I, Bergin M, Evans P, Daly R, Farrell R, Delaney RM, Hill S, McAuliffe SR, Kilgannon T, Fleming RMT, Thinnes CC, Thiele I. A genome-scale metabolic reconstruction resource of 247,092 diverse human microbes spanning multiple continents, age groups, and body sites. Cell Syst 2025; 16:101196. [PMID: 39947184 DOI: 10.1016/j.cels.2025.101196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2024] [Revised: 10/04/2024] [Accepted: 01/15/2025] [Indexed: 02/19/2025]
Abstract
Genome-scale modeling of microbiome metabolism enables the simulation of diet-host-microbiome-disease interactions. However, current genome-scale reconstruction resources are limited in scope by computational challenges. We developed an optimized and highly parallelized reconstruction and analysis pipeline to build a resource of 247,092 microbial genome-scale metabolic reconstructions, deemed APOLLO. APOLLO spans 19 phyla, contains >60% of uncharacterized strains, and accounts for strains from 34 countries, all age groups, and multiple body sites. Using machine learning, we predicted with high accuracy the taxonomic assignment of strains based on the computed metabolic features. We then built 14,451 metagenomic sample-specific microbiome community models to systematically interrogate their community-level metabolic capabilities. We show that sample-specific metabolic pathways accurately stratify microbiomes by body site, age, and disease state. APOLLO is freely available, enables the systematic interrogation of the metabolic capabilities of largely still uncultured and unclassified species, and provides unprecedented opportunities for systems-level modeling of personalized host-microbiome co-metabolism.
Collapse
Affiliation(s)
- Almut Heinken
- School of Medicine, University of Galway, Galway, Ireland; Ryan Institute, University of Galway, Galway, Ireland; Inserm UMRS 1256 NGERE, University of Lorraine, Nancy, France
| | - Timothy Otto Hulshof
- School of Medicine, University of Galway, Galway, Ireland; Ryan Institute, University of Galway, Galway, Ireland
| | - Bram Nap
- School of Medicine, University of Galway, Galway, Ireland; Ryan Institute, University of Galway, Galway, Ireland
| | - Filippo Martinelli
- School of Medicine, University of Galway, Galway, Ireland; Ryan Institute, University of Galway, Galway, Ireland
| | - Arianna Basile
- School of Medicine, University of Galway, Galway, Ireland; Department of Biology, University of Padova, Padova, Italy
| | | | | | | | | | | | - Ian Lalor
- University of Galway, Galway, Ireland
| | | | | | | | | | | | | | | | | | | | - Cyrille C Thinnes
- School of Medicine, University of Galway, Galway, Ireland; Ryan Institute, University of Galway, Galway, Ireland
| | - Ines Thiele
- School of Medicine, University of Galway, Galway, Ireland; Ryan Institute, University of Galway, Galway, Ireland; Division of Microbiology, University of Galway, Galway, Ireland; APC Microbiome Ireland, Cork, Ireland.
| |
Collapse
|
6
|
Qu EB, Baker JS, Markey L, Khadka V, Mancuso C, Tripp D, Lieberman TD. Intraspecies associations from strain-rich metagenome samples. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.02.07.636498. [PMID: 39974997 PMCID: PMC11839054 DOI: 10.1101/2025.02.07.636498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/21/2025]
Abstract
Genetically distinct strains of a species can vary widely in phenotype, reducing the utility of species-resolved microbiome measurements for detecting associations with health or disease. While metagenomics theoretically provides information on all strains in a sample, current strain-resolved analysis methods face a tradeoff: de novo genotyping approaches can detect novel strains but struggle when applied to strain-rich or low-coverage samples, while reference database methods work robustly across sample types but are insensitive to novel diversity. We present PHLAME, a method that bridges this divide by combining the advantages of reference-based approaches with novelty awareness. PHLAME explicitly defines clades at multiple phylogenetic levels and introduces a probabilistic, mutation-based, framework to accurately quantify novelty from the nearest reference. By applying PHLAME to publicly available human skin and vaginal metagenomes, we uncover previously undetected clade associations with coexisting species, geography, and host age. The ability to characterize intraspecies associations and dynamics in previously inaccessible environments will propel new mechanistic insights from accumulating metagenomic data.
Collapse
Affiliation(s)
- Evan B. Qu
- Institute for Medical Engineering and Sciences, Massachusetts Institute of Technology; Cambridge, MA 02139, USA
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology; Cambridge, MA 02139, USA
| | - Jacob S. Baker
- Institute for Medical Engineering and Sciences, Massachusetts Institute of Technology; Cambridge, MA 02139, USA
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology; Cambridge, MA 02139, USA
| | - Laura Markey
- Institute for Medical Engineering and Sciences, Massachusetts Institute of Technology; Cambridge, MA 02139, USA
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology; Cambridge, MA 02139, USA
| | - Veda Khadka
- Institute for Medical Engineering and Sciences, Massachusetts Institute of Technology; Cambridge, MA 02139, USA
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology; Cambridge, MA 02139, USA
| | - Chris Mancuso
- Institute for Medical Engineering and Sciences, Massachusetts Institute of Technology; Cambridge, MA 02139, USA
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology; Cambridge, MA 02139, USA
| | - Delphine Tripp
- Institute for Medical Engineering and Sciences, Massachusetts Institute of Technology; Cambridge, MA 02139, USA
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology; Cambridge, MA 02139, USA
- Department of Systems Biology, Harvard University, Cambridge, MA 02138, USA
| | - Tami D. Lieberman
- Institute for Medical Engineering and Sciences, Massachusetts Institute of Technology; Cambridge, MA 02139, USA
- Department of Civil and Environmental Engineering, Massachusetts Institute of Technology; Cambridge, MA 02139, USA
- Broad Institute of MIT and Harvard; Cambridge, MA 02139, USA
- Ragon Institute of MGH, MIT, and Harvard; Cambridge, MA 02139, USA
| |
Collapse
|
7
|
Pinto Y, Bhatt AS. Sequencing-based analysis of microbiomes. Nat Rev Genet 2024; 25:829-845. [PMID: 38918544 DOI: 10.1038/s41576-024-00746-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2024] [Indexed: 06/27/2024]
Abstract
Microbiomes occupy a range of niches and, in addition to having diverse compositions, they have varied functional roles that have an impact on agriculture, environmental sciences, and human health and disease. The study of microbiomes has been facilitated by recent technological and analytical advances, such as cheaper and higher-throughput DNA and RNA sequencing, improved long-read sequencing and innovative computational analysis methods. These advances are providing a deeper understanding of microbiomes at the genomic, transcriptional and translational level, generating insights into their function and composition at resolutions beyond the species level.
Collapse
Affiliation(s)
- Yishay Pinto
- Department of Genetics, Stanford University, Stanford, CA, USA
- Department of Medicine, Divisions of Hematology and Blood & Marrow Transplantation, Stanford University, Stanford, CA, USA
| | - Ami S Bhatt
- Department of Genetics, Stanford University, Stanford, CA, USA.
- Department of Medicine, Divisions of Hematology and Blood & Marrow Transplantation, Stanford University, Stanford, CA, USA.
| |
Collapse
|
8
|
Zhou B, Wang C, Putzel G, Hu J, Liu M, Wu F, Chen Y, Pironti A, Li H. An integrated strain-level analytic pipeline utilizing longitudinal metagenomic data. Microbiol Spectr 2024; 12:e0143124. [PMID: 39311770 PMCID: PMC11542597 DOI: 10.1128/spectrum.01431-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2024] [Accepted: 08/28/2024] [Indexed: 11/08/2024] Open
Abstract
With the development of sequencing technology and analytic tools, studying within-species variations enhances the understanding of microbial biological processes. Nevertheless, most existing methods designed for strain-level analysis lack the capability to concurrently assess both strain proportions and genome-wide single nucleotide variants (SNVs) across longitudinal metagenomic samples. In this study, we introduce LongStrain, an integrated pipeline for the analysis of large-scale metagenomic data from individuals with longitudinal or repeated samples. In LongStrain, we first utilize two efficient tools, Kraken2 and Bowtie2, for the taxonomic classification and alignment of sequencing reads, respectively. Subsequently, we propose to jointly model strain proportions and shared haplotypes across samples within individuals. This approach specifically targets tracking a primary strain and a secondary strain for each subject, providing their respective proportions and SNVs as output. With extensive simulation studies of a microbial community and single species, our results demonstrate that LongStrain is superior to two genotyping methods and two deconvolution methods across a majority of scenarios. Furthermore, we illustrate the potential applications of LongStrain in the real data analysis of The Environmental Determinants of Diabetes in the Young study and a gastric intestinal metaplasia microbiome study. In summary, the proposed analytic pipeline demonstrates marked statistical efficiency over the same type of methods and has great potential in understanding the genomic variants and dynamic changes at strain level. LongStrain and its tutorial are freely available online at https://github.com/BoyanZhou/LongStrain. IMPORTANCE The advancement in DNA-sequencing technology has enabled the high-resolution identification of microorganisms in microbial communities. Since different microbial strains within species may contain extreme phenotypic variability (e.g., nutrition metabolism, antibiotic resistance, and pathogen virulence), investigating within-species variations holds great scientific promise in understanding the underlying mechanism of microbial biological processes. To fully utilize the shared genomic variants across longitudinal metagenomics samples collected in microbiome studies, we develop an integrated analytic pipeline (LongStrain) for longitudinal metagenomics data. It concurrently leverages the information on proportions of mapped reads for individual strains and genome-wide SNVs to enhance the efficiency and accuracy of strain identification. Our method helps to understand strains' dynamic changes and their association with genome-wide variants. Given the fast-growing longitudinal studies of microbial communities, LongStrain which streamlines analyses of large-scale raw sequencing data should be of great value in microbiome research communities.
Collapse
Affiliation(s)
- Boyan Zhou
- Division of
Biostatistics, Department of Population Health, New York University
School of Medicine, New
York, New York, USA
| | - Chan Wang
- Division of
Biostatistics, Department of Population Health, New York University
School of Medicine, New
York, New York, USA
| | - Gregory Putzel
- Department of
Microbiology, New York University School of
Medicine, New York, New
York, USA
| | - Jiyuan Hu
- Division of
Biostatistics, Department of Population Health, New York University
School of Medicine, New
York, New York, USA
| | - Menghan Liu
- Department of
Biological Sciences, Columbia University in the City of New
York, New York, New
York, USA
| | - Fen Wu
- Division of
Epidemiology, Department of Population Health, New York University
School of Medicine, New
York, New York, USA
| | - Yu Chen
- Division of
Epidemiology, Department of Population Health, New York University
School of Medicine, New
York, New York, USA
| | - Alejandro Pironti
- Department of
Microbiology, New York University School of
Medicine, New York, New
York, USA
| | - Huilin Li
- Division of
Biostatistics, Department of Population Health, New York University
School of Medicine, New
York, New York, USA
| |
Collapse
|
9
|
Kang X, Zhang W, Li Y, Luo X, Schönhuth A. HyLight: Strain aware assembly of low coverage metagenomes. Nat Commun 2024; 15:8665. [PMID: 39375348 PMCID: PMC11458758 DOI: 10.1038/s41467-024-52907-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2023] [Accepted: 09/23/2024] [Indexed: 10/09/2024] Open
Abstract
Different strains of identical species can vary substantially in terms of their spectrum of biomedically relevant phenotypes. Reconstructing the genomes of microbial communities at the level of their strains poses significant challenges, because sequencing errors can obscure strain-specific variants. Next-generation sequencing (NGS) reads are too short to resolve complex genomic regions. Third-generation sequencing (TGS) reads, although longer, are prone to higher error rates or substantially more expensive. Limiting TGS coverage to reduce costs compromises the accuracy of the assemblies. This explains why prior approaches agree on losses in strain awareness, accuracy, tendentially excessive costs, or combinations thereof. We introduce HyLight, a metagenome assembly approach that addresses these challenges by implementing the complementary strengths of TGS and NGS data. HyLight employs strain-resolved overlap graphs (OG) to accurately reconstruct individual strains within microbial communities. Our experiments demonstrate that HyLight produces strain-aware and contiguous assemblies at minimal error content, while significantly reducing costs because utilizing low-coverage TGS data. HyLight achieves an average improvement of 19.05% in preserving strain identity and demonstrates near-complete strain awareness across diverse datasets. In summary, HyLight offers considerable advances in metagenome assembly, insofar as it delivers significantly enhanced strain awareness, contiguity, and accuracy without the typical compromises observed in existing approaches.
Collapse
Affiliation(s)
- Xiongbin Kang
- College of Biology, Hunan University, Changsha, China
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Wenhai Zhang
- College of Biology, Hunan University, Changsha, China
| | - Yichen Li
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Xiao Luo
- College of Biology, Hunan University, Changsha, China.
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
10
|
Kim Y, Worby CJ, Acharya S, van Dijk LR, Alfonsetti D, Gromko Z, Azimzadeh P, Dodson K, Gerber G, Hultgren S, Earl AM, Berger B, Gibson TE. Strain tracking with uncertainty quantification. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.01.25.525531. [PMID: 36747646 PMCID: PMC9900846 DOI: 10.1101/2023.01.25.525531] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
The ability to detect and quantify microbiota over time has a plethora of clinical, basic science, and public health applications. One of the primary means of tracking microbiota is through sequencing technologies. When the microorganism of interest is well characterized or known a priori , targeted sequencing is often used. In many applications, however, untargeted bulk (shotgun) sequencing is more appropriate; for instance, the tracking of infection transmission events and nucleotide variants across multiple genomic loci, or studying the role of multiple genes in a particular phenotype. Given these applications, and the observation that pathogens (e.g. Clostridioides difficile, Escherichia coli, Salmonella enterica ) and other taxa of interest can reside at low relative abundance in the gastrointestinal tract, there is a critical need for algorithms that accurately track low-abundance taxa with strain level resolution. Here we present a sequence quality- and time-aware model, ChronoStrain , that introduces uncertainty quantification to gauge low-abundance species and significantly outperforms the current state-of-the-art on both real and synthetic data. ChronoStrain leverages sequences' quality scores and the samples' temporal information to produce a probability distribution over abundance trajectories for each strain tracked in the model. We demonstrate Chronostrain's improved performance in capturing post-antibiotic Escherichia coli strain blooms among women with recurrent urinary tract infections (UTIs) from the UTI Microbiome (UMB) Project. Other strain tracking models on the same data either show inconsistent temporal colonization or can only track consistently using very coarse groupings. In contrast, our probabilistic outputs can reveal the relationship between low-confidence strains present in the sample that cannot be reliably assigned a single reference label (either due to poor coverage or novelty) while simultaneously calling high-confidence strains that can be unambiguously assigned a label. We also analyze samples from the Early Life Microbiota Colonisation (ELMC) Study demonstrating the algorithm's ability to correctly identify Enterococcus faecalis strains using paired sample isolates as validation.
Collapse
|
11
|
Shaw J, Gounot JS, Chen H, Nagarajan N, Yu YW. Floria: fast and accurate strain haplotyping in metagenomes. Bioinformatics 2024; 40:i30-i38. [PMID: 38940183 PMCID: PMC11211831 DOI: 10.1093/bioinformatics/btae252] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2024] Open
Abstract
SUMMARY Shotgun metagenomics allows for direct analysis of microbial community genetics, but scalable computational methods for the recovery of bacterial strain genomes from microbiomes remains a key challenge. We introduce Floria, a novel method designed for rapid and accurate recovery of strain haplotypes from short and long-read metagenome sequencing data, based on minimum error correction (MEC) read clustering and a strain-preserving network flow model. Floria can function as a standalone haplotyping method, outputting alleles and reads that co-occur on the same strain, as well as an end-to-end read-to-assembly pipeline (Floria-PL) for strain-level assembly. Benchmarking evaluations on synthetic metagenomes show that Floria is > 3× faster and recovers 21% more strain content than base-level assembly methods (Strainberry) while being over an order of magnitude faster when only phasing is required. Applying Floria to a set of 109 deeply sequenced nanopore metagenomes took <20 min on average per sample and identified several species that have consistent strain heterogeneity. Applying Floria's short-read haplotyping to a longitudinal gut metagenomics dataset revealed a dynamic multi-strain Anaerostipes hadrus community with frequent strain loss and emergence events over 636 days. With Floria, accurate haplotyping of metagenomic datasets takes mere minutes on standard workstations, paving the way for extensive strain-level metagenomic analyses. AVAILABILITY AND IMPLEMENTATION Floria is available at https://github.com/bluenote-1577/floria, and the Floria-PL pipeline is available at https://github.com/jsgounot/Floria_analysis_workflow along with code for reproducing the benchmarks.
Collapse
Affiliation(s)
- Jim Shaw
- Department of Mathematics, University of Toronto, Toronto, Ontario, M5S 2E4, Canada
| | - Jean-Sebastien Gounot
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Singapore, 138672, Republic of Singapore
| | - Hanrong Chen
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Singapore, 138672, Republic of Singapore
| | - Niranjan Nagarajan
- Genome Institute of Singapore (GIS), Agency for Science, Technology and Research (A*STAR), 60 Biopolis Street, Singapore, 138672, Republic of Singapore
- Yong Loo Lin School of Medicine, National University of Singapore, Singapore, 117597, Republic of Singapore
| | - Yun William Yu
- Department of Mathematics, University of Toronto, Toronto, Ontario, M5S 2E4, Canada
- Ray and Stephanie Lane Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA, 15213, United States
| |
Collapse
|
12
|
Ju N, Liu J, He Q. SNP-slice resolves mixed infections: simultaneously unveiling strain haplotypes and linking them to hosts. Bioinformatics 2024; 40:btae344. [PMID: 38885409 PMCID: PMC11187496 DOI: 10.1093/bioinformatics/btae344] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 05/09/2024] [Accepted: 06/14/2024] [Indexed: 06/20/2024] Open
Abstract
MOTIVATION Multi-strain infection is a common yet under-investigated phenomenon of many pathogens. Currently, biologists analyzing SNP information sometimes have to discard mixed infection samples as many downstream analyses require monogenomic inputs. Such a protocol impedes our understanding of the underlying genetic diversity, co-infection patterns, and genomic relatedness of pathogens. A scalable tool to learn and resolve the SNP-haplotypes from polygenomic data is an urgent need in molecular epidemiology. RESULTS We develop a slice sampling Markov Chain Monte Carlo algorithm, named SNP-Slice, to learn not only the SNP-haplotypes of all strains in the populations but also which strains infect which hosts. Our method reconstructs SNP-haplotypes and individual heterozygosities accurately without reference panels and outperforms the state-of-the-art methods at estimating the multiplicity of infections and allele frequencies. Thus, SNP-Slice introduces a novel approach to address polygenomic data and opens a new avenue for resolving complex infection patterns in molecular surveillance. We illustrate the performance of SNP-Slice on empirical malaria and HIV datasets and provide recommendations for using our method on empirical datasets. AVAILABILITY AND IMPLEMENTATION The implementation of the SNP-Slice algorithm, as well as scripts to analyze SNP-Slice outputs, are available at https://github.com/nianqiaoju/snp-slice.
Collapse
Affiliation(s)
- Nianqiao Ju
- Department of Statistics, Purdue University, West Lafayette, IN 47907, United States
| | - Jiawei Liu
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, United States
| | - Qixin He
- Department of Biological Sciences, Purdue University, West Lafayette, IN 47907, United States
| |
Collapse
|
13
|
Young MG, Straub TJ, Worby CJ, Metsky HC, Gnirke A, Bronson RA, van Dijk LR, Desjardins CA, Matranga C, Qu J, Villicana JB, Azimzadeh P, Kau A, Dodson KW, Schreiber HL, Manson AL, Hultgren SJ, Earl AM. Distinct Escherichia coli transcriptional profiles in the guts of recurrent UTI sufferers revealed by pangenome hybrid selection. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.02.29.582780. [PMID: 38463963 PMCID: PMC10925322 DOI: 10.1101/2024.02.29.582780] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 03/12/2024]
Abstract
Low-abundance members of microbial communities are difficult to study in their native habitats. This includes Escherichia coli, a minor, but common inhabitant of the gastrointestinal tract and opportunistic pathogen, including of the urinary tract, where it is the primary pathogen. While multi-omic analyses have detailed critical interactions between uropathogenic Escherichia coli (UPEC) and the bladder that mediate UTI outcome, comparatively little is known about UPEC in its pre-infection reservoir, partly due to its low abundance there (<1% relative abundance). To accurately and sensitively explore the genomes and transcriptomes of diverse E. coli in gastrointestinal communities, we developed E. coli PanSelect which uses a set of probes designed to specifically recognize and capture E. coli's broad pangenome from sequencing libraries. We demonstrated the ability of E. coli PanSelect to enrich, by orders of magnitude, sequencing data from diverse E. coli using a mock community and a set of human stool samples collected as part of a cohort study investigating drivers of recurrent urinary tract infections (rUTI). Comparisons of genomes and transcriptomes between E. coli residing in the gastrointestinal tracts of women with and without a history of rUTI suggest that rUTI gut E. coli are responding to increased levels of oxygen and nitrate, suggestive of mucosal inflammation, which may have implications for recurrent disease. E. coli PanSelect is well suited for investigations of native in vivo biology of E. coli in other environments where it is at low relative abundance, and the framework described here has broad applicability to other highly diverse, low abundance organisms.
Collapse
Affiliation(s)
- Mark G Young
- Infectious Disease & Microbiome Program, Broad Institute, Cambridge, MA 02142, USA
| | - Timothy J Straub
- Infectious Disease & Microbiome Program, Broad Institute, Cambridge, MA 02142, USA
| | - Colin J Worby
- Infectious Disease & Microbiome Program, Broad Institute, Cambridge, MA 02142, USA
| | - Hayden C Metsky
- Infectious Disease & Microbiome Program, Broad Institute, Cambridge, MA 02142, USA
| | - Andreas Gnirke
- Infectious Disease & Microbiome Program, Broad Institute, Cambridge, MA 02142, USA
| | - Ryan A Bronson
- Infectious Disease & Microbiome Program, Broad Institute, Cambridge, MA 02142, USA
| | - Lucas R van Dijk
- Infectious Disease & Microbiome Program, Broad Institute, Cambridge, MA 02142, USA
- Delft Bioinformatics Lab, Delft University of Technology, Van Mourik Broekmanweg 6, Delft, 2628 XE, The Netherlands
| | | | - Christian Matranga
- Infectious Disease & Microbiome Program, Broad Institute, Cambridge, MA 02142, USA
| | - James Qu
- Infectious Disease & Microbiome Program, Broad Institute, Cambridge, MA 02142, USA
| | - Jesús Bazan Villicana
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO, USA
| | - Philippe Azimzadeh
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO, USA
| | - Andrew Kau
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO, USA
- Center for Women's Infectious Disease Research, Washington University School of Medicine, St. Louis, MO, USA
- Division of Allergy and Immunology, Department of Medicine, Washington University School of Medicine, St. Louis, MO, USA
| | - Karen W Dodson
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO, USA
- Center for Women's Infectious Disease Research, Washington University School of Medicine, St. Louis, MO, USA
| | - Henry L Schreiber
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO, USA
- Center for Women's Infectious Disease Research, Washington University School of Medicine, St. Louis, MO, USA
| | - Abigail L Manson
- Infectious Disease & Microbiome Program, Broad Institute, Cambridge, MA 02142, USA
| | - Scott J Hultgren
- Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO, USA
- Center for Women's Infectious Disease Research, Washington University School of Medicine, St. Louis, MO, USA
| | - Ashlee M Earl
- Infectious Disease & Microbiome Program, Broad Institute, Cambridge, MA 02142, USA
| |
Collapse
|
14
|
Goussarov G, Mysara M, Cleenwerck I, Claesen J, Leys N, Vandamme P, Van Houdt R. Benchmarking short-, long- and hybrid-read assemblers for metagenome sequencing of complex microbial communities. MICROBIOLOGY (READING, ENGLAND) 2024; 170:001469. [PMID: 38916949 PMCID: PMC11261854 DOI: 10.1099/mic.0.001469] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/14/2024] [Accepted: 05/23/2024] [Indexed: 06/26/2024]
Abstract
Metagenome community analyses, driven by the continued development in sequencing technology, is rapidly providing insights in many aspects of microbiology and becoming a cornerstone tool. Illumina, Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) are the leading technologies, each with their own advantages and drawbacks. Illumina provides accurate reads at a low cost, but their length is too short to close bacterial genomes. Long reads overcome this limitation, but these technologies produce reads with lower accuracy (ONT) or with lower throughput (PacBio high-fidelity reads). In a critical first analysis step, reads are assembled to reconstruct genomes or individual genes within the community. However, to date, the performance of existing assemblers has never been challenged with a complex mock metagenome. Here, we evaluate the performance of current assemblers that use short, long or both read types on a complex mock metagenome consisting of 227 bacterial strains with varying degrees of relatedness. We show that many of the current assemblers are not suited to handle such a complex metagenome. In addition, hybrid assemblies do not fulfil their potential. We conclude that ONT reads assembled with CANU and Illumina reads assembled with SPAdes offer the best value for reconstructing genomes and individual genes of complex metagenomes, respectively.
Collapse
Affiliation(s)
- Gleb Goussarov
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Mohamed Mysara
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
- Bioinformatics group, Information Technology & Computer Science, Nile University, Giza, Egypt
| | - Ilse Cleenwerck
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Jürgen Claesen
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| | - Natalie Leys
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| | - Peter Vandamme
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Rob Van Houdt
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| |
Collapse
|
15
|
Trecarten S, Fongang B, Liss M. Current Trends and Challenges of Microbiome Research in Prostate Cancer. Curr Oncol Rep 2024; 26:477-487. [PMID: 38573440 DOI: 10.1007/s11912-024-01520-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/18/2024] [Indexed: 04/05/2024]
Abstract
PURPOSE OF REVIEW The role of the gut microbiome in prostate cancer is an emerging area of research interest. However, no single causative organism has yet been identified. The goal of this paper is to examine the role of the microbiome in prostate cancer and summarize the challenges relating to methodology in specimen collection, sequencing technology, and interpretation of results. RECENT FINDINGS Significant heterogeneity still exists in methodology for stool sampling/storage, preservative options, DNA extraction, and sequencing database selection/in silico processing. Debate persists over primer choice in amplicon sequencing as well as optimal methods for data normalization. Statistical methods for longitudinal microbiome analysis continue to undergo refinement. While standardization of methodology may help yield more consistent results for organism identification in prostate cancer, this is a difficult task due to considerable procedural variation at each step in the process. Further reproducibility and methodology research is required.
Collapse
Affiliation(s)
- Shaun Trecarten
- Department of Urology, UT Health San Antonio, 7703 Floyd Curl Dr, San Antonio, TX, 78229, USA
| | - Bernard Fongang
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, UT Health San Antonio, San Antonio, TX, USA
- Department of Biochemistry and Structural Biology, UT Health San Antonio, San Antonio, TX, USA
- Department of Population Health Sciences, UT Health San Antonio, San Antonio, TX, USA
| | - Michael Liss
- Department of Urology, UT Health San Antonio, 7703 Floyd Curl Dr, San Antonio, TX, 78229, USA.
| |
Collapse
|
16
|
Logares R. Decoding populations in the ocean microbiome. MICROBIOME 2024; 12:67. [PMID: 38561814 PMCID: PMC10983722 DOI: 10.1186/s40168-024-01778-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2023] [Accepted: 02/12/2024] [Indexed: 04/04/2024]
Abstract
Understanding the characteristics and structure of populations is fundamental to comprehending ecosystem processes and evolutionary adaptations. While the study of animal and plant populations has spanned a few centuries, microbial populations have been under scientific scrutiny for a considerably shorter period. In the ocean, analyzing the genetic composition of microbial populations and their adaptations to multiple niches can yield important insights into ecosystem function and the microbiome's response to global change. However, microbial populations have remained elusive to the scientific community due to the challenges associated with isolating microorganisms in the laboratory. Today, advancements in large-scale metagenomics and metatranscriptomics facilitate the investigation of populations from many uncultured microbial species directly from their habitats. The knowledge acquired thus far reveals substantial genetic diversity among various microbial species, showcasing distinct patterns of population differentiation and adaptations, and highlighting the significant role of selection in structuring populations. In the coming years, population genomics is expected to significantly increase our understanding of the architecture and functioning of the ocean microbiome, providing insights into its vulnerability or resilience in the face of ongoing global change. Video Abstract.
Collapse
Affiliation(s)
- Ramiro Logares
- Institute of Marine Sciences (ICM), CSIC, Barcelona, Catalonia, 08003, Spain.
| |
Collapse
|
17
|
Cerk K, Ugalde‐Salas P, Nedjad CG, Lecomte M, Muller C, Sherman DJ, Hildebrand F, Labarthe S, Frioux C. Community-scale models of microbiomes: Articulating metabolic modelling and metagenome sequencing. Microb Biotechnol 2024; 17:e14396. [PMID: 38243750 PMCID: PMC10832553 DOI: 10.1111/1751-7915.14396] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2023] [Revised: 11/27/2023] [Accepted: 12/20/2023] [Indexed: 01/21/2024] Open
Abstract
Building models is essential for understanding the functions and dynamics of microbial communities. Metabolic models built on genome-scale metabolic network reconstructions (GENREs) are especially relevant as a means to decipher the complex interactions occurring among species. Model reconstruction increasingly relies on metagenomics, which permits direct characterisation of naturally occurring communities that may contain organisms that cannot be isolated or cultured. In this review, we provide an overview of the field of metabolic modelling and its increasing reliance on and synergy with metagenomics and bioinformatics. We survey the means of assigning functions and reconstructing metabolic networks from (meta-)genomes, and present the variety and mathematical fundamentals of metabolic models that foster the understanding of microbial dynamics. We emphasise the characterisation of interactions and the scaling of model construction to large communities, two important bottlenecks in the applicability of these models. We give an overview of the current state of the art in metagenome sequencing and bioinformatics analysis, focusing on the reconstruction of genomes in microbial communities. Metagenomics benefits tremendously from third-generation sequencing, and we discuss the opportunities of long-read sequencing, strain-level characterisation and eukaryotic metagenomics. We aim at providing algorithmic and mathematical support, together with tool and application resources, that permit bridging the gap between metagenomics and metabolic modelling.
Collapse
Affiliation(s)
- Klara Cerk
- Quadram Institute BioscienceNorwichUK
- Earlham InstituteNorwichUK
| | | | - Chabname Ghassemi Nedjad
- Inria, University of Bordeaux, INRAETalenceFrance
- University of Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800TalenceFrance
| | - Maxime Lecomte
- Inria, University of Bordeaux, INRAETalenceFrance
- INRAE STLO¸University of RennesRennesFrance
| | | | | | - Falk Hildebrand
- Quadram Institute BioscienceNorwichUK
- Earlham InstituteNorwichUK
| | - Simon Labarthe
- Inria, University of Bordeaux, INRAETalenceFrance
- INRAE, University of Bordeaux, BIOGECO, UMR 1202CestasFrance
| | | |
Collapse
|
18
|
Kang X, Xu J, Luo X, Schönhuth A. Hybrid-hybrid correction of errors in long reads with HERO. Genome Biol 2023; 24:275. [PMID: 38041098 PMCID: PMC10690975 DOI: 10.1186/s13059-023-03112-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 11/16/2023] [Indexed: 12/03/2023] Open
Abstract
Although generally superior, hybrid approaches for correcting errors in third-generation sequencing (TGS) reads, using next-generation sequencing (NGS) reads, mistake haplotype-specific variants for errors in polyploid and mixed samples. We suggest HERO, as the first "hybrid-hybrid" approach, to make use of both de Bruijn graphs and overlap graphs for optimal catering to the particular strengths of NGS and TGS reads. Extensive benchmarking experiments demonstrate that HERO improves indel and mismatch error rates by on average 65% (27[Formula: see text]95%) and 20% (4[Formula: see text]61%). Using HERO prior to genome assembly significantly improves the assemblies in the majority of the relevant categories.
Collapse
Affiliation(s)
- Xiongbin Kang
- College of Biology, Hunan University, Changsha, China
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Jialu Xu
- College of Biology, Hunan University, Changsha, China
| | - Xiao Luo
- College of Biology, Hunan University, Changsha, China.
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany.
| |
Collapse
|
19
|
Heinken A, Hulshof TO, Nap B, Martinelli F, Basile A, O'Brolchain A, O’Sullivan NF, Gallagher C, Magee E, McDonagh F, Lalor I, Bergin M, Evans P, Daly R, Farrell R, Delaney RM, Hill S, McAuliffe SR, Kilgannon T, Fleming RM, Thinnes CC, Thiele I. APOLLO: A genome-scale metabolic reconstruction resource of 247,092 diverse human microbes spanning multiple continents, age groups, and body sites. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.02.560573. [PMID: 37873072 PMCID: PMC10592896 DOI: 10.1101/2023.10.02.560573] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/25/2023]
Abstract
Computational modelling of microbiome metabolism has proved instrumental to catalyse our understanding of diet-host-microbiome-disease interactions through the interrogation of mechanistic, strain- and molecule-resolved metabolic models. We present APOLLO, a resource of 247,092 human microbial genome-scale metabolic reconstructions spanning 19 phyla and accounting for microbial genomes from 34 countries, all age groups, and five body sites. We explored the metabolic potential of the reconstructed strains and developed a machine learning classifier able to predict with high accuracy the taxonomic strain assignments. We also built 14,451 sample-specific microbial community models, which could be stratified by body site, age, and disease states. Finally, we predicted faecal metabolites enriched or depleted in gut microbiomes of people with Crohn's disease, Parkinson disease, and undernourished children. APOLLO is compatible with the human whole-body models, and thus, provide unprecedented opportunities for systems-level modelling of personalised host-microbiome co-metabolism. APOLLO will be freely available under https://www.vmh.life/.
Collapse
Affiliation(s)
- Almut Heinken
- School of Medicine, University of Galway, Galway, Ireland
- Ryan Institute, University of Galway, Galway, Ireland
- Inserm UMRS 1256 NGERE, University of Lorraine, Nancy, France
| | - Timothy Otto Hulshof
- School of Medicine, University of Galway, Galway, Ireland
- Ryan Institute, University of Galway, Galway, Ireland
| | - Bram Nap
- School of Medicine, University of Galway, Galway, Ireland
- Ryan Institute, University of Galway, Galway, Ireland
| | - Filippo Martinelli
- School of Medicine, University of Galway, Galway, Ireland
- Ryan Institute, University of Galway, Galway, Ireland
| | - Arianna Basile
- School of Medicine, University of Galway, Galway, Ireland
- Department of Biology, University of Padova, Padova, Italy
| | | | | | | | | | | | - Ian Lalor
- University of Galway, Galway, Ireland
| | | | | | | | | | | | | | | | | | | | - Cyrille C. Thinnes
- School of Medicine, University of Galway, Galway, Ireland
- Ryan Institute, University of Galway, Galway, Ireland
| | - Ines Thiele
- School of Medicine, University of Galway, Galway, Ireland
- Ryan Institute, University of Galway, Galway, Ireland
- Division of Microbiology, University of Galway, Galway, Ireland
- APC Microbiome Ireland, Cork, Ireland
| |
Collapse
|
20
|
Breusing C, Xiao Y, Russell SL, Corbett-Detig RB, Li S, Sun J, Chen C, Lan Y, Qian PY, Beinart RA. Ecological differences among hydrothermal vent symbioses may drive contrasting patterns of symbiont population differentiation. mSystems 2023; 8:e0028423. [PMID: 37493648 PMCID: PMC10469979 DOI: 10.1128/msystems.00284-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Accepted: 06/13/2023] [Indexed: 07/27/2023] Open
Abstract
The intra-host composition of horizontally transmitted microbial symbionts can vary across host populations due to interactive effects of host genetics, environmental, and geographic factors. While adaptation to local habitat conditions can drive geographic subdivision of symbiont strains, it is unknown how differences in ecological characteristics among host-symbiont associations influence the genomic structure of symbiont populations. To address this question, we sequenced metagenomes of different populations of the deep-sea mussel Bathymodiolus septemdierum, which are common at Western Pacific deep-sea hydrothermal vents and show characteristic patterns of niche partitioning with sympatric gastropod symbioses. Bathymodiolus septemdierum lives in close symbiotic relationship with sulfur-oxidizing chemosynthetic bacteria but supplements its symbiotrophic diet through filter-feeding, enabling it to occupy ecological niches with little exposure to geochemical reductants. Our analyses indicate that symbiont populations associated with B. septemdierum show structuring by geographic location, but that the dominant symbiont strain is uncorrelated with vent site. These patterns are in contrast to co-occurring Alviniconcha and Ifremeria gastropod symbioses that exhibit greater symbiont nutritional dependence and occupy habitats with higher spatial variability in environmental conditions. Our results suggest that relative habitat homogeneity combined with sufficient symbiont dispersal and genomic mixing might promote persistence of similar symbiont strains across geographic locations, while mixotrophy might decrease selective pressures on the host to affiliate with locally adapted symbiont strains. Overall, these data contribute to our understanding of the potential mechanisms influencing symbiont population structure across a spectrum of marine microbial symbioses that occupy contrasting ecological niches. IMPORTANCE Beneficial relationships between animals and microbial organisms (symbionts) are ubiquitous in nature. In the ocean, microbial symbionts are typically acquired from the environment and their composition across geographic locations is often shaped by adaptation to local habitat conditions. However, it is currently unknown how generalizable these patterns are across symbiotic systems that have contrasting ecological characteristics. To address this question, we compared symbiont population structure between deep-sea hydrothermal vent mussels and co-occurring but ecologically distinct snail species. Our analyses show that mussel symbiont populations are less partitioned by geography and do not demonstrate evidence for environmental adaptation. We posit that the mussel's mixotrophic feeding mode may lower its need to affiliate with locally adapted symbiont strains, while microhabitat stability and symbiont genomic mixing likely favors persistence of symbiont strains across geographic locations. Altogether, these findings further our understanding of the mechanisms shaping symbiont population structure in marine environmentally transmitted symbioses.
Collapse
Affiliation(s)
- Corinna Breusing
- Graduate School of Oceanography, University of Rhode Island, Narragansett, Rhode Island, USA
| | - Yao Xiao
- Department of Ocean Science, Division of Life Science and Hong Kong Branch of the Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), The Hong Kong University of Science and Technology, Hong Kong, China
- The Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Nansha, Guangzhou, China
| | - Shelbi L. Russell
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, USA
| | - Russell B. Corbett-Detig
- Department of Biomolecular Engineering, University of California Santa Cruz, Santa Cruz, California, USA
| | - Sixuan Li
- Graduate School of Oceanography, University of Rhode Island, Narragansett, Rhode Island, USA
| | - Jin Sun
- Institute of Evolution & Marine Biodiversity, Ocean University of China, Qingdao, China
| | - Chong Chen
- X-STAR, Japan Agency for Marine-Earth Science and Technology (JAMSTEC), Yokosuka, Japan
| | - Yi Lan
- Department of Ocean Science, Division of Life Science and Hong Kong Branch of the Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), The Hong Kong University of Science and Technology, Hong Kong, China
- The Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Nansha, Guangzhou, China
| | - Pei-Yuan Qian
- Department of Ocean Science, Division of Life Science and Hong Kong Branch of the Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), The Hong Kong University of Science and Technology, Hong Kong, China
- The Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Nansha, Guangzhou, China
| | - Roxanne A. Beinart
- Graduate School of Oceanography, University of Rhode Island, Narragansett, Rhode Island, USA
| |
Collapse
|
21
|
Liao H, Ji Y, Sun Y. High-resolution strain-level microbiome composition analysis from short reads. MICROBIOME 2023; 11:183. [PMID: 37587527 PMCID: PMC10433603 DOI: 10.1186/s40168-023-01615-w] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Accepted: 07/07/2023] [Indexed: 08/18/2023]
Abstract
BACKGROUND Bacterial strains under the same species can exhibit different biological properties, making strain-level composition analysis an important step in understanding the dynamics of microbial communities. Metagenomic sequencing has become the major means for probing the microbial composition in host-associated or environmental samples. Although there are a plethora of composition analysis tools, they are not optimized to address the challenges in strain-level analysis: highly similar strain genomes and the presence of multiple strains under one species in a sample. Thus, this work aims to provide a high-resolution and more accurate strain-level analysis tool for short reads. RESULTS In this work, we present a new strain-level composition analysis tool named StrainScan that employs a novel tree-based k-mers indexing structure to strike a balance between the strain identification accuracy and the computational complexity. We tested StrainScan extensively on a large number of simulated and real sequencing data and benchmarked StrainScan with popular strain-level analysis tools including Krakenuniq, StrainSeeker, Pathoscope2, Sigma, StrainGE, and StrainEst. The results show that StrainScan has higher accuracy and resolution than the state-of-the-art tools on strain-level composition analysis. It improves the F1 score by 20% in identifying multiple strains at the strain level. CONCLUSIONS By using a novel k-mer indexing structure, StrainScan is able to provide strain-level analysis with higher resolution than existing tools, enabling it to return more informative strain composition analysis in one sample or across multiple samples. StrainScan takes short reads and a set of reference strains as input and its source codes are freely available at https://github.com/liaoherui/StrainScan . Video Abstract.
Collapse
Affiliation(s)
- Herui Liao
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China
| | - Yongxin Ji
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China
| | - Yanni Sun
- Department of Electrical Engineering, City University of Hong Kong, Kowloon, China.
| |
Collapse
|
22
|
Abstract
Techniques by which to genetically manipulate members of the microbiota enable both the evaluation of host-microbe interactions and an avenue by which to monitor and modulate human physiology. Genetic engineering applications have traditionally focused on model gut residents, such as Escherichia coli and lactic acid bacteria. However, emerging efforts by which to develop synthetic biology toolsets for "nonmodel" resident gut microbes could provide an improved foundation for microbiome engineering. As genome engineering tools come online, so too have novel applications for engineered gut microbes. Engineered resident gut bacteria facilitate investigations of the roles of microbes and their metabolites on host health and allow for potential live microbial biotherapeutics. Due to the rapid pace of discovery in this burgeoning field, this minireview highlights advancements in the genetic engineering of all resident gut microbes.
Collapse
Affiliation(s)
- Jack Arnold
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois, USA
| | - Joshua Glazier
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois, USA
| | - Mark Mimee
- Pritzker School of Molecular Engineering, University of Chicago, Chicago, Illinois, USA
- Department of Microbiology, University of Chicago, Chicago, Illinois, USA
| |
Collapse
|
23
|
Feehan B, Ran Q, Dorman V, Rumback K, Pogranichniy S, Ward K, Goodband R, Niederwerder MC, Lee STM. Novel complete methanogenic pathways in longitudinal genomic study of monogastric age-associated archaea. Anim Microbiome 2023; 5:35. [PMID: 37461084 DOI: 10.1186/s42523-023-00256-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2023] [Accepted: 07/11/2023] [Indexed: 07/20/2023] Open
Abstract
BACKGROUND Archaea perform critical roles in the microbiome system, including utilizing hydrogen to allow for enhanced microbiome member growth and influencing overall host health. With the majority of microbiome research focusing on bacteria, the functions of archaea are largely still under investigation. Understanding methanogenic functions during the host lifetime will add to the limited knowledge on archaeal influence on gut and host health. In our study, we determined lifelong archaea dynamics, including detection and methanogenic functions, while assessing global, temporal and host distribution of our novel archaeal metagenome-assembled genomes (MAGs). We followed 7 monogastric swine throughout their life, from birth to adult (1-156 days of age), and collected feces at 22 time points. The samples underwent gDNA extraction, Illumina sequencing, bioinformatic quality and assembly processes, MAG taxonomic assignment and functional annotation. MAGs were utilized in downstream phylogenetic analysis for global, temporal and host distribution in addition to methanogenic functional potential determination. RESULTS We generated 1130 non-redundant MAGs, representing 588 unique taxa at the species level, with 8 classified as methanogenic archaea. The taxonomic classifications were as follows: orders Methanomassiliicoccales (5) and Methanobacteriales (3); genera UBA71 (3), Methanomethylophilus (1), MX-02 (1), and Methanobrevibacter (3). We recovered the first US swine Methanobrevibacter UBA71 sp006954425 and Methanobrevibacter gottschalkii MAGs. The Methanobacteriales MAGs were identified primarily during the young, preweaned host whereas Methanomassiliicoccales primarily in the adult host. Moreover, we identified our methanogens in metagenomic sequences from Chinese swine, US adult humans, Mexican adult humans, Swedish adult humans, and paleontological humans, indicating that methanogens span different hosts, geography and time. We determined complete metabolic pathways for all three methanogenic pathways: hydrogenotrophic, methylotrophic, and acetoclastic. This study provided the first evidence of acetoclastic methanogenesis in archaea of monogastric hosts which indicated a previously unknown capability for acetate utilization in methanogenesis for monogastric methanogens. Overall, we hypothesized that the age-associated detection patterns were due to differential substrate availability via the host diet and microbial metabolism, and that these methanogenic functions are likely crucial to methanogens across hosts. This study provided a comprehensive, genome-centric investigation of monogastric-associated methanogens which will further improve our understanding of microbiome development and functions.
Collapse
Affiliation(s)
- Brandi Feehan
- Division of Biology, College of Arts and Sciences, Kansas State University, Manhattan, KS, 66506, USA
| | - Qinghong Ran
- Division of Biology, College of Arts and Sciences, Kansas State University, Manhattan, KS, 66506, USA
| | - Victoria Dorman
- Division of Biology, College of Arts and Sciences, Kansas State University, Manhattan, KS, 66506, USA
| | - Kourtney Rumback
- Division of Biology, College of Arts and Sciences, Kansas State University, Manhattan, KS, 66506, USA
| | - Sophia Pogranichniy
- Division of Biology, College of Arts and Sciences, Kansas State University, Manhattan, KS, 66506, USA
| | - Kaitlyn Ward
- Division of Biology, College of Arts and Sciences, Kansas State University, Manhattan, KS, 66506, USA
| | - Robert Goodband
- Department of Animal Sciences and Industry, College of Agriculture, Kansas State University, Manhattan, KS, 66506, USA
| | | | - Sonny T M Lee
- Division of Biology, College of Arts and Sciences, Kansas State University, Manhattan, KS, 66506, USA.
| |
Collapse
|
24
|
Watson AR, Füssel J, Veseli I, DeLongchamp JZ, Silva M, Trigodet F, Lolans K, Shaiber A, Fogarty E, Runde JM, Quince C, Yu MK, Söylev A, Morrison HG, Lee STM, Kao D, Rubin DT, Jabri B, Louie T, Eren AM. Metabolic independence drives gut microbial colonization and resilience in health and disease. Genome Biol 2023; 24:78. [PMID: 37069665 PMCID: PMC10108530 DOI: 10.1186/s13059-023-02924-x] [Citation(s) in RCA: 35] [Impact Index Per Article: 17.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2022] [Accepted: 04/07/2023] [Indexed: 04/19/2023] Open
Abstract
BACKGROUND Changes in microbial community composition as a function of human health and disease states have sparked remarkable interest in the human gut microbiome. However, establishing reproducible insights into the determinants of microbial succession in disease has been a formidable challenge. RESULTS Here we use fecal microbiota transplantation (FMT) as an in natura experimental model to investigate the association between metabolic independence and resilience in stressed gut environments. Our genome-resolved metagenomics survey suggests that FMT serves as an environmental filter that favors populations with higher metabolic independence, the genomes of which encode complete metabolic modules to synthesize critical metabolites, including amino acids, nucleotides, and vitamins. Interestingly, we observe higher completion of the same biosynthetic pathways in microbes enriched in IBD patients. CONCLUSIONS These observations suggest a general mechanism that underlies changes in diversity in perturbed gut environments and reveal taxon-independent markers of "dysbiosis" that may explain why widespread yet typically low-abundance members of healthy gut microbiomes can dominate under inflammatory conditions without any causal association with disease.
Collapse
Affiliation(s)
- Andrea R Watson
- Department of Medicine, The University of Chicago, Chicago, IL, 60637, USA
- Committee On Microbiology, The University of Chicago, Chicago, IL, 60637, USA
| | - Jessika Füssel
- Department of Medicine, The University of Chicago, Chicago, IL, 60637, USA
- Institute for Chemistry and Biology of the Marine Environment, University of Oldenburg, 26129, Oldenburg, Germany
| | - Iva Veseli
- Biophysical Sciences Program, The University of Chicago, Chicago, IL, 60637, USA
| | | | - Marisela Silva
- Department of Medicine, The University of Calgary, Calgary, AB, T2N 1N4, Canada
| | - Florian Trigodet
- Department of Medicine, The University of Chicago, Chicago, IL, 60637, USA
| | - Karen Lolans
- Department of Medicine, The University of Chicago, Chicago, IL, 60637, USA
| | - Alon Shaiber
- Biophysical Sciences Program, The University of Chicago, Chicago, IL, 60637, USA
| | - Emily Fogarty
- Department of Medicine, The University of Chicago, Chicago, IL, 60637, USA
- Committee On Microbiology, The University of Chicago, Chicago, IL, 60637, USA
| | - Joseph M Runde
- Department of Pediatrics, Lurie Children's Hospital of Chicago, Chicago, IL, 60611, USA
| | - Christopher Quince
- Organisms and Ecosystems, Earlham Institute, Norwich, Norwich, NR4 7UZ, UK
- Gut Microbes and Health, Quadram Institute, Norwich, NR4 7UQ, UK
| | - Michael K Yu
- Toyota Technological Institute at Chicago, Chicago, IL, 60637, USA
| | - Arda Söylev
- Department of Computer Engineering, Konya Food and Agriculture University, Konya, Turkey
| | - Hilary G Morrison
- Marine Biological Laboratory, Josephine Bay Paul Center, Woods Hole, Falmouth, MA, 02543, USA
| | - Sonny T M Lee
- Department of Medicine, The University of Chicago, Chicago, IL, 60637, USA
| | - Dina Kao
- Department of Medicine, University of Alberta, Edmonton, AB, T6G 2G3, Canada
| | - David T Rubin
- Department of Medicine, The University of Chicago, Chicago, IL, 60637, USA
| | - Bana Jabri
- Department of Medicine, The University of Chicago, Chicago, IL, 60637, USA
| | - Thomas Louie
- Department of Medicine, The University of Calgary, Calgary, AB, T2N 1N4, Canada
| | - A Murat Eren
- Department of Medicine, The University of Chicago, Chicago, IL, 60637, USA.
- Committee On Microbiology, The University of Chicago, Chicago, IL, 60637, USA.
- Institute for Chemistry and Biology of the Marine Environment, University of Oldenburg, 26129, Oldenburg, Germany.
- Marine Biological Laboratory, Josephine Bay Paul Center, Woods Hole, Falmouth, MA, 02543, USA.
- Helmholtz Institute for Functional Marine Biodiversity, 26129, Oldenburg, Germany.
| |
Collapse
|
25
|
Herold M, Hock L, Penny C, Walczak C, Djabi F, Cauchie HM, Ragimbeau C. Metagenomic Strain-Typing Combined with Isolate Sequencing Provides Increased Resolution of the Genetic Diversity of Campylobacter jejuni Carriage in Wild Birds. Microorganisms 2023; 11:microorganisms11010121. [PMID: 36677413 PMCID: PMC9860660 DOI: 10.3390/microorganisms11010121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2022] [Revised: 12/28/2022] [Accepted: 12/29/2022] [Indexed: 01/05/2023] Open
Abstract
As the world's leading cause of human gastro-enteritis, the food- and waterborne pathogen Campylobacter needs to be intensively monitored through a One Health approach. Particularly, wild birds have been hypothesized to contribute to the spread of human clinical recurring C. jejuni genotypes across several countries. A major concern in studying epidemiological dynamics is resolving the large genomic diversity of strains circulating in the environment and various reservoirs, challenging to achieve with isolation techniques. Here, we applied a passive-filtration method to obtain isolates and in parallel recovered genotypes from metagenomic sequencing data from associated filter sweeps. For genotyping mixed strains, a reference-based computational workflow to predict allelic profiles of nine extended-MLST loci was utilized. We validated the pipeline by sequencing artificial mixtures of C. jejuni strains and observed the highest prediction accuracy when including obtained isolates as references. By analyzing metagenomic samples, we were able to detect over 20% additional genetic diversity and observed an over 50% increase in the potential to connect genotypes across wild-bird samples. With an optimized filtration method and a computational approach for genotyping strain mixtures, we provide the foundation for future studies assessing C. jejuni diversity in environmental and clinical settings at improved throughput and resolution.
Collapse
Affiliation(s)
- Malte Herold
- Environmental Research and Innovation (ERIN) Department, Luxembourg Institute of Science and Technology (LIST), 41 rue du Brill, L-4422 Belvaux, Luxembourg
- Epidemiology and Microbial Genomics, Laboratoire National de Santé (LNS), 1 rue Louis Rech, L-3555 Dudelange, Luxembourg
- Correspondence:
| | - Louise Hock
- Environmental Research and Innovation (ERIN) Department, Luxembourg Institute of Science and Technology (LIST), 41 rue du Brill, L-4422 Belvaux, Luxembourg
| | - Christian Penny
- Environmental Research and Innovation (ERIN) Department, Luxembourg Institute of Science and Technology (LIST), 41 rue du Brill, L-4422 Belvaux, Luxembourg
| | - Cécile Walczak
- Environmental Research and Innovation (ERIN) Department, Luxembourg Institute of Science and Technology (LIST), 41 rue du Brill, L-4422 Belvaux, Luxembourg
| | - Fatu Djabi
- Epidemiology and Microbial Genomics, Laboratoire National de Santé (LNS), 1 rue Louis Rech, L-3555 Dudelange, Luxembourg
| | - Henry-Michel Cauchie
- Environmental Research and Innovation (ERIN) Department, Luxembourg Institute of Science and Technology (LIST), 41 rue du Brill, L-4422 Belvaux, Luxembourg
| | - Catherine Ragimbeau
- Epidemiology and Microbial Genomics, Laboratoire National de Santé (LNS), 1 rue Louis Rech, L-3555 Dudelange, Luxembourg
| |
Collapse
|
26
|
Ma S, Li H. Statistical and Computational Methods for Microbial Strain Analysis. Methods Mol Biol 2023; 2629:231-245. [PMID: 36929080 DOI: 10.1007/978-1-0716-2986-4_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/17/2023]
Abstract
Microbial strains are interpreted as a lineage derived from a recent ancestor that have not experienced "too many" recombination events and can be successfully retrieved with culture-independent techniques using metagenomic sequencing. Such a strain variability has been increasingly shown to display additional phenotypic heterogeneities that affect host health, such as virulence, transmissibility, and antibiotics resistance. New statistical and computational methods have recently been developed to track the strains in samples based on shotgun metagenomics data either based on reference genome sequences or Metagenome-assembled genomes (MAGs). In this paper, we review some recent statistical methods for strain identifications based on frequency counts at a set of single nucleotide variants (SNVs) within a set of single-copy marker genes. These methods differ in terms of whether reference genome sequences are needed, how SNVs are called, what methods of deconvolution are used and whether the methods can be applied to multiple samples. We conclude our review with areas that require further research.
Collapse
Affiliation(s)
- Siyuan Ma
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA
| | - Hongzhe Li
- Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA.
| |
Collapse
|
27
|
Genome-Centric Dynamics Shape the Diversity of Oral Bacterial Populations. mBio 2022; 13:e0241422. [PMID: 36214570 PMCID: PMC9765137 DOI: 10.1128/mbio.02414-22] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Two major viewpoints have been put forward for how microbial populations change, differing in whether adaptation is driven principally by gene-centric or genome-centric processes. Longitudinal sampling at microbially relevant timescales, i.e., days to weeks, is critical for distinguishing these mechanisms. Because of its significance for both microbial ecology and human health and its accessibility and high level of curation, we used the oral microbiota to study bacterial intrapopulation genome dynamics. Metagenomes were generated by shotgun sequencing of total community DNA from the healthy tongues of 17 volunteers at four to seven time points obtained over intervals of days to weeks. We obtained 390 high-quality metagenome-assembled genomes (MAGs) defining population genomes from 55 genera. The vast majority of genes in each MAG were tightly linked over the 2-week sampling window, indicating that the majority of the population's genomes were temporally stable at the MAG level. MAG-defined populations were composed of up to 5 strains, as determined by single-nucleotide-variant frequencies. Although most were stable over time, individual strains carrying over 100 distinct genes that rose from low abundance to dominance in a population over a period of days were detected. These results indicate a genome-wide as opposed to a gene-level process of population change. We infer that genome-wide selection of ecotypes is the dominant mode of adaptation in the oral populations over short timescales. IMPORTANCE The oral microbiome represents a microbial community of critical relevance to human health. Recent studies have documented the diversity and dynamics of different bacteria to reveal a rich, stable ecosystem characterized by strain-level dynamics. However, bacterial populations and their genomes are neither monolithic nor static; their genomes are constantly evolving to lose, gain, or alter their functional potential. To better understand how microbial genomes change in complex communities, we used culture-independent approaches to reconstruct the genomes (MAGs) for bacterial populations that approximated different species, in 17 healthy donors' mouths over a 2-week window. Our results underscored the importance of strain-level dynamics, which agrees with and expands on the conclusions of previous research. Altogether, these observations reveal patterns of genomic dynamics among strains of oral bacteria occurring over a matter of days.
Collapse
|
28
|
Ide K, Nishikawa Y, Maruyama T, Tsukada Y, Kogawa M, Takeda H, Ito H, Wagatsuma R, Miyaoka R, Nakano Y, Kinjo K, Ito M, Hosokawa M, Yura K, Suda S, Takeyama H. Targeted single-cell genomics reveals novel host adaptation strategies of the symbiotic bacteria Endozoicomonas in Acropora tenuis coral. MICROBIOME 2022; 10:220. [PMID: 36503599 PMCID: PMC9743535 DOI: 10.1186/s40168-022-01395-9] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Accepted: 10/13/2022] [Indexed: 06/17/2023]
Abstract
BACKGROUND Endozoicomonas bacteria symbiosis with various marine organisms is hypothesized as a potential indicator of health in corals. Although many amplicon analyses using 16S rRNA gene have suggested the diversity of Endozoicomonas species, genome analysis has been limited due to contamination of host-derived sequences and difficulties in culture and metagenomic analysis. Therefore, the evolutionary and functional potential of individual Endozoicomonas species symbiotic with the same coral species remains unresolved. RESULTS In this study, we applied a novel single-cell genomics technique using droplet microfluidics to obtain single-cell amplified genomes (SAGs) for uncultured coral-associated Endozoicomonas spp. We obtained seven novel Endozoicomonas genomes and quantitative bacterial composition from Acropora tenuis corals at four sites in Japan. Our quantitative 16S rRNA gene and comparative genomic analysis revealed that these Endozoicomonas spp. belong to different lineages (Clade A and Clade B), with widely varying abundance among individual corals. Furthermore, each Endozoicomonas species possessed various eukaryotic-like genes in clade-specific genes. It was suggested that these eukaryotic-like genes might have a potential ability of different functions in each clade, such as infection of the host coral or suppression of host immune pathways. These Endozoicomonas species may have adopted different host adaptation strategies despite living symbiotically on the same coral. CONCLUSIONS This study suggests that coral-associated Endozoicomonas spp. on the same species of coral have different evolutional strategies and functional potentials in each species and emphasizes the need to analyze the genome of each uncultured strain in future coral-Endozoicomonas relationships studies. Video Abstract.
Collapse
Affiliation(s)
- Keigo Ide
- Department of Life Science and Medical Bioscience, Waseda University, Tokyo, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
- Research Organization for Nano and Life Innovation, Waseda University, Tokyo, Japan
| | - Yohei Nishikawa
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
- Research Organization for Nano and Life Innovation, Waseda University, Tokyo, Japan
| | - Toru Maruyama
- Department of Life Science and Medical Bioscience, Waseda University, Tokyo, Japan
| | - Yuko Tsukada
- Department of Life Science and Medical Bioscience, Waseda University, Tokyo, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
| | - Masato Kogawa
- Department of Life Science and Medical Bioscience, Waseda University, Tokyo, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
- Research Organization for Nano and Life Innovation, Waseda University, Tokyo, Japan
| | - Hiroki Takeda
- Department of Life Science and Medical Bioscience, Waseda University, Tokyo, Japan
| | - Haruka Ito
- Department of Life Science and Medical Bioscience, Waseda University, Tokyo, Japan
| | - Ryota Wagatsuma
- Department of Life Science and Medical Bioscience, Waseda University, Tokyo, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
| | - Rimi Miyaoka
- Department of Life Science and Medical Bioscience, Waseda University, Tokyo, Japan
| | - Yoshikatsu Nakano
- Tropical Biosphere Research Center, University of the Ryukyus, Okinawa, Japan
- Marine Science Section, Research Support Division, Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
| | | | - Michihiro Ito
- Tropical Biosphere Research Center, University of the Ryukyus, Okinawa, Japan
| | - Masahito Hosokawa
- Department of Life Science and Medical Bioscience, Waseda University, Tokyo, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
- Research Organization for Nano and Life Innovation, Waseda University, Tokyo, Japan
- Institute for Advanced Research of Biosystem Dynamics, Waseda Research Institute for Science and Engineering, Tokyo, Japan
| | - Kei Yura
- Department of Life Science and Medical Bioscience, Waseda University, Tokyo, Japan
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology, Tokyo, Japan
- Research Organization for Nano and Life Innovation, Waseda University, Tokyo, Japan
- Graduate School of Humanities and Sciences, Ochanomizu University, Tokyo, Japan
| | - Shoichiro Suda
- Faculty of Science, University of the Ryukyus, Okinawa, Japan
| | - Haruko Takeyama
- Department of Life Science and Medical Bioscience, Waseda University, Tokyo, Japan.
- Computational Bio Big-Data Open Innovation Laboratory (CBBD-OIL), National Institute of Advanced Industrial Science and Technology, Tokyo, Japan.
- Research Organization for Nano and Life Innovation, Waseda University, Tokyo, Japan.
- Institute for Advanced Research of Biosystem Dynamics, Waseda Research Institute for Science and Engineering, Tokyo, Japan.
| |
Collapse
|
29
|
Caldwell R, Zhou W, Oh J. Strains to go: interactions of the skin microbiome beyond its species. Curr Opin Microbiol 2022; 70:102222. [PMID: 36242896 PMCID: PMC9701184 DOI: 10.1016/j.mib.2022.102222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2022] [Revised: 09/07/2022] [Accepted: 09/14/2022] [Indexed: 01/25/2023]
Abstract
An extraordinary biodiversity of bacteria, fungi, viruses, and even small multicellular eukaryota inhabit the human skin. Genomic innovations have accelerated characterization of this biodiversity both at a species as well as the subspecies, or strain level, which further imparts a tremendous genetic diversity to an individual's skin microbiome. In turn, these advances portend significant species- and strain-specificity in the skin microbiome's functional impact on cutaneous immunity, barrier integrity, aging, and other skin physiologic processes. Future advances in defining strain diversity, spatial distribution, and metabolic diversity for major skin species will be foundational for understanding the microbiome's essentiality to the skin ecosystem and for designing topical therapeutics that leverage or target the skin microbiome.
Collapse
Affiliation(s)
- Ryan Caldwell
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States
| | - Wei Zhou
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States
| | - Julia Oh
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, United States.
| |
Collapse
|
30
|
VeChat: correcting errors in long reads using variation graphs. Nat Commun 2022; 13:6657. [PMID: 36333324 PMCID: PMC9636371 DOI: 10.1038/s41467-022-34381-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Accepted: 10/24/2022] [Indexed: 11/06/2022] Open
Abstract
Error correction is the canonical first step in long-read sequencing data analysis. Current self-correction methods, however, are affected by consensus sequence induced biases that mask true variants in haplotypes of lower frequency showing in mixed samples. Unlike consensus sequence templates, graph-based reference systems are not affected by such biases, so do not mistakenly mask true variants as errors. We present VeChat, as an approach to implement this idea: VeChat is based on variation graphs, as a popular type of data structure for pangenome reference systems. Extensive benchmarking experiments demonstrate that long reads corrected by VeChat contain 4 to 15 (Pacific Biosciences) and 1 to 10 times (Oxford Nanopore Technologies) less errors than when being corrected by state of the art approaches. Further, using VeChat prior to long-read assembly significantly improves the haplotype awareness of the assemblies. VeChat is an easy-to-use open-source tool and publicly available at https://github.com/HaploKit/vechat .
Collapse
|
31
|
Hu H, Tan Y, Li C, Chen J, Kou Y, Xu ZZ, Liu Y, Tan Y, Dai L. StrainPanDA: Linked reconstruction of strain composition and gene content profiles via pangenome-based decomposition of metagenomic data. IMETA 2022; 1:e41. [PMID: 38868710 PMCID: PMC10989911 DOI: 10.1002/imt2.41] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 05/20/2022] [Accepted: 06/28/2022] [Indexed: 06/14/2024]
Abstract
Microbial strains of variable functional capacities coexist in microbiomes. Current bioinformatics methods of strain analysis cannot provide the direct linkage between strain composition and their gene contents from metagenomic data. Here we present Strain-level Pangenome Decomposition Analysis (StrainPanDA), a novel method that uses the pangenome coverage profile of multiple metagenomic samples to simultaneously reconstruct the composition and gene content variation of coexisting strains in microbial communities. We systematically validate the accuracy and robustness of StrainPanDA using synthetic data sets. To demonstrate the power of gene-centric strain profiling, we then apply StrainPanDA to analyze the gut microbiome samples of infants, as well as patients treated with fecal microbiota transplantation. We show that the linked reconstruction of strain composition and gene content profiles is critical for understanding the relationship between microbial adaptation and strain-specific functions (e.g., nutrient utilization and pathogenicity). Finally, StrainPanDA has minimal requirements for computing resources and can be scaled to process multiple species in a community in parallel. In short, StrainPanDA can be applied to metagenomic data sets to detect the association between molecular functions and microbial/host phenotypes to formulate testable hypotheses and gain novel biological insights at the strain or subspecies level.
Collapse
Affiliation(s)
- Han Hu
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic BiologyShenzhen Institutes of Advanced Technology, Chinese Academy of SciencesShenzhenChina
- Bioinformatics DepartmentXbiome, Scientific Research Building, Tsinghua High‐Tech ParkShenzhenChina
| | - Yuxiang Tan
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic BiologyShenzhen Institutes of Advanced Technology, Chinese Academy of SciencesShenzhenChina
| | - Chenhao Li
- Center for Computational and Integrative BiologyMassachusetts General Hospital and Harvard Medical School, Richard B. Simches Research CenterBostonMassachusettsUSA
| | - Junyu Chen
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic BiologyShenzhen Institutes of Advanced Technology, Chinese Academy of SciencesShenzhenChina
| | - Yan Kou
- Bioinformatics DepartmentXbiome, Scientific Research Building, Tsinghua High‐Tech ParkShenzhenChina
| | - Zhenjiang Zech Xu
- Department of Food Science and Technology, State Key Laboratory of Food Science and TechnologyNanchang UniversityNanchangChina
| | - Yang‐Yu Liu
- Channing Division of Network Medicine, Department of MedicineBrigham and Women's Hospital and Harvard Medical SchoolBostonMassachusettsUSA
| | - Yan Tan
- Bioinformatics DepartmentXbiome, Scientific Research Building, Tsinghua High‐Tech ParkShenzhenChina
| | - Lei Dai
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic BiologyShenzhen Institutes of Advanced Technology, Chinese Academy of SciencesShenzhenChina
| |
Collapse
|
32
|
A revisit to universal single-copy genes in bacterial genomes. Sci Rep 2022; 12:14550. [PMID: 36008577 PMCID: PMC9411617 DOI: 10.1038/s41598-022-18762-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2022] [Accepted: 08/18/2022] [Indexed: 11/08/2022] Open
Abstract
Universal single-copy genes (USCGs) are widely used for species classification and taxonomic profiling. Despite many studies on USCGs, our understanding of USCGs in bacterial genomes might be out of date, especially how different the USCGs are in different studies, how well a set of USCGs can distinguish two bacterial species, whether USCGs can separate different strains of a bacterial species, to name a few. To fill the void, we studied USCGs in the most updated complete bacterial genomes. We showed that different USCG sets are quite different while coming from highly similar functional categories. We also found that although USCGs occur once in almost all bacterial genomes, each USCG does occur multiple times in certain genomes. We demonstrated that USCGs are reliable markers to distinguish different species while they cannot distinguish different strains of most bacterial species. Our study sheds new light on the usage and limitations of USCGs, which will facilitate their applications in evolutionary, phylogenomic, and metagenomic studies.
Collapse
|
33
|
Abstract
The subseafloor is a vast habitat that supports microorganisms that have a global scale impact on geochemical cycles. Many of the endemic microbial communities inhabiting the subseafloor consist of small populations under growth-limited conditions. For small populations, stochastic evolutionary events can have large impacts on intraspecific population dynamics and allele frequencies. These conditions are fundamentally different from those experienced by most microorganisms in surface environments, and it is unknown how small population sizes and growth-limiting conditions influence evolution and population structure in the subsurface. Using a 2-year, high-resolution environmental time series, we examine the dynamics of microbial populations from cold, oxic crustal fluids collected from the subseafloor site North Pond, located near the mid-Atlantic ridge. Our results reveal rapid shifts in overall abundance, allele frequency, and strain abundance across the time points observed, with evidence for homologous recombination between coexisting lineages. We show that the subseafloor aquifer is a dynamic habitat that hosts microbial metapopulations that disperse frequently through the crustal fluids, enabling gene flow and recombination between microbial populations. The dynamism and stochasticity of microbial population dynamics in North Pond suggest that these forces are important drivers in the evolution of microbial populations in the vast subseafloor habitat.
Collapse
|
34
|
Romero Picazo D, Werner A, Dagan T, Kupczok A. Pangenome Evolution in Environmentally Transmitted Symbionts of Deep-Sea Mussels Is Governed by Vertical Inheritance. Genome Biol Evol 2022; 14:evac098. [PMID: 35731940 PMCID: PMC9260185 DOI: 10.1093/gbe/evac098] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/18/2022] [Indexed: 11/13/2022] Open
Abstract
Microbial pangenomes vary across species; their size and structure are determined by genetic diversity within the population and by gene loss and horizontal gene transfer (HGT). Many bacteria are associated with eukaryotic hosts where the host colonization dynamics may impact bacterial genome evolution. Host-associated lifestyle has been recognized as a barrier to HGT in parentally transmitted bacteria. However, pangenome evolution of environmentally acquired symbionts remains understudied, often due to limitations in symbiont cultivation. Using high-resolution metagenomics, here we study pangenome evolution of two co-occurring endosymbionts inhabiting Bathymodiolus brooksi mussels from a single cold seep. The symbionts, sulfur-oxidizing (SOX) and methane-oxidizing (MOX) gamma-proteobacteria, are environmentally acquired at an early developmental stage and individual mussels may harbor multiple strains of each symbiont species. We found differences in the accessory gene content of both symbionts across individual mussels, which are reflected by differences in symbiont strain composition. Compared with core genes, accessory genes are enriched in genome plasticity functions. We found no evidence for recent HGT between both symbionts. A comparison between the symbiont pangenomes revealed that the MOX population is less diverged and contains fewer accessory genes, supporting that the MOX association with B. brooksi is more recent in comparison to that of SOX. Our results show that the pangenomes of both symbionts evolved mainly by vertical inheritance. We conclude that genome evolution of environmentally transmitted symbionts that associate with individual hosts over their lifetime is affected by a narrow symbiosis where the frequency of HGT is constrained.
Collapse
Affiliation(s)
- Devani Romero Picazo
- Genomic Microbiology Group, Institute of General Microbiology, Christian-Albrechts University, 24118 Kiel, Germany
| | - Almut Werner
- Genomic Microbiology Group, Institute of General Microbiology, Christian-Albrechts University, 24118 Kiel, Germany
| | - Tal Dagan
- Genomic Microbiology Group, Institute of General Microbiology, Christian-Albrechts University, 24118 Kiel, Germany
| | - Anne Kupczok
- Genomic Microbiology Group, Institute of General Microbiology, Christian-Albrechts University, 24118 Kiel, Germany
- Max Planck Institute for Marine Microbiology, 28359 Bremen, Germany
- Bioinformatics Group, Wageningen University & Research, 6708PB Wageningen, The Netherlands
| |
Collapse
|
35
|
Kang X, Luo X, Schönhuth A. StrainXpress: strain aware metagenome assembly from short reads. Nucleic Acids Res 2022; 50:e101. [PMID: 35776122 PMCID: PMC9508831 DOI: 10.1093/nar/gkac543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2021] [Revised: 05/27/2022] [Accepted: 06/30/2022] [Indexed: 12/05/2022] Open
Abstract
Next-generation sequencing–based metagenomics has enabled to identify microorganisms in characteristic habitats without the need for lengthy cultivation. Importantly, clinically relevant phenomena such as resistance to medication, virulence or interactions with the environment can vary already within species. Therefore, a major current challenge is to reconstruct individual genomes from the sequencing reads at the level of strains, and not just the level of species. However, strains of one species can differ only by minor amounts of variants, which makes it difficult to distinguish them. Despite considerable recent progress, related approaches have remained fragmentary so far. Here, we present StrainXpress, as a comprehensive solution to the problem of strain aware metagenome assembly from next-generation sequencing reads. In experiments, StrainXpress reconstructs strain-specific genomes from metagenomes that involve up to >1000 strains and proves to successfully deal with poorly covered strains. The amount of reconstructed strain-specific sequence exceeds that of the current state-of-the-art approaches by on average 26.75% across all data sets (first quartile: 18.51%, median: 26.60%, third quartile: 35.05%).
Collapse
Affiliation(s)
- Xiongbin Kang
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, 33615, Germany
| | - Xiao Luo
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, 33615, Germany
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, 33615, Germany
| |
Collapse
|
36
|
Haryono MAS, Law YY, Arumugam K, Liew LCW, Nguyen TQN, Drautz-Moses DI, Schuster SC, Wuertz S, Williams RBH. Recovery of High Quality Metagenome-Assembled Genomes From Full-Scale Activated Sludge Microbial Communities in a Tropical Climate Using Longitudinal Metagenome Sampling. Front Microbiol 2022; 13:869135. [PMID: 35756038 PMCID: PMC9230771 DOI: 10.3389/fmicb.2022.869135] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2022] [Accepted: 05/05/2022] [Indexed: 01/23/2023] Open
Abstract
The analysis of metagenome data based on the recovery of draft genomes (so called metagenome-assembled genomes, or MAG) has assumed an increasingly central role in microbiome research in recent years. Microbial communities underpinning the operation of wastewater treatment plants are particularly challenging targets for MAG analysis due to their high ecological complexity, and remain important, albeit understudied, microbial communities that play ssa key role in mediating interactions between human and natural ecosystems. Here we consider strategies for recovery of MAG sequence from time series metagenome surveys of full-scale activated sludge microbial communities. We generate MAG catalogs from this set of data using several different strategies, including the use of multiple individual sample assemblies, two variations on multi-sample co-assembly and a recently published MAG recovery workflow using deep learning. We obtain a total of just under 9,100 draft genomes, which collapse to around 3,100 non-redundant genomic clusters. We examine the strengths and weaknesses of these approaches in relation to MAG yield and quality, showing that co-assembly may offer advantages over single-sample assembly in the case of metagenome data obtained from closely sampled longitudinal study designs. Around 1,000 MAGs were candidates for being considered high quality, based on single-copy marker gene occurrence statistics, however only 58 MAG formally meet the MIMAG criteria for being high quality draft genomes. These findings carry broader broader implications for performing genome-resolved metagenomics on highly complex communities, the design and implementation of genome recoverability strategies, MAG decontamination and the search for better binning methodology.
Collapse
Affiliation(s)
- Mindia A S Haryono
- Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, Singapore
| | - Ying Yu Law
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Krithika Arumugam
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Larry C-W Liew
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Thi Quynh Ngoc Nguyen
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Daniela I Drautz-Moses
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore
| | - Stephan C Schuster
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore.,School of Biological Sciences, Nanyang Technological University, Singapore, Singapore
| | - Stefan Wuertz
- Singapore Centre for Environmental Life Sciences Engineering, Nanyang Technological University, Singapore, Singapore.,School of Civil and Environmental Engineering, Nanyang Technological University, Singapore, Singapore
| | - Rohan B H Williams
- Singapore Centre for Environmental Life Sciences Engineering, National University of Singapore, Singapore, Singapore
| |
Collapse
|
37
|
Goussarov G, Mysara M, Vandamme P, Van Houdt R. Introduction to the principles and methods underlying the recovery of metagenome-assembled genomes from metagenomic data. Microbiologyopen 2022; 11:e1298. [PMID: 35765182 PMCID: PMC9179125 DOI: 10.1002/mbo3.1298] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2022] [Revised: 05/19/2022] [Accepted: 05/19/2022] [Indexed: 11/18/2022] Open
Abstract
The rise of metagenomics offers a leap forward for understanding the genetic diversity of microorganisms in many different complex environments by providing a platform that can identify potentially unlimited numbers of known and novel microorganisms. As such, it is impossible to imagine new major initiatives without metagenomics. Nevertheless, it represents a relatively new discipline with various levels of complexity and demands on bioinformatics. The underlying principles and methods used in metagenomics are often seen as common knowledge and often not detailed or fragmented. Therefore, we reviewed these to guide microbiologists in taking the first steps into metagenomics. We specifically focus on a workflow aimed at reconstructing individual genomes, that is, metagenome-assembled genomes, integrating DNA sequencing, assembly, binning, identification and annotation.
Collapse
Affiliation(s)
- Gleb Goussarov
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN)MolBelgium
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of SciencesGhent UniversityGhentBelgium
| | - Mohamed Mysara
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN)MolBelgium
| | - Peter Vandamme
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of SciencesGhent UniversityGhentBelgium
| | - Rob Van Houdt
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN)MolBelgium
| |
Collapse
|
38
|
Smith BJ, Li X, Shi ZJ, Abate A, Pollard KS. Scalable Microbial Strain Inference in Metagenomic Data Using StrainFacts. FRONTIERS IN BIOINFORMATICS 2022; 2:867386. [PMID: 36304283 PMCID: PMC9580935 DOI: 10.3389/fbinf.2022.867386] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 04/14/2022] [Indexed: 11/25/2022] Open
Abstract
While genome databases are nearing a complete catalog of species commonly inhabiting the human gut, their representation of intraspecific diversity is lacking for all but the most abundant and frequently studied taxa. Statistical deconvolution of allele frequencies from shotgun metagenomic data into strain genotypes and relative abundances is a promising approach, but existing methods are limited by computational scalability. Here we introduce StrainFacts, a method for strain deconvolution that enables inference across tens of thousands of metagenomes. We harness a “fuzzy” genotype approximation that makes the underlying graphical model fully differentiable, unlike existing methods. This allows parameter estimates to be optimized with gradient-based methods, speeding up model fitting by two orders of magnitude. A GPU implementation provides additional scalability. Extensive simulations show that StrainFacts can perform strain inference on thousands of metagenomes and has comparable accuracy to more computationally intensive tools. We further validate our strain inferences using single-cell genomic sequencing from a human stool sample. Applying StrainFacts to a collection of more than 10,000 publicly available human stool metagenomes, we quantify patterns of strain diversity, biogeography, and linkage-disequilibrium that agree with and expand on what is known based on existing reference genomes. StrainFacts paves the way for large-scale biogeography and population genetic studies of microbiomes using metagenomic data.
Collapse
Affiliation(s)
- Byron J. Smith
- The Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, United States
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, United States
| | - Xiangpeng Li
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, United States
| | - Zhou Jason Shi
- The Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, United States
- Chan-Zuckerberg Biohub, San Francisco, CA, United States
| | - Adam Abate
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, San Francisco, CA, United States
- Chan-Zuckerberg Biohub, San Francisco, CA, United States
| | - Katherine S. Pollard
- The Gladstone Institute of Data Science and Biotechnology, San Francisco, CA, United States
- Department of Epidemiology and Biostatistics, University of California, San Francisco, San Francisco, CA, United States
- Chan-Zuckerberg Biohub, San Francisco, CA, United States
- *Correspondence: Katherine S. Pollard,
| |
Collapse
|
39
|
Luo X, Kang X, Schönhuth A. Enhancing Long-Read-Based Strain-Aware Metagenome Assembly. Front Genet 2022; 13:868280. [PMID: 35646097 PMCID: PMC9136235 DOI: 10.3389/fgene.2022.868280] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2022] [Accepted: 04/01/2022] [Indexed: 11/18/2022] Open
Abstract
Microbial communities are usually highly diverse and often involve multiple strains from the participating species due to the rapid evolution of microorganisms. In such a complex microecosystem, different strains may show different biological functions. While reconstruction of individual genomes at the strain level is vital for accurately deciphering the composition of microbial communities, the problem has largely remained unresolved so far. Next-generation sequencing has been routinely used in metagenome assembly but there have been struggles to generate strain-specific genome sequences due to the short-read length. This explains why long-read sequencing technologies have recently provided unprecedented opportunities to carry out haplotype- or strain-resolved genome assembly. Here, we propose MetaBooster and MetaBooster-HiFi, as two pipelines for strain-aware metagenome assembly from PacBio CLR and Oxford Nanopore long-read sequencing data. Benchmarking experiments on both simulated and real sequencing data demonstrate that either the MetaBooster or the MetaBooster-HiFi pipeline drastically outperforms the state-of-the-art de novo metagenome assemblers, in terms of all relevant metagenome assembly criteria, involving genome fraction, contig length, and error rates.
Collapse
Affiliation(s)
- Xiao Luo
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
- Life Science and Health, Centrum Wiskunde and Informatica, Amsterdam, Netherlands
| | - Xiongbin Kang
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
| | - Alexander Schönhuth
- Genome Data Science, Faculty of Technology, Bielefeld University, Bielefeld, Germany
- Life Science and Health, Centrum Wiskunde and Informatica, Amsterdam, Netherlands
| |
Collapse
|
40
|
Kong HH, Oh J. State of Residency: Microbial Strain Diversity in the Skin. J Invest Dermatol 2022; 142:1260-1264. [PMID: 34688614 PMCID: PMC9021319 DOI: 10.1016/j.jid.2021.10.005] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Revised: 10/04/2021] [Accepted: 10/09/2021] [Indexed: 12/31/2022]
Abstract
Human skin hosts a diversity of microbiota. Advances in sequencing and analytical methods have increasingly illuminated the importance of the finest resolution in understanding the genetic diversity of the skin microbiota, highlighting strain-level differences and their functional implications. Such genetic diversity, which exists within an individual and is strongly individual specific underscores the difficulty in elucidating functionality. Integrated investigations of the microbial strain diversity through sequencing and culture-based approaches with host immunology and physiology will be critical in expanding our understanding of the roles of the skin microbiome.
Collapse
Affiliation(s)
- Heidi H Kong
- National Institute of Arthritis and Musculoskeletal and Skin Diseases, National Institutes of Health, Bethesda, Maryland, USA
| | - Julia Oh
- The Jackson Laboratory for Genomic Medicine, Farmington, Connecticut, USA.
| |
Collapse
|
41
|
Jin S, Wetzel D, Schirmer M. Deciphering mechanisms and implications of bacterial translocation in human health and disease. Curr Opin Microbiol 2022; 67:102147. [PMID: 35461008 DOI: 10.1016/j.mib.2022.102147] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2021] [Revised: 02/28/2022] [Accepted: 03/03/2022] [Indexed: 12/12/2022]
Abstract
Significant increases in potential microbial translocation, especially along the oral-gut axis, have been identified in many immune-related and inflammatory diseases, such as inflammatory bowel disease, colorectal cancer, rheumatoid arthritis, and liver cirrhosis, for which we currently have no cure or long-term treatment options. Recent advances in computational and experimental omics approaches now enable strain tracking, functional profiling, and strain isolation in unprecedented detail, which has the potential to elucidate the causes and consequences of microbial translocation. In this review, we discuss current evidence for the detection of bacterial translocation, examine different translocation axes with a primary focus on the oral-gut axis, and outline currently known translocation mechanisms and how they adversely affect the host in disease. Finally, we conclude with an overview of state-of-the-art computational and experimental tools for strain tracking and highlight the required next steps to elucidate the role of bacterial translocation in human health.
Collapse
Affiliation(s)
- Shen Jin
- ZIEL - Institute for Food and Health, Technical University of Munich, Gregor-Mendel-Str. 2, 85354 Freising, Germany
| | - Daniela Wetzel
- ZIEL - Institute for Food and Health, Technical University of Munich, Gregor-Mendel-Str. 2, 85354 Freising, Germany
| | - Melanie Schirmer
- ZIEL - Institute for Food and Health, Technical University of Munich, Gregor-Mendel-Str. 2, 85354 Freising, Germany.
| |
Collapse
|
42
|
Altermann E, Tegetmeyer HE, Chanyi RM. The evolution of bacterial genome assemblies - where do we need to go next? MICROBIOME RESEARCH REPORTS 2022; 1:15. [PMID: 38046358 PMCID: PMC10688829 DOI: 10.20517/mrr.2022.02] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 03/08/2022] [Accepted: 03/24/2022] [Indexed: 12/05/2023]
Abstract
Genome sequencing has fundamentally changed our ability to decipher and understand the genetic blueprint of life and how it changes over time in response to environmental and evolutionary pressures. The pace of sequencing is still increasing in response to advances in technologies, paving the way from sequenced genes to genomes to metagenomes to metagenome-assembled genomes (MAGs). Our ability to interrogate increasingly complex microbial communities through metagenomes and MAGs is opening up a tantalizing future where we may be able to delve deeper into the mechanisms and genetic responses emerging over time. In the near future, we will be able to detect MAG assembly variations within strains originating from diverging sub-populations, and one of the emerging challenges will be to capture these variations in a biologically relevant way. Here, we present a brief overview of sequencing technologies and the current state of metagenome assemblies to suggest the need to develop new data formats that can capture the genetic variations within strains and communities, which previously remained invisible due to sequencing technology limitations.
Collapse
Affiliation(s)
- Eric Altermann
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Riddet Institute, Massey University, Private Bag 11222, Palmerston North 4442, New Zealand
- Massey University, School of Veterinary Science, Palmerston North 4100, New Zealand
| | - Halina E. Tegetmeyer
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Center for Biotechnology, Bielefeld University, Universitaetsstrasse 27, Bielefeld 33615, Germany
| | - Ryan M. Chanyi
- AgResearch Ltd., Private Bag 11008, Palmerston North 4410, New Zealand
- Riddet Institute, Massey University, Private Bag 11222, Palmerston North 4442, New Zealand
| |
Collapse
|
43
|
Strain identification and quantitative analysis in microbial communities. J Mol Biol 2022; 434:167582. [DOI: 10.1016/j.jmb.2022.167582] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 03/31/2022] [Accepted: 04/03/2022] [Indexed: 12/14/2022]
|
44
|
Shi ZJ, Dimitrov B, Zhao C, Nayfach S, Pollard KS. Fast and accurate metagenotyping of the human gut microbiome with GT-Pro. Nat Biotechnol 2022; 40:507-516. [PMID: 34949778 DOI: 10.1038/s41587-021-01102-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2020] [Accepted: 09/20/2021] [Indexed: 02/07/2023]
Abstract
Single nucleotide polymorphisms (SNPs) in metagenomics are used to quantify population structure, track strains and identify genetic determinants of microbial phenotypes. However, existing alignment-based approaches for metagenomic SNP detection require high-performance computing and enough read coverage to distinguish SNPs from sequencing errors. To address these issues, we developed the GenoTyper for Prokaryotes (GT-Pro), a suite of methods to catalog SNPs from genomes and use unique k-mers to rapidly genotype these SNPs from metagenomes. Compared to methods that use read alignment, GT-Pro is more accurate and two orders of magnitude faster. Using high-quality genomes, we constructed a catalog of 104 million SNPs in 909 human gut species and used unique k-mers targeting this catalog to characterize the global population structure of gut microbes from 7,459 samples. GT-Pro enables fast and memory-efficient metagenotyping of millions of SNPs on a personal computer.
Collapse
Affiliation(s)
- Zhou Jason Shi
- Data Science, Chan Zuckerberg Biohub, San Francisco, CA, USA
- Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA
| | | | - Chunyu Zhao
- Data Science, Chan Zuckerberg Biohub, San Francisco, CA, USA
| | - Stephen Nayfach
- Department of Energy, Joint Genome Institute, Walnut Creek, CA, USA.
- Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory, Berkeley, CA, USA.
| | - Katherine S Pollard
- Data Science, Chan Zuckerberg Biohub, San Francisco, CA, USA.
- Data Science and Biotechnology, Gladstone Institutes, San Francisco, CA, USA.
- Epidemiology and Biostatistics, University of California, San Francisco, CA, USA.
| |
Collapse
|
45
|
Gregory AC, Gerhardt K, Zhong ZP, Bolduc B, Temperton B, Konstantinidis KT, Sullivan MB. MetaPop: a pipeline for macro- and microdiversity analyses and visualization of microbial and viral metagenome-derived populations. MICROBIOME 2022; 10:49. [PMID: 35287721 PMCID: PMC8922842 DOI: 10.1186/s40168-022-01231-0] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Accepted: 11/29/2021] [Indexed: 05/08/2023]
Abstract
BACKGROUND Microbes and their viruses are hidden engines driving Earth's ecosystems from the oceans and soils to humans and bioreactors. Though gene marker approaches can now be complemented by genome-resolved studies of inter-(macrodiversity) and intra-(microdiversity) population variation, analytical tools to do so remain scattered or under-developed. RESULTS Here, we introduce MetaPop, an open-source bioinformatic pipeline that provides a single interface to analyze and visualize microbial and viral community metagenomes at both the macro- and microdiversity levels. Macrodiversity estimates include population abundances and α- and β-diversity. Microdiversity calculations include identification of single nucleotide polymorphisms, novel codon-constrained linkage of SNPs, nucleotide diversity (π and θ), and selective pressures (pN/pS and Tajima's D) within and fixation indices (FST) between populations. MetaPop will also identify genes with distinct codon usage. Following rigorous validation, we applied MetaPop to the gut viromes of autistic children that underwent fecal microbiota transfers and their neurotypical peers. The macrodiversity results confirmed our prior findings for viral populations (microbial shotgun metagenomes were not available) that diversity did not significantly differ between autistic and neurotypical children. However, by also quantifying microdiversity, MetaPop revealed lower average viral nucleotide diversity (π) in autistic children. Analysis of the percentage of genomes detected under positive selection was also lower among autistic children, suggesting that higher viral π in neurotypical children may be beneficial because it allows populations to better "bet hedge" in changing environments. Further, comparisons of microdiversity pre- and post-FMT in autistic children revealed that the delivery FMT method (oral versus rectal) may influence viral activity and engraftment of microdiverse viral populations, with children who received their FMT rectally having higher microdiversity post-FMT. Overall, these results show that analyses at the macro level alone can miss important biological differences. CONCLUSIONS These findings suggest that standardized population and genetic variation analyses will be invaluable for maximizing biological inference, and MetaPop provides a convenient tool package to explore the dual impact of macro- and microdiversity across microbial communities. Video abstract.
Collapse
Affiliation(s)
- Ann C Gregory
- Department of Microbiology, Ohio State University, Columbus, OH, 43210, USA
- Present Address: Department of Microbiology and Immunology, Rega Institute for Medical Research, VIB-KU Leuven Center for Microbiology, Leuven, Belgium
| | - Kenji Gerhardt
- Department of Microbiology, Ohio State University, Columbus, OH, 43210, USA
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Zhi-Ping Zhong
- Department of Microbiology, Ohio State University, Columbus, OH, 43210, USA
- Byrd Polar and Climate Research Center, Ohio State University, Columbus, OH, 43210, USA
| | - Benjamin Bolduc
- Department of Microbiology, Ohio State University, Columbus, OH, 43210, USA
| | - Ben Temperton
- School of Biosciences, University of Exeter, Exeter, UK
| | - Konstantinos T Konstantinidis
- School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia, USA
- School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Matthew B Sullivan
- Department of Microbiology, Ohio State University, Columbus, OH, 43210, USA.
- Center of Microbiome Science, Ohio State University, Columbus, OH, 43210, USA.
- Department of Civil, Environmental and Geodetic Engineering, Ohio State University, Columbus, OH, 43210, USA.
| |
Collapse
|
46
|
van Dijk LR, Walker BJ, Straub TJ, Worby CJ, Grote A, Schreiber HL, Anyansi C, Pickering AJ, Hultgren SJ, Manson AL, Abeel T, Earl AM. StrainGE: a toolkit to track and characterize low-abundance strains in complex microbial communities. Genome Biol 2022; 23:74. [PMID: 35255937 PMCID: PMC8900328 DOI: 10.1186/s13059-022-02630-0] [Citation(s) in RCA: 51] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Accepted: 02/09/2022] [Indexed: 01/21/2023] Open
Abstract
Human-associated microbial communities comprise not only complex mixtures of bacterial species, but also mixtures of conspecific strains, the implications of which are mostly unknown since strain level dynamics are underexplored due to the difficulties of studying them. We introduce the Strain Genome Explorer (StrainGE) toolkit, which deconvolves strain mixtures and characterizes component strains at the nucleotide level from short-read metagenomic sequencing with higher sensitivity and resolution than other tools. StrainGE is able to identify strains at 0.1x coverage and detect variants for multiple conspecific strains within a sample from coverages as low as 0.5x.
Collapse
Affiliation(s)
- Lucas R. van Dijk
- grid.66859.340000 0004 0546 1623Infectious Disease & Microbiome Program, Broad Institute, 415 Main Street, Cambridge, MA 02142 USA ,grid.5292.c0000 0001 2097 4740Delft Bioinformatics Lab, Delft University of Technology, Van Mourik Broekmanweg 6, Delft, 2628 XE The Netherlands
| | - Bruce J. Walker
- grid.66859.340000 0004 0546 1623Infectious Disease & Microbiome Program, Broad Institute, 415 Main Street, Cambridge, MA 02142 USA ,Applied Invention, Cambridge, MA USA
| | - Timothy J. Straub
- grid.66859.340000 0004 0546 1623Infectious Disease & Microbiome Program, Broad Institute, 415 Main Street, Cambridge, MA 02142 USA ,grid.38142.3c000000041936754XDepartment of Immunology and Infectious Diseases, Harvard T.H. Chan School of Public Health, Boston, MA 02115 USA
| | - Colin J. Worby
- grid.66859.340000 0004 0546 1623Infectious Disease & Microbiome Program, Broad Institute, 415 Main Street, Cambridge, MA 02142 USA
| | - Alexandra Grote
- grid.66859.340000 0004 0546 1623Infectious Disease & Microbiome Program, Broad Institute, 415 Main Street, Cambridge, MA 02142 USA
| | - Henry L. Schreiber
- grid.4367.60000 0001 2355 7002Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO 63110 USA ,grid.4367.60000 0001 2355 7002Center for Women’s Infectious Disease Research (CWIDR), Washington University School of Medicine, St. Louis, MO 63110 USA
| | - Christine Anyansi
- grid.5292.c0000 0001 2097 4740Delft Bioinformatics Lab, Delft University of Technology, Van Mourik Broekmanweg 6, Delft, 2628 XE The Netherlands
| | - Amy J. Pickering
- grid.47840.3f0000 0001 2181 7878Department of Civil and Environmental Engineering, University of California, Berkeley, Berkeley, CA 94720 USA ,grid.429997.80000 0004 1936 7531Stuart B. Levy Center for Integrated Management of Antimicrobial Resistance (Levy CIMAR), Tufts University, Boston, MA USA
| | - Scott J. Hultgren
- grid.4367.60000 0001 2355 7002Department of Molecular Microbiology, Washington University School of Medicine, St. Louis, MO 63110 USA ,grid.4367.60000 0001 2355 7002Center for Women’s Infectious Disease Research (CWIDR), Washington University School of Medicine, St. Louis, MO 63110 USA
| | - Abigail L. Manson
- grid.66859.340000 0004 0546 1623Infectious Disease & Microbiome Program, Broad Institute, 415 Main Street, Cambridge, MA 02142 USA
| | - Thomas Abeel
- grid.66859.340000 0004 0546 1623Infectious Disease & Microbiome Program, Broad Institute, 415 Main Street, Cambridge, MA 02142 USA ,grid.5292.c0000 0001 2097 4740Delft Bioinformatics Lab, Delft University of Technology, Van Mourik Broekmanweg 6, Delft, 2628 XE The Netherlands
| | - Ashlee M. Earl
- grid.66859.340000 0004 0546 1623Infectious Disease & Microbiome Program, Broad Institute, 415 Main Street, Cambridge, MA 02142 USA
| |
Collapse
|
47
|
Goussarov G, Claesen J, Mysara M, Cleenwerck I, Leys N, Vandamme P, Van Houdt R. Accurate prediction of metagenome-assembled genome completeness by MAGISTA, a random forest model built on alignment-free intra-bin statistics. ENVIRONMENTAL MICROBIOME 2022; 17:9. [PMID: 35248155 PMCID: PMC8898458 DOI: 10.1186/s40793-022-00403-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 02/17/2022] [Indexed: 05/03/2023]
Abstract
BACKGROUND Although the total number of microbial taxa on Earth is under debate, it is clear that only a small fraction of these has been cultivated and validly named. Evidently, the inability to culture most bacteria outside of very specific conditions severely limits their characterization and further studies. In the last decade, a major part of the solution to this problem has been the use of metagenome sequencing, whereby the DNA of an entire microbial community is sequenced, followed by the in silico reconstruction of genomes of its novel component species. The large discrepancy between the number of sequenced type strain genomes (around 12,000) and total microbial diversity (106-1012 species) directs these efforts to de novo assembly and binning. Unfortunately, these steps are error-prone and as such, the results have to be intensely scrutinized to avoid publishing incomplete and low-quality genomes. RESULTS We developed MAGISTA (metagenome-assembled genome intra-bin statistics assessment), a novel approach to assess metagenome-assembled genome quality that tackles some of the often-neglected drawbacks of current reference gene-based methods. MAGISTA is based on alignment-free distance distributions between contig fragments within metagenomic bins, rather than a set of reference genes. For proper training, a highly complex genomic DNA mock community was needed and constructed by pooling genomic DNA of 227 bacterial strains, specifically selected to obtain a wide variety representing the major phylogenetic lineages of cultivable bacteria. CONCLUSIONS MAGISTA achieved a 20% reduction in root-mean-square error in comparison to the marker gene approach when tested on publicly available mock metagenomes. Furthermore, our highly complex genomic DNA mock community is a very valuable tool for benchmarking (new) metagenome analysis methods.
Collapse
Affiliation(s)
- Gleb Goussarov
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Jürgen Claesen
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
- Department of Epidemiology & Biostatistics, Amsterdam UMC, VU University, Amsterdam, The Netherlands
| | - Mohamed Mysara
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| | - Ilse Cleenwerck
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Natalie Leys
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium
| | - Peter Vandamme
- Laboratory of Microbiology and BCCM/LMG Bacteria Collection, Faculty of Sciences, Ghent University, Ghent, Belgium
| | - Rob Van Houdt
- Microbiology Unit, Belgian Nuclear Research Centre (SCK CEN), Mol, Belgium.
| |
Collapse
|
48
|
Ventolero MF, Wang S, Hu H, Li X. Computational analyses of bacterial strains from shotgun reads. Brief Bioinform 2022; 23:6524011. [PMID: 35136954 DOI: 10.1093/bib/bbac013] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 01/10/2022] [Accepted: 01/11/2022] [Indexed: 12/21/2022] Open
Abstract
Shotgun sequencing is routinely employed to study bacteria in microbial communities. With the vast amount of shotgun sequencing reads generated in a metagenomic project, it is crucial to determine the microbial composition at the strain level. This study investigated 20 computational tools that attempt to infer bacterial strain genomes from shotgun reads. For the first time, we discussed the methodology behind these tools. We also systematically evaluated six novel-strain-targeting tools on the same datasets and found that BHap, mixtureS and StrainFinder performed better than other tools. Because the performance of the best tools is still suboptimal, we discussed future directions that may address the limitations.
Collapse
Affiliation(s)
| | - Saidi Wang
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA
| | - Haiyan Hu
- Department of Computer Science, University of Central Florida, Orlando, FL 32816, USA.,Genomics and Bioinformatics Cluster, University of Central Florida, Orlando, FL 32816, USA
| | - Xiaoman Li
- Burnett School of Biomedical Science, University of Central Florida, Orlando, FL 32816, USA
| |
Collapse
|
49
|
Ruan Z, Zou S, Wang Z, Zhang L, Chen H, Wu Y, Jia H, Draz MS, Feng Y. Toward accurate diagnosis and surveillance of bacterial infections using enhanced strain-level metagenomic next-generation sequencing of infected body fluids. Brief Bioinform 2022; 23:6519793. [PMID: 35108376 DOI: 10.1093/bib/bbac004] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Revised: 12/17/2021] [Accepted: 01/04/2022] [Indexed: 12/12/2022] Open
Abstract
Metagenomic next-generation sequencing (mNGS) enables comprehensive pathogen detection and has become increasingly popular in clinical diagnosis. The distinct pathogenic traits between strains require mNGS to achieve a strain-level resolution, but an equivocal concept of 'strain' as well as the low pathogen loads in most clinical specimens hinders such strain awareness. Here we introduce a metagenomic intra-species typing (MIST) tool (https://github.com/pandafengye/MIST), which hierarchically organizes reference genomes based on average nucleotide identity (ANI) and performs maximum likelihood estimation to infer the strain-level compositional abundance. In silico analysis using synthetic datasets showed that MIST accurately predicted the strain composition at a 99.9% average nucleotide identity (ANI) resolution with a merely 0.001× sequencing depth. When applying MIST on 359 culture-positive and 359 culture-negative real-world specimens of infected body fluids, we found the presence of multiple-strain reached considerable frequencies (30.39%-93.22%), which were otherwise underestimated by current diagnostic techniques due to their limited resolution. Several high-risk clones were identified to be prevalent across samples, including Acinetobacter baumannii sequence type (ST)208/ST195, Staphylococcus aureus ST22/ST398 and Klebsiella pneumoniae ST11/ST15, indicating potential outbreak events occurring in the clinical settings. Interestingly, contaminations caused by the engineered Escherichia coli strain K-12 and BL21 throughout the mNGS datasets were also identified by MIST instead of the statistical decontamination approach. Our study systemically characterized the infected body fluids at the strain level for the first time. Extension of mNGS testing to the strain level can greatly benefit clinical diagnosis of bacterial infections, including the identification of multi-strain infection, decontamination and infection control surveillance.
Collapse
Affiliation(s)
- Zhi Ruan
- Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Shengmei Zou
- Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, China
| | - Zeyu Wang
- Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, China
| | - Luhan Zhang
- Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, China
| | - Hangfei Chen
- Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Yuye Wu
- Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Huiqiong Jia
- Deparment of Laboratory Medicine, the First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China
| | - Mohamed S Draz
- Department of Medicine, Case Western Reserve University School of Medicine, Cleveland, OH, USA
| | - Ye Feng
- Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, Hangzhou, China.,Institute of Translational Medicine, Zhejiang University School of Medicine, Hangzhou, China
| |
Collapse
|
50
|
Somerville V, Berthoud H, Schmidt RS, Bachmann HP, Meng YH, Fuchsmann P, von Ah U, Engel P. Functional strain redundancy and persistent phage infection in Swiss hard cheese starter cultures. THE ISME JOURNAL 2022; 16:388-399. [PMID: 34363005 PMCID: PMC8776748 DOI: 10.1038/s41396-021-01071-0] [Citation(s) in RCA: 26] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/14/2021] [Accepted: 07/15/2021] [Indexed: 02/07/2023]
Abstract
Undefined starter cultures are poorly characterized bacterial communities from environmental origin used in cheese making. They are phenotypically stable and have evolved through domestication by repeated propagation in closed and highly controlled environments over centuries. This makes them interesting for understanding eco-evolutionary dynamics governing microbial communities. While cheese starter cultures are known to be dominated by a few bacterial species, little is known about the composition, functional relevance, and temporal dynamics of strain-level diversity. Here, we applied shotgun metagenomics to an important Swiss cheese starter culture and analyzed historical and experimental samples reflecting 82 years of starter culture propagation. We found that the bacterial community is highly stable and dominated by only a few coexisting strains of Streptococcus thermophilus and Lactobacillus delbrueckii subsp. lactis. Genome sequencing, metabolomics analysis, and co-culturing experiments of 43 isolates show that these strains are functionally redundant, but differ tremendously in their phage resistance potential. Moreover, we identified two highly abundant Streptococcus phages that seem to stably coexist in the community without any negative impact on bacterial growth or strain persistence, and despite the presence of a large and diverse repertoire of matching CRISPR spacers. Our findings show that functionally equivalent strains can coexist in domesticated microbial communities and highlight an important role of bacteria-phage interactions that are different from kill-the-winner dynamics.
Collapse
Affiliation(s)
- Vincent Somerville
- Department of Fundamental Microbiology, University of Lausanne, Lausanne, Switzerland.
- Agroscope, Bern, Switzerland.
| | | | | | | | | | | | | | - Philipp Engel
- Department of Fundamental Microbiology, University of Lausanne, Lausanne, Switzerland.
| |
Collapse
|