1
|
Burgaya J, Damaris BF, Fiebig J, Galardini M. microGWAS: a computational pipeline to perform large-scale bacterial genome-wide association studies. Microb Genom 2025; 11. [PMID: 39932497 DOI: 10.1099/mgen.0.001349] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2025] Open
Abstract
Identifying genetic variants associated with bacterial phenotypes, such as virulence, host preference and antimicrobial resistance, has great potential for a better understanding of the mechanisms involved in these traits. The availability of large collections of bacterial genomes has made genome-wide association studies (GWAS) a common approach for this purpose. The need to employ multiple software tools for data pre- and postprocessing limits the application of these methods by experienced bioinformaticians. To address this issue, we have developed a pipeline to perform bacterial GWAS from a set of assemblies and annotations, with multiple phenotypes as targets. The associations are run using five sets of genetic variants: unitigs, gene presence/absence, rare variants (i.e. gene burden test), gene-cluster-specific k-mers and all unitigs jointly. All variants passing the association threshold are further annotated to identify overrepresented biological processes and pathways. The results can be further augmented by generating a phylogenetic tree and predicting the presence of antimicrobial resistance and virulence-associated genes. We tested the microGWAS pipeline on a previously reported dataset on Escherichia coli virulence, successfully identifying the causal variants and providing further interpretation of the association results. The microGWAS pipeline integrates state-of-the-art tools to perform bacterial GWAS into a single, user-friendly and reproducible pipeline, allowing for the democratization of these analyses. The pipeline, together with its documentation, can be accessed at https://github.com/microbial-pangenomes-lab/microGWAS.
Collapse
Affiliation(s)
- Judit Burgaya
- Institute for Molecular Bacteriology, TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Hannover Medical School (MHH) and the Helmholtz Centre for Infection Research (HZI), Hannover, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School (MHH), Hannover, Germany
| | - Bamu F Damaris
- Institute for Molecular Bacteriology, TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Hannover Medical School (MHH) and the Helmholtz Centre for Infection Research (HZI), Hannover, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School (MHH), Hannover, Germany
| | - Jenny Fiebig
- Institute for Molecular Bacteriology, TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Hannover Medical School (MHH) and the Helmholtz Centre for Infection Research (HZI), Hannover, Germany
| | - Marco Galardini
- Institute for Molecular Bacteriology, TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Hannover Medical School (MHH) and the Helmholtz Centre for Infection Research (HZI), Hannover, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School (MHH), Hannover, Germany
| |
Collapse
|
2
|
Beeloo R, Zomer A, Deorowicz S, Dutilh B. Graphite: painting genomes using a colored de Bruijn graph. NAR Genom Bioinform 2024; 6:lqae142. [PMID: 39445080 PMCID: PMC11497850 DOI: 10.1093/nargab/lqae142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 08/02/2024] [Accepted: 10/05/2024] [Indexed: 10/25/2024] Open
Abstract
The recent growth of microbial sequence data allows comparisons at unprecedented scales, enabling the tracking of strains, mobile genetic elements, or genes. Querying a genome against a large reference database can easily yield thousands of matches that are tedious to interpret and pose computational challenges. We developed Graphite that uses a colored de Bruijn graph (cDBG) to paint query genomes, selecting the local best matches along the full query length. By focusing on the best genomic match of each query region, Graphite reduces the number of matches while providing the most promising leads for sequence tracking or genomic forensics. When applied to hundreds of Campylobacter genomes we found extensive gene sharing, including a previously undetected C. coli plasmid that matched a C. jejuni chromosome. Together, genome painting using cDBGs as enabled by Graphite, can reveal new biological phenomena by mitigating computational hurdles.
Collapse
Affiliation(s)
- Rick Beeloo
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
| | - Aldert L Zomer
- Department of Infectious Diseases and Immunology, Faculty of Veterinary Medicine, Utrecht University, 3584 Utrecht, The Netherlands
| | - Sebastian Deorowicz
- Department of Algorithmics and Software, Silesian University of Technology, Akademicka 16, Gliwice PL-44100, Poland
| | - Bas E Dutilh
- Theoretical Biology and Bioinformatics, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands
- Institute of Biodiversity, Faculty of Biological Sciences, Cluster of Excellence Balance of the Microverse, Friedrich Schiller University Jena, 07743 Jena, Germany
| |
Collapse
|
3
|
Tam YL, Cameron S, Preston A, Cowley L. GWarrange: a pre- and post- genome-wide association studies pipeline for detecting phenotype-associated genome rearrangement events. Microb Genom 2024; 10:001268. [PMID: 38980151 PMCID: PMC11316554 DOI: 10.1099/mgen.0.001268] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2024] [Accepted: 06/17/2024] [Indexed: 07/10/2024] Open
Abstract
The use of k-mers to capture genetic variation in bacterial genome-wide association studies (bGWAS) has demonstrated its effectiveness in overcoming the plasticity of bacterial genomes by providing a comprehensive array of genetic variants in a genome set that is not confined to a single reference genome. However, little attempt has been made to interpret k-mers in the context of genome rearrangements, partly due to challenges in the exhaustive and high-throughput identification of genome structure and individual rearrangement events. Here, we present GWarrange, a pre- and post-bGWAS processing methodology that leverages the unique properties of k-mers to facilitate bGWAS for genome rearrangements. Repeat sequences are common instigators of genome rearrangements through intragenomic homologous recombination, and they are commonly found at rearrangement boundaries. Using whole-genome sequences, repeat sequences are replaced by short placeholder sequences, allowing the regions flanking repeats to be incorporated into relatively short k-mers. Then, locations of flanking regions in significant k-mers are mapped back to complete genome sequences to visualise genome rearrangements. Four case studies based on two bacterial species (Bordetella pertussis and Enterococcus faecium) and a simulated genome set are presented to demonstrate the ability to identify phenotype-associated rearrangements. GWarrange is available at https://github.com/DorothyTamYiLing/GWarrange.
Collapse
Affiliation(s)
- Yi Ling Tam
- The Milner Centre for Evolution and Department of Life Sciences, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | - Sarah Cameron
- The Milner Centre for Evolution and Department of Life Sciences, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | - Andrew Preston
- The Milner Centre for Evolution and Department of Life Sciences, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| | - Lauren Cowley
- The Milner Centre for Evolution and Department of Life Sciences, University of Bath, Claverton Down, Bath, BA2 7AY, UK
| |
Collapse
|
4
|
Mosquera-Rendón J, Moreno-Herrera CX, Robledo J, Hurtado-Páez U. Genome-Wide Association Studies (GWAS) Approaches for the Detection of Genetic Variants Associated with Antibiotic Resistance: A Systematic Review. Microorganisms 2023; 11:2866. [PMID: 38138010 PMCID: PMC10745584 DOI: 10.3390/microorganisms11122866] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2023] [Revised: 10/20/2023] [Accepted: 10/25/2023] [Indexed: 12/24/2023] Open
Abstract
Antibiotic resistance is a significant threat to public health worldwide. Genome-wide association studies (GWAS) have emerged as a powerful tool to identify genetic variants associated with this antibiotic resistance. By analyzing large datasets of bacterial genomes, GWAS can provide valuable insights into the resistance mechanisms and facilitate the discovery of new drug targets. The present study aimed to undertake a systematic review of different GWAS approaches used for detecting genetic variants associated with antibiotic resistance. We comprehensively searched the PubMed and Scopus databases to identify relevant studies published from 2013 to February 2023. A total of 40 studies met our inclusion criteria. These studies explored a wide range of bacterial species, antibiotics, and study designs. Notably, most of the studies were centered around human pathogens such as Mycobacterium tuberculosis, Escherichia coli, Neisseria gonorrhoeae, and Staphylococcus aureus. The review seeks to explore the several GWAS approaches utilized to investigate the genetic mechanisms associated with antibiotic resistance. Furthermore, it examines the contributions of GWAS approaches in identifying resistance-associated genetic variants through binary and continuous phenotypes. Overall, GWAS holds great potential to enhance our understanding of bacterial resistance and improve strategies to combat infectious diseases.
Collapse
Affiliation(s)
- Jeanneth Mosquera-Rendón
- Bacteriology and Mycobacteria Unit, Corporation for Biological Research (CIB), Medellín 050034, Colombia; (J.M.-R.); (J.R.)
- Microbiodiversity and Bioprospecting Group (Microbiop), Department of Biosciences, Faculty of Sciences, Universidad Nacional de Colombia, Medellín 050034, Colombia;
| | - Claudia Ximena Moreno-Herrera
- Microbiodiversity and Bioprospecting Group (Microbiop), Department of Biosciences, Faculty of Sciences, Universidad Nacional de Colombia, Medellín 050034, Colombia;
| | - Jaime Robledo
- Bacteriology and Mycobacteria Unit, Corporation for Biological Research (CIB), Medellín 050034, Colombia; (J.M.-R.); (J.R.)
| | - Uriel Hurtado-Páez
- Bacteriology and Mycobacteria Unit, Corporation for Biological Research (CIB), Medellín 050034, Colombia; (J.M.-R.); (J.R.)
| |
Collapse
|
5
|
Sommer H, Djamalova D, Galardini M. Reduced ambiguity and improved interpretability of bacterial genome-wide associations using gene-cluster-centric k-mers. Microb Genom 2023; 9. [PMID: 37934071 DOI: 10.1099/mgen.0.001129] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2023] Open
Abstract
The wide adoption of bacterial genome sequencing and encoding both core and accessory genome variation using k-mers has allowed bacterial genome-wide association studies (GWAS) to identify genetic variants associated with relevant phenotypes such as those linked to infection. Significant limitations still remain because of k-mers being duplicated across gene clusters and as far as the interpretation of association results is concerned, which affects the wider adoption of GWAS methods on microbial data sets. We have developed a simple computational method (panfeed) that explicitly links each k-mer to their gene cluster at base-resolution level, which allows us to avoid biases introduced by a global de Bruijn graph as well as more easily map and annotate associated variants. We tested panfeed on two independent data sets, correctly identifying previously characterized causal variants, which demonstrates the precision of the method, as well as its scalable performance. panfeed is a command line tool written in the python programming language and is available at https://github.com/microbial-pangenomes-lab/panfeed.
Collapse
Affiliation(s)
- Hannes Sommer
- Institute for Molecular Bacteriology, TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Hannover Medical School (MHH) and the Helmholtz Centre for Infection Research (HZI), Hannover, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School (MHH), Hannover, Germany
| | - Dilfuza Djamalova
- Institute for Molecular Bacteriology, TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Hannover Medical School (MHH) and the Helmholtz Centre for Infection Research (HZI), Hannover, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School (MHH), Hannover, Germany
| | - Marco Galardini
- Institute for Molecular Bacteriology, TWINCORE Centre for Experimental and Clinical Infection Research, a joint venture between the Hannover Medical School (MHH) and the Helmholtz Centre for Infection Research (HZI), Hannover, Germany
- Cluster of Excellence RESIST (EXC 2155), Hannover Medical School (MHH), Hannover, Germany
| |
Collapse
|