1
|
Waters EV, Cameron SK, Langridge GC, Preston A. Bacterial genome structural variation: prevalence, mechanisms, and consequences. Trends Microbiol 2025:S0966-842X(25)00115-5. [PMID: 40300989 DOI: 10.1016/j.tim.2025.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2025] [Revised: 03/28/2025] [Accepted: 04/01/2025] [Indexed: 05/01/2025]
Abstract
A vast number of bacterial genome sequences are publicly available. However, the majority were generated using short-read sequencing, producing fragmented assemblies. Long-read sequencing can generate closed assemblies, and they reveal that bacterial genome structure, the order and orientation of genes on the chromosome, is highly variable for many species. Growing evidence suggests that genome structure is a determinant of genome-wide gene expression levels and thus phenotype. We review this developing picture of genome structure variation among bacteria, the challenges for the study of this phenomenon, and its impact on adaptation and evolution, including virulence and infection.
Collapse
Affiliation(s)
- Emma V Waters
- Microbes and Food Safety, Quadram Institute Bioscience, Norwich, UK; Centre for Microbial Interactions, Norwich Research Park, Norwich, UK
| | - Sarah K Cameron
- The Milner Centre for Evolution and Department of Life Sciences, University of Bath, Bath, UK
| | - Gemma C Langridge
- Microbes and Food Safety, Quadram Institute Bioscience, Norwich, UK; Centre for Microbial Interactions, Norwich Research Park, Norwich, UK
| | - Andrew Preston
- The Milner Centre for Evolution and Department of Life Sciences, University of Bath, Bath, UK.
| |
Collapse
|
2
|
Yurtseven A, Buyanova S, Agrawal AA, Bochkareva OO, Kalinina OV. Machine learning and phylogenetic analysis allow for predicting antibiotic resistance in M. tuberculosis. BMC Microbiol 2023; 23:404. [PMID: 38124060 PMCID: PMC10731705 DOI: 10.1186/s12866-023-03147-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 12/07/2023] [Indexed: 12/23/2023] Open
Abstract
BACKGROUND Antimicrobial resistance (AMR) poses a significant global health threat, and an accurate prediction of bacterial resistance patterns is critical for effective treatment and control strategies. In recent years, machine learning (ML) approaches have emerged as powerful tools for analyzing large-scale bacterial AMR data. However, ML methods often ignore evolutionary relationships among bacterial strains, which can greatly impact performance of the ML methods, especially if resistance-associated features are attempted to be detected. Genome-wide association studies (GWAS) methods like linear mixed models accounts for the evolutionary relationships in bacteria, but they uncover only highly significant variants which have already been reported in literature. RESULTS In this work, we introduce a novel phylogeny-related parallelism score (PRPS), which measures whether a certain feature is correlated with the population structure of a set of samples. We demonstrate that PRPS can be used, in combination with SVM- and random forest-based models, to reduce the number of features in the analysis, while simultaneously increasing models' performance. We applied our pipeline to publicly available AMR data from PATRIC database for Mycobacterium tuberculosis against six common antibiotics. CONCLUSIONS Using our pipeline, we re-discovered known resistance-associated mutations as well as new candidate mutations which can be related to resistance and not previously reported in the literature. We demonstrated that taking into account phylogenetic relationships not only improves the model performance, but also yields more biologically relevant predicted most contributing resistance markers.
Collapse
Affiliation(s)
- Alper Yurtseven
- Department of Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Campus E8.1, Saarbrücken, 66123, Saarland, Germany.
- Graduate School of Computer Science, Saarland University, Saarbrücken, 66123, Saarland, Germany.
| | - Sofia Buyanova
- Institute of Science and Technology Austria (ISTA), Am Campus 1, Klosterneuburg, 3400, Austria
| | - Amay Ajaykumar Agrawal
- Department of Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Campus E8.1, Saarbrücken, 66123, Saarland, Germany
- Graduate School of Computer Science, Saarland University, Saarbrücken, 66123, Saarland, Germany
| | - Olga O Bochkareva
- Institute of Science and Technology Austria (ISTA), Am Campus 1, Klosterneuburg, 3400, Austria
- Centre for Microbiology and Environmental Systems Science, Division of Computational System Biology, University of Vienna, Djerassiplatz 1 A, Wien, 1030, Austria
| | - Olga V Kalinina
- Department of Drug Bioinformatics, Helmholtz Institute for Pharmaceutical Research Saarland (HIPS), Helmholtz Centre for Infection Research (HZI), Campus E8.1, Saarbrücken, 66123, Saarland, Germany
- Graduate School of Computer Science, Saarland University, Saarbrücken, 66123, Saarland, Germany
- Faculty of Medicine, Saarland University, Homburg, 66421, Saarland, Germany
| |
Collapse
|
3
|
Welgemoed T, Duong TA, Barnes I, Stukenbrock EH, Berger DK. Population genomic analyses suggest recent dispersal events of the pathogen Cercospora zeina into East and Southern African maize cropping systems. G3 (BETHESDA, MD.) 2023; 13:jkad214. [PMID: 37738420 PMCID: PMC10627275 DOI: 10.1093/g3journal/jkad214] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 08/03/2023] [Accepted: 09/06/2023] [Indexed: 09/24/2023]
Abstract
A serious factor hampering global maize production is gray leaf spot disease. Cercospora zeina is one of the causative pathogens, but population genomics analysis of C. zeina is lacking. We conducted whole-genome Illumina sequencing of a representative set of 30 C. zeina isolates from Kenya and Uganda (East Africa) and Zambia, Zimbabwe, and South Africa (Southern Africa). Selection of the diverse set was based on microsatellite data from a larger collection of the pathogen. Pangenome analysis of the C. zeina isolates was done by (1) de novo assembly of the reads with SPAdes, (2) annotation with BRAKER, and (3) protein clustering with OrthoFinder. A published long-read assembly of C. zeina (CMW25467) from Zambia was included and annotated using the same pipeline. This analysis revealed 790 non-shared accessory and 10,677 shared core orthogroups (genes) between the 31 isolates. Accessory gene content was largely shared between isolates from all countries, with a few genes unique to populations from Southern Africa (32) or East Africa (6). There was a significantly higher proportion of effector genes in the accessory secretome (44%) compared to the core secretome (24%). PCA, ADMIXTURE, and phylogenetic analysis using a neighbor-net network indicated a population structure with a geographical subdivision between the East African isolates and the Southern African isolates, although gene flow was also evident. The small pangenome and partial population differentiation indicated recent dispersal of C. zeina into Africa, possibly from 2 regional founder populations, followed by recurrent gene flow owing to widespread maize production across sub-Saharan Africa.
Collapse
Affiliation(s)
- Tanya Welgemoed
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa
| | - Tuan A Duong
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa
| | - Irene Barnes
- Department of Biochemistry, Genetics and Microbiology, Forestry and Agricultural Biotechnology Institute, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa
| | - Eva H Stukenbrock
- Environmental Genomics, Christian-Albrechts University of Kiel, Am Botanischen Garten 1-11, Kiel 24118, Germany
- Max Planck Institute for Evolutionary Biology, August-Thienemann-Str. 2, Plön 24306, Germany
| | - Dave K Berger
- Department of Plant and Soil Sciences, Forestry and Agricultural Biotechnology Institute, University of Pretoria, Private Bag X20, Hatfield 0028, South Africa
| |
Collapse
|
4
|
Milman O, Yelin I, Kishony R. Systematic identification of gene-altering programmed inversions across the bacterial domain. Nucleic Acids Res 2023; 51:553-573. [PMID: 36617974 PMCID: PMC9881135 DOI: 10.1093/nar/gkac1166] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Revised: 10/22/2022] [Accepted: 01/05/2023] [Indexed: 01/10/2023] Open
Abstract
Programmed chromosomal inversions allow bacteria to generate intra-population genotypic and functional heterogeneity, a bet-hedging strategy important in changing environments. Some programmed inversions modify coding sequences, producing different alleles in several gene families, most notably in specificity-determining genes such as Type I restriction-modification systems, where systematic searches revealed cross phylum abundance. Yet, a broad, gene-independent, systematic search for gene-altering programmed inversions has been absent, and little is known about their genomic sequence attributes and prevalence across gene families. Here, identifying intra-species variation in genomes of over 35 000 species, we develop a predictive model of gene-altering inversions, revealing key attributes of their genomic sequence attributes, including gene-pseudogene size asymmetry and orientation bias. The model predicted over 11,000 gene-altering loci covering known targeted gene families, as well as novel targeted families including Type II restriction-modification systems, a protein of unknown function, and a fusion-protein containing conjugative-pilus and phage tail domains. Publicly available long-read sequencing datasets validated representatives of these newly predicted inversion-targeted gene families, confirming intra-population genetic heterogeneity. Together, these results reveal gene-altering programmed inversions as a key strategy adopted across the bacterial domain, and highlight programmed inversions that modify Type II restriction-modification systems as a possible new mechanism for maintaining intra-population heterogeneity.
Collapse
Affiliation(s)
- Oren Milman
- Faculty of Biology, Technion–Israel Institute of Technology, Haifa, Israel
| | - Idan Yelin
- Faculty of Biology, Technion–Israel Institute of Technology, Haifa, Israel
| | - Roy Kishony
- To whom correspondence should be addressed. Tel: +972 4 8293737;
| |
Collapse
|