1
|
Krisna MA, Jolley KA, Monteith W, Boubour A, Hamers RL, Brueggemann AB, Harrison OB, Maiden MCJ. Development and implementation of a core genome multilocus sequence typing scheme for Haemophilus influenzae. Microb Genom 2024; 10:001281. [PMID: 39120932 PMCID: PMC11315579 DOI: 10.1099/mgen.0.001281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2024] [Accepted: 07/18/2024] [Indexed: 08/10/2024] Open
Abstract
Haemophilus influenzae is part of the human nasopharyngeal microbiota and a pathogen causing invasive disease. The extensive genetic diversity observed in H. influenzae necessitates discriminatory analytical approaches to evaluate its population structure. This study developed a core genome multilocus sequence typing (cgMLST) scheme for H. influenzae using pangenome analysis tools and validated the cgMLST scheme using datasets consisting of complete reference genomes (N = 14) and high-quality draft H. influenzae genomes (N = 2297). The draft genome dataset was divided into a development dataset (N = 921) and a validation dataset (N = 1376). The development dataset was used to identify potential core genes, and the validation dataset was used to refine the final core gene list to ensure the reliability of the proposed cgMLST scheme. Functional classifications were made for all the resulting core genes. Phylogenetic analyses were performed using both allelic profiles and nucleotide sequence alignments of the core genome to test congruence, as assessed by Spearman's correlation and ordinary least square linear regression tests. Preliminary analyses using the development dataset identified 1067 core genes, which were refined to 1037 with the validation dataset. More than 70% of core genes were predicted to encode proteins essential for metabolism or genetic information processing. Phylogenetic and statistical analyses indicated that the core genome allelic profile accurately represented phylogenetic relatedness among the isolates (R 2 = 0.945). We used this cgMLST scheme to define a high-resolution population structure for H. influenzae, which enhances the genomic analysis of this clinically relevant human pathogen.
Collapse
Affiliation(s)
- Made Ananda Krisna
- Nuffield Department of Medicine, Centre for Tropical Medicine and Global Health, University of Oxford, Oxford, UK
- Department of Biology, University of Oxford, Oxford, UK
- Oxford University Clinical Research Unit Indonesia, Faculty of Medicine Universitas Indonesia, Jakarta, Indonesia
| | | | - William Monteith
- Department of Biology, University of Oxford, Oxford, UK
- Department of Biology and Biochemistry, University of Bath, Bath, UK
| | - Alexandra Boubour
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Raph L. Hamers
- Nuffield Department of Medicine, Centre for Tropical Medicine and Global Health, University of Oxford, Oxford, UK
- Oxford University Clinical Research Unit Indonesia, Faculty of Medicine Universitas Indonesia, Jakarta, Indonesia
| | | | - Odile B. Harrison
- Department of Biology, University of Oxford, Oxford, UK
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | | |
Collapse
|
2
|
Shikov AE, Malovichko YV, Nizhnikov AA, Antonets KS. Current Methods for Recombination Detection in Bacteria. Int J Mol Sci 2022; 23:ijms23116257. [PMID: 35682936 PMCID: PMC9181119 DOI: 10.3390/ijms23116257] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2022] [Revised: 05/30/2022] [Accepted: 05/30/2022] [Indexed: 02/05/2023] Open
Abstract
The role of genetic exchanges, i.e., homologous recombination (HR) and horizontal gene transfer (HGT), in bacteria cannot be overestimated for it is a pivotal mechanism leading to their evolution and adaptation, thus, tracking the signs of recombination and HGT events is importance both for fundamental and applied science. To date, dozens of bioinformatics tools for revealing recombination signals are available, however, their pros and cons as well as the spectra of solvable tasks have not yet been systematically reviewed. Moreover, there are two major groups of software. One aims to infer evidence of HR, while the other only deals with horizontal gene transfer (HGT). However, despite seemingly different goals, all the methods use similar algorithmic approaches, and the processes are interconnected in terms of genomic evolution influencing each other. In this review, we propose a classification of novel instruments for both HR and HGT detection based on the genomic consequences of recombination. In this context, we summarize available methodologies paying particular attention to the type of traceable events for which a certain program has been designed.
Collapse
Affiliation(s)
- Anton E. Shikov
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
| | - Yury V. Malovichko
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
| | - Anton A. Nizhnikov
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
| | - Kirill S. Antonets
- Laboratory for Proteomics of Supra-Organismal Systems, All-Russia Research Institute for Agricultural Microbiology (ARRIAM), 196608 St. Petersburg, Russia; (A.E.S.); (Y.V.M.); (A.A.N.)
- Faculty of Biology, St. Petersburg State University (SPbSU), 199034 St. Petersburg, Russia
- Correspondence:
| |
Collapse
|
3
|
Li Y, Jiang Y, Li Z, Yu Y, Chen J, Jia W, Kaow Ng Y, Ye F, Cheng Li S, Shen B. Both simulation and sequencing data reveal coinfections with multiple SARS-CoV-2 variants in the COVID-19 pandemic. Comput Struct Biotechnol J 2022; 20:1389-1401. [PMID: 35342534 PMCID: PMC8930779 DOI: 10.1016/j.csbj.2022.03.011] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 03/13/2022] [Accepted: 03/13/2022] [Indexed: 01/16/2023] Open
Abstract
SARS-CoV-2 is a single-stranded RNA betacoronavirus with a high mutation rate. The rapidly emerging SARS-CoV-2 variants could increase transmissibility and diminish vaccine protection. However, whether coinfection with multiple SARS-CoV-2 variants exists remains controversial. This study collected 12,986 and 4,113 SARS-CoV-2 genomes from the GISAID database on May 11, 2020 (GISAID20May11), and Apr 1, 2021 (GISAID21Apr1), respectively. With single-nucleotide variant (SNV) and network clique analyses, we constructed single-nucleotide polymorphism (SNP) coexistence networks and discovered maximal SNP cliques of sizes 16 and 34 in the GISAID20May11 and GISAID21Apr1 datasets, respectively. Simulating the transmission routes and SNV accumulations, we discovered a linear relationship between the size of the maximal clique and the number of coinfected variants. We deduced that the COVID-19 cases in GISAID20May11 and GISAID21Apr1 were coinfections with 3.20 and 3.42 variants on average, respectively. Additionally, we performed Nanopore sequencing on 42 COVID-19 patients and discovered recurrent heterozygous SNPs in twenty of the patients, including loci 8,782 and 28,144, which were crucial for SARS-CoV-2 lineage divergence. In conclusion, our findings reported SARS-CoV-2 variants coinfection in COVID-19 patients and demonstrated the increasing number of coinfected variants.
Collapse
Affiliation(s)
- Yinhu Li
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu 610212, China
- Department of Computer Science, City University of Hong Kong, Hong Kong 999077, China
| | - Yiqi Jiang
- Department of Computer Science, City University of Hong Kong, Hong Kong 999077, China
| | - Zhengtu Li
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou 510120, China
| | - Yonghan Yu
- Department of Computer Science, City University of Hong Kong, Hong Kong 999077, China
| | - Jiaxing Chen
- Department of Computer Science, City University of Hong Kong, Hong Kong 999077, China
- Department of Computer Science, Hong Kong Baptist University, Hong Kong 999077, China
| | - Wenlong Jia
- Department of Computer Science, City University of Hong Kong, Hong Kong 999077, China
| | - Yen Kaow Ng
- Kotai Biotechnologies, Inc., Osaka 565-0871, Japan
| | - Feng Ye
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou 510120, China
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Hong Kong 999077, China
| | - Bairong Shen
- Institutes for Systems Genetics, Frontiers Science Center for Disease-related Molecular Network, West China Hospital, Sichuan University, Chengdu 610212, China
| |
Collapse
|
4
|
Sakoparnig T, Field C, van Nimwegen E. Whole genome phylogenies reflect the distributions of recombination rates for many bacterial species. eLife 2021; 10:e65366. [PMID: 33416498 PMCID: PMC7884076 DOI: 10.7554/elife.65366] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2020] [Accepted: 01/07/2021] [Indexed: 12/26/2022] Open
Abstract
Although recombination is accepted to be common in bacteria, for many species robust phylogenies with well-resolved branches can be reconstructed from whole genome alignments of strains, and these are generally interpreted to reflect clonal relationships. Using new methods based on the statistics of single-nucleotide polymorphism (SNP) splits, we show that this interpretation is incorrect. For many species, each locus has recombined many times along its line of descent, and instead of many loci supporting a common phylogeny, the phylogeny changes many thousands of times along the genome alignment. Analysis of the patterns of allele sharing among strains shows that bacterial populations cannot be approximated as either clonal or freely recombining but are structured such that recombination rates between lineages vary over several orders of magnitude, with a unique pattern of rates for each lineage. Thus, rather than reflecting clonal ancestry, whole genome phylogenies reflect distributions of recombination rates.
Collapse
Affiliation(s)
- Thomas Sakoparnig
- Biozentrum, University of Basel, and Swiss Institute of BioinformaticsBaselSwitzerland
| | - Chris Field
- Biozentrum, University of Basel, and Swiss Institute of BioinformaticsBaselSwitzerland
| | - Erik van Nimwegen
- Biozentrum, University of Basel, and Swiss Institute of BioinformaticsBaselSwitzerland
| |
Collapse
|
5
|
Stott CM, Bobay LM. Impact of homologous recombination on core genome phylogenies. BMC Genomics 2020; 21:829. [PMID: 33238876 PMCID: PMC7691112 DOI: 10.1186/s12864-020-07262-x] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 11/19/2020] [Indexed: 01/14/2023] Open
Abstract
BACKGROUND Core genome phylogenies are widely used to build the evolutionary history of individual prokaryote species. By using hundreds or thousands of shared genes, these approaches are the gold standard to reconstruct the relationships of large sets of strains. However, there is growing evidence that bacterial strains exchange DNA through homologous recombination at rates that vary widely across prokaryote species, indicating that core genome phylogenies might not be able to reconstruct true phylogenies when recombination rate is high. Few attempts have been made to evaluate the robustness of core genome phylogenies to recombination, but some analyses suggest that reconstructed trees are not always accurate. RESULTS In this study, we tested the robustness of core genome phylogenies to various levels of recombination rates. By analyzing simulated and empirical data, we observed that core genome phylogenies are relatively robust to recombination rates; nevertheless, our results suggest that many reconstructed trees are not completely accurate even when bootstrap supports are high. We found that some core genome phylogenies are highly robust to recombination whereas others are strongly impacted by it, and we identified that the robustness of core genome phylogenies to recombination is highly linked to the levels of selective pressures acting on a species. Stronger selective pressures lead to less accurate tree reconstructions, presumably because selective pressures more strongly bias the routes of DNA transfers, thereby causing phylogenetic artifacts. CONCLUSIONS Overall, these results have important implications for the application of core genome phylogenies in prokaryotes.
Collapse
Affiliation(s)
- Caroline M Stott
- Department of Biology, University of North Carolina Greensboro, 321 McIver Street, PO Box 26170, Greensboro, NC, 27402, USA
| | - Louis-Marie Bobay
- Department of Biology, University of North Carolina Greensboro, 321 McIver Street, PO Box 26170, Greensboro, NC, 27402, USA.
| |
Collapse
|
6
|
Lai YP, Ioerger TR. Exploiting Homoplasy in Genome-Wide Association Studies to Enhance Identification of Antibiotic-Resistance Mutations in Bacterial Genomes. Evol Bioinform Online 2020; 16:1176934320944932. [PMID: 32782426 PMCID: PMC7385850 DOI: 10.1177/1176934320944932] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Accepted: 06/30/2020] [Indexed: 12/23/2022] Open
Abstract
Many antibacterial drugs have multiple mechanisms of resistance, which are often represented simultaneously by a mixture of resistance mutations (some more frequent than others) in a clinical population. This presents a challenge for Genome-Wide Association Studies (GWAS) methods, making it difficult to detect less prevalent resistance mechanisms purely through (weak) statistical associations. Homoplasy, or the occurrence of multiple independent mutations at the same site, is often observed with drug resistance mutations and can be a strong indicator of positive selection. However, traditional GWAS methods, such as those based on allele counting or linear regression, are not designed to take homoplasy into account. In this article, we present a new method, called ECAT (for Evolutionary Cluster-based Association Test), that extends traditional regression-based GWAS methods with the ability to take advantage of homoplasy. This is achieved through a preprocessing step which identifies hypervariable regions in the genome exhibiting statistically significant clusters of distinct evolutionary changes, to which association testing by a linear mixed model (LMM) is applied using GEMMA (a well-established LMM-based GWAS tool). Thus, the approach can be viewed as extending GEMMA from the usual site- or gene-level analysis to focusing on clustered regions of mutations. This approach was evaluated on a large collection of more than 600 clinical isolates of multidrug-resistant (MDR) Mycobacterium tuberculosis from Lima, Peru. We show that ECAT does a better job of detecting known resistance mutations for several antitubercular drugs (including less prevalent mutations with weaker associations), compared with (site- or gene-based) GEMMA, as representative of existing GWAS methods. The power of the multiphase approach in ECAT comes from focusing association testing on the hypervariable regions of the genome, which reduces complexity in the model and increases statistical power.
Collapse
Affiliation(s)
- Yi-Pin Lai
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA
| | - Thomas R Ioerger
- Department of Computer Science and Engineering, Texas A&M University, College Station, TX, USA
| |
Collapse
|