1
|
Sutton G, Fogel GB, Abramson B, Brinkac L, Michael T, Liu ES, Thomas S. Horizontal transfer and evolution of wall teichoic acid gene cassettes in Bacillus subtilis. F1000Res 2022; 10:354. [PMID: 35035886 PMCID: PMC8753576 DOI: 10.12688/f1000research.51874.1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/06/2021] [Indexed: 12/31/2022] Open
Abstract
Background: Wall teichoic acid (WTA) genes are essential for production of cell walls in gram-positive bacteria and necessary for survival and variability in the cassette has led to recent antibiotic resistance acquisition in pathogenic bacteria. Methods: Using a pan-genome approach, we examined the evolutionary history of WTA genes in
Bacillus subtilis ssp.
subtilis. Results: Our analysis reveals an interesting pattern of evolution from the type-strain WTA gene cassette possibly resulting from horizontal acquisition from organisms with similar gene sequences. The WTA cassettes have a high level of variation which may be due to one or more independent horizontal transfer events during the evolution of
Bacillus subtilis ssp.
subtilis. This swapping of entire WTA cassettes and smaller regions within the WTA cassettes is an unusual feature in the evolution of the
Bacillus subtilis genome and highlights the importance of horizontal transfer of gene cassettes through homologous recombination within
B. subtilis or other bacterial species. Conclusions: Reduced sequence conservation of these WTA cassettes may indicate a modified function like the previously documented WTA ribitol/glycerol variation. An improved understanding of high-frequency recombination of gene cassettes has ramifications for synthetic biology and the use of
B. subtilis in industry.
Collapse
Affiliation(s)
- Granger Sutton
- J. Craig Venter Institute, Rockville, Maryland, 20850, USA
| | - Gary B Fogel
- Natural Selection, Inc., San Diego, CA, 92121, USA
| | - Bradley Abramson
- The Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
| | | | - Todd Michael
- The Salk Institute for Biological Studies, La Jolla, CA, 92037, USA
| | - Enoch S Liu
- Natural Selection, Inc., San Diego, CA, 92121, USA
| | | |
Collapse
|
2
|
Sutton G, Fogel GB, Abramson B, Brinkac L, Michael T, Liu ES, Thomas S. A pan-genome method to determine core regions of the Bacillus subtilis and Escherichia coli genomes. F1000Res 2021; 10:286. [PMID: 34113437 PMCID: PMC8156514 DOI: 10.12688/f1000research.51873.2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/17/2021] [Indexed: 11/22/2022] Open
Abstract
Background: Synthetic engineering of bacteria to produce industrial products is a burgeoning field of research and application. In order to optimize genome design, designers need to understand which genes are essential, which are optimal for growth, and locations in the genome that will be tolerated by the organism when inserting engineered cassettes. Methods: We present a pan-genome based method for the identification of core regions in a genome that are strongly conserved at the species level. Results: We show that the core regions determined by our method contain all or almost all essential genes. This demonstrates the accuracy of our method as essential genes should be core genes. We show that we outperform previous methods by this measure. We also explain why there are exceptions to this rule for our method. Conclusions: We assert that synthetic engineers should avoid deleting or inserting into these core regions unless they understand and are manipulating the function of the genes in that region. Similarly, if the designer wishes to streamline the genome, non-core regions and in particular low penetrance genes would be good targets for deletion. Care should be taken to remove entire cassettes with similar penetrance of the genes within cassettes as they may harbor toxin/antitoxin genes which need to be removed in tandem. The bioinformatic approach introduced here saves considerable time and effort relative to knockout studies on single isolates of a given species and captures a broad understanding of the conservation of genes that are core to a species.
Collapse
Affiliation(s)
- Granger Sutton
- J. Craig Venter Institute, Rockville, Maryland, 20850, USA
| | - Gary B Fogel
- Natural Selection, Inc., San Diego, CA, 92121, USA
| | | | | | - Todd Michael
- J. Craig Venter Institute, Rockville, Maryland, 20850, USA
| | - Enoch S Liu
- Natural Selection, Inc., San Diego, CA, 92121, USA
| | | |
Collapse
|
3
|
Sutton G, Fogel GB, Abramson B, Brinkac L, Michael T, Liu ES, Thomas S. A pan-genome method to determine core regions of the Bacillus subtilis and Escherichia coli genomes. F1000Res 2021; 10:286. [DOI: 10.12688/f1000research.51873.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/31/2021] [Indexed: 11/20/2022] Open
Abstract
Background: Synthetic engineering of bacteria to produce industrial products is a burgeoning field of research and application. In order to optimize genome design, designers need to understand which genes are essential, which are optimal for growth, and locations in the genome that will be tolerated by the organism when inserting engineered cassettes. Methods: We present a pan-genome based method for the identification of core regions in a genome that are strongly conserved at the species level. Results: We show that the core regions determined by our method contain all or almost all essential genes. This demonstrates the accuracy of our method as essential genes should be core genes. We show that we outperform previous methods by this measure. We also explain why there are exceptions to this rule for our method. Conclusions: We assert that synthetic engineers should avoid deleting or inserting into these core regions unless they understand and are manipulating the function of the genes in that region. Similarly, if the designer wishes to streamline the genome, non-core regions and in particular low penetrance genes would be good targets for deletion. Care should be taken to remove entire cassettes with similar penetrance of the genes within cassettes as they may harbor toxin/antitoxin genes which need to be removed in tandem. The bioinformatic approach introduced here saves considerable time and effort relative to knockout studies on single isolates of a given species and captures a broad understanding of the conservation of genes that are core to a species.
Collapse
|
4
|
Perrin A, Rocha EPC. PanACoTA: a modular tool for massive microbial comparative genomics. NAR Genom Bioinform 2021; 3:lqaa106. [PMID: 33575648 PMCID: PMC7803007 DOI: 10.1093/nargab/lqaa106] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2020] [Revised: 11/10/2020] [Accepted: 12/01/2020] [Indexed: 02/06/2023] Open
Abstract
The study of the gene repertoires of microbial species, their pangenomes, has become a key part of microbial evolution and functional genomics. Yet, the increasing number of genomes available complicates the establishment of the basic building blocks of comparative genomics. Here, we present PanACoTA (https://github.com/gem-pasteur/PanACoTA), a tool that allows to download all genomes of a species, build a database with those passing quality and redundancy controls, uniformly annotate and then build their pangenome, several variants of core genomes, their alignments and a rapid but accurate phylogenetic tree. While many programs building pangenomes have become available in the last few years, we have focused on a modular method, that tackles all the key steps of the process, from download to phylogenetic inference. While all steps are integrated, they can also be run separately and multiple times to allow rapid and extensive exploration of the parameters of interest. PanACoTA is built in Python3, includes a singularity container and features to facilitate its future development. We believe PanACoTa is an interesting addition to the current set of comparative genomics tools, since it will accelerate and standardize the more routine parts of the work, allowing microbial genomicists to more quickly tackle their specific questions.
Collapse
Affiliation(s)
- Amandine Perrin
- Microbial Evolutionary Genomics, CNRS, UMR3525, Institut Pasteur, 28, rue Dr Roux, Paris 75015, France
| | - Eduardo P C Rocha
- Microbial Evolutionary Genomics, CNRS, UMR3525, Institut Pasteur, 28, rue Dr Roux, Paris 75015, France
| |
Collapse
|
5
|
Wesevich A, Sutton G, Ruffin F, Park LP, Fouts DE, Fowler VG Jr, Thaden JT. Newly Named Klebsiella aerogenes (formerly Enterobacter aerogenes) Is Associated with Poor Clinical Outcomes Relative to Other Enterobacter Species in Patients with Bloodstream Infection. J Clin Microbiol 2020; 58:e00582-20. [PMID: 32493786 DOI: 10.1128/JCM.00582-20] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2020] [Accepted: 05/26/2020] [Indexed: 12/17/2022] Open
Abstract
Enterobacter aerogenes was recently renamed Klebsiella aerogenes This study aimed to identify differences in clinical characteristics, outcomes, and bacterial genetics among patients with K. aerogenes versus Enterobacter species bloodstream infections (BSI). We prospectively enrolled patients with K. aerogenes or Enterobacter cloacae complex (Ecc) BSI from 2002 to 2015. We performed whole-genome sequencing (WGS) and pan-genome analysis on all bacteria. Overall, 150 patients with K. aerogenes (46/150 [31%]) or Ecc (104/150 [69%]) BSI were enrolled. The two groups had similar baseline characteristics. Neither total in-hospital mortality (13/46 [28%] versus 22/104 [21%]; P = 0.3) nor attributable in-hospital mortality (9/46 [20%] versus 13/104 [12%]; P = 0.3) differed between patients with K. aerogenes versus Ecc BSI, respectively. However, poor clinical outcome (death before discharge, recurrent BSI, and/or BSI complication) was higher for K. aerogenes than Ecc BSI (32/46 [70%] versus 42/104 [40%]; P = 0.001). In a multivariable regression model, K. aerogenes BSI, relative to Ecc BSI, was predictive of poor clinical outcome (odds ratio 3.3; 95% confidence interval 1.4 to 8.1; P = 0.008). Pan-genome analysis revealed 983 genes in 323 genomic islands unique to K. aerogenes isolates, including putative virulence genes involved in iron acquisition (n = 67), fimbriae/pili/flagella production (n = 117), and metal homeostasis (n = 34). Antibiotic resistance was largely found in Ecc lineage 1, which had a higher rate of multidrug resistant phenotype (23/54 [43%]) relative to all other bacterial isolates (23/96 [24%]; P = 0.03). K. aerogenes BSI was associated with poor clinical outcomes relative to Ecc BSI. Putative virulence factors in K. aerogenes may account for these differences.
Collapse
|
6
|
Bannantine JP, Conde C, Bayles DO, Branger M, Biet F. Genetic Diversity Among Mycobacterium avium Subspecies Revealed by Analysis of Complete Genome Sequences. Front Microbiol 2020; 11:1701. [PMID: 32849358 PMCID: PMC7426613 DOI: 10.3389/fmicb.2020.01701] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2020] [Accepted: 06/29/2020] [Indexed: 11/13/2022] Open
Abstract
Mycobacterium avium comprises four subspecies that contain both human and veterinary pathogens. At the inception of this study, twenty-eight M. avium genomes had been annotated as RefSeq genomes, facilitating direct comparisons. These genomes represent strains from around the world and provided a unique opportunity to examine genome dynamics in this species. Each genome was confirmed to be classified correctly based on SNP genotyping, nucleotide identity and presence/absence of repetitive elements or other typing methods. The Mycobacterium avium subspecies paratuberculosis (Map) genome size and organization was remarkably consistent, averaging 4.8 Mb with a variance of only 29.6 kb among the 13 strains. Comparing recombination events along with the larger genome size and variance observed among Mycobacterium avium subspecies avium (Maa) and Mycobacterium avium subspecies hominissuis (Mah) strains (collectively termed non-Map) suggests horizontal gene transfer occurs in non-Map, but not in Map strains. Overall, M. avium subspecies could be divided into two major sub-divisions, with the Map type II (bovine strains) clustering tightly on one end of a phylogenetic spectrum and Mah strains clustering more loosely together on the other end. The most evolutionarily distinct Map strain was an ovine strain, designated Telford, which had >1,000 SNPs and showed large rearrangements compared to the bovine type II strains. The Telford strain clustered with Maa strains as an intermediate between Map type II and Mah. SNP analysis and genome organization analyses repeatedly demonstrated the conserved nature of Map versus the mosaic nature of non-Map M. avium strains. Finally, core and pangenomes were developed for Map and non-Map strains. A total of 80% Map genes belonged to the Map core genome, while only 40% of non-Map genes belonged to the non-Map core genome. These genomes provide a more complete and detailed comparison of these subspecies strains as well as a blueprint for how genetic diversity originated.
Collapse
Affiliation(s)
- John P Bannantine
- USDA-Agricultural Research Service, National Animal Disease Center, Ames, IA, United States
| | - Cyril Conde
- INRAE, Université de Tours, ISP, Nouzilly, France
| | - Darrell O Bayles
- USDA-Agricultural Research Service, National Animal Disease Center, Ames, IA, United States
| | | | - Franck Biet
- INRAE, Université de Tours, ISP, Nouzilly, France
| |
Collapse
|
7
|
Wu H, Wang D, Gao F. Toward a high-quality pan-genome landscape of Bacillus subtilis by removal of confounding strains. Brief Bioinform 2020; 22:1951-1971. [PMID: 32065216 DOI: 10.1093/bib/bbaa013] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Revised: 01/17/2020] [Accepted: 01/22/2020] [Indexed: 02/05/2023] Open
Abstract
Pan-genome analysis is widely used to study the evolution and genetic diversity of species, particularly in bacteria. However, the impact of strain selection on the outcome of pan-genome analysis is poorly understood. Furthermore, a standard protocol to ensure high-quality pan-genome results is lacking. In this study, we carried out a series of pan-genome analyses of different strain sets of Bacillus subtilis to understand the impact of various strains on the performance and output quality of pan-genome analyses. Consequently, we found that the results obtained by pan-genome analyses of B. subtilis can be influenced by the inclusion of incorrectly classified Bacillus subspecies strains, phylogenetically distinct strains, engineered genome-reduced strains, chimeric strains, strains with a large number of unique genes or a large proportion of pseudogenes, and multiple clonal strains. Since the presence of these confounding strains can seriously affect the quality and true landscape of the pan-genome, we should remove these deviations in the process of pan-genome analyses. Our study provides new insights into the removal of biases from confounding strains in pan-genome analyses at the beginning of data processing, which enables the achievement of a closer representation of a high-quality pan-genome landscape of B. subtilis that better reflects the performance and credibility of the B. subtilis pan-genome. This procedure could be added as an important quality control step in pan-genome analyses for improving the efficiency of analyses, and ultimately contributing to a better understanding of genome function, evolution and genome-reduction strategies for B. subtilis in the future.
Collapse
Affiliation(s)
- Hao Wu
- Department of Physics, School of Science, Tianjin University
| | - Dan Wang
- Department of Physics, School of Science, Tianjin University
| | - Feng Gao
- Department of Physics, School of Science, and the Frontier Science Center of Synthetic Biology (MOE), Key Laboratory of Systems Bioengineering (MOE), Tianjin University
| |
Collapse
|
8
|
Huang W, Wang G, Yin C, Chen D, Dhand A, Chanza M, Dimitrova N, Fallon JT. Optimizing a Whole-Genome Sequencing Data Processing Pipeline for Precision Surveillance of Health Care-Associated Infections. Microorganisms 2019; 7:microorganisms7100388. [PMID: 31554234 PMCID: PMC6843764 DOI: 10.3390/microorganisms7100388] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2019] [Revised: 09/04/2019] [Accepted: 09/20/2019] [Indexed: 11/16/2022] Open
Abstract
The surveillance of health care-associated infection (HAI) is an essential element of the infection control program. While whole-genome sequencing (WGS) has widely been adopted for genomic surveillance, its data processing remains to be improved. Here, we propose a three-level data processing pipeline for the precision genomic surveillance of microorganisms without prior knowledge: species identification, multi-locus sequence typing (MLST), and sub-MLST clustering. The former two are closely connected to what have widely been used in current clinical microbiology laboratories, whereas the latter one provides significantly improved resolution and accuracy in genomic surveillance. Comparing to a broadly used reference-dependent alignment/mapping method and an annotation-dependent pan-/core-genome analysis, we implemented our reference- and annotation-independent, k-mer-based, simplified workflow to a collection of Acinetobacter and Enterococcus clinical isolates for tests. By taking both single nucleotide variants and genomic structural changes into account, the optimized k-mer-based pipeline demonstrated a global view of bacterial population structure in a rapid manner and discriminated the relatedness between bacterial isolates in more detail and precision. The newly developed WGS data processing pipeline would facilitate WGS application to the precision genomic surveillance of HAI. In addition, the results from such a WGS-based analysis would be useful for the precision laboratory diagnosis of infectious microorganisms.
Collapse
Affiliation(s)
- Weihua Huang
- Department of Pathology, New York Medical College, Valhalla, NY 10595, USA.
| | - Guiqing Wang
- Department of Pathology, New York Medical College, Valhalla, NY 10595, USA.
- Department of Pathology and Clinical Laboratories, Westchester Medical Center, Valhalla, NY 10595, USA.
| | - Changhong Yin
- Department of Pathology, New York Medical College, Valhalla, NY 10595, USA.
| | - Donald Chen
- Department of Medicine, New York Medical College, Valhalla, NY 10595, USA.
- Department of Infection Prevention and Control, Westchester Medical Center, Valhalla, NY 10595, USA.
| | - Abhay Dhand
- Department of Medicine, New York Medical College, Valhalla, NY 10595, USA.
| | - Melissa Chanza
- Department of Pathology, New York Medical College, Valhalla, NY 10595, USA.
| | | | - John T Fallon
- Department of Pathology, New York Medical College, Valhalla, NY 10595, USA.
- Department of Pathology and Clinical Laboratories, Westchester Medical Center, Valhalla, NY 10595, USA.
| |
Collapse
|