1
|
Vakirlis N, Kupczok A. Large-scale investigation of species-specific orphan genes in the human gut microbiome elucidates their evolutionary origins. Genome Res 2024; 34:888-903. [PMID: 38977308 PMCID: PMC11293555 DOI: 10.1101/gr.278977.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Accepted: 06/12/2024] [Indexed: 07/10/2024]
Abstract
Species-specific genes, also known as orphans, are ubiquitous across life's domains. In prokaryotes, species-specific orphan genes (SSOGs) are mostly thought to originate in external elements such as viruses followed by horizontal gene transfer, whereas the scenario of native origination, through rapid divergence or de novo, is mostly dismissed. However, quantitative evidence supporting either scenario is lacking. Here, we systematically analyzed genomes from 4644 human gut microbiome species and identified more than 600,000 unique SSOGs, representing an average of 2.6% of a given species' pangenome. These sequences are mostly rare within each species yet show signs of purifying selection. Overall, SSOGs use optimal codons less frequently, and their proteins are more disordered than those of conserved genes (i.e., non-SSOGs). Importantly, across species, the GC content of SSOGs closely matches that of conserved ones. In contrast, the ∼5% of SSOGs that share similarity to known viral sequences have distinct characteristics, including lower GC content. Thus, SSOGs with similarity to viruses differ from the remaining SSOGs, contrasting an external origination scenario for most of them. By examining the orthologous genomic region in closely related species, we show that a small subset of SSOGs likely evolved natively de novo and find that these genes also differ in their properties from the remaining SSOGs. Our results challenge the notion that external elements are the dominant source of prokaryotic genetic novelty and will enable future studies into the biological role and relevance of species-specific genes in the human gut.
Collapse
Affiliation(s)
- Nikolaos Vakirlis
- Institute For Fundamental Biomedical Research, B.S.R.C. "Alexander Fleming," Vari 166 72, Greece;
- Institute for General Microbiology, Kiel University, 24118 Kiel, Germany
| | - Anne Kupczok
- Bioinformatics Group, Wageningen University, 6700 PB Wageningen, The Netherlands
| |
Collapse
|
2
|
Song Z, Ge Y, Yu X, Liu R, Liu C, Cheng K, Guo L, Yao S. Development of a single nucleotide polymorphism-based strain-identified method for Streptococcus thermophilus CICC 6038 and Lactobacillus delbrueckii ssp. bulgaricus CICC 6047 using pan-genomics analysis. J Dairy Sci 2024; 107:4248-4258. [PMID: 38246550 DOI: 10.3168/jds.2023-23655] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2023] [Accepted: 12/14/2023] [Indexed: 01/23/2024]
Abstract
The health benefits conferred by probiotics is specific to individual probiotic strains, highlighting the importance of identifying specific strains for research and production purposes. Streptococcus thermophilus CICC 6038 and Lactobacillus delbrueckii ssp. bulgaricus CICC 6047 are exceedingly valuable for commercial use with an excellent mixed-culture fermentation. To differentiate these 2 strains from other S. thermophilus and L. delbrueckii ssp. bulgaricus, a specific, sensitive, accurate, rapid, convenient, and cost-effective method is required. In this study, we conducted a pan-genome analysis of S. thermophilus and L. delbrueckii ssp. bulgaricus to identify species-specific core genes, along with strain-specific SNPs. These genes were used to develop suitable PCR primers, and the conformity of sequence length and unique SNPs was confirmed by sequencing for qualitative identification at the strain level. The results demonstrated that SNPs analysis of PCR products derived from these primers could distinguish CICC 6038 and CICC 6047 accurately and reproducibly from the other strains of S. thermophilus and L. delbrueckii ssp. bulgaricus, respectively. The strain-specific PCR method based on SNPs herein is universally applicable for probiotics identification. It offers valuable insights into identifying probiotics at the strain level that is fit-for-purpose in quality control and compliance assessment of commercial dairy products.
Collapse
Affiliation(s)
- Zhiquan Song
- China National Research Institute of Food and Fermentation Industries Co. Ltd., China Center of Industrial Culture Collection, Beijing, 100015, China
| | - Yuanyuan Ge
- China National Research Institute of Food and Fermentation Industries Co. Ltd., China Center of Industrial Culture Collection, Beijing, 100015, China; Beijing Forestry University, College of Biological Sciences and Biotechnology, Beijing, 100083, China
| | - Xuejian Yu
- China National Research Institute of Food and Fermentation Industries Co. Ltd., China Center of Industrial Culture Collection, Beijing, 100015, China
| | - Rui Liu
- China National Research Institute of Food and Fermentation Industries Co. Ltd., China Center of Industrial Culture Collection, Beijing, 100015, China
| | - Chong Liu
- China National Research Institute of Food and Fermentation Industries Co. Ltd., China Center of Industrial Culture Collection, Beijing, 100015, China
| | - Kun Cheng
- China National Research Institute of Food and Fermentation Industries Co. Ltd., China Center of Industrial Culture Collection, Beijing, 100015, China
| | - Lizheng Guo
- China National Research Institute of Food and Fermentation Industries Co. Ltd., China Center of Industrial Culture Collection, Beijing, 100015, China
| | - Su Yao
- China National Research Institute of Food and Fermentation Industries Co. Ltd., China Center of Industrial Culture Collection, Beijing, 100015, China.
| |
Collapse
|
3
|
Gunasekera RS, Raja KKB, Hewapathirana S, Tundrea E, Gunasekera V, Galbadage T, Nelson PA. ORFanID: A web-based search engine for the discovery and identification of orphan and taxonomically restricted genes. PLoS One 2023; 18:e0291260. [PMID: 37879070 PMCID: PMC10599687 DOI: 10.1371/journal.pone.0291260] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 08/24/2023] [Indexed: 10/27/2023] Open
Abstract
With the numerous genomes sequenced today, it has been revealed that a noteworthy percentage of genes in a given taxon of organisms in the phylogenetic tree of life do not have orthologous sequences in other taxa. These sequences are commonly referred to as "orphans" or "ORFans" if found as single occurrences in a single species or as "taxonomically restricted genes" (TRGs) when found at higher taxonomic levels. Quantitative and collective studies of these genes are necessary for understanding their biological origins. However, the current software for identifying orphan genes is limited in its functionality, database search range, and very complex algorithmically. Thus, researchers studying orphan genes must harvest their data from many disparate sources. ORFanID is a graphical web-based search engine that facilitates the efficient identification of both orphan genes and TRGs at all taxonomic levels, from DNA or amino acid sequences in the NCBI database cluster and other large bioinformatics repositories. The software allows users to identify genes that are unique to any taxonomic rank, from species to domain, using NCBI systematic classifiers. It provides control over NCBI database search parameters, and the results are presented in a spreadsheet as well as a graphical display. The tables in the software are sortable, and results can be filtered using the fuzzy search functionality. The visual presentation can be expanded and collapsed by the taxonomic tree to its various branches. Example results from searches on five species and gene expression data from specific orphan genes are provided in the Supplementary Information.
Collapse
Affiliation(s)
- Richard S. Gunasekera
- Department of Chemistry, Physics and Engineering, School of Science, Technology & Health, Biola University, La Mirada, CA, United States of America
| | - Komal K. B. Raja
- Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, United States of America
| | - Suresh Hewapathirana
- European Bioinformatics Institute, Welcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
| | - Emanuel Tundrea
- Griffiths School of Management and IT, Emanuel University of Oradea, Oradea, Romania
| | - Vinodh Gunasekera
- Bioinformatics, Chesalon USA, Inc., Houston, TX, United States of America
| | - Thushara Galbadage
- Department of Kinesiology and Public Health, School of Science, Technology & Health, Biola University, La Mirada, CA, United States of America
| | - Paul A. Nelson
- Biola University, La Mirada, CA, United States of America
| |
Collapse
|
4
|
Karlowski WM, Varshney D, Zielezinski A. Taxonomically Restricted Genes in Bacillus may Form Clusters of Homologs and Can be Traced to a Large Reservoir of Noncoding Sequences. Genome Biol Evol 2023; 15:7039703. [PMID: 36790099 PMCID: PMC10003748 DOI: 10.1093/gbe/evad023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2022] [Revised: 01/09/2023] [Accepted: 02/08/2023] [Indexed: 02/16/2023] Open
Abstract
Taxonomically restricted genes (TRGs) are unique for a defined group of organisms and may act as potential genetic determinants of lineage-specific, biological properties. Here, we explore the TRGs of highly diverse and economically important Bacillus bacteria by examining commonly used TRG identification parameters and data sources. We show the significant effects of sequence similarity thresholds, composition, and the size of the reference database in the identification process. Subsequently, we applied stringent TRG search parameters and expanded the identification procedure by incorporating an analysis of noncoding and non-syntenic regions of non-Bacillus genomes. A multiplex annotation procedure minimized the number of false-positive TRG predictions and showed nearly one-third of the alleged TRGs could be mapped to genes missed in genome annotations. We traced the putative origin of TRGs by identifying homologous, noncoding genomic regions in non-Bacillus species and detected sequence changes that could transform these regions into protein-coding genes. In addition, our analysis indicated that Bacillus TRGs represent a specific group of genes mostly showing intermediate sequence properties between genes that are conserved across multiple taxa and nonannotated peptides encoded by open reading frames.
Collapse
Affiliation(s)
- Wojciech M Karlowski
- Department of Computational Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, Poznan, Poland
| | - Deepti Varshney
- Department of Computational Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, Poznan, Poland
| | - Andrzej Zielezinski
- Department of Computational Biology, Adam Mickiewicz University in Poznan, Uniwersytetu Poznanskiego 6, Poznan, Poland
| |
Collapse
|
5
|
Ferrandis-Vila M, Tiwari SK, Mamerow S, Semmler T, Menge C, Berens C. Using unique ORFan genes as strain-specific identifiers for Escherichia coli. BMC Microbiol 2022; 22:135. [PMID: 35585491 PMCID: PMC9118744 DOI: 10.1186/s12866-022-02508-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 03/30/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Bacterial identification at the strain level is a much-needed, but arduous and challenging task. This study aimed to develop a method for identifying and differentiating individual strains among multiple strains of the same bacterial species. The set used for testing the method consisted of 17 Escherichia coli strains picked from a collection of strains isolated in Germany, Spain, the United Kingdom and Vietnam from humans, cattle, swine, wild boars, and chickens. We targeted unique or rare ORFan genes to address the problem of selective and specific strain identification. These ORFan genes, exclusive to each strain, served as templates for developing strain-specific primers. RESULTS Most of the experimental strains (14 out of 17) possessed unique ORFan genes that were used to develop strain-specific primers. The remaining three strains were identified by combining a PCR for a rare gene with a selection step for isolating the experimental strains. Multiplex PCR allowed the successful identification of the strains both in vitro in spiked faecal material in addition to in vivo after experimental infections of pigs and recovery of bacteria from faecal material. In addition, primers for qPCR were also developed and quantitative readout from faecal samples after experimental infection was also possible. CONCLUSIONS The method described in this manuscript using strain-specific unique genes to identify single strains in a mixture of strains proved itself efficient and reliable in detecting and following individual strains both in vitro and in vivo, representing a fast and inexpensive alternative to more costly methods.
Collapse
Affiliation(s)
- Marta Ferrandis-Vila
- Friedrich-Loeffler-Institut - Federal Research Institute for Animal Health, Institute of Molecular Pathogenesis, Naumburger Straße 96a, 07743, Jena, Germany
| | | | - Svenja Mamerow
- Friedrich-Loeffler-Institut - Federal Research Institute for Animal Health, Institute of Molecular Pathogenesis, Naumburger Straße 96a, 07743, Jena, Germany
| | | | | | - Christian Menge
- Friedrich-Loeffler-Institut - Federal Research Institute for Animal Health, Institute of Molecular Pathogenesis, Naumburger Straße 96a, 07743, Jena, Germany
| | - Christian Berens
- Friedrich-Loeffler-Institut - Federal Research Institute for Animal Health, Institute of Molecular Pathogenesis, Naumburger Straße 96a, 07743, Jena, Germany.
| |
Collapse
|
6
|
Buongermino Pereira M, Österlund T, Eriksson KM, Backhaus T, Axelson-Fisk M, Kristiansson E. A comprehensive survey of integron-associated genes present in metagenomes. BMC Genomics 2020; 21:495. [PMID: 32689930 PMCID: PMC7370490 DOI: 10.1186/s12864-020-06830-5] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2019] [Accepted: 06/15/2020] [Indexed: 12/19/2022] Open
Abstract
Background Integrons are genomic elements that mediate horizontal gene transfer by inserting and removing genetic material using site-specific recombination. Integrons are commonly found in bacterial genomes, where they maintain a large and diverse set of genes that plays an important role in adaptation and evolution. Previous studies have started to characterize the wide range of biological functions present in integrons. However, the efforts have so far mainly been limited to genomes from cultivable bacteria and amplicons generated by PCR, thus targeting only a small part of the total integron diversity. Metagenomic data, generated by direct sequencing of environmental and clinical samples, provides a more holistic and unbiased analysis of integron-associated genes. However, the fragmented nature of metagenomic data has previously made such analysis highly challenging. Results Here, we present a systematic survey of integron-associated genes in metagenomic data. The analysis was based on a newly developed computational method where integron-associated genes were identified by detecting their associated recombination sites. By processing contiguous sequences assembled from more than 10 terabases of metagenomic data, we were able to identify 13,397 unique integron-associated genes. Metagenomes from marine microbial communities had the highest occurrence of integron-associated genes with levels more than 100-fold higher than in the human microbiome. The identified genes had a large functional diversity spanning over several functional classes. Genes associated with defense mechanisms and mobility facilitators were most overrepresented and more than five times as common in integrons compared to other bacterial genes. As many as two thirds of the genes were found to encode proteins of unknown function. Less than 1% of the genes were associated with antibiotic resistance, of which several were novel, previously undescribed, resistance gene variants. Conclusions Our results highlight the large functional diversity maintained by integrons present in unculturable bacteria and significantly expands the number of described integron-associated genes.
Collapse
Affiliation(s)
- Mariana Buongermino Pereira
- Department of Mathematical Sciences, Chalmers University of Technology, Gothenburg, Sweden.,Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Gothenburg, Sweden
| | - Tobias Österlund
- Department of Mathematical Sciences, Chalmers University of Technology, Gothenburg, Sweden.,Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Gothenburg, Sweden
| | - K Martin Eriksson
- Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden.,Gothenburg Centre for Sustainable Development, Chalmers University of Technology, Gothenburg, Sweden
| | - Thomas Backhaus
- Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Gothenburg, Sweden.,Department of Biological and Environmental Sciences, University of Gothenburg, Gothenburg, Sweden
| | - Marina Axelson-Fisk
- Department of Mathematical Sciences, Chalmers University of Technology, Gothenburg, Sweden
| | - Erik Kristiansson
- Department of Mathematical Sciences, Chalmers University of Technology, Gothenburg, Sweden. .,Centre for Antibiotic Resistance Research (CARe) at University of Gothenburg, Gothenburg, Sweden.
| |
Collapse
|
7
|
Orphan Genes Shared by Pathogenic Genomes Are More Associated with Bacterial Pathogenicity. mSystems 2019; 4:mSystems00290-18. [PMID: 30801025 PMCID: PMC6372840 DOI: 10.1128/msystems.00290-18] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 01/08/2019] [Indexed: 11/20/2022] Open
Abstract
Recent pangenome analyses of numerous bacterial species have suggested that each genome of a single species may have a significant fraction of its gene content unique or shared by a very few genomes (i.e., ORFans). We selected nine bacterial genera, each containing at least five pathogenic and five nonpathogenic genomes, to compare their ORFans in relation to pathogenicity-related genes. Pathogens in these genera are known to cause a number of common and devastating human diseases such as pneumonia, diphtheria, melioidosis, and tuberculosis. Thus, they are worthy of in-depth systems microbiology investigations, including the comparative study of ORFans between pathogens and nonpathogens. We provide direct evidence to suggest that ORFans shared by more pathogens are more associated with pathogenicity-related genes and thus are more important targets for development of new diagnostic markers or therapeutic drugs for bacterial infectious diseases. Orphan genes (also known as ORFans [i.e., orphan open reading frames]) are new genes that enable an organism to adapt to its specific living environment. Our focus in this study is to compare ORFans between pathogens (P) and nonpathogens (NP) of the same genus. Using the pangenome idea, we have identified 130,169 ORFans in nine bacterial genera (505 genomes) and classified these ORFans into four groups: (i) SS-ORFans (P), which are only found in a single pathogenic genome; (ii) SS-ORFans (NP), which are only found in a single nonpathogenic genome; (iii) PS-ORFans (P), which are found in multiple pathogenic genomes; and (iv) NS-ORFans (NP), which are found in multiple nonpathogenic genomes. Within the same genus, pathogens do not always have more genes, more ORFans, or more pathogenicity-related genes (PRGs)—including prophages, pathogenicity islands (PAIs), virulence factors (VFs), and horizontal gene transfers (HGTs)—than nonpathogens. Interestingly, in pathogens of the nine genera, the percentages of PS-ORFans are consistently higher than those of SS-ORFans, which is not true in nonpathogens. Similarly, in pathogens of the nine genera, the percentages of PS-ORFans matching the four types of PRGs are also always higher than those of SS-ORFans, but this is not true in nonpathogens. All of these findings suggest the greater importance of PS-ORFans for bacterial pathogenicity. IMPORTANCE Recent pangenome analyses of numerous bacterial species have suggested that each genome of a single species may have a significant fraction of its gene content unique or shared by a very few genomes (i.e., ORFans). We selected nine bacterial genera, each containing at least five pathogenic and five nonpathogenic genomes, to compare their ORFans in relation to pathogenicity-related genes. Pathogens in these genera are known to cause a number of common and devastating human diseases such as pneumonia, diphtheria, melioidosis, and tuberculosis. Thus, they are worthy of in-depth systems microbiology investigations, including the comparative study of ORFans between pathogens and nonpathogens. We provide direct evidence to suggest that ORFans shared by more pathogens are more associated with pathogenicity-related genes and thus are more important targets for development of new diagnostic markers or therapeutic drugs for bacterial infectious diseases.
Collapse
|
8
|
Berry D, Loy A. Stable-Isotope Probing of Human and Animal Microbiome Function. Trends Microbiol 2018; 26:999-1007. [PMID: 30001854 PMCID: PMC6249988 DOI: 10.1016/j.tim.2018.06.004] [Citation(s) in RCA: 59] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2018] [Revised: 06/10/2018] [Accepted: 06/20/2018] [Indexed: 12/30/2022]
Abstract
Humans and animals host diverse communities of microorganisms important to their physiology and health. Despite extensive sequencing-based characterization of host-associated microbiomes, there remains a dramatic lack of understanding of microbial functions. Stable-isotope probing (SIP) is a powerful strategy to elucidate the ecophysiology of microorganisms in complex host-associated microbiotas. Here, we suggest that SIP methodologies should be more frequently exploited as part of a holistic functional microbiomics approach. We provide examples of how SIP has been used to study host-associated microbes in vivo and ex vivo. We highlight recent developments in SIP technologies and discuss future directions that will facilitate deeper insights into the function of human and animal microbiomes.
Collapse
Affiliation(s)
- David Berry
- Division of Microbial Ecology, Department of Microbiology and Ecosystem Science, Research Network Chemistry Meets Microbiology, University of Vienna, Althanstrasse 14, Vienna, Austria.
| | - Alexander Loy
- Division of Microbial Ecology, Department of Microbiology and Ecosystem Science, Research Network Chemistry Meets Microbiology, University of Vienna, Althanstrasse 14, Vienna, Austria
| |
Collapse
|
9
|
Bolotin E, Hershberg R. Horizontally Acquired Genes Are Often Shared between Closely Related Bacterial Species. Front Microbiol 2017; 8:1536. [PMID: 28890711 PMCID: PMC5575156 DOI: 10.3389/fmicb.2017.01536] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2017] [Accepted: 07/28/2017] [Indexed: 01/11/2023] Open
Abstract
Horizontal gene transfer (HGT) serves as an important source of innovation for bacterial species. We used a pangenome-based approach to identify genes that were horizontally acquired by four closely related bacterial species, belonging to the Enterobacteriaceae family. This enabled us to examine the extent to which such closely related species tend to share horizontally acquired genes. We find that a high percent of horizontally acquired genes are shared among these closely related species. Furthermore, we demonstrate that the extent of sharing of horizontally acquired genes among these four closely related species is predictive of the extent to which these genes will be found in additional bacterial species. Finally, we show that acquired genes shared by more species tend to be better optimized for expression within the genomes of their new hosts. Combined, our results demonstrate the existence of a large pool of frequently horizontally acquired genes that have distinct characteristics from horizontally acquired genes that are less frequently shared between species.
Collapse
Affiliation(s)
- Evgeni Bolotin
- Rachel and Menachem Mendelovitch Evolutionary Processes of Mutation and Natural Selection Research Laboratory, The Rappaport Family Institute for Research in the Medical Sciences, Department of Genetics and Developmental Biology, Technion-Israel Institute of TechnologyHaifa, Israel
| | - Ruth Hershberg
- Rachel and Menachem Mendelovitch Evolutionary Processes of Mutation and Natural Selection Research Laboratory, The Rappaport Family Institute for Research in the Medical Sciences, Department of Genetics and Developmental Biology, Technion-Israel Institute of TechnologyHaifa, Israel
| |
Collapse
|
10
|
Disentangling the effects of selection and loss bias on gene dynamics. Proc Natl Acad Sci U S A 2017; 114:E5616-E5624. [PMID: 28652353 DOI: 10.1073/pnas.1704925114] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We combine mathematical modeling of genome evolution with comparative analysis of prokaryotic genomes to estimate the relative contributions of selection and intrinsic loss bias to the evolution of different functional classes of genes and mobile genetic elements (MGE). An exact solution for the dynamics of gene family size was obtained under a linear duplication-transfer-loss model with selection. With the exception of genes involved in information processing, particularly translation, which are maintained by strong selection, the average selection coefficient for most nonparasitic genes is low albeit positive, compatible with observed positive correlation between genome size and effective population size. Free-living microbes evolve under stronger selection for gene retention than parasites. Different classes of MGE show a broad range of fitness effects, from the nearly neutral transposons to prophages, which are actively eliminated by selection. Genes involved in antiparasite defense, on average, incur a fitness cost to the host that is at least as high as the cost of plasmids. This cost is probably due to the adverse effects of autoimmunity and curtailment of horizontal gene transfer caused by the defense systems and selfish behavior of some of these systems, such as toxin-antitoxin and restriction modification modules. Transposons follow a biphasic dynamics, with bursts of gene proliferation followed by decay in the copy number that is quantitatively captured by the model. The horizontal gene transfer to loss ratio, but not duplication to loss ratio, correlates with genome size, potentially explaining increased abundance of neutral and costly elements in larger genomes.
Collapse
|
11
|
Omer S, Harlow TJ, Gogarten JP. Does Sequence Conservation Provide Evidence for Biological Function? Trends Microbiol 2017; 25:11-18. [DOI: 10.1016/j.tim.2016.09.010] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2016] [Revised: 09/16/2016] [Accepted: 09/22/2016] [Indexed: 01/14/2023]
|
12
|
Two fundamentally different classes of microbial genes. Nat Microbiol 2016; 2:16208. [PMID: 27819663 DOI: 10.1038/nmicrobiol.2016.208] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2016] [Accepted: 09/20/2016] [Indexed: 01/15/2023]
Abstract
The evolution of bacterial and archaeal genomes is highly dynamic and involves extensive horizontal gene transfer and gene loss1-4. Furthermore, many microbial species appear to have open pangenomes, where each newly sequenced genome contains more than 10% ORFans, that is, genes without detectable homologues in other species5,6. Here, we report a quantitative analysis of microbial genome evolution by fitting the parameters of a simple, steady-state evolutionary model to the comparative genomic data on the gene content and gene order similarity between archaeal genomes. The results reveal two sharply distinct classes of microbial genes, one of which is characterized by effectively instantaneous gene replacement, and the other consists of genes with finite, distributed replacement rates. These findings imply a conservative estimate of the size of the prokaryotic genomic universe, which appears to consist of at least a billion distinct genes. Furthermore, the same distribution of constraints is shown to govern the evolution of gene complement and gene order, without the need to invoke long-range conservation or the selfish operon concept7.
Collapse
|
13
|
Abstract
Bacteria and archaea typically possess small genomes that are tightly packed with protein-coding genes. The compactness of prokaryotic genomes is commonly perceived as evidence of adaptive genome streamlining caused by strong purifying selection in large microbial populations. In such populations, even the small cost incurred by nonfunctional DNA because of extra energy and time expenditure is thought to be sufficient for this extra genetic material to be eliminated by selection. However, contrary to the predictions of this model, there exists a consistent, positive correlation between the strength of selection at the protein sequence level, measured as the ratio of nonsynonymous to synonymous substitution rates, and microbial genome size. Here, by fitting the genome size distributions in multiple groups of prokaryotes to predictions of mathematical models of population evolution, we show that only models in which acquisition of additional genes is, on average, slightly beneficial yield a good fit to genomic data. These results suggest that the number of genes in prokaryotic genomes reflects the equilibrium between the benefit of additional genes that diminishes as the genome grows and deletion bias (i.e., the rate of deletion of genetic material being slightly greater than the rate of acquisition). Thus, new genes acquired by microbial genomes, on average, appear to be adaptive. The tight spacing of protein-coding genes likely results from a combination of the deletion bias and purifying selection that efficiently eliminates nonfunctional, noncoding sequences.
Collapse
|
14
|
Ekstrom A, Yin Y. ORFanFinder: automated identification of taxonomically restricted orphan genes. Bioinformatics 2016; 32:2053-5. [PMID: 27153690 DOI: 10.1093/bioinformatics/btw122] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2016] [Accepted: 02/26/2016] [Indexed: 01/01/2023] Open
Abstract
MOTIVATION Orphan genes, also known as ORFans, are newly evolved genes in a genome that enable the organism to adapt to specific living environment. The gene content of every sequenced genome can be classified into different age groups, based on how widely/narrowly a gene's homologs are distributed in the context of species taxonomy. Those having homologs restricted to organisms of particular taxonomic ranks are classified as taxonomically restricted ORFans. RESULTS Implementing this idea, we have developed an open source program named ORFanFinder and a free web server to allow automated classification of a genome's gene content and identification of ORFans at different taxonomic ranks. ORFanFinder and its web server will contribute to the comparative genomics field by facilitating the study of the origin of new genes and the emergence of lineage-specific traits in both prokaryotes and eukaryotes. AVAILABILITY AND IMPLEMENTATION http://cys.bios.niu.edu/orfanfinder CONTACT yyin@niu.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Yanbin Yin
- Department of Biological Sciences, Montgomery Hall 325A, Northern Illinois University, DeKalb, IL, USA
| |
Collapse
|
15
|
Espinoza-Valles I, Vora GJ, Lin B, Leekitcharoenphon P, González-Castillo A, Ussery D, Høj L, Gomez-Gil B. Unique and conserved genome regions in Vibrio harveyi and related species in comparison with the shrimp pathogen Vibrio harveyi CAIM 1792. MICROBIOLOGY-SGM 2015. [PMID: 26198743 DOI: 10.1099/mic.0.000141] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Vibrio harveyi CAIM 1792 is a marine bacterial strain that causes mortality in farmed shrimp in north-west Mexico, and the identification of virulence genes in this strain is important for understanding its pathogenicity. The aim of this work was to compare the V. harveyi CAIM 1792 genome with related genome sequences to determine their phylogenic relationship and explore unique regions in silico that differentiate this strain from other V. harveyi strains. Twenty-one newly sequenced genomes were compared in silico against the CAIM 1792 genome at nucleotidic and predicted proteome levels. The proteome of CAIM 1792 had higher similarity to those of other V. harveyi strains (78%) than to those of the other closely related species Vibrio owensii (67%), Vibrio rotiferianus (63%) and Vibrio campbellii (59%). Pan-genome ORFans trees showed the best fit with the accepted phylogeny based on DNA-DNA hybridization and multi-locus sequence analysis of 11 concatenated housekeeping genes. SNP analysis clustered 34/38 genomes within their accepted species. The pangenomic and SNP trees showed that V. harveyi is the most conserved of the four species studied and V. campbellii may be divided into at least three subspecies, supported by intergenomic distance analysis. blastp atlases were created to identify unique regions among the genomes most related to V. harveyi CAIM 1792; these regions included genes encoding glycosyltransferases, specific type restriction modification systems and a transcriptional regulator, LysR, reported to be involved in virulence, metabolism, quorum sensing and motility.
Collapse
Affiliation(s)
| | - Gary J Vora
- Center for Bio/Molecular Science & Engineering, Naval Research Laboratory, Washington, DC, USA
| | - Baochuan Lin
- Center for Bio/Molecular Science & Engineering, Naval Research Laboratory, Washington, DC, USA
| | - Pimlapas Leekitcharoenphon
- National Food Institute, Division for Epidemiology and Microbial Genomics, Technical University of Denmark, Kongens Lyngby, Denmark.,Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, Kongens Lyngby, Denmark
| | | | - Dave Ussery
- Department of Systems Biology, Center for Biological Sequence Analysis, Technical University of Denmark, Kongens Lyngby, Denmark.,Comparative Genomics group, Biosciences Division, Oak Ridge National Labs, Oak Ridge, Tennessee, USA
| | - Lone Høj
- Australian Institute of Marine Science, Townsville, Queensland, Australia
| | - Bruno Gomez-Gil
- CIAD A.C., Mazatlán Unit for Aquaculture, Mazatlán, Sinaloa, Mexico
| |
Collapse
|
16
|
Molina F, López-Acedo E, Tabla R, Roa I, Gómez A, Rebollo JE. Improved detection of Escherichia coli and coliform bacteria by multiplex PCR. BMC Biotechnol 2015; 15:48. [PMID: 26040540 PMCID: PMC4453288 DOI: 10.1186/s12896-015-0168-2] [Citation(s) in RCA: 86] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 05/17/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The presence of coliform bacteria is routinely assessed to establish the microbiological safety of water supplies and raw or processed foods. Coliforms are a group of lactose-fermenting Enterobacteriaceae, which most likely acquired the lacZ gene by horizontal transfer and therefore constitute a polyphyletic group. Among this group of bacteria is Escherichia coli, the pathogen that is most frequently associated with foodborne disease outbreaks and is often identified by β-glucuronidase enzymatic activity or by the redundant detection of uidA by PCR. Because a significant fraction of essential E. coli genes are preserved throughout the bacterial kingdom, alternative oligonucleotide primers for specific E. coli detection are not easily identified. RESULTS In this manuscript, two strategies were used to design oligonucleotide primers with differing levels of specificity for the simultaneous detection of total coliforms and E. coli by multiplex PCR. A consensus sequence of lacZ and the orphan gene yaiO were chosen as targets for amplification, yielding 234 bp and 115 bp PCR products, respectively. CONCLUSIONS The assay designed in this work demonstrated superior detection ability when tested with lab collection and dairy isolated lactose-fermenting strains. While lacZ amplicons were found in a wide range of coliforms, yaiO amplification was highly specific for E. coli. Additionally, yaiO detection is non-redundant with enzymatic methods.
Collapse
Affiliation(s)
- Felipe Molina
- Área de Genética, Departamento de Bioquímica y Biologia Molecular y Genética, Universidad de Extremadura, Badajoz, Spain.
| | - Elena López-Acedo
- Área de Genética, Departamento de Bioquímica y Biologia Molecular y Genética, Universidad de Extremadura, Badajoz, Spain.
| | - Rafael Tabla
- Dairy products, Technological institute of Food and Agriculture, Badajoz, Spain.
| | - Isidro Roa
- Dairy products, Technological institute of Food and Agriculture, Badajoz, Spain.
| | - Antonia Gómez
- Dairy products, Technological institute of Food and Agriculture, Badajoz, Spain.
| | - José E Rebollo
- Área de Genética, Departamento de Bioquímica y Biologia Molecular y Genética, Universidad de Extremadura, Badajoz, Spain.
| |
Collapse
|
17
|
Andersson DI, Jerlström-Hultqvist J, Näsvall J. Evolution of new functions de novo and from preexisting genes. Cold Spring Harb Perspect Biol 2015; 7:7/6/a017996. [PMID: 26032716 DOI: 10.1101/cshperspect.a017996] [Citation(s) in RCA: 109] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
How the enormous structural and functional diversity of new genes and proteins was generated (estimated to be 10(10)-10(12) different proteins in all organisms on earth [Choi I-G, Kim S-H. 2006. Evolution of protein structural classes and protein sequence families. Proc Natl Acad Sci 103: 14056-14061] is a central biological question that has a long and rich history. Extensive work during the last 80 years have shown that new genes that play important roles in lineage-specific phenotypes and adaptation can originate through a multitude of different mechanisms, including duplication, lateral gene transfer, gene fusion/fission, and de novo origination. In this review, we focus on two main processes as generators of new functions: evolution of new genes by duplication and divergence of pre-existing genes and de novo gene origination in which a whole protein-coding gene evolves from a noncoding sequence.
Collapse
Affiliation(s)
- Dan I Andersson
- Department of Medical Biochemistry and Microbiology, Uppsala University, SE-75123 Uppsala, Sweden
| | - Jon Jerlström-Hultqvist
- Department of Medical Biochemistry and Microbiology, Uppsala University, SE-75123 Uppsala, Sweden
| | - Joakim Näsvall
- Department of Medical Biochemistry and Microbiology, Uppsala University, SE-75123 Uppsala, Sweden
| |
Collapse
|
18
|
Álvarez-Canales G, Arellano-Álvarez G, González-Domenech CM, de la Cruz F, Moya A, Delaye L. Identification of Xenologs and Their Characteristic Low Expression Levels in the Cyanobacterium Synechococcus elongatus. J Mol Evol 2015; 80:292-304. [PMID: 26040248 DOI: 10.1007/s00239-015-9684-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2014] [Accepted: 05/28/2015] [Indexed: 02/07/2023]
Abstract
Horizontal gene transfer (HGT) is a central process in prokaryotic evolution. Once a gene is introduced into a genome by HGT, its contribution to the fitness of the recipient cell depends in part on its expression level. Here we show that in Synechococcus elongatus PCC 7942, xenologs derived from non-cyanobacterial sources exhibited lower expression levels than native genes in the genome. In accord with our observation, xenolog codon adaptation indexes also displayed relatively low expression values. These results are in agreement with previous reports that suggested the relative neutrality of most xenologs. However, we also demonstrated that some of the xenologs detected participated in cellular functions, including iron starvation acclimation and nitrate reduction, which corroborate the role of HGT in bacterial adaptation. For example, the expression levels of some of the xenologs detected are known to increase under iron-limiting conditions. We interpreted the overall pattern as an indication that there is a selection pressure against high expression levels of xenologs. However, when a xenolog protein product confers a selective advantage, natural selection can further modulate its expression level to meet the requirements of the recipient cell. In addition, we show that ORFans did not exhibit significantly lower expression levels than native genes in the genome, which suggested an origin other than xenology.
Collapse
Affiliation(s)
- Gilberto Álvarez-Canales
- Departamento de Ingeniería Genética, CINVESTAV-Irapuato, Km. 9.6 Libramiento Norte, Carretera Irapuato-León, 36821, Irapuato, Guanajuato, Mexico
| | | | | | | | | | | |
Collapse
|
19
|
Yang YS, Fernandez B, Lagorce A, Aloin V, De Guillen KM, Boyer JB, Dedieu A, Confalonieri F, Armengaud J, Roumestand C. Prioritizing targets for structural biology through the lens of proteomics: the archaeal protein TGAM_1934 from Thermococcus gammatolerans. Proteomics 2015; 15:114-23. [PMID: 25359407 DOI: 10.1002/pmic.201300535] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2013] [Revised: 10/01/2014] [Accepted: 10/24/2014] [Indexed: 11/09/2022]
Abstract
ORFans are hypothetical proteins lacking any significant sequence similarity with other proteins. Here, we highlighted by quantitative proteomics the TGAM_1934 ORFan from the hyperradioresistant Thermococcus gammatolerans archaeon as one of the most abundant hypothetical proteins. This protein has been selected as a priority target for structure determination on the basis of its abundance in three cellular conditions. Its solution structure has been determined using multidimensional heteronuclear NMR spectroscopy. TGAM_1934 displays an original fold, although sharing some similarities with the 3D structure of the bacterial ortholog of frataxin, CyaY, a protein conserved in bacteria and eukaryotes and involved in iron-sulfur cluster biogenesis. These results highlight the potential of structural proteomics in prioritizing ORFan targets for structure determination based on quantitative proteomics data. The proteomic data and structure coordinates have been deposited to the ProteomeXchange with identifier PXD000402 (http://proteomecentral.proteomexchange.org/dataset/PXD000402) and Protein Data Bank under the accession number 2mcf, respectively.
Collapse
Affiliation(s)
- Yin-Shan Yang
- Centre de Biochimie Structurale, Universités de Montpellier, Montpellier, France
| | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Milani L, Ghiselli F, Guerra D, Breton S, Passamonti M. A comparative analysis of mitochondrial ORFans: new clues on their origin and role in species with doubly uniparental inheritance of mitochondria. Genome Biol Evol 2013; 5:1408-34. [PMID: 23824218 PMCID: PMC3730352 DOI: 10.1093/gbe/evt101] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Despite numerous comparative mitochondrial genomics studies revealing that animal mitochondrial genomes are highly conserved in terms of gene content, supplementary genes are sometimes found, often arising from gene duplication. Mitochondrial ORFans (ORFs having no detectable homology and unknown function) were found in bivalve molluscs with Doubly Uniparental Inheritance (DUI) of mitochondria. In DUI animals, two mitochondrial lineages are present: one transmitted through females (F-type) and the other through males (M-type), each showing a specific and conserved ORF. The analysis of 34 mitochondrial major Unassigned Regions of Musculista senhousia F- and M-mtDNA allowed us to verify the presence of novel mitochondrial ORFs in this species and to compare them with ORFs from other species with ascertained DUI, with other bivalves and with animals showing new mitochondrial elements. Overall, 17 ORFans from nine species were analyzed for structure and function. Many clues suggest that the analyzed ORFans arose from endogenization of viral genes. The co-option of such novel genes by viral hosts may have determined some evolutionary aspects of host life cycle, possibly involving mitochondria. The structure similarity of DUI ORFans within evolutionary lineages may also indicate that they originated from independent events. If these novel ORFs are in some way linked to DUI establishment, a multiple origin of DUI has to be considered. These putative proteins may have a role in the maintenance of sperm mitochondria during embryo development, possibly masking them from the degradation processes that normally affect sperm mitochondria in species with strictly maternal inheritance.
Collapse
Affiliation(s)
- Liliana Milani
- Dipartimento di Scienze Biologiche, Geologiche ed Ambientali, University of Bologna, Bologna, Italy.
| | | | | | | | | |
Collapse
|
21
|
Narrow-host-range bacteriophages that infect Rhizobium etli associate with distinct genomic types. Appl Environ Microbiol 2013; 80:446-54. [PMID: 24185856 DOI: 10.1128/aem.02256-13] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
In this work, we isolated and characterized 14 bacteriophages that infect Rhizobium etli. They were obtained from rhizosphere soil of bean plants from agricultural lands in Mexico using an enrichment method. The host range of these phages was narrow but variable within a collection of 48 R. etli strains. We obtained the complete genome sequence of nine phages. Four phages were resistant to several restriction enzymes and in vivo cloning, probably due to nucleotide modifications. The genome size of the sequenced phages varied from 43 kb to 115 kb, with a median size of ≈ 45 to 50 kb. A large proportion of open reading frames of these phage genomes (65 to 70%) consisted of hypothetical and orphan genes. The remainder encoded proteins needed for phage morphogenesis and DNA synthesis and processing, among other functions, and a minor percentage represented genes of bacterial origin. We classified these phages into four genomic types on the basis of their genomic similarity, gene content, and host range. Since there are no reports of similar sequences, we propose that these bacteriophages correspond to novel species.
Collapse
|