1
|
Orphan Genes Shared by Pathogenic Genomes Are More Associated with Bacterial Pathogenicity. mSystems 2019; 4:mSystems00290-18. [PMID: 30801025 PMCID: PMC6372840 DOI: 10.1128/msystems.00290-18] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 01/08/2019] [Indexed: 11/20/2022] Open
Abstract
Recent pangenome analyses of numerous bacterial species have suggested that each genome of a single species may have a significant fraction of its gene content unique or shared by a very few genomes (i.e., ORFans). We selected nine bacterial genera, each containing at least five pathogenic and five nonpathogenic genomes, to compare their ORFans in relation to pathogenicity-related genes. Pathogens in these genera are known to cause a number of common and devastating human diseases such as pneumonia, diphtheria, melioidosis, and tuberculosis. Thus, they are worthy of in-depth systems microbiology investigations, including the comparative study of ORFans between pathogens and nonpathogens. We provide direct evidence to suggest that ORFans shared by more pathogens are more associated with pathogenicity-related genes and thus are more important targets for development of new diagnostic markers or therapeutic drugs for bacterial infectious diseases. Orphan genes (also known as ORFans [i.e., orphan open reading frames]) are new genes that enable an organism to adapt to its specific living environment. Our focus in this study is to compare ORFans between pathogens (P) and nonpathogens (NP) of the same genus. Using the pangenome idea, we have identified 130,169 ORFans in nine bacterial genera (505 genomes) and classified these ORFans into four groups: (i) SS-ORFans (P), which are only found in a single pathogenic genome; (ii) SS-ORFans (NP), which are only found in a single nonpathogenic genome; (iii) PS-ORFans (P), which are found in multiple pathogenic genomes; and (iv) NS-ORFans (NP), which are found in multiple nonpathogenic genomes. Within the same genus, pathogens do not always have more genes, more ORFans, or more pathogenicity-related genes (PRGs)—including prophages, pathogenicity islands (PAIs), virulence factors (VFs), and horizontal gene transfers (HGTs)—than nonpathogens. Interestingly, in pathogens of the nine genera, the percentages of PS-ORFans are consistently higher than those of SS-ORFans, which is not true in nonpathogens. Similarly, in pathogens of the nine genera, the percentages of PS-ORFans matching the four types of PRGs are also always higher than those of SS-ORFans, but this is not true in nonpathogens. All of these findings suggest the greater importance of PS-ORFans for bacterial pathogenicity. IMPORTANCE Recent pangenome analyses of numerous bacterial species have suggested that each genome of a single species may have a significant fraction of its gene content unique or shared by a very few genomes (i.e., ORFans). We selected nine bacterial genera, each containing at least five pathogenic and five nonpathogenic genomes, to compare their ORFans in relation to pathogenicity-related genes. Pathogens in these genera are known to cause a number of common and devastating human diseases such as pneumonia, diphtheria, melioidosis, and tuberculosis. Thus, they are worthy of in-depth systems microbiology investigations, including the comparative study of ORFans between pathogens and nonpathogens. We provide direct evidence to suggest that ORFans shared by more pathogens are more associated with pathogenicity-related genes and thus are more important targets for development of new diagnostic markers or therapeutic drugs for bacterial infectious diseases.
Collapse
|
2
|
Lennon CW, Thamsen M, Friman ET, Cacciaglia A, Sachsenhauser V, Sorgenfrei FA, Wasik MA, Bardwell JCA. Folding Optimization In Vivo Uncovers New Chaperones. J Mol Biol 2015; 427:2983-94. [PMID: 26003922 PMCID: PMC4569523 DOI: 10.1016/j.jmb.2015.05.013] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Revised: 04/22/2015] [Accepted: 05/10/2015] [Indexed: 01/08/2023]
Abstract
By employing a genetic selection that forces the cell to fold an unstable, aggregation-prone test protein in order to survive, we have generated bacterial strains with enhanced periplasmic folding capacity. These strains enhance the soluble steady-state level of the test protein. Most of the bacterial variants we isolated were found to overexpress one or more periplasmic proteins including OsmY, Ivy, DppA, OppA, and HdeB. Of these proteins, only HdeB has convincingly been previously shown to function as chaperone in vivo. By giving bacteria the stark choice between death and stabilizing a poorly folded protein, we have now generated designer bacteria selected for their ability to stabilize specific proteins.
Collapse
Affiliation(s)
- Christopher W Lennon
- Howard Hughes Medical Institute, Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Maike Thamsen
- Howard Hughes Medical Institute, Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Elias T Friman
- Howard Hughes Medical Institute, Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Austin Cacciaglia
- Howard Hughes Medical Institute, Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Veronika Sachsenhauser
- Howard Hughes Medical Institute, Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Frieda A Sorgenfrei
- Howard Hughes Medical Institute, Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - Milena A Wasik
- Howard Hughes Medical Institute, Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI 48109, USA
| | - James C A Bardwell
- Howard Hughes Medical Institute, Department of Molecular, Cellular, and Developmental Biology, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
3
|
Molina F, López-Acedo E, Tabla R, Roa I, Gómez A, Rebollo JE. Improved detection of Escherichia coli and coliform bacteria by multiplex PCR. BMC Biotechnol 2015; 15:48. [PMID: 26040540 PMCID: PMC4453288 DOI: 10.1186/s12896-015-0168-2] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 05/17/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The presence of coliform bacteria is routinely assessed to establish the microbiological safety of water supplies and raw or processed foods. Coliforms are a group of lactose-fermenting Enterobacteriaceae, which most likely acquired the lacZ gene by horizontal transfer and therefore constitute a polyphyletic group. Among this group of bacteria is Escherichia coli, the pathogen that is most frequently associated with foodborne disease outbreaks and is often identified by β-glucuronidase enzymatic activity or by the redundant detection of uidA by PCR. Because a significant fraction of essential E. coli genes are preserved throughout the bacterial kingdom, alternative oligonucleotide primers for specific E. coli detection are not easily identified. RESULTS In this manuscript, two strategies were used to design oligonucleotide primers with differing levels of specificity for the simultaneous detection of total coliforms and E. coli by multiplex PCR. A consensus sequence of lacZ and the orphan gene yaiO were chosen as targets for amplification, yielding 234 bp and 115 bp PCR products, respectively. CONCLUSIONS The assay designed in this work demonstrated superior detection ability when tested with lab collection and dairy isolated lactose-fermenting strains. While lacZ amplicons were found in a wide range of coliforms, yaiO amplification was highly specific for E. coli. Additionally, yaiO detection is non-redundant with enzymatic methods.
Collapse
Affiliation(s)
- Felipe Molina
- Área de Genética, Departamento de Bioquímica y Biologia Molecular y Genética, Universidad de Extremadura, Badajoz, Spain.
| | - Elena López-Acedo
- Área de Genética, Departamento de Bioquímica y Biologia Molecular y Genética, Universidad de Extremadura, Badajoz, Spain.
| | - Rafael Tabla
- Dairy products, Technological institute of Food and Agriculture, Badajoz, Spain.
| | - Isidro Roa
- Dairy products, Technological institute of Food and Agriculture, Badajoz, Spain.
| | - Antonia Gómez
- Dairy products, Technological institute of Food and Agriculture, Badajoz, Spain.
| | - José E Rebollo
- Área de Genética, Departamento de Bioquímica y Biologia Molecular y Genética, Universidad de Extremadura, Badajoz, Spain.
| |
Collapse
|
4
|
Gibson AK, Smith Z, Fuqua C, Clay K, Colbourne JK. Why so many unknown genes? Partitioning orphans from a representative transcriptome of the lone star tick Amblyomma americanum. BMC Genomics 2013; 14:135. [PMID: 23445305 PMCID: PMC3616916 DOI: 10.1186/1471-2164-14-135] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2012] [Accepted: 02/21/2013] [Indexed: 11/10/2022] Open
Abstract
Background Genomic resources within the phylum Arthropoda are largely limited to the true insects but are beginning to include unexplored subphyla, such as the Crustacea and Chelicerata. Investigations of these understudied taxa uncover high frequencies of orphan genes, which lack detectable sequence homology to genes in pre-existing databases. The ticks (Acari: Chelicerata) are one such understudied taxon for which genomic resources are urgently needed. Ticks are obligate blood-feeders that vector major diseases of humans, domesticated animals, and wildlife. In analyzing a transcriptome of the lone star tick Amblyomma americanum, one of the most abundant disease vectors in the United States, we find a high representation of unannotated sequences. We apply a general framework for quantifying the origin and true representation of unannotated sequences in a dataset and for evaluating the biological significance of orphan genes. Results Expressed sequence tags (ESTs) were derived from different life stages and populations of A. americanum and combined with ESTs available from GenBank to produce 14,310 ESTs, over twice the number previously available. The vast majority (71%) has no sequence homology to proteins archived in UniProtKB. We show that poor sequence or assembly quality is not a major contributor to this high representation by orphan genes. Moreover, most unannotated sequences are functional: a microarray experiment demonstrates that 59% of functional ESTs are unannotated. Lastly, we attempt to further annotate our EST dataset using genomic datasets from other members of the Acari, including Ixodes scapularis, four other tick species and the mite Tetranychus urticae. We find low homology with these species, consistent with significant divergence within this subclass. Conclusions We conclude that the abundance of orphan genes in A. americanum likely results from 1) taxonomic isolation stemming from divergence within the tick lineage and limited genomic resources for ticks and 2) lineage-specific genes needing functional genomic studies to evaluate their association with the unique biology of ticks. The EST sequences described here will contribute substantially to the development of tick genomics. Moreover, the framework provided for the evaluation of orphan genes can guide analyses of future transcriptome sequencing projects.
Collapse
Affiliation(s)
- Amanda K Gibson
- Department of Biology, Indiana University, Bloomington, IN 47405, USA.
| | | | | | | | | |
Collapse
|
5
|
Siew N, Fischer D. Unravelling the ORFan Puzzle. Comp Funct Genomics 2010; 4:432-41. [PMID: 18629076 PMCID: PMC2447361 DOI: 10.1002/cfg.311] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2003] [Revised: 06/05/2003] [Accepted: 06/05/2003] [Indexed: 12/27/2022] Open
Abstract
ORFans are open reading frames (ORFs) with no detectable sequence similarity
to any other sequence in the databases. Each newly sequenced genome contains a
significant number of ORFans. Therefore, ORFans entail interesting evolutionary
puzzles. However, little can be learned about them using bioinformatics tools, and
their study seems to have been underemphasized. Here we present some of the
questions that the existence of so many ORFans have raised and review some of
the studies aimed at understanding ORFans, their functions and their origins. These
works have demonstrated that ORFans are an untapped source of research, requiring
further computational and experimental studies.
Collapse
Affiliation(s)
- Naomi Siew
- Department of Chemistry, Ben Gurion University, Beer-Sheva 84105, Israel
| | | |
Collapse
|
6
|
Abstract
ORFan genes can constitute a large fraction of a bacterial genome, but due to their lack of homologs, their functions have remained largely unexplored. To determine if particular features of ORFan-encoded proteins promote their presence in a genome, we analyzed properties of ORFans that originated over a broad evolutionary timescale. We also compared ORFan genes to another class of acquired genes, heterogeneous occurrence in prokaryotes (HOPs), which have homologs in other bacteria. A total of 54 ORFan and HOP genes selected from different phylogenetic depths in the Escherichia coli lineage were cloned, expressed, purified, and subjected to circular dichroism (CD) spectroscopy. A majority of genes could be expressed, but only 18 yielded sufficient soluble protein for spectral analysis. Of these, half were significantly alpha-helical, three were predominantly beta-sheet, and six were of intermediate/indeterminate structure. Although a higher proportion of HOPs yielded soluble proteins with resolvable secondary structures, ORFans resembled HOPs with regard to most of the other features tested. Overall, we found that those ORFan and HOP genes that have persisted in the E. coli lineage were more likely to encode soluble and folded proteins, more likely to display environmental modulation of their gene expression, and by extrapolation, are more likely to be functional.
Collapse
Affiliation(s)
- Hema Prasad Narra
- Department of Biochemistry & Molecular Biophysics, University of Arizona, Tucson, AZ 85721, USA
| | | | | |
Collapse
|
7
|
|
8
|
Yin Y, Fischer D. Identification and investigation of ORFans in the viral world. BMC Genomics 2008; 9:24. [PMID: 18205946 PMCID: PMC2245933 DOI: 10.1186/1471-2164-9-24] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2007] [Accepted: 01/19/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genome-wide studies have already shed light into the evolution and enormous diversity of the viral world. Nevertheless, one of the unresolved mysteries in comparative genomics today is the abundance of ORFans - ORFs with no detectable sequence similarity to any other ORF in the databases. Recently, studies attempting to understand the origin and functions of bacterial ORFans have been reported. Here we present a first genome-wide identification and analysis of ORFans in the viral world, with focus on bacteriophages. RESULTS Almost one-third of all ORFs in 1,456 complete virus genomes correspond to ORFans, a figure significantly larger than that observed in prokaryotes. Like prokaryotic ORFans, viral ORFans are shorter and have a lower GC content than non-ORFans. Nevertheless, a statistically significant lower GC content is found only on a minority of viruses. By focusing on phages, we find that 38.4% of phage ORFs have no homologs in other phages, and 30.1% have no homologs neither in the viral nor in the prokaryotic world. Phages with different host ranges have different percentages of ORFans, reflecting different sampling status and suggesting various diversities. Similarity searches of the phage ORFeome (ORFans and non-ORFans) against prokaryotic genomes shows that almost half of the phage ORFs have prokaryotic homologs, suggesting the major role that horizontal transfer plays in bacterial evolution. Surprisingly, the percentage of phage ORFans with prokaryotic homologs is only 18.7%. This suggests that phage ORFans play a lesser role in horizontal transfer to prokaryotes, but may be among the major players contributing to the vast phage diversity. CONCLUSION Although the current sampling of viral genomes is extremely low, ORFans and near-ORFans are likely to continue to grow in number as more genomes are sequenced. The abundance of phage ORFans may be partially due to the expected vast viral diversity, and may be instrumental in understanding viral evolution. The functions, origins and fates of the majority of viral ORFans remain a mystery. Further computational and experimental studies are likely to shed light on the mechanisms that have given rise to so many bacterial and viral ORFans.
Collapse
Affiliation(s)
- Yanbin Yin
- Computer Science and Engineering Dept, 201 Bell Hall, University at Buffalo, Buffalo, NY 14260-2000, USA.
| | | |
Collapse
|
9
|
Al-Mailem DM, Hough DW, Danson MJ. The 2-oxoacid dehydrogenase multienzyme complex of Haloferax volcanii. Extremophiles 2007; 12:89-96. [PMID: 17571210 DOI: 10.1007/s00792-007-0091-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2007] [Accepted: 04/26/2007] [Indexed: 10/23/2022]
Abstract
Those aerobic archaea whose genomes have been sequenced possess four adjacent genes that, by sequence comparisons with bacteria and eukarya, appear to encode the component enzymes of a 2-oxoacid dehydrogenase multienzyme complex. However, no catalytic activity of any such complex has ever been detected in the archaea. In Thermoplasma acidophilum, evidence has been presented that the heterologously expressed recombinant enzyme possesses activity with the branched chain 2-oxoacids and, to a lesser extent, with pyruvate. In the current paper, we demonstrate that in Haloferax volcanii the four genes are transcribed as an operon in vivo. However, no functional complex or individual enzyme, except for the dihydrolipoamide dehydrogenase component, could be detected in this halophile grown on a variety of carbon sources. Dihydrolipoamide dehydrogenase is present at low catalytic activities, the level of which is increased three to fourfold when Haloferax volcanii is grown on the branched-chain amino acids valine, leucine and isoleucine.
Collapse
Affiliation(s)
- Dina M Al-Mailem
- Department of Biological Sciences, Faculty of Science, Kuwait University, PO Box 5969, Safat 13060, State of Kuwait
| | | | | |
Collapse
|
10
|
Wilson GA, Feil EJ, Lilley AK, Field D. Large-scale comparative genomic ranking of taxonomically restricted genes (TRGs) in bacterial and archaeal genomes. PLoS One 2007; 2:e324. [PMID: 17389915 PMCID: PMC1824705 DOI: 10.1371/journal.pone.0000324] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2007] [Accepted: 02/18/2007] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Lineage-specific, or taxonomically restricted genes (TRGs), especially those that are species and strain-specific, are of special interest because they are expected to play a role in defining exclusive ecological adaptations to particular niches. Despite this, they are relatively poorly studied and little understood, in large part because many are still orphans or only have homologues in very closely related isolates. This lack of homology confounds attempts to establish the likelihood that a hypothetical gene is expressed and, if so, to determine the putative function of the protein. METHODOLOGY/PRINCIPAL FINDINGS We have developed "QIPP" ("Quality Index for Predicted Proteins"), an index that scores the "quality" of a protein based on non-homology-based criteria. QIPP can be used to assign a value between zero and one to any protein based on comparing its features to other proteins in a given genome. We have used QIPP to rank the predicted proteins in the proteomes of Bacteria and Archaea. This ranking reveals that there is a large amount of variation in QIPP scores, and identifies many high-scoring orphans as potentially "authentic" (expressed) orphans. There are significant differences in the distributions of QIPP scores between orphan and non-orphan genes for many genomes and a trend for less well-conserved genes to have lower QIPP scores. CONCLUSIONS The implication of this work is that QIPP scores can be used to further annotate predicted proteins with information that is independent of homology. Such information can be used to prioritize candidates for further analysis. Data generated for this study can be found in the OrphanMine at http://www.genomics.ceh.ac.uk/orphan_mine.
Collapse
Affiliation(s)
- Gareth A Wilson
- Centre for Ecology and Hydrology (CEH) Oxford, Oxford, United Kindgom.
| | | | | | | |
Collapse
|
11
|
Yin Y, Fischer D. On the origin of microbial ORFans: quantifying the strength of the evidence for viral lateral transfer. BMC Evol Biol 2006; 6:63. [PMID: 16914045 PMCID: PMC1559721 DOI: 10.1186/1471-2148-6-63] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2006] [Accepted: 08/16/2006] [Indexed: 11/10/2022] Open
Abstract
Background: The origin of microbial ORFans, ORFs having no detectable homology to other ORFs in the databases, is one of the unexplained puzzles of the post-genomic era. Several hypothesis on the origin of ORFans have been suggested in the last few years, most of which based on selected, relatively small, subsets of ORFans. One of the hypotheses for the origin of ORFans is that they have been acquired thru lateral transfer from viruses. Here we carry out a comprehensive, genome-wide study on the origins of ORFans to quantify the strength of current evidence supporting this hypothesis. Results: We performed similarity searches by querying all current ORFans against the public virus protein database. Surprisingly, we found that only 2.8% of all microbial ORFans have detectable homologs in viruses, while the percentage of non-ORFans with detectable homologs in viruses is 7.9%, a significantly higher figure. This suggests that the current evidence for the origin of ORFans from lateral transfer from viruses is at best weak. However, an analysis of individual genomes revealed a number of organisms with much higher percentages, many of them belonging to the Firmicutes and Gamma-proteobacteria. We provide evidence suggesting that the current virus database may be biased towards those viruses attacking Firmicutes and Gamma-proteobacteria. Conclusion: We conclude that as more viral genomes are sequenced, more microbial ORFans will find homologs in viruses, but this trend may vary much for individual genomes. Thus, lateral transfer from viruses alone is unlikely to explain the origin of the majority of ORFans in the majority of prokaryotes and consequently, other, not necessarily exclusive, mechanisms are likely to better explain the origin of the increasing number of ORFans.
Collapse
Affiliation(s)
- Yanbin Yin
- Computer Science and Engineering Dept. 201 Bell Hall, University at Buffalo, Buffalo, NY 14260-2000, US
| | - Daniel Fischer
- Computer Science and Engineering Dept. 201 Bell Hall, University at Buffalo, Buffalo, NY 14260-2000, US
- Bioinformatics/Dept. of Computer Science, Ben Gurion University, Beer-Sheva 84015, Israel
| |
Collapse
|
12
|
Heath C, Jeffries AC, Hough DW, Danson MJ. Discovery of the catalytic function of a putative 2-oxoacid dehydrogenase multienzyme complex in the thermophilic archaeonThermoplasma acidophilum. FEBS Lett 2004; 577:523-7. [PMID: 15556640 DOI: 10.1016/j.febslet.2004.10.058] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2004] [Revised: 10/20/2004] [Accepted: 10/21/2004] [Indexed: 11/27/2022]
Abstract
Those aerobic archaea whose genomes have been sequenced possess a single 4-gene operon that, by sequence comparisons with Bacteria and Eukarya, appears to encode the three component enzymes of a 2-oxoacid dehydrogenase multienzyme complex. However, no catalytic activity of any such complex has ever been detected in the Archaea. In the current paper, we have cloned and expressed the first two genes of this operon from the thermophilic archaeon, Thermoplasma acidophilum. We demonstrate that the protein products form an alpha2beta2 hetero-tetramer possessing the decarboxylase catalytic activity characteristic of the first component enzyme of a branched-chain 2-oxoacid dehydrogenase multienzyme complex. This represents the first report of the catalytic function of these putative archaeal multienzyme complexes.
Collapse
Affiliation(s)
- Caroline Heath
- Centre for Extremophile Research, Department of Biology and Biochemistry, University of Bath, Bath BA2 7AY, UK
| | | | | | | |
Collapse
|
13
|
Siew N, Fischer D. Structural Biology Sheds Light on the Puzzle of Genomic ORFans. J Mol Biol 2004; 342:369-73. [PMID: 15327940 DOI: 10.1016/j.jmb.2004.06.073] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2004] [Revised: 06/09/2004] [Accepted: 06/19/2004] [Indexed: 10/26/2022]
Abstract
Genomic ORFans are orphan open reading frames (ORFs) with no significant sequence similarity to other ORFs. ORFans comprise 20-30% of the ORFs of most completely sequenced genomes. Because nothing can be learnt about ORFans via sequence homology, the functions and evolutionary origins of ORFans remain a mystery. Furthermore, because relatively few ORFans have been experimentally characterized, it has been suggested that most ORFans are not likely to correspond to functional, expressed proteins, but rather to spurious ORFs, pseudo-genes or to rapidly evolving proteins with non-essential roles. As a snapshot view of current ORFan structural studies, we searched for ORFans among proteins whose three-dimensional structures have been recently determined. We find that functional and structural studies of ORFans are not as underemphasized as previously suggested. These recently determined structures correspond to ORFans from all Kingdoms of life, and include proteins that have previously been functionally characterized, as well as structural genomics targets of unknown function labeled as "hypothetical proteins". This suggests that many of the ORFans in the databases are likely to correspond to expressed, functional (and even essential) proteins. Furthermore, the recently determined structures include examples of the various types of ORFans, suggesting that the functions and evolutionary origins of ORFans are diverse. Although this survey sheds some light on the ORFan mystery, further experimental studies are required to gain a better understanding of the role and origins of the tens of thousands of ORFans awaiting characterization.
Collapse
Affiliation(s)
- Naomi Siew
- Department of Chemistry, Ben Gurion University Beer-Sheva 84105, Israel
| | | |
Collapse
|
14
|
Abstract
Singleton sequence ORFans are orphan ORFs (open reading frames) that have no detectable sequence similarity to any other sequence in the databases. ORFans are of particular interest not only as evolutionary puzzles but also because we can learn little about them using bioinformatics tools. Here, we present a first systematic analysis of singleton ORFans in the first 60 fully sequenced microbial genomes. We show that although ORFans have been underemphasized, the number of ORFans is steadily growing, currently accounting for 23,634 sequences. At the same time, the percentage of ORFans as a fraction of all sequences is slowly diminishing, and is currently about 14%. Short ORFans comprise about 61% of all ORFans. The abundance of short ORFans may be due to a yet unexplained artifact. The data also suggest that the number of longer ORFans may soon diminish as more genomes of closely related organisms become available. To better address the questions about the functions and origins of ORFans, we propose to focus further studies on the longer ORFans, with emphasis on three new types of ORFans: ORFan modules, paralogous ORFans, and orthologous ORFans. We conclude that the large number of ORFans reflects an intrinsic property of the genetic material not yet fully understood. Further computational and experimental studies aimed at understanding Nature's protein diversity should also include ORFans.
Collapse
Affiliation(s)
- Naomi Siew
- Department of Chemistry, Ben Gurion University, Beer-Sheva, Israel
| | | |
Collapse
|
15
|
Charlebois RL, Clarke GDP, Beiko RG, St Jean A. Characterization of species-specific genes using a flexible, web-based querying system. FEMS Microbiol Lett 2003; 225:213-20. [PMID: 12951244 DOI: 10.1016/s0378-1097(03)00512-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
We describe a query-based web-accessible system (www.neurogadgets.com/bws.php) for facilitating comparative microbial genomics. A variety of query pages are available, each with numerous options, that allow a biologist to pose relevant questions of genomic data. We illustrate with a characterization of species-specific protein-coding genes (so-called "ORFans"), finding that they are on average smaller, faster evolving, and less G+C-rich, and that they encode proteins more basic in their predicted isoelectric point, compared with non-species-specific genes. Using a dual-threshold approach, we conclude that these are characteristics of true species-specific genes, rather than artifacts of mis-annotation.
Collapse
|
16
|
Daubin V, Lerat E, Perrière G. The source of laterally transferred genes in bacterial genomes. Genome Biol 2003; 4:R57. [PMID: 12952536 PMCID: PMC193657 DOI: 10.1186/gb-2003-4-9-r57] [Citation(s) in RCA: 148] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2003] [Revised: 06/11/2003] [Accepted: 07/04/2003] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Laterally transferred genes have often been identified on the basis of compositional features that distinguish them from ancestral genes in the genome. These genes are usually A+T-rich, arguing either that there is a bias towards acquiring genes from donor organisms having low G+C contents or that genes acquired from organisms of similar genomic base compositions go undetected in these analyses. RESULTS By examining the genome contents of closely related, fully sequenced bacteria, we uncovered genes confined to a single genome and examined the sequence features of these acquired genes. The analysis shows that few transfer events are overlooked by compositional analyses. Most observed lateral gene transfers do not correspond to free exchange of regular genes among bacterial genomes, but more probably represent the constituents of phages or other selfish elements. CONCLUSIONS Although bacteria tend to acquire large amounts of DNA, the origin of these genes remains obscure. We have shown that contrary to what is often supposed, their composition cannot be explained by a previous genomic context. In contrast, these genes fit the description of recently described genes in lambdoid phages, named 'morons'. Therefore, results from genome content and compositional approaches to detect lateral transfers should not be cited as evidence for genetic exchange between distantly related bacteria.
Collapse
Affiliation(s)
- Vincent Daubin
- Laboratoire de Biométrie et Biologie Evolutive, UMR CNRS 5558, Université Claude Bernard - Lyon 1, Cedex, France.
| | | | | |
Collapse
|
17
|
Abstract
The genomes of most newly sequenced organisms contain a significant fraction of ORFs (open reading frames) that match no other sequence in the databases. We refer to these singleton ORFs as sequence ORFans. Because little can be learned about ORFans by homology, the origin and functions of ORFans remain a mystery. However, in this era of full genome sequencing, it seems that ORFans have been underemphasized. In this minireview, we draw attention to the increasing number of ORFans and to the consequences of this growth to biological research in the postgenomic era.
Collapse
Affiliation(s)
- Naomi Siew
- Department of Chemistry, Department of Computer Science, Ben Gurion University, Beer-Sheva 84105, Israel
| | | |
Collapse
|
18
|
Monchois V, Abergel C, Sturgis J, Jeudy S, Claverie JM. Escherichia coli ykfE ORFan gene encodes a potent inhibitor of C-type lysozyme. J Biol Chem 2001; 276:18437-41. [PMID: 11278658 DOI: 10.1074/jbc.m010297200] [Citation(s) in RCA: 90] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
The complete nucleotide sequences of over 37 microbial and three eukaryote genomes are already publicly available, and more sequencing is in progress. Despite this accumulation of data, newly sequenced microbial genomes continue to reveal up to 50% of functionally uncharacterized "anonymous" genes. A majority of these anonymous proteins have homologues in other organisms, whereas the rest exhibit no clear similarity to any other sequence in the data bases. This set of unique, apparently species-specific, sequences are referred to as ORFans. The biochemical and structural analysis of ORFan gene products is of both evolutionary and functional interest. Here we report the cloning and expression of Escherichia coli ORFan ykfE gene and the functional characterization of the encoded protein. Under physiological conditions, the protein is a homodimer with a strong affinity for C-type lysozyme, as revealed by co-purification and co-crystallization. Activity measurements and fluorescence studies demonstrated that the YkfE gene product is a potent C-type lysozyme inhibitor (K(i) approximately 1 nm). To denote this newly assigned function, ykfE has now been registered under the new gene name Ivy (inhibitor of vertebrate lysozyme) at the E. coli genetic stock center.
Collapse
Affiliation(s)
- V Monchois
- Information Génétique et Structurale, UMR1889 CNRS-AVENTIS and Laboratoire d'Ingénierie des Systèmes Macromoléculaires, UPR 9027, 31 Chemin Joseph Aiguier, 13402 Marseille, CEDEX 20, France.
| | | | | | | | | |
Collapse
|
19
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2001. [PMCID: PMC2447185 DOI: 10.1002/cfg.55] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
|