1
|
Meng W, Kong L, Abulizi A, Cong J, Sun Z, Chang Y. Sex determination factor, a novel male-linked gene in the sea cucumber Apostichopus japonicus: Molecular characterization, expression patterns and effects of gene knockdown. Comp Biochem Physiol B Biochem Mol Biol 2025; 277:111071. [PMID: 39778676 DOI: 10.1016/j.cbpb.2025.111071] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 01/05/2025] [Accepted: 01/05/2025] [Indexed: 01/11/2025]
Abstract
Apostichopus japonicus is a highly significant marine aquaculture species. Research findings have indicated that male sea cucumbers demonstrate a more rapid growth rate compared to females, underscoring the potential advantages of establishing an all-male population. In this study, we identified a specific protein-coding gene (ORFan) within a 4565 bp male fragment and named it sex determination factor (sdf). The sdf transcript exhibited ubiquitous expression in various adult male tissues, along with dynamic expression patterns in the testis across different developmental stages. Notably, knockdown of the sdf gene through immersion of embryos in its specific vivo-morpholino oligomers (vivo-MO) resulted in significant changes in the expression levels of several sex-related genes including piwi1, vasa, foxl2, and DNMT3. Additionally, a transcriptomic analysis showed that sdf knockdown resulted in significant alterations in multiple biological processes encompassing various sex-related gene ontology terms such as male gonad development, ovarian follicle development, and steroidogenesis. These results provide a molecular foundation for comprehending ORFans in sea cucumbers while offering a valuable method for gene knockdown studies in echinoderms.
Collapse
Affiliation(s)
- Weihan Meng
- Key Laboratory of Mariculture& Stock Enhancement in North China's Sea, Ministry of Agriculture and Rural Affairs, Dalian Ocean University, Dalian 116023, China
| | - Lingnan Kong
- Key Laboratory of Mariculture& Stock Enhancement in North China's Sea, Ministry of Agriculture and Rural Affairs, Dalian Ocean University, Dalian 116023, China
| | - Abudula Abulizi
- Key Laboratory of Mariculture& Stock Enhancement in North China's Sea, Ministry of Agriculture and Rural Affairs, Dalian Ocean University, Dalian 116023, China
| | - Jingjing Cong
- Key Laboratory of Mariculture& Stock Enhancement in North China's Sea, Ministry of Agriculture and Rural Affairs, Dalian Ocean University, Dalian 116023, China; School of Life Science, Liaoning Normal University, Dalian 116029, China
| | - Zhihui Sun
- Key Laboratory of Mariculture& Stock Enhancement in North China's Sea, Ministry of Agriculture and Rural Affairs, Dalian Ocean University, Dalian 116023, China.
| | - Yaqing Chang
- Key Laboratory of Mariculture& Stock Enhancement in North China's Sea, Ministry of Agriculture and Rural Affairs, Dalian Ocean University, Dalian 116023, China
| |
Collapse
|
2
|
Orphan Genes Shared by Pathogenic Genomes Are More Associated with Bacterial Pathogenicity. mSystems 2019; 4:mSystems00290-18. [PMID: 30801025 PMCID: PMC6372840 DOI: 10.1128/msystems.00290-18] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 01/08/2019] [Indexed: 11/20/2022] Open
Abstract
Recent pangenome analyses of numerous bacterial species have suggested that each genome of a single species may have a significant fraction of its gene content unique or shared by a very few genomes (i.e., ORFans). We selected nine bacterial genera, each containing at least five pathogenic and five nonpathogenic genomes, to compare their ORFans in relation to pathogenicity-related genes. Pathogens in these genera are known to cause a number of common and devastating human diseases such as pneumonia, diphtheria, melioidosis, and tuberculosis. Thus, they are worthy of in-depth systems microbiology investigations, including the comparative study of ORFans between pathogens and nonpathogens. We provide direct evidence to suggest that ORFans shared by more pathogens are more associated with pathogenicity-related genes and thus are more important targets for development of new diagnostic markers or therapeutic drugs for bacterial infectious diseases. Orphan genes (also known as ORFans [i.e., orphan open reading frames]) are new genes that enable an organism to adapt to its specific living environment. Our focus in this study is to compare ORFans between pathogens (P) and nonpathogens (NP) of the same genus. Using the pangenome idea, we have identified 130,169 ORFans in nine bacterial genera (505 genomes) and classified these ORFans into four groups: (i) SS-ORFans (P), which are only found in a single pathogenic genome; (ii) SS-ORFans (NP), which are only found in a single nonpathogenic genome; (iii) PS-ORFans (P), which are found in multiple pathogenic genomes; and (iv) NS-ORFans (NP), which are found in multiple nonpathogenic genomes. Within the same genus, pathogens do not always have more genes, more ORFans, or more pathogenicity-related genes (PRGs)—including prophages, pathogenicity islands (PAIs), virulence factors (VFs), and horizontal gene transfers (HGTs)—than nonpathogens. Interestingly, in pathogens of the nine genera, the percentages of PS-ORFans are consistently higher than those of SS-ORFans, which is not true in nonpathogens. Similarly, in pathogens of the nine genera, the percentages of PS-ORFans matching the four types of PRGs are also always higher than those of SS-ORFans, but this is not true in nonpathogens. All of these findings suggest the greater importance of PS-ORFans for bacterial pathogenicity. IMPORTANCE Recent pangenome analyses of numerous bacterial species have suggested that each genome of a single species may have a significant fraction of its gene content unique or shared by a very few genomes (i.e., ORFans). We selected nine bacterial genera, each containing at least five pathogenic and five nonpathogenic genomes, to compare their ORFans in relation to pathogenicity-related genes. Pathogens in these genera are known to cause a number of common and devastating human diseases such as pneumonia, diphtheria, melioidosis, and tuberculosis. Thus, they are worthy of in-depth systems microbiology investigations, including the comparative study of ORFans between pathogens and nonpathogens. We provide direct evidence to suggest that ORFans shared by more pathogens are more associated with pathogenicity-related genes and thus are more important targets for development of new diagnostic markers or therapeutic drugs for bacterial infectious diseases.
Collapse
|
3
|
Mitić NS, Malkov SN, Kovačević JJ, Pavlović-Lažetić GM, Beljanski MV. Structural disorder of plasmid-encoded proteins in Bacteria and Archaea. BMC Bioinformatics 2018; 19:158. [PMID: 29699482 PMCID: PMC5922023 DOI: 10.1186/s12859-018-2158-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2017] [Accepted: 04/16/2018] [Indexed: 01/30/2023] Open
Abstract
Background In the last decade and a half it has been firmly established that a large number of proteins do not adopt a well-defined (ordered) structure under physiological conditions. Such intrinsically disordered proteins (IDPs) and intrinsically disordered (protein) regions (IDRs) are involved in essential cell processes through two basic mechanisms: the entropic chain mechanism which is responsible for rapid fluctuations among many alternative conformations, and molecular recognition via short recognition elements that bind to other molecules. IDPs possess a high adaptive potential and there is special interest in investigating their involvement in organism evolution. Results We analyzed 2554 Bacterial and 139 Archaeal proteomes, with a total of 8,455,194 proteins for disorder content and its implications for adaptation of organisms, using three disorder predictors and three measures. Along with other findings, we revealed that for all three predictors and all three measures (1) Bacteria exhibit significantly more disorder than Archaea; (2) plasmid-encoded proteins contain considerably more IDRs than proteins encoded on chromosomes (or whole genomes) in both prokaryote superkingdoms; (3) plasmid proteins are significantly more disordered than chromosomal proteins only in the group of proteins with no COG category assigned; (4) antitoxin proteins in comparison to other proteins, are the most disordered (almost double) in both Bacterial and Archaeal proteomes; (5) plasmidal proteins are more disordered than chromosomal proteins in Bacterial antitoxins and toxin-unclassified proteins, but have almost the same disorder content in toxin proteins. Conclusion Our results suggest that while disorder content depends on genome and proteome characteristics, it is more influenced by functional engagements than by gene location (on chromosome or plasmid). Electronic supplementary material The online version of this article (10.1186/s12859-018-2158-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Nenad S Mitić
- Department of Computer Science, Faculty of Mathematics, University of Belgrade, P.O.B. 550 Studentski trg 16, Belgrade, 11001, Serbia.
| | - Saša N Malkov
- Department of Computer Science, Faculty of Mathematics, University of Belgrade, P.O.B. 550 Studentski trg 16, Belgrade, 11001, Serbia
| | - Jovana J Kovačević
- Department of Computer Science, Faculty of Mathematics, University of Belgrade, P.O.B. 550 Studentski trg 16, Belgrade, 11001, Serbia
| | - Gordana M Pavlović-Lažetić
- Department of Computer Science, Faculty of Mathematics, University of Belgrade, P.O.B. 550 Studentski trg 16, Belgrade, 11001, Serbia
| | - Miloš V Beljanski
- Bio-lab, Institute of General and Physical Chemistry, P.O.B. 45, Studentski trg 12/V, Belgrade, 11001, Serbia
| |
Collapse
|
4
|
Pinos S, Pontarotti P, Raoult D, Merhej V. Identification of constraints influencing the bacterial genomes evolution in the PVC super-phylum. BMC Evol Biol 2017; 17:75. [PMID: 28274202 PMCID: PMC5343374 DOI: 10.1186/s12862-017-0921-3] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Accepted: 02/21/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Horizontal transfer plays an important role in the evolution of bacterial genomes, yet it obeys several constraints, including the ecological opportunity to meet other organisms, the presence of transfer systems, and the fitness of the transferred genes. Bacteria from the Planctomyctetes, Verrumicrobia, Chlamydiae (PVC) super-phylum have a compartmentalized cell plan delimited by an intracytoplasmic membrane that might constitute an additional constraint with particular impact on bacterial evolution. In this investigation, we studied the evolution of 33 genomes from PVC species and focused on the rate and the nature of horizontally transferred sequences in relation to their habitat and their cell plan. RESULTS Using a comparative phylogenomic approach, we showed that habitat influences the evolution of the bacterial genome's content and the flux of horizontal transfer of DNA (HT). Thus bacteria from soil, from insects and ubiquitous bacteria presented the highest average of horizontal transfer compared to bacteria living in water, extracellular bacteria in vertebrates, bacteria from amoeba and intracellular bacteria in vertebrates (with a mean of 379 versus 110 events per species, respectively and 7.6% of each genomes due to HT against 4.8%). The partners of these transfers were mainly bacterial organisms (94.9%); they allowed us to differentiate environmental bacteria, which exchanged more with Proteobacteria, and bacteria from vertebrates, which exchanged more with Firmicutes. The functional analysis of the horizontal transfers revealed a convergent evolution, with an over-representation of genes encoding for membrane biogenesis and lipid metabolism, among compartmentalized bacteria in the different habitats. CONCLUSIONS The presence of an intracytoplasmic membrane in PVC species seems to affect the genome's evolution through the selection of transferred DNA, according to their encoded functions.
Collapse
Affiliation(s)
- Sandrine Pinos
- Aix Marseille Université, CNRS, Centrale Marseille, I2M UMR 7373, Evolution Biologique et Modélisation, 3 place Victor Hugo, Marseille, 13331 France
- Aix Marseille Univ, CNRS, IRD, INSERM, AP-HM URMITE, IHU -Méditerranée Infection, 19-21 Boulevard Jean Moulin, Marseille, 13005 France
| | - Pierre Pontarotti
- Aix Marseille Université, CNRS, Centrale Marseille, I2M UMR 7373, Evolution Biologique et Modélisation, 3 place Victor Hugo, Marseille, 13331 France
| | - Didier Raoult
- Aix Marseille Univ, CNRS, IRD, INSERM, AP-HM URMITE, IHU -Méditerranée Infection, 19-21 Boulevard Jean Moulin, Marseille, 13005 France
| | - Vicky Merhej
- Aix Marseille Univ, CNRS, IRD, INSERM, AP-HM URMITE, IHU -Méditerranée Infection, 19-21 Boulevard Jean Moulin, Marseille, 13005 France
| |
Collapse
|
5
|
Zhu Q, Kosoy M, Dittmar K. HGTector: an automated method facilitating genome-wide discovery of putative horizontal gene transfers. BMC Genomics 2014; 15:717. [PMID: 25159222 PMCID: PMC4155097 DOI: 10.1186/1471-2164-15-717] [Citation(s) in RCA: 101] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2014] [Accepted: 08/20/2014] [Indexed: 11/23/2022] Open
Abstract
Background First pass methods based on BLAST match are commonly used as an initial step to separate the different phylogenetic histories of genes in microbial genomes, and target putative horizontal gene transfer (HGT) events. This will continue to be necessary given the rapid growth of genomic data and the technical difficulties in conducting large-scale explicit phylogenetic analyses. However, these methods often produce misleading results due to their inability to resolve indirect phylogenetic links and their vulnerability to stochastic events. Results A new computational method of rapid, exhaustive and genome-wide detection of HGT was developed, featuring the systematic analysis of BLAST hit distribution patterns in the context of a priori defined hierarchical evolutionary categories. Genes that fall beyond a series of statistically determined thresholds are identified as not adhering to the typical vertical history of the organisms in question, but instead having a putative horizontal origin. Tests on simulated genomic data suggest that this approach effectively targets atypically distributed genes that are highly likely to be HGT-derived, and exhibits robust performance compared to conventional BLAST-based approaches. This method was further tested on real genomic datasets, including Rickettsia genomes, and was compared to previous studies. Results show consistency with currently employed categories of HGT prediction methods. In-depth analysis of both simulated and real genomic data suggests that the method is notably insensitive to stochastic events such as gene loss, rate variation and database error, which are common challenges to the current methodology. An automated pipeline was created to implement this approach and was made publicly available at: https://github.com/DittmarLab/HGTector. The program is versatile, easily deployed, has a low requirement for computational resources. Conclusions HGTector is an effective tool for initial or standalone large-scale discovery of candidate HGT-derived genes. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-717) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qiyun Zhu
- Department of Biological Sciences, University at Buffalo, State University of New York, 109 Cooke Hall, Buffalo, NY 14260, USA.
| | | | | |
Collapse
|
6
|
Milani L, Ghiselli F, Guerra D, Breton S, Passamonti M. A comparative analysis of mitochondrial ORFans: new clues on their origin and role in species with doubly uniparental inheritance of mitochondria. Genome Biol Evol 2013; 5:1408-34. [PMID: 23824218 PMCID: PMC3730352 DOI: 10.1093/gbe/evt101] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Despite numerous comparative mitochondrial genomics studies revealing that animal mitochondrial genomes are highly conserved in terms of gene content, supplementary genes are sometimes found, often arising from gene duplication. Mitochondrial ORFans (ORFs having no detectable homology and unknown function) were found in bivalve molluscs with Doubly Uniparental Inheritance (DUI) of mitochondria. In DUI animals, two mitochondrial lineages are present: one transmitted through females (F-type) and the other through males (M-type), each showing a specific and conserved ORF. The analysis of 34 mitochondrial major Unassigned Regions of Musculista senhousia F- and M-mtDNA allowed us to verify the presence of novel mitochondrial ORFs in this species and to compare them with ORFs from other species with ascertained DUI, with other bivalves and with animals showing new mitochondrial elements. Overall, 17 ORFans from nine species were analyzed for structure and function. Many clues suggest that the analyzed ORFans arose from endogenization of viral genes. The co-option of such novel genes by viral hosts may have determined some evolutionary aspects of host life cycle, possibly involving mitochondria. The structure similarity of DUI ORFans within evolutionary lineages may also indicate that they originated from independent events. If these novel ORFs are in some way linked to DUI establishment, a multiple origin of DUI has to be considered. These putative proteins may have a role in the maintenance of sperm mitochondria during embryo development, possibly masking them from the degradation processes that normally affect sperm mitochondria in species with strictly maternal inheritance.
Collapse
Affiliation(s)
- Liliana Milani
- Dipartimento di Scienze Biologiche, Geologiche ed Ambientali, University of Bologna, Bologna, Italy.
| | | | | | | | | |
Collapse
|
7
|
Abstract
The origin and evolution of "ORFans" (suspected genes without known relatives) remain unclear. Here, we take advantage of a unique opportunity to examine the population diversity of thousands of ORFans, based on a collection of 35 complete genomes of isolates of Escherichia coli and Shigella (which is included phylogenetically within E. coli). As expected from previous studies, ORFans are shorter and AT-richer in sequence than non-ORFans. We find that ORFans often are very narrowly distributed: the most common pattern is for an ORFan to be found in only one genome. We compared within-species population diversity of ORFan genes with those of two control groups of non-ORFan genes. Patterns of population variation suggest that most ORFans are not artifacts, but encode real genes whose protein-coding capacity is conserved, reflecting selection against nonsynonymous mutations. Nevertheless, nonsynonymous nucleotide diversity is higher than for non-ORFans, whereas synonymous diversity is roughly the same. In particular, there is a several-fold excess of ORFans in the highest decile of diversity relative to controls, which might be due to weaker purifying selection, positive selection, or a subclass of ORFans that are decaying.
Collapse
Affiliation(s)
- Guoqin Yu
- Institute for Bioscience and Biotechnology Research, University of Maryland, USA.
| | | |
Collapse
|
8
|
Halachev MR, Loman NJ, Pallen MJ. Calculating orthologs in bacteria and Archaea: a divide and conquer approach. PLoS One 2011; 6:e28388. [PMID: 22174796 PMCID: PMC3236195 DOI: 10.1371/journal.pone.0028388] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Accepted: 11/07/2011] [Indexed: 12/27/2022] Open
Abstract
Among proteins, orthologs are defined as those that are derived by vertical descent from a single progenitor in the last common ancestor of their host organisms. Our goal is to compute a complete set of protein orthologs derived from all currently available complete bacterial and archaeal genomes. Traditional approaches typically rely on all-against-all BLAST searching which is prohibitively expensive in terms of hardware requirements or computational time (requiring an estimated 18 months or more on a typical server). Here, we present xBASE-Orth, a system for ongoing ortholog annotation, which applies a "divide and conquer" approach and adopts a pragmatic scheme that trades accuracy for speed. Starting at species level, xBASE-Orth carefully constructs and uses pan-genomes as proxies for the full collections of coding sequences at each level as it progressively climbs the taxonomic tree using the previously computed data. This leads to a significant decrease in the number of alignments that need to be performed, which translates into faster computation, making ortholog computation possible on a global scale. Using xBASE-Orth, we analyzed an NCBI collection of 1,288 bacterial and 94 archaeal complete genomes with more than 4 million coding sequences in 5 weeks and predicted more than 700 million ortholog pairs, clustered in 175,531 orthologous groups. We have also identified sets of highly conserved bacterial and archaeal orthologs and in so doing have highlighted anomalies in genome annotation and in the proposed composition of the minimal bacterial genome. In summary, our approach allows for scalable and efficient computation of the bacterial and archaeal ortholog annotations. In addition, due to its hierarchical nature, it is suitable for incorporating novel complete genomes and alternative genome annotations. The computed ortholog data and a continuously evolving set of applications based on it are integrated in the xBASE database, available at http://www.xbase.ac.uk/.
Collapse
Affiliation(s)
- Mihail R. Halachev
- School of Biosciences, University of Birmingham, Birmingham, United Kingdom
| | - Nicholas J. Loman
- School of Biosciences, University of Birmingham, Birmingham, United Kingdom
| | - Mark J. Pallen
- School of Biosciences, University of Birmingham, Birmingham, United Kingdom
| |
Collapse
|
9
|
Hirtreiter AM, Calloni G, Forner F, Scheibe B, Puype M, Vandekerckhove J, Mann M, Hartl FU, Hayer-Hartl M. Differential substrate specificity of group I and group II chaperonins in the archaeon Methanosarcina mazei. Mol Microbiol 2009; 74:1152-68. [PMID: 19843217 DOI: 10.1111/j.1365-2958.2009.06924.x] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Chaperonins are macromolecular machines that assist in protein folding. The archaeon Methanosarcina mazei has acquired numerous bacterial genes by horizontal gene transfer. As a result, both the bacterial group I chaperonin, GroEL, and the archaeal group II chaperonin, thermosome, coexist. A proteome-wide analysis of chaperonin interactors was performed to determine the differential substrate specificity of GroEL and thermosome. At least 13% of soluble M. mazei proteins interact with chaperonins, with the two systems having partially overlapping substrate sets. Remarkably, chaperonin selectivity is independent of phylogenetic origin and is determined by distinct structural and biochemical features of proteins. GroEL prefers well-conserved proteins with complex alpha/beta domains. In contrast, thermosome substrates comprise a group of faster-evolving proteins and contain a much wider range of different domain folds, including small all-alpha and all-beta modules, and a greater number of large multidomain proteins. Thus, the group II chaperonins may have facilitated the evolution of the highly complex proteomes characteristic of eukaryotic cells.
Collapse
Affiliation(s)
- Angela M Hirtreiter
- Department of Cellular Biochemistry, Max Planck Institute of Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany
| | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Abstract
ORFan genes can constitute a large fraction of a bacterial genome, but due to their lack of homologs, their functions have remained largely unexplored. To determine if particular features of ORFan-encoded proteins promote their presence in a genome, we analyzed properties of ORFans that originated over a broad evolutionary timescale. We also compared ORFan genes to another class of acquired genes, heterogeneous occurrence in prokaryotes (HOPs), which have homologs in other bacteria. A total of 54 ORFan and HOP genes selected from different phylogenetic depths in the Escherichia coli lineage were cloned, expressed, purified, and subjected to circular dichroism (CD) spectroscopy. A majority of genes could be expressed, but only 18 yielded sufficient soluble protein for spectral analysis. Of these, half were significantly alpha-helical, three were predominantly beta-sheet, and six were of intermediate/indeterminate structure. Although a higher proportion of HOPs yielded soluble proteins with resolvable secondary structures, ORFans resembled HOPs with regard to most of the other features tested. Overall, we found that those ORFan and HOP genes that have persisted in the E. coli lineage were more likely to encode soluble and folded proteins, more likely to display environmental modulation of their gene expression, and by extrapolation, are more likely to be functional.
Collapse
Affiliation(s)
- Hema Prasad Narra
- Department of Biochemistry & Molecular Biophysics, University of Arizona, Tucson, AZ 85721, USA
| | | | | |
Collapse
|
11
|
Podell S, Gaasterland T, Allen EE. A database of phylogenetically atypical genes in archaeal and bacterial genomes, identified using the DarkHorse algorithm. BMC Bioinformatics 2008; 9:419. [PMID: 18840280 PMCID: PMC2573894 DOI: 10.1186/1471-2105-9-419] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2008] [Accepted: 10/07/2008] [Indexed: 01/30/2023] Open
Abstract
Background The process of horizontal gene transfer (HGT) is believed to be widespread in Bacteria and Archaea, but little comparative data is available addressing its occurrence in complete microbial genomes. Collection of high-quality, automated HGT prediction data based on phylogenetic evidence has previously been impractical for large numbers of genomes at once, due to prohibitive computational demands. DarkHorse, a recently described statistical method for discovering phylogenetically atypical genes on a genome-wide basis, provides a means to solve this problem through lineage probability index (LPI) ranking scores. LPI scores inversely reflect phylogenetic distance between a test amino acid sequence and its closest available database matches. Proteins with low LPI scores are good horizontal gene transfer candidates; those with high scores are not. Description The DarkHorse algorithm has been applied to 955 microbial genome sequences, and the results organized into a web-searchable relational database, called the DarkHorse HGT Candidate Resource . Users can select individual genomes or groups of genomes to screen by LPI score, search for protein functions by descriptive annotation or amino acid sequence similarity, or select proteins with unusual G+C composition in their underlying coding sequences. The search engine reports LPI scores for match partners as well as query sequences, providing the opportunity to explore whether potential HGT donor sequences are phylogenetically typical or atypical within their own genomes. This information can be used to predict whether or not sufficient information is available to build a well-supported phylogenetic tree using the potential donor sequence. Conclusion The DarkHorse HGT Candidate database provides a powerful, flexible set of tools for identifying phylogenetically atypical proteins, allowing researchers to explore both individual HGT events in single genomes, and large-scale HGT patterns among protein families and genome groups. Although the DarkHorse algorithm cannot, by itself, provide definitive proof of horizontal gene transfer, it is a flexible, powerful tool that can be combined with slower, more rigorous methods in situations where these other methods could not otherwise be applied.
Collapse
Affiliation(s)
- Sheila Podell
- Marine Biology Research Division, Scripps Institution of Oceanography University of California at San Diego, La Jolla, CA 92093 USA.
| | | | | |
Collapse
|
12
|
Yin Y, Fischer D. Identification and investigation of ORFans in the viral world. BMC Genomics 2008; 9:24. [PMID: 18205946 PMCID: PMC2245933 DOI: 10.1186/1471-2164-9-24] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2007] [Accepted: 01/19/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genome-wide studies have already shed light into the evolution and enormous diversity of the viral world. Nevertheless, one of the unresolved mysteries in comparative genomics today is the abundance of ORFans - ORFs with no detectable sequence similarity to any other ORF in the databases. Recently, studies attempting to understand the origin and functions of bacterial ORFans have been reported. Here we present a first genome-wide identification and analysis of ORFans in the viral world, with focus on bacteriophages. RESULTS Almost one-third of all ORFs in 1,456 complete virus genomes correspond to ORFans, a figure significantly larger than that observed in prokaryotes. Like prokaryotic ORFans, viral ORFans are shorter and have a lower GC content than non-ORFans. Nevertheless, a statistically significant lower GC content is found only on a minority of viruses. By focusing on phages, we find that 38.4% of phage ORFs have no homologs in other phages, and 30.1% have no homologs neither in the viral nor in the prokaryotic world. Phages with different host ranges have different percentages of ORFans, reflecting different sampling status and suggesting various diversities. Similarity searches of the phage ORFeome (ORFans and non-ORFans) against prokaryotic genomes shows that almost half of the phage ORFs have prokaryotic homologs, suggesting the major role that horizontal transfer plays in bacterial evolution. Surprisingly, the percentage of phage ORFans with prokaryotic homologs is only 18.7%. This suggests that phage ORFans play a lesser role in horizontal transfer to prokaryotes, but may be among the major players contributing to the vast phage diversity. CONCLUSION Although the current sampling of viral genomes is extremely low, ORFans and near-ORFans are likely to continue to grow in number as more genomes are sequenced. The abundance of phage ORFans may be partially due to the expected vast viral diversity, and may be instrumental in understanding viral evolution. The functions, origins and fates of the majority of viral ORFans remain a mystery. Further computational and experimental studies are likely to shed light on the mechanisms that have given rise to so many bacterial and viral ORFans.
Collapse
Affiliation(s)
- Yanbin Yin
- Computer Science and Engineering Dept, 201 Bell Hall, University at Buffalo, Buffalo, NY 14260-2000, USA.
| | | |
Collapse
|
13
|
Jékely G. Did the last common ancestor have a biological membrane? Biol Direct 2006; 1:35. [PMID: 17129384 PMCID: PMC1675992 DOI: 10.1186/1745-6150-1-35] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2006] [Accepted: 11/27/2006] [Indexed: 12/24/2022] Open
Abstract
All theories about the origin and evolution of membrane bound cells necessarily have to cope with the nature of the last common ancestor of cellular life. One of the most important aspect of this ancestor, whether it had a closed biological membrane or not, has recently been intensely debated. Having a consensus about it would be an important step towards an eventual (though probably still remote) synthesis of the best elements of the current multitude of cell evolution models. Here I analyse the structural and functional conservation of the few universally distributed proteins that were undoubtedly present in the last common ancestor and that carry out membrane-associated functions. These include the SecY subunit of the protein-conducting channel, the signal recognition particle, the signal recognition particle receptor, the signal peptidase, and the proton ATPase. The conserved structural and functional aspects of these proteins indicate that the last common ancestor was associated with a hydrophobic layer with two hydrophilic sides (an inside and an outside) that had a full-fledged and asymmetric protein insertion and translocation machinery and served as a permeability barrier for protons and other small molecules. It is difficult to escape the conclusion that the last common ancestor had a closed biological membrane from which all cellular membranes evolved.
Collapse
Affiliation(s)
- Gáspár Jékely
- European Molecular Biology Laboratory, Meyerhofstrasse 1, Heidelberg, Germany.
| |
Collapse
|
14
|
Comas I, Moya A, Azad RK, Lawrence JG, Gonzalez-Candelas F. The evolutionary origin of Xanthomonadales genomes and the nature of the horizontal gene transfer process. Mol Biol Evol 2006; 23:2049-2057. [PMID: 16882701 DOI: 10.1093/molbev/msl075] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Determining the influence of horizontal gene transfer (HGT) on phylogenomic analyses and the retrieval of a tree of life is relevant for our understanding of microbial genome evolution. It is particularly difficult to differentiate between phylogenetic incongruence due to noise and that resulting from HGT. We have performed a large-scale, detailed evolutionary analysis of the different phylogenetic signals present in the genomes of Xanthomonadales, a group of Proteobacteria. We show that the presence of phylogenetic noise is not an obstacle to infer past and present HGTs during their evolution. The scenario derived from this analysis and other recently published reports reflect the confounding effects on bacterial phylogenomics of past and present HGT. Although transfers between closely related species are difficult to detect in genome-scale phylogenetic analyses, past transfers to the ancestor of extant groups appear as conflicting signals that occasionally might make impossible to determine the evolutionary origin of the whole genome.
Collapse
Affiliation(s)
- Iñaki Comas
- Instituto Cavanilles de Biodiversidad y Biología Evolutiva, Universidad de Valencia, Valencia, Spain.
| | | | | | | | | |
Collapse
|
15
|
Yin Y, Fischer D. On the origin of microbial ORFans: quantifying the strength of the evidence for viral lateral transfer. BMC Evol Biol 2006; 6:63. [PMID: 16914045 PMCID: PMC1559721 DOI: 10.1186/1471-2148-6-63] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2006] [Accepted: 08/16/2006] [Indexed: 11/10/2022] Open
Abstract
Background: The origin of microbial ORFans, ORFs having no detectable homology to other ORFs in the databases, is one of the unexplained puzzles of the post-genomic era. Several hypothesis on the origin of ORFans have been suggested in the last few years, most of which based on selected, relatively small, subsets of ORFans. One of the hypotheses for the origin of ORFans is that they have been acquired thru lateral transfer from viruses. Here we carry out a comprehensive, genome-wide study on the origins of ORFans to quantify the strength of current evidence supporting this hypothesis. Results: We performed similarity searches by querying all current ORFans against the public virus protein database. Surprisingly, we found that only 2.8% of all microbial ORFans have detectable homologs in viruses, while the percentage of non-ORFans with detectable homologs in viruses is 7.9%, a significantly higher figure. This suggests that the current evidence for the origin of ORFans from lateral transfer from viruses is at best weak. However, an analysis of individual genomes revealed a number of organisms with much higher percentages, many of them belonging to the Firmicutes and Gamma-proteobacteria. We provide evidence suggesting that the current virus database may be biased towards those viruses attacking Firmicutes and Gamma-proteobacteria. Conclusion: We conclude that as more viral genomes are sequenced, more microbial ORFans will find homologs in viruses, but this trend may vary much for individual genomes. Thus, lateral transfer from viruses alone is unlikely to explain the origin of the majority of ORFans in the majority of prokaryotes and consequently, other, not necessarily exclusive, mechanisms are likely to better explain the origin of the increasing number of ORFans.
Collapse
Affiliation(s)
- Yanbin Yin
- Computer Science and Engineering Dept. 201 Bell Hall, University at Buffalo, Buffalo, NY 14260-2000, US
| | - Daniel Fischer
- Computer Science and Engineering Dept. 201 Bell Hall, University at Buffalo, Buffalo, NY 14260-2000, US
- Bioinformatics/Dept. of Computer Science, Ben Gurion University, Beer-Sheva 84015, Israel
| |
Collapse
|
16
|
Tanaka N, Abe T, Miyazaki S, Sugawara H. G-InforBIO: integrated system for microbial genomics. BMC Bioinformatics 2006; 7:368. [PMID: 16887044 PMCID: PMC1552091 DOI: 10.1186/1471-2105-7-368] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2006] [Accepted: 08/04/2006] [Indexed: 11/10/2022] Open
Abstract
Background Genome databases contain diverse kinds of information, including gene annotations and nucleotide and amino acid sequences. It is not easy to integrate such information for genomic study. There are few tools for integrated analyses of genomic data, therefore, we developed software that enables users to handle, manipulate, and analyze genome data with a variety of sequence analysis programs. Results The G-InforBIO system is a novel tool for genome data management and sequence analysis. The system can import genome data encoded as eXtensible Markup Language documents as formatted text documents, including annotations and sequences, from DNA Data Bank of Japan and GenBank encoded as flat files. The genome database is constructed automatically after importing, and the database can be exported as documents formatted with eXtensible Markup Language or tab-deliminated text. Users can retrieve data from the database by keyword searches, edit annotation data of genes, and process data with G-InforBIO. In addition, information in the G-InforBIO database can be analyzed seamlessly with nine different software programs, including programs for clustering and homology analyses. Conclusion The G-InforBIO system simplifies genome analyses by integrating several available software programs to allow efficient handling and manipulation of genome data. G-InforBIO is freely available from the download site.
Collapse
Affiliation(s)
- Naoto Tanaka
- Center for Information Biology and DDBJ, National Institute of Genetics 1111 Yata, Mishima, Shizuoka 411-8540, Japan
- Institute for Bioinformatics Research and Development (BIRD), Japan Science and Technology Corporation (JST), 5-3 Yonbancho, Chiyoda-ku, Tokyo 102-8666, Japan
- Laboratory of Information Biology, Faculty of Pharmaceutical Science, Tokyo University of Science, 2641 Yamazaki, Noda, Chiba 278-8510, Japan
| | - Takashi Abe
- Center for Information Biology and DDBJ, National Institute of Genetics 1111 Yata, Mishima, Shizuoka 411-8540, Japan
- SOKENDAI, Hayama, Kanagawa 240-0193, Japan
| | - Satoru Miyazaki
- Laboratory of Information Biology, Faculty of Pharmaceutical Science, Tokyo University of Science, 2641 Yamazaki, Noda, Chiba 278-8510, Japan
| | - Hideaki Sugawara
- Center for Information Biology and DDBJ, National Institute of Genetics 1111 Yata, Mishima, Shizuoka 411-8540, Japan
- SOKENDAI, Hayama, Kanagawa 240-0193, Japan
| |
Collapse
|
17
|
Gophna U, Charlebois RL, Doolittle WF. Ancient lateral gene transfer in the evolution of Bdellovibrio bacteriovorus. Trends Microbiol 2006; 14:64-9. [PMID: 16413191 DOI: 10.1016/j.tim.2005.12.008] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2005] [Revised: 11/21/2005] [Accepted: 12/21/2005] [Indexed: 10/25/2022]
Abstract
The recently sequenced genome of the predatory delta-proteobacterium Bdellovibrio bacteriovorus provides many insights into its metabolism and evolution. Because its genes are reasonably uniform in G+C content, it was suggested that B. bacteriovorus actively resists recombination with foreign DNA and horizontal transfer of DNA from other bacteria. To investigate this further, we carried out a variety of phylogenetic and comparative genomics analyses using data from >200 microbial genomes, including several published delta-proteobacteria. Although there might be little evidence for the extensive recent transfer of genes, we demonstrate that ancient lateral gene acquisition has shaped the B. bacteriovorus genome to a great extent.
Collapse
Affiliation(s)
- Uri Gophna
- Department of Molecular Microbiology and Biotechnology, The George S. Wise Faculty of Life Sciences, Tel-Aviv University, Tel-Aviv 69978, Israel.
| | | | | |
Collapse
|
18
|
Mazumder R, Natale DA, Murthy S, Thiagarajan R, Wu CH. Computational identification of strain-, species- and genus-specific proteins. BMC Bioinformatics 2005; 6:279. [PMID: 16305751 PMCID: PMC1310627 DOI: 10.1186/1471-2105-6-279] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2005] [Accepted: 11/23/2005] [Indexed: 11/14/2022] Open
Abstract
Background The identification of unique proteins at different taxonomic levels has both scientific and practical value. Strain-, species- and genus-specific proteins can provide insight into the criteria that define an organism and its relationship with close relatives. Such proteins can also serve as taxon-specific diagnostic targets. Description A pipeline using a combination of computational and manual analyses of BLAST results was developed to identify strain-, species-, and genus-specific proteins and to catalog the closest sequenced relative for each protein in a proteome. Proteins encoded by a given strain are preliminarily considered to be unique if BLAST, using a comprehensive protein database, fails to retrieve (with an e-value better than 0.001) any protein not encoded by the query strain, species or genus (for strain-, species- and genus-specific proteins respectively), or if BLAST, using the best hit as the query (reverse BLAST), does not retrieve the initial query protein. Results are manually inspected for homology if the initial query is retrieved in the reverse BLAST but is not the best hit. Sequences unlikely to retrieve homologs using the default BLOSUM62 matrix (usually short sequences) are re-tested using the PAM30 matrix, thereby increasing the number of retrieved homologs and increasing the stringency of the search for unique proteins. The above protocol was used to examine several food- and water-borne pathogens. We find that the reverse BLAST step filters out about 22% of proteins with homologs that would otherwise be considered unique at the genus and species levels. Analysis of the annotations of unique proteins reveals that many are remnants of prophage proteins, or may be involved in virulence. The data generated from this study can be accessed and further evaluated from the CUPID (Core and Unique Protein Identification) system web site (updated semi-annually) at . Conclusion CUPID provides a set of proteins specific to a genus, species or a strain, and identifies the most closely related organism.
Collapse
Affiliation(s)
- Raja Mazumder
- Department of Biochemistry and Molecular Biology, Georgetown University Medical Center, 3900 Reservoir Rd., NW, Washington, DC 20057-1414, USA
| | - Darren A Natale
- Department of Biochemistry and Molecular Biology, Georgetown University Medical Center, 3900 Reservoir Rd., NW, Washington, DC 20057-1414, USA
| | - Sudhir Murthy
- DCWASA-DWT, 5000 Overlook Ave., SW, Washington, DC 20032, USA
| | - Rathi Thiagarajan
- Department of Biochemistry and Molecular Biology, Georgetown University Medical Center, 3900 Reservoir Rd., NW, Washington, DC 20057-1414, USA
| | - Cathy H Wu
- Department of Biochemistry and Molecular Biology, Georgetown University Medical Center, 3900 Reservoir Rd., NW, Washington, DC 20057-1414, USA
| |
Collapse
|
19
|
Renesto P, Azza S, Dolla A, Fourquet P, Vestris G, Gorvel JP, Raoult D. Proteome analysis of Rickettsia conorii by two-dimensional gel electrophoresis coupled with mass spectrometry. FEMS Microbiol Lett 2005; 245:231-8. [PMID: 15837377 DOI: 10.1016/j.femsle.2005.03.004] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2005] [Revised: 03/04/2005] [Accepted: 03/04/2005] [Indexed: 10/25/2022] Open
Abstract
The availability of genome sequence offers the opportunity to further expand our knowledge about proteins expressed by Rickettsia conorii, strictly intracellular bacterium responsible for Mediterranean spotted fever. Using two-dimensional polyacrylamide gel electrophoresis combined with MALDI-TOF mass spectrometry, we established the first reference map of R. conorii proteome. This approach also allowed identification of GroEL as the major antigen recognized by rabbit serum and sera of infected patients. Altogether, this work opens the way to characterize the proteome of R. conorii, to compare protein profiles of different isolates or of bacteria maintained under different experimental conditions and to identify immunogenic proteins as potential vaccine targets.
Collapse
Affiliation(s)
- Patricia Renesto
- Unité des Rickettsies, CNRS UMR 6020, IFR-48, Faculté de Médecine, 27 Boulevard Jean Moulin, 13385 Marseille, France.
| | | | | | | | | | | | | |
Collapse
|
20
|
Abstract
The remarkable diversity in the contents of genomes raises questions about how new genes and new functions originate. Recent evidence indicates that parasitism, particularly the molecular interactions between phage and their bacterial hosts, is a likely mechanism for generating new genes. This invention of such novel functions seems to be founded on a strategy that secures the short-term survival of parasitic elements and thereby contributes to the renovation of gene repertoires in their host.
Collapse
Affiliation(s)
- Vincent Daubin
- Department of Biochemistry & Molecular Biophysics, University of Arizona, Tucson, AZ 85721, USA.
| | | |
Collapse
|
21
|
Abstract
There are many ways to group completed genome sequences in hierarchical patterns (trees) reflecting relationships between their genes. Such groupings help us organize biological information and bear crucially on underlying processes of genome and organismal evolution. Genome trees make use of all comparable genes but can variously weight the contributions of these genes according to similarity, congruent patterns of similarity, or prevalence among genomes. Here we explore such possible weighting strategies, in an analysis of 142 prokaryotic and 5 eukaryotic genomes. We demonstrate that alternate weighting strategies have different advantages, and we propose that each may have its specific uses in systematic or evolutionary biology. Comparisons of results obtained with different methods can provide further clues to major events and processes in genome evolution.
Collapse
Affiliation(s)
- Uri Gophna
- Genome Atlantic and Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia
| | | | | |
Collapse
|
22
|
Charlebois RL, Doolittle WF. Computing prokaryotic gene ubiquity: rescuing the core from extinction. Genome Res 2005; 14:2469-77. [PMID: 15574825 PMCID: PMC534671 DOI: 10.1101/gr.3024704] [Citation(s) in RCA: 143] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The genomic core concept has found several uses in comparative and evolutionary genomics. Defined as the set of all genes common to (ubiquitous among) all genomes in a phylogenetically coherent group, core size decreases as the number and phylogenetic diversity of the relevant group increases. Here, we focus on methods for defining the size and composition of the core of all genes shared by sequenced genomes of prokaryotes (Bacteria and Archaea). There are few (almost certainly less than 50) genes shared by all of the 147 genomes compared, surely insufficient to conduct all essential functions. Sequencing and annotation errors are responsible for the apparent absence of some genes, while very limited but genuine disappearances (from just one or a few genomes) can account for several others. Core size will continue to decrease as more genome sequences appear, unless the requirement for ubiquity is relaxed. Such relaxation seems consistent with any reasonable biological purpose for seeking a core, but it renders the problem of definition more problematic. We propose an alternative approach (the phylogenetically balanced core), which preserves some of the biological utility of the core concept. Cores, however delimited, preferentially contain informational rather than operational genes; we present a new hypothesis for why this might be so.
Collapse
Affiliation(s)
- Robert L Charlebois
- Genome Atlantic, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, B3H 1X5, Canada
| | | |
Collapse
|
23
|
Beiko RG, Charlebois RL. GANN: genetic algorithm neural networks for the detection of conserved combinations of features in DNA. BMC Bioinformatics 2005; 6:36. [PMID: 15725347 PMCID: PMC553964 DOI: 10.1186/1471-2105-6-36] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2004] [Accepted: 02/22/2005] [Indexed: 11/16/2022] Open
Abstract
Background The multitude of motif detection algorithms developed to date have largely focused on the detection of patterns in primary sequence. Since sequence-dependent DNA structure and flexibility may also play a role in protein-DNA interactions, the simultaneous exploration of sequence- and structure-based hypotheses about the composition of binding sites and the ordering of features in a regulatory region should be considered as well. The consideration of structural features requires the development of new detection tools that can deal with data types other than primary sequence. Results GANN (available at ) is a machine learning tool for the detection of conserved features in DNA. The software suite contains programs to extract different regions of genomic DNA from flat files and convert these sequences to indices that reflect sequence and structural composition or the presence of specific protein binding sites. The machine learning component allows the classification of different types of sequences based on subsamples of these indices, and can identify the best combinations of indices and machine learning architecture for sequence discrimination. Another key feature of GANN is the replicated splitting of data into training and test sets, and the implementation of negative controls. In validation experiments, GANN successfully merged important sequence and structural features to yield good predictive models for synthetic and real regulatory regions. Conclusion GANN is a flexible tool that can search through large sets of sequence and structural feature combinations to identify those that best characterize a set of sequences.
Collapse
Affiliation(s)
- Robert G Beiko
- Institute for Molecular Bioscience, The University of Queensland, Brisbane 4072, Australia
- Department of Biology, University of Ottawa, Ottawa, ON, K1N 6N5, Canada
| | - Robert L Charlebois
- Genome Atlantic, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS, B3H 1X5, Canada
| |
Collapse
|
24
|
Abstract
Differences in gene repertoire among bacterial genomes are usually ascribed to gene loss or to lateral gene transfer from unrelated cellular organisms. However, most bacteria contain large numbers of ORFans, that is, annotated genes that are restricted to a particular genome and that possess no known homologs. The uniqueness of ORFans within a genome has precluded the use of a comparative approach to examine their function and evolution. However, by identifying sequences unique to monophyletic groups at increasing phylogenetic depths, we can make direct comparisons of the characteristics of ORFans of different ages in the Escherichia coli genome, and establish their functional status and evolutionary rates. Relative to the genes ancestral to gamma-Proteobacteria and to those genes distributed sporadically in other prokaryotic species, ORFans in the E. coli lineage are short, A+T rich, and evolve quickly. Moreover, most encode functional proteins. Based on these features, ORFans are not attributable to errors in gene annotation, limitations of current databases, or to failure of methods for detecting homology. Rather, ORFans in the genomes of free-living microorganisms apparently derive from bacteriophage and occasionally become established by assuming roles in key cellular functions.
Collapse
Affiliation(s)
- Vincent Daubin
- Department of Biochemistry & Molecular Biophysics, University of Arizona, Tucson, Arizona 85721, USA.
| | | |
Collapse
|
25
|
Gophna U, Charlebois RL, Doolittle WF. Have archaeal genes contributed to bacterial virulence? Trends Microbiol 2004; 12:213-9. [PMID: 15120140 DOI: 10.1016/j.tim.2004.03.002] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Affiliation(s)
- Uri Gophna
- Genome Atlantic and Department of Biochemistry and Molecular Biology, Dalhousie University, 5850 College Street, Halifax, Nova Scotia B3H 1X5, Canada.
| | | | | |
Collapse
|