1
|
Barceló-Antemate D, Fontove-Herrera F, Santos W, Merino E. The effect of the genomic GC content bias of prokaryotic organisms on the secondary structures of their proteins. PLoS One 2023; 18:e0285201. [PMID: 37141209 PMCID: PMC10159118 DOI: 10.1371/journal.pone.0285201] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 04/17/2023] [Indexed: 05/05/2023] Open
Abstract
One of the main characteristics of prokaryotic genomes is the ratio in which guanine-cytosine bases are used in their DNA sequences. This is known as the genomic GC content and varies widely, from values below 20% to values greater than 74%. It has been demonstrated that the genomic GC content varies in accordance with the phylogenetic distribution of organisms and influences the amino acid composition of their corresponding proteomes. This bias is particularly important for amino acids that are coded by GC content-rich codons such as alanine, glycine, and proline, as well as amino acids that are coded by AT-rich codons, such as lysine, asparagine, and isoleucine. In our study, we extend these results by considering the effect of the genomic GC content on the secondary structure of proteins. On a set of 192 representative prokaryotic genomes and proteome sequences, we identified through a bioinformatic study that the composition of the secondary structures of the proteomes varies in relation to the genomic GC content; random coils increase as the genomic GC content increases, while alpha-helices and beta-sheets present an inverse relationship. In addition, we found that the tendency of an amino acid to form part of a secondary structure of proteins is not ubiquitous, as previously expected, but varies according to the genomic GC content. Finally, we discovered that for some specific groups of orthologous proteins, the GC content of genes biases the composition of secondary structures of the proteins for which they code.
Collapse
Affiliation(s)
- Diana Barceló-Antemate
- Departamento de Microbiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
- Centro de Investigación en Dinámica Celular, Instituto de Investigación en Ciencias Básicas y Aplicadas, Universidad Autónoma del Estado de Morelos (UAEM), Cuernavaca, Morelos, México
| | | | - Walter Santos
- Departamento de Microbiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| | - Enrique Merino
- Departamento de Microbiología Molecular, Instituto de Biotecnología, Universidad Nacional Autónoma de México, Cuernavaca, Morelos, México
| |
Collapse
|
2
|
Korenskaia AE, Matushkin YG, Lashin SA, Klimenko AI. Bioinformatic Assessment of Factors Affecting the Correlation between Protein Abundance and Elongation Efficiency in Prokaryotes. Int J Mol Sci 2022; 23:11996. [PMID: 36233299 PMCID: PMC9570070 DOI: 10.3390/ijms231911996] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2022] [Revised: 09/23/2022] [Accepted: 09/30/2022] [Indexed: 11/07/2022] Open
Abstract
Protein abundance is crucial for the majority of genetically regulated cell functions to act properly in prokaryotic organisms. Therefore, developing bioinformatic methods for assessing the efficiency of different stages of gene expression is of great importance for predicting the actual protein abundance. One of these steps is the evaluation of translation elongation efficiency based on mRNA sequence features, such as codon usage bias and mRNA secondary structure properties. In this study, we have evaluated correlation coefficients between experimentally measured protein abundance and predicted elongation efficiency characteristics for 26 prokaryotes, including non-model organisms, belonging to diverse taxonomic groups The algorithm for assessing elongation efficiency takes into account not only codon bias, but also number and energy of secondary structures in mRNA if those demonstrate an impact on predicted elongation efficiency of the ribosomal protein genes. The results show that, for a number of organisms, secondary structures are a better predictor of protein abundance than codon usage bias. The bioinformatic analysis has revealed several factors associated with the value of the correlation coefficient. The first factor is the elongation efficiency optimization type-the organisms whose genomes are optimized for codon usage only have significantly higher correlation coefficients. The second factor is taxonomical identity-bacteria that belong to the class Bacilli tend to have higher correlation coefficients among the analyzed set. The third is growth rate, which is shown to be higher for the organisms with higher correlation coefficients between protein abundance and predicted translation elongation efficiency. The obtained results can be useful for further improvement of methods for protein abundance prediction.
Collapse
Affiliation(s)
- Aleksandra E. Korenskaia
- Kurchatov Genomics Center, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk National Research State University, Pirogova St. 1, 630090 Novosibirsk, Russia
| | - Yury G. Matushkin
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk National Research State University, Pirogova St. 1, 630090 Novosibirsk, Russia
| | - Sergey A. Lashin
- Kurchatov Genomics Center, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Department of Natural Sciences, Novosibirsk National Research State University, Pirogova St. 1, 630090 Novosibirsk, Russia
| | - Alexandra I. Klimenko
- Kurchatov Genomics Center, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
- Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Science, Lavrentiev Avenue 10, 630090 Novosibirsk, Russia
| |
Collapse
|
3
|
Fuglsang A. Intragenic codon usage in proteobacteria: Translational selection, IS expansion and genomic shrinkage. Gene 2022; 809:146015. [PMID: 34655721 DOI: 10.1016/j.gene.2021.146015] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Revised: 08/08/2021] [Accepted: 10/11/2021] [Indexed: 11/16/2022]
Abstract
This manuscript presents a method to systematically study intragenic variations in codon usage using correspondence analysis and the effective number of codons. The method is applied to >1100 proteobacteria. Codon usage biases (measured as inertia) increases with genome size, the same is true for the percentage of inertia explained by the first axis. It is shown that there is often a relaxed or more uniform codon usage near the gene termini. Ithis is not seen n small genomes, notably those of intracellular organisms like Buchnera aphidicola or Rickettsia prowazekii where translational selection plays less of a role. When genes from E. coli, for which translational selection is well described, are split into low, intermediate and high expression, respectively, it is shown that the intragenic codon usage pattern with more uniform usage at termini exist across all three expression groups. Furthermore, the correspondence analysis reveals a unique pattern in Bordetella pertussis due to IS expansion. This study thus shows that translational selection, genome shrinkage and IS expansion result in characteristic patterns in intragenic codon usage.
Collapse
|
4
|
GC constituents and relative codon expressed amino acid composition in cyanobacterial phycobiliproteins. Gene 2014; 546:162-71. [PMID: 24933001 DOI: 10.1016/j.gene.2014.06.024] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Revised: 04/17/2014] [Accepted: 06/12/2014] [Indexed: 02/01/2023]
Abstract
The genomic as well as structural relationship of phycobiliproteins (PBPs) in different cyanobacterial species are determined by nucleotides as well as amino acid composition. The genomic GC constituents influence the amino acid variability and codon usage of particular subunit of PBPs. We have analyzed 11 cyanobacterial species to explore the variation of amino acids and causal relationship between GC constituents and codon usage. The study at the first, second and third levels of GC content showed relatively more amino acid variability on the levels of G3+C3 position in comparison to the first and second positions. The amino acid encoded GC rich level including G rich and C rich or both correlate the codon variability and amino acid availability. The fluctuation in amino acids such as Arg, Ala, His, Asp, Gly, Leu and Glu in α and β subunits was observed at G1C1 position; however, fluctuation in other amino acids such as Ser, Thr, Cys and Trp was observed at G2C2 position. The coding selection pressure of amino acids such as Ala, Thr, Tyr, Asp, Gly, Ile, Leu, Asn, and Ser in α and β subunits of PBPs was more elaborated at G3C3 position. In this study, we observed that each subunit of PBPs is codon specific for particular amino acid. These results suggest that genomic constraint linked with GC constituents selects the codon for particular amino acids and furthermore, the codon level study may be a novel approach to explore many problems associated with genomics and proteomics of cyanobacteria.
Collapse
|
5
|
Abstract
Human metapneumovirus (HMPV) is an important agent of acute respiratory tract infection in children, while its pathogenicity and molecular evolution are lacking. Herein, we firstly report the synonymous codon usage patterns of HMPV genome. The relative synonymous codon usage (RSCU) values, effective number of codon (ENC) values, nucleotide contents, and correlation analysis were performed among 17 available whole genome of HMPV, including different genotypes. All preferred codons in HMPV are ended with A/U nucleotide and exhibited a great association with its high proportion of these two nucleotides in their genomes. Mutation pressure rather than natural selection is the main influence factor that determines the bias of synonymous codon usage in HMPV. The complementary pattern of codon usage bias between HMPV and human cell was observed, and this phenomenon suggests that host cells might be also act as an important factor to affect the codon usage bias. Moreover, the codon usage biases in each HMPV genotypes are separated into different clades, which suggest that phylogenetic distance might involve in codon usage bias formation as well. These analyses of synonymous codon usage bias in HMPV provide more information for better understanding its evolution and pathogenicity.
Collapse
|
6
|
Zhou T, Lu ZH, Sun X. The Correlation between Recombination Rate and Codon Bias in Yeast Mainly Results from Mutational Bias Associated with Recombination Rather than Hill-Robertson Interference. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2012; 2005:4787-90. [PMID: 17281312 DOI: 10.1109/iembs.2005.1615542] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Codon usage has been reported to be correlated with local recombination rate, which can be explained by two proposed models. In the present study, correspondence analysis was used to investigate the major trends in codon usage variation among S. cerevisiae genes. It was found that the first principle source of codon usage variation in yeast is due to the variance of expressional levels, which is consistent with the previous translational selection model. Moreover, recombination rate is also correlated with the codon pattern, which might be a byproduct of mutational bias associated with recombination rather than the consequence of Hill-Robertson interference. A recent study has analysed the genome sequence, but reached opposite conclusions: the positive correlation between recombination rate and codon bias in yeast mainly results from Hill-Robertson interference. In light of this conflicting result, we have discussed the possible reason and found that the previous analysis was undermined by mistaken assumptions that weak selection acting at expression level led to the correlation between recombination and codon bias.
Collapse
Affiliation(s)
- T Zhou
- Key Laboratory of Molecular and Biomolecular Electronics of the Ministry of Education, Southeast University, Nanjing 210096, China
| | | | | |
Collapse
|
7
|
Ma F, Zhuang Y, Li Y, Xu X, Chen X. Usage Patterns of Codons Versus Complementary Codons Among Cellular Organisms and Organelles. J BIOL SYST 2011. [DOI: 10.1142/s0218339003000944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Genetic code is one of the most important biological languages in communications between DNA and protein, so peoples have been paying a great attention to the usage bias of synonymous codons. Based on Grosjean and Ikemura's "optimum combination of codon-anticodon complex" and "translation efficiency" hypotheses, in this paper, we put forward that a biased codon usage is identical to its corresponding complementary codon usage preference. To testify the hypothesis and reveal usage patterns between codons and corresponding complementary codons among different cellular organisms and organelles, the usage data of both codons and corresponding complementary ones from 28 cellular organisms and 20 organelles were analyzed. The results showed that: (1) there is a significantly positive correlation between codons and their complementary ones in most cellular organisms, chloroplasts and mitochondria; (2) all 32 single pairs codon versus complementary codon shared the likely usage correlation patterns, with the significantly positive, unrelated and significantly negative pair number of 18, 12 and 2 within 28 cellular organisms as well as 11, 17 and 4 within 20 organelles respectively, and some usage patterns of 32 single pairs codon versus complementary codon of cellular organisms are highly consistency with two kinds of organelles, which strongly implied that their codon usage has undergone the similar evolutionary selection in their wobbling and modification; (3) the codon-frequency tree agreed fairly well with the traditional one. These results demonstrated the validity of our hypothesis, and indicated the usefulness of correlation between codon and complementary codon in elucidating molecular evolutionary mechanisms.
Collapse
Affiliation(s)
- Fei Ma
- Institute of Bioinformatics, Tsinghua University, Beijing 100084, China
| | - Yonglong Zhuang
- Institute of Bioinformatics, Tsinghua University, Beijing 100084, China
| | - Yanda Li
- Institute of Bioinformatics, Tsinghua University, Beijing 100084, China
| | - Xiaofeng Xu
- Life Science College, Nanjing Normal University, Nanjing 210097, China
| | - Xueping Chen
- College of Economics and Technology, University of Science and Technology of China, Hefei 230052, China
| |
Collapse
|
8
|
Viklund J, Ettema TJG, Andersson SGE. Independent genome reduction and phylogenetic reclassification of the oceanic SAR11 clade. Mol Biol Evol 2011; 29:599-615. [PMID: 21900598 DOI: 10.1093/molbev/msr203] [Citation(s) in RCA: 92] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
The SAR11 clade, here represented by Candidatus Pelagibacter ubique, is the most successful group of bacteria in the upper surface waters of the oceans. In contrast to previous studies that have associated the 1.3 Mb genome of Ca. Pelagibacter ubique with the less than 1.5 Mb genomes of the Rickettsiales, our phylogenetic analysis suggests that Ca. Pelagibacter ubique is most closely related to soil and aquatic Alphaproteobacteria with large genomes. This implies that the SAR11 clade and the Rickettsiales have undergone genome reduction independently. A gene flux analysis of 46 representative alphaproteobacterial genomes indicates the loss of more than 800 genes in each of Ca. Pelagibacter ubique and the Rickettsiales. Consistent with their different phylogenetic affiliations, the pattern of gene loss differs with a higher loss of genes for repair and recombination processes in Ca. Pelagibacter ubique as compared with a more extensive loss of genes for biosynthetic functions in the Rickettsiales. Some of the lost genes in Ca. Pelagibacter ubique, such as mutLS, recFN, and ruvABC, are conserved in all other alphaproteobacterial genomes including the small genomes of the Rickettsiales. The mismatch repair genes mutLS are absent from all currently sequenced SAR11 genomes and also underrepresented in the global ocean metagenome data set. We hypothesize that the unique loss of genes involved in repair and recombination processes in Ca. Pelagibacter ubique has been driven by selection and that this helps explain many of the characteristics of the SAR11 population, such as the streamlined genomes, the long branch lengths, the high recombination frequencies, and the extensive sequence divergence within the population.
Collapse
Affiliation(s)
- Johan Viklund
- Department of Molecular Evolution, Evolutionary Biology Center, Science for Life Laboratory, Uppsala University, Uppsala, Sweden
| | | | | |
Collapse
|
9
|
Cardinale DJ, Duffy S. Single-stranded genomic architecture constrains optimal codon usage. BACTERIOPHAGE 2011; 1:219-224. [PMID: 22334868 PMCID: PMC3278643 DOI: 10.4161/bact.1.4.18496] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/21/2011] [Revised: 10/21/2011] [Accepted: 10/23/2011] [Indexed: 12/11/2022]
Abstract
Viral codon usage is shaped by the conflicting forces of mutational pressure and selection to match host patterns for optimal expression. We examined whether genomic architecture (single- or double-stranded DNA) influences the degree to which bacteriophage codon usage differ from their primary bacterial hosts and each other. While both correlated equally with their hosts’ genomic nucleotide content, the coat genes of ssDNA phages were less well adapted than those of dsDNA phages to their hosts’ codon usage profiles due to their preference for codons ending in thymine. No specific biases were detected in dsDNA phage genomes. In all nine of ten cases of codon redundancy in which a specific codon was overrepresented, ssDNA phages favored the NNT codon. A cytosine to thymine biased mutational pressure working in conjunction with strong selection against non-synonymous mutations appears be shaping codon usage bias in ssDNA viral genomes.
Collapse
Affiliation(s)
- Daniel J Cardinale
- Department of Ecology, Evolution and Natural Resources; School of Environmental and Biological Sciences; Rutgers; The State University of New Jersey; New Brunswick, NJ USA
| | | |
Collapse
|
10
|
Selected codon usage bias in members of the class Mollicutes. Gene 2010; 473:110-8. [PMID: 21147204 DOI: 10.1016/j.gene.2010.11.010] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2010] [Revised: 11/20/2010] [Accepted: 11/22/2010] [Indexed: 11/24/2022]
Abstract
Mollicutes are parasitic microorganisms mainly characterized by small cell sizes, reduced genomes and great A and T mutational bias. We analyzed the codon usage patterns of the completely sequenced genomes of bacteria that belong to this class. We found that for many organisms not only mutational bias but also selection has a major effect on codon usage. Through a comparative perspective and based on three widely used criteria we were able to classify Mollicutes according to the effect of selection on codon usage. We found conserved optimal codons in many species and study the tRNA gene pool in each genome. Previous results are reinforced by the fact that, when selection is operative, the putative optimal codons found match the respective cognate tRNA. Finally, we trace selection effect backwards to the common ancestor of the class and estimate the phylogenetic inertia associated with this character. We discuss the possible scenarios that explain the observed evolutionary patterns.
Collapse
|
11
|
Hershberg R, Petrov DA. Evidence that mutation is universally biased towards AT in bacteria. PLoS Genet 2010; 6:e1001115. [PMID: 20838599 PMCID: PMC2936535 DOI: 10.1371/journal.pgen.1001115] [Citation(s) in RCA: 318] [Impact Index Per Article: 21.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2010] [Accepted: 08/09/2010] [Indexed: 11/19/2022] Open
Abstract
Mutation is the engine that drives evolution and adaptation forward in that it generates the variation on which natural selection acts. Mutation is a random process that nevertheless occurs according to certain biases. Elucidating mutational biases and the way they vary across species and within genomes is crucial to understanding evolution and adaptation. Here we demonstrate that clonal pathogens that evolve under severely relaxed selection are uniquely suitable for studying mutational biases in bacteria. We estimate mutational patterns using sequence datasets from five such clonal pathogens belonging to four diverse bacterial clades that span most of the range of genomic nucleotide content. We demonstrate that across different types of sites and in all four clades mutation is consistently biased towards AT. This is true even in clades that have high genomic GC content. In all studied cases the mutational bias towards AT is primarily due to the high rate of C/G to T/A transitions. These results suggest that bacterial mutational biases are far less variable than previously thought. They further demonstrate that variation in nucleotide content cannot stem entirely from variation in mutational biases and that natural selection and/or a natural selection-like process such as biased gene conversion strongly affect nucleotide content.
Collapse
Affiliation(s)
- Ruth Hershberg
- Department of Biology, Stanford University, Stanford, California, United States of America.
| | | |
Collapse
|
12
|
Davis JJ, Olsen GJ. Characterizing the native codon usages of a genome: an axis projection approach. Mol Biol Evol 2010; 28:211-21. [PMID: 20679093 PMCID: PMC3002238 DOI: 10.1093/molbev/msq185] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
Codon usage can provide insights into the nature of the genes in a genome. Genes that are “native” to a genome (have not been recently acquired by horizontal transfer) range in codon usage from a low-bias “typical” usage to a more biased “high-expression” usage characteristic of genes encoding abundant proteins. Genes that differ from these native codon usages are candidates for foreign genes that have been recently acquired by horizontal gene transfer. In this study, we present a method for characterizing the codon usages of native genes—both typical and highly expressed—within a genome. Each gene is evaluated relative to a half line (or axis) in a 59D space of codon usage. The axis begins at the modal codon usage, the usage that matches the largest number of genes in the genome, and it passes through a point representing the codon usage of a set of genes with expression-related bias. A gene whose codon usage matches (does not significantly differ from) a point on this axis is a candidate native gene, and the location of its projection onto the axis provides a general estimate of its expression level. A gene that differs significantly from all points on the axis is a candidate foreign gene. This automated approach offers significant improvements over existing methods. We illustrate this by analyzing the genomes of Pseudomonas aeruginosa PAO1 and Bacillus anthracis A0248, which can be difficult to analyze with commonly used methods due to their biased base compositions. Finally, we use this approach to measure the proportion of candidate foreign genes in 923 bacterial and archaeal genomes. The organisms with the most homogeneous genomes (containing the fewest candidate foreign genes) are mostly endosymbionts and parasites, though with exceptions that include Pelagibacter ubique and Beutenbergia cavernae. The organisms with the most heterogeneous genomes (containing the most candidate foreign genes) include members of the genera Bacteroides, Corynebacterium, Desulfotalea, Neisseria, Xylella, and Thermobaculum.
Collapse
Affiliation(s)
- James J Davis
- Department of Microbiology, University of Illinois at Urbana-Champaign
| | | |
Collapse
|
13
|
Codon Usage Patterns in Corynebacterium glutamicum: Mutational Bias, Natural Selection and Amino Acid Conservation. Comp Funct Genomics 2010; 2010:343569. [PMID: 20445740 PMCID: PMC2860111 DOI: 10.1155/2010/343569] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2009] [Revised: 01/29/2010] [Accepted: 02/04/2010] [Indexed: 11/17/2022] Open
Abstract
The alternative synonymous codons in Corynebacterium glutamicum, a well-known bacterium used in industry for the production of amino acid, have been investigated by multivariate analysis. As C. glutamicum is a GC-rich organism, G and C are expected to predominate at the third position of codons. Indeed, overall codon usage analyses have indicated that C and/or G ending codons are predominant in this organism. Through multivariate statistical analysis, apart from mutational selection, we identified three other trends of codon usage variation among the genes. Firstly, the majority of highly expressed genes are scattered towards the positive end of the first axis, whereas the majority of lowly expressed genes are clustered towards the other end of the first axis. Furthermore, the distinct difference in the two sets of genes was that the C ending codons are predominate in putatively highly expressed genes, suggesting that the C ending codons are translationally optimal in this organism. Secondly, the majority of the putatively highly expressed genes have a tendency to locate on the leading strand, which indicates that replicational and transciptional selection might be invoked. Thirdly, highly expressed genes are more conserved than lowly expressed genes by synonymous and nonsynonymous substitutions among orthologous genes fromthe genomes of C. glutamicum and C. diphtheriae. We also analyzed other factors such as the length of genes and hydrophobicity that might influence codon usage and found their contributions to be weak.
Collapse
|
14
|
Affiliation(s)
- Ruth Hershberg
- Department of Biological Sciences, Stanford University, Stanford, California 94305;
| | - Dmitri A. Petrov
- Department of Biological Sciences, Stanford University, Stanford, California 94305;
| |
Collapse
|
15
|
Suzuki H, Brown CJ, Forney LJ, Top EM. Comparison of correspondence analysis methods for synonymous codon usage in bacteria. DNA Res 2008; 15:357-65. [PMID: 18940873 PMCID: PMC2608848 DOI: 10.1093/dnares/dsn028] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Synonymous codon usage varies both between organisms and among genes within a genome, and arises due to differences in G + C content, replication strand skew, or gene expression levels. Correspondence analysis (CA) is widely used to identify major sources of variation in synonymous codon usage among genes and provides a way to identify horizontally transferred or highly expressed genes. Four methods of CA have been developed based on three kinds of input data: absolute codon frequency, relative codon frequency, and relative synonymous codon usage (RSCU) as well as within-group CA (WCA). Although different CA methods have been used in the past, no comprehensive comparative study has been performed to evaluate their effectiveness. Here, the four CA methods were evaluated by applying them to 241 bacterial genome sequences. The results indicate that WCA is more effective than the other three methods in generating axes that reflect variations in synonymous codon usage. Furthermore, WCA reveals sources that were previously unnoticed in some genomes; e.g. synonymous codon usage related to replication strand skew was detected in Rickettsia prowazekii. Though CA based on RSCU is widely used, our evaluation indicates that this method does not perform as well as WCA.
Collapse
Affiliation(s)
- Haruo Suzuki
- Department of Biological Sciences and Initiative for Bioinformatics and Evolutionary Studies, University of Idaho, PO Box 443051, Moscow, Idaho 83844-3051, USA.
| | | | | | | |
Collapse
|
16
|
Fuglsang A. Impact of bias discrepancy and amino acid usage on estimates of the effective number of codons used in a gene, and a test for selection on codon usage. Gene 2007; 410:82-8. [PMID: 18248919 DOI: 10.1016/j.gene.2007.12.001] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2007] [Revised: 10/22/2007] [Accepted: 12/03/2007] [Indexed: 11/26/2022]
Abstract
The effective number of codons (Nc) used in a gene is one of the most commonly used measures of synonymous codon usage bias, owing much of its popularity to the fact that it is species independent and that simulation studies have shown that it is less dependent of gene length than other measures. In this paper I provide a clear and practically meaningful definition of bias discrepancy (BD; when the degree of codon bias varies within a degeneracy class). Moreover I evaluate the impact of BD and amino acid usage on estimates of Nc. It is shown that both factors have a significant effect on accuracy and precision. Both amino acid usage and BD influence accuracy considerably, especially in short genes. Finally, I demonstrate how the definition of bias discrepancy can be applied to investigate if codon usage is influenced by selection and I discuss this test in relation to the incongruous literature that exists for Buchnera sp. APS and Borrelia burgdorferi.
Collapse
Affiliation(s)
- Anders Fuglsang
- University of Copenhagen, Faculty of Pharmaceutical Sciences, 2 Universitetsparken, Copenhagen O, Denmark.
| |
Collapse
|
17
|
Charles H, Calevro F, Vinuelas J, Fayard JM, Rahbe Y. Codon usage bias and tRNA over-expression in Buchnera aphidicola after aromatic amino acid nutritional stress on its host Acyrthosiphon pisum. Nucleic Acids Res 2006; 34:4583-92. [PMID: 16963497 PMCID: PMC1636365 DOI: 10.1093/nar/gkl597] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Codon usage bias and relative abundances of tRNA isoacceptors were analysed in the obligate intracellular symbiotic bacterium, Buchnera aphidicola from the aphid Acyrthosiphon pisum, using a dedicated 35mer oligonucleotide microarray. Buchnera is archetypal of organisms living with minimal metabolic requirements and presents a reduced genome with high-evolutionary rate. Codonusage in Buchnera has been overcome by the high mutational bias towards AT bases. However, several lines of evidence for codon usage selection are given here. A significant correlation was found between tRNA relative abundances and codon composition of Buchnera genes. A significant codon usage bias was found for the choice of rare codons in Buchnera: C-ending codons are preferred in highly expressed genes, whereas G-ending codons are avoided. This bias is not explained by GC skew in the bacteria and might correspond to a selection for perfect matching between codon-anticodon pairs for some essential amino acids in Buchnera proteins. Nutritional stress applied to the aphid host induced a significant overexpression of most of the tRNA isoacceptors in bacteria. Although, molecular regulation of the tRNA operons in Buchnera was not investigated, a correlation between relative expression levels and organization in transcription unit was found in the genome of Buchnera.
Collapse
Affiliation(s)
- Hubert Charles
- Laboratoire de Biologie Fonctionnelle Insectes et Interactions, UMR INRA/INSA de Lyon, 203 Bâtiment Louis Pasteur, 69621 Villeurbanne Cedex, France.
| | | | | | | | | |
Collapse
|
18
|
Sällström B, Arnaout RA, Davids W, Bjelkmar P, Andersson SGE. Protein evolutionary rates correlate with expression independently of synonymous substitutions in Helicobacter pylori. J Mol Evol 2006; 62:600-14. [PMID: 16586017 DOI: 10.1007/s00239-005-0104-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2005] [Accepted: 12/20/2005] [Indexed: 11/29/2022]
Abstract
In free-living microorganisms, such as Escherichia coli and Saccharomyces cerevisiae, both synonymous and nonsynonymous substitution frequencies correlate with expression levels. Here, we have tested the hypothesis that the correlation between amino acid substitution rates and expression is a by-product of selection for codon bias and translational efficiency in highly expressed genes. To this end, we have examined the correlation between protein evolutionary rates and expression in the human gastric pathogen Helicobacter pylori, where the absence of selection on synonymous sites enables the two types of substitutions to be uncoupled. The results revealed a statistically significant negative correlation between expression levels and nonsynonymous substitutions in both H. pylori and E. coli. We also found that neighboring genes located on the same, but not on opposite strands, evolve at significantly more similar rates than random gene pairs, as expected by co-expression of genes located in the same operon. However, the two species differ in that synonymous substitutions show a strand-specific pattern in E. coli, whereas the weak similarity in synonymous substitutions for neighbors in H. pylori is independent of gene orientation. These results suggest a direct influence of expression levels on nonsynonymous substitution frequencies independent of codon bias and selective constraints on synonymous sites.
Collapse
Affiliation(s)
- Björn Sällström
- Program of Molecular Evolution, Department of Evolution, Genomics and Systematics, Evolutionary Biology Center, Uppsala University, 752 36 Uppsala, Sweden
| | | | | | | | | |
Collapse
|
19
|
Das S, Paul S, Dutta C. Evolutionary constraints on codon and amino acid usage in two strains of human pathogenic actinobacteria Tropheryma whipplei. J Mol Evol 2006; 62:645-58. [PMID: 16557339 DOI: 10.1007/s00239-005-0164-6] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2005] [Accepted: 12/20/2005] [Indexed: 12/13/2022]
Abstract
The factors governing codon and amino acid usages in the predicted protein-coding sequences of Tropheryma whipplei TW08/27 and Twist genomes have been analyzed. Multivariate analysis identifies the replicational-transcriptional selection coupled with DNA strand-specific asymmetric mutational bias as a major driving force behind the significant interstrand variations in synonymous codon usage patterns in T. whipplei genes, while a residual intrastrand synonymous codon bias is imparted by a selection force operating at the level of translation. The strand-specific mutational pressure has little influence on the amino acid usage, for which the mean hydropathy level and aromaticity are the major sources of variation, both having nearly equal impact. In spite of the intracellular lifestyle, the amino acid usage in highly expressed gene products of T. whipplei follows the cost-minimization hypothesis. The products of the highly expressed genes of these relatively A + T-rich actinobacteria prefer to use the residues encoded by GC-rich codons, probably due to greater conservation of a GC-rich ancestral state in the highly expressed genes, as suggested by the lower values of the rate of nonsynonymous divergences between orthologous sequences of highly expressed genes from the two strains of T. whipplei. Both the genomes under study are characterized by the presence of two distinct groups of membrane-associated genes, products of which exhibit significant differences in primary and potential secondary structures as well as in the propensity of protein disorder.
Collapse
Affiliation(s)
- Sabyasachi Das
- Bioinformatics Centre, Indian Institute of Chemical Biology, 4 Raja S. C. Mullick Road, Kolkata 700 032, India
| | | | | |
Collapse
|
20
|
Ngwamidiba M, Blanc G, Raoult D, Fournier PE. Sca1, a previously undescribed paralog from autotransporter protein-encoding genes in Rickettsia species. BMC Microbiol 2006; 6:12. [PMID: 16504018 PMCID: PMC1388218 DOI: 10.1186/1471-2180-6-12] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2005] [Accepted: 02/20/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Among the 17 genes encoding autotransporter proteins of the "surface cell antigen" (sca) family in the currently sequenced Rickettsia genomes, ompA, sca5 (ompB) and sca4 (gene D), have been extensively used for identification and phylogenetic purposes for Rickettsia species. However, none of these genes is present in all 20 currently validated Rickettsia species. Of the remaining 14 sca genes, sca1 is the only gene to be present in all nine sequenced Rickettsia genomes. To estimate whether the sca1 gene is present in all Rickettsia species and its usefulness as an identification and phylogenetic tool, we searched for sca1genes in the four published Rickettsia genomes and amplified and sequenced this gene in the remaining 16 validated Rickettsia species. RESULTS Sca1 is the only one of the 17 rickettsial sca genes present in all 20 Rickettsia species. R. prowazekii and R. canadensis exhibit a split sca1 gene whereas the remaining species have a complete gene. Within the sca1 gene, we identified a 488-bp variable sequence fragment that can be amplified using a pair of conserved primers. Sequences of this fragment are specific for each Rickettsia species. The phylogenetic organization of Rickettsia species inferred from the comparison of sca1 sequences strengthens the classification based on the housekeeping gene gltA and is similar to those obtained from the analyses of ompA, sca5 and sca4, thus suggesting similar evolutionary constraints. We also observed that Sca1 protein sequences have evolved under a dual selection pressure: with the exception of typhus group rickettsiae, the amino-terminal part of the protein that encompasses the predicted passenger domain, has evolved under positive selection in rickettsiae. This suggests that the Sca1 protein interacts with the host. In contrast, the C-terminal portion containing the autotransporter domain has evolved under purifying selection. In addition, sca1 is transcribed in R. conorii, and might therefore be functional in this species. CONCLUSION The sca1 gene, encoding an autotransporter protein that evolves under dual evolution pressure, is the only sca-family gene to be conserved by all Rickettsia species. As such, it is a valuable identification target for these bacteria, especially because rickettsial isolates can be identified by amplification and sequencing of a discriminatory gene fragment using a single primer pair. It may also be used as a phylogenetic tool. However, its current functional status remains to be determined although it was found expressed in R. conorii.
Collapse
Affiliation(s)
- Maxime Ngwamidiba
- Unité des rickettsies, IFR 48, CNRS UMR 6020, Faculté de médecine, Université de la Méditerranée, 27 Boulevard Jean Moulin, 13385 Marseille cedex 05, France
| | - Guillaume Blanc
- Information Génomique et Structurale, UPR 2589, 31, Chemin Joseph Aiguier, 13402 Marseille Cedex 20, France
| | - Didier Raoult
- Unité des rickettsies, IFR 48, CNRS UMR 6020, Faculté de médecine, Université de la Méditerranée, 27 Boulevard Jean Moulin, 13385 Marseille cedex 05, France
| | - Pierre-Edouard Fournier
- Unité des rickettsies, IFR 48, CNRS UMR 6020, Faculté de médecine, Université de la Méditerranée, 27 Boulevard Jean Moulin, 13385 Marseille cedex 05, France
| |
Collapse
|
21
|
Klasson L, Andersson SGE. Strong asymmetric mutation bias in endosymbiont genomes coincide with loss of genes for replication restart pathways. Mol Biol Evol 2006; 23:1031-9. [PMID: 16476690 DOI: 10.1093/molbev/msj107] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
A large majority of bacterial genomes show strand asymmetry, such that G and T preferentially accumulate on the leading strand. The mechanisms are unknown, but cytosine deaminations are thought to play an important role. Here, we have examined DNA strand asymmetry in three strains of the aphid endosymbiont Buchnera aphidicola. These are phylogenetically related, have similar genomic GC contents, and conserved gene order structures, yet B. aphidicola (Bp) shows a fourfold higher replication-induced strand bias than B. aphidicola (Sg) and (Ap). We rule out an increase in the overall substitution frequency as the major cause of the stronger strand bias in B. aphidicola (Bp). Instead, the results suggest that the higher GC skew in this species is caused by a different spectrum of mutations, including a relatively higher frequency of C to T mutations on the leading strand and/or of G to A mutations on the lagging strand. A comparative analysis of 20 gamma-proteobacterial genomes revealed that endosymbiont genomes lacking recA and other genes involved in replication restart processes, such as priA, which codes for primosomal helicase PriA, displayed the strongest strand bias. We hypothesize that cytosine deaminations accumulate during single-strand exposure at arrested replication forks and that inefficient restart mechanisms may lead to high DNA strand asymmetry in bacterial genomes.
Collapse
Affiliation(s)
- Lisa Klasson
- Department of Molecular Evolution, Evolutionary Biology Center, Uppsala University, Uppsala, Sweden
| | | |
Collapse
|
22
|
Zhou T, Sun X, Lu Z. Synonymous codon usage in environmental chlamydia UWE25 reflects an evolutional divergence from pathogenic chlamydiae. Gene 2005; 368:117-25. [PMID: 16380221 DOI: 10.1016/j.gene.2005.10.035] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2005] [Revised: 09/22/2005] [Accepted: 10/27/2005] [Indexed: 10/25/2022]
Abstract
Publication of the complete genome sequence for the Acanthamoeba sp. endosymbiont UWE25 has illuminated the evolution history of chlamydiae. In this study, the codon usage bias in UWE25 and five other species of pathogenic chlamydiae was calculated. It was found that genomic composition constraints are the major source of codon usage variation in UWE25. This result is different from the former observation in pathogenic chlamydiae, whose genomic base composition is more unbiased. Four other factors, such as strand-specific mutational bias, natural selection acting at the level of translation, hydropathy level of each protein and the conservation level of amino acids also have influence in shaping the codon usage in these six species to some extent. Further analysis suggests that the high stability of the UWE25 genome partially account for the difference in codon usage pattern between environmental and pathogenic chlamydiae. Moreover, our results imply that the replicational selection pressure in pathogenic chlamydiae is stronger than that in UWE25. Analyzing the codon usage pattern in the environmental chlamydia and comparing it with that of the pathogenic chlamydiae may provide clues how the chlamydiae have evolved from their common ancestor.
Collapse
Affiliation(s)
- Tong Zhou
- State Key Laboratory of Bioelectronics, Southeast University, 210096 China.
| | | | | |
Collapse
|
23
|
Das S, Ghosh S, Pan A, Dutta C. Compositional variation in bacterial genes and proteins with potential expression level. FEBS Lett 2005; 579:5205-10. [PMID: 16165133 DOI: 10.1016/j.febslet.2005.08.042] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2005] [Accepted: 08/22/2005] [Indexed: 11/22/2022]
Abstract
Usage of guanine and cytosine at three codon sites in eubacterial genes vary distinctly with potential expressivity, as predicted by Codon Adaptation Index (CAI). In bacteria with moderate/high GC-content, G(3) follows a biphasic relationship, while C(3) increases with CAI. In AT-rich bacteria, correlation of CAI is negative with G(3), but non-specific with C(3). Correlations of CAI with residues encoded by G-starting codons are positive, while with those by C-starting codons are usually negative/random. Average Size/Complexity Score and aromaticity of gene-products decrease with CAI, confirming general validity of cost-minimization principle in free-living eubacteria. Alcoholicity of bacterial gene-products usually decreases with expressivity.
Collapse
Affiliation(s)
- Sabyasachi Das
- Bioinformatics Center, Indian Institute of Chemical Biology, 4, Raja S.C. Mullick Road, Kolkata 700 032, India
| | | | | | | |
Collapse
|
24
|
Baldridge GD, Burkhardt N, Herron MJ, Kurtti TJ, Munderloh UG. Analysis of fluorescent protein expression in transformants of Rickettsia monacensis, an obligate intracellular tick symbiont. Appl Environ Microbiol 2005; 71:2095-105. [PMID: 15812043 PMCID: PMC1082560 DOI: 10.1128/aem.71.4.2095-2105.2005] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
We developed and applied transposon-based transformation vectors for molecular manipulation and analysis of spotted fever group rickettsiae, which are obligate intracellular bacteria that infect ticks and, in some cases, mammals. Using the Epicentre EZ::TN transposon system, we designed transposons for simultaneous expression of a reporter gene and a chloramphenicol acetyltransferase (CAT) resistance marker. Transposomes (transposon-transposase complexes) were electroporated into Rickettsia monacensis, a rickettsial symbiont isolated from the tick Ixodes ricinus. Each transposon contained an expression cassette consisting of the rickettsial ompA promoter and a green fluorescent protein (GFP) reporter gene (GFPuv) or the ompB promoter and a red fluorescent protein reporter gene (DsRed2), followed by the ompA transcription terminator and a second ompA promoter CAT gene cassette. Selection with chloramphenicol gave rise to rickettsial populations with chromosomally integrated single-copy transposons as determined by PCR, Southern blotting, and sequence analysis. Reverse transcription-PCR and Northern blots demonstrated transcription of all three genes. GFPuv transformant rickettsiae exhibited strong fluorescence in individual cells, but DsRed2 transformants did not. Western blots confirmed expression of GFPuv in R. monacensis and in Escherichia coli, but DsRed2 was expressed only in E. coli. The DsRed2 gene, but not the GFPuv gene, contains many GC-rich amino acid codons that are rare in the preferred codon suite of rickettsiae, possibly explaining the failure to express DsRed2 protein in R. monacensis. We demonstrated that our vectors provide a means to study rickettsia-host cell interactions by visualizing GFPuv-fluorescent R. monacensis associated with actin tails in tick host cells.
Collapse
Affiliation(s)
- Gerald D Baldridge
- Department of Entomology, University of Minnesota, 1980 Folwell Ave., St. Paul, MN 55108, USA.
| | | | | | | | | |
Collapse
|
25
|
Blanc G, Ngwamidiba M, Ogata H, Fournier PE, Claverie JM, Raoult D. Molecular evolution of rickettsia surface antigens: evidence of positive selection. Mol Biol Evol 2005; 22:2073-83. [PMID: 15972845 DOI: 10.1093/molbev/msi199] [Citation(s) in RCA: 93] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The Rickettsia genus is a group of obligate intracellular parasitic alpha-proteobacteria that includes human pathogens responsible for the typhus disease and various types of spotted fevers. rOmpA and rOmpB are two members of the "surface cell antigen" (Sca) autotransporter (AT) protein family that may play key roles in the adhesion of the Rickettsia cells to the host tissue. These molecules are likely determinants for the pathogenicity of the Rickettsia and represent good candidates for vaccine development. We identified the 17 members of this family of outer-membrane proteins in nine fully sequenced Rickettsia genomes. The typical architecture of the Sca proteins is composed of an N-terminal signal peptide and a C-terminal AT domain that promote the export of the central passenger domain to the outside of the bacteria. A characteristic of this family is the frequent degradation of the genes, which results in different subsets of the sca genes being expressed among Rickettsia species. Here, we present a detailed analysis of their phylogenetic relationships and evolution. We provide strong evidence that rOmpA and rOmpB as well as three other members of the Sca protein family--Sca1, Sca2, and Sca4--have evolved under positive selection. The exclusive distribution of the predicted positively selected sites within the passenger domains of these proteins argues that these regions are involved in the interaction with the host and may be locked in "arms race" coevolutionary conflicts.
Collapse
Affiliation(s)
- Guillaume Blanc
- Information Génomique et Structurale, UPR 2589, 31 Chemin Joseph Aiguier, 13402 Marseille Cedex 20, France.
| | | | | | | | | | | |
Collapse
|
26
|
Simmons MP, Carr TG, O'Neill K. Relative character-state space, amount of potential phylogenetic information, and heterogeneity of nucleotide and amino acid characters. Mol Phylogenet Evol 2005; 32:913-26. [PMID: 15288066 DOI: 10.1016/j.ympev.2004.04.011] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2003] [Revised: 03/10/2004] [Indexed: 11/16/2022]
Abstract
We examined a broad selection of protein-coding loci from a diverse array of clades and genomes to quantify three factors that determine whether nucleotide or amino acid characters should be preferred for phylogenetic inference. First, we quantified the difference in observed character-state space between nucleotides and amino acids. Second, we quantified the loss of potential phylogenetic signal from silent substitutions when amino acids are used. Third, we used the disparity index to quantify the relative compositional heterogeneity of nucleotides and amino acids and then determined how commonly convergent (rather than unique) shifts in nucleotide and amino acid composition occur in a phylogenetic context. The greater potential phylogenetic signal for nucleotide characters was found to be enormous (on average 440% that of amino acids), whereas the greater observed character-state space for amino acids was less impressive (on average 150.4% that of nucleotides). While matrices of amino acid sequences had less compositional heterogeneity than their corresponding nucleotide sequences, heterogeneity in amino acid composition may be more homoplasious than heterogeneity in nucleotide composition. Given the ability of increased taxon sampling to better utilize the greater potential phylogenetic signal of nucleotide characters and decrease the potential for artifacts caused by heterogeneous nucleotide composition among taxa, we suggest that increased taxon sampling be performed whenever possible instead of restricting analyses to amino acid characters.
Collapse
Affiliation(s)
- Mark P Simmons
- Department of Biology, Colorado State University, Fort Collins, CO 80523, USA.
| | | | | |
Collapse
|
27
|
Sharp PM, Bailes E, Grocock RJ, Peden JF, Sockett RE. Variation in the strength of selected codon usage bias among bacteria. Nucleic Acids Res 2005; 33:1141-53. [PMID: 15728743 PMCID: PMC549432 DOI: 10.1093/nar/gki242] [Citation(s) in RCA: 299] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2004] [Revised: 01/10/2005] [Accepted: 01/23/2005] [Indexed: 12/21/2022] Open
Abstract
Among bacteria, many species have synonymous codon usage patterns that have been influenced by natural selection for those codons that are translated more accurately and/or efficiently. However, in other species selection appears to have been ineffective. Here, we introduce a population genetics-based model for quantifying the extent to which selection has been effective. The approach is applied to 80 phylogenetically diverse bacterial species for which whole genome sequences are available. The strength of selected codon usage bias, S, is found to vary substantially among species; in 30% of the genomes examined, there was no significant evidence that selection had been effective. Values of S are highly positively correlated with both the number of rRNA operons and the number of tRNA genes. These results are consistent with the hypothesis that species exposed to selection for rapid growth have more rRNA operons, more tRNA genes and more strongly selected codon usage bias. For example, Clostridium perfringens, the species with the highest value of S, can have a generation time as short as 7 min.
Collapse
Affiliation(s)
- Paul M Sharp
- Institute of Genetics, University of Nottingham, Queens Medical Centre, Nottingham NG7 2UH, UK.
| | | | | | | | | |
Collapse
|
28
|
Lithwick G, Margalit H. Relative predicted protein levels of functionally associated proteins are conserved across organisms. Nucleic Acids Res 2005; 33:1051-7. [PMID: 15718304 PMCID: PMC549420 DOI: 10.1093/nar/gki261] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We show that the predicted protein levels of functionally related proteins change in a coordinated fashion over many unicellular organisms. For each protein, we created a profile containing a protein abundance measure in each of a set of organisms. We show that for functionally related proteins these profiles tend to be correlated. Using the Codon Adaptation Index as a predictor of protein abundance in 48 unicellular organisms, we demonstrated this phenomenon for two types of functional relations: for proteins that physically interact and for proteins involved in consecutive steps within a metabolic pathway. Our results suggest that the protein abundance levels of functionally related proteins co-evolve.
Collapse
Affiliation(s)
| | - Hanah Margalit
- To whom correspondence should be addressed. Tel: +972 2 6758614; Fax: +972 2 6757308;
| |
Collapse
|
29
|
Dufresne A, Garczarek L, Partensky F. Accelerated evolution associated with genome reduction in a free-living prokaryote. Genome Biol 2005; 6:R14. [PMID: 15693943 PMCID: PMC551534 DOI: 10.1186/gb-2005-6-2-r14] [Citation(s) in RCA: 251] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2004] [Revised: 12/02/2004] [Accepted: 12/07/2004] [Indexed: 01/15/2023] Open
Abstract
Prochlorococcus sp. are marine bacteria with very small genomes. The mechanisms by which these reduced genomes have evolved appears, however, to be distinct from those that have led to small genome size in intracellular bacteria. Background Three complete genomes of Prochlorococcus species, the smallest and most abundant photosynthetic organism in the ocean, have recently been published. Comparative genome analyses reveal that genome shrinkage has occurred within this genus, associated with a sharp reduction in G+C content. As all examples of genome reduction characterized so far have been restricted to endosymbionts or pathogens, with a host-dependent lifestyle, the observed genome reduction in Prochlorococcus is the first documented example of such a process in a free-living organism. Results Our results clearly indicate that genome reduction has been accompanied by an increased rate of protein evolution in P. marinus SS120 that is even more pronounced in P. marinus MED4. This acceleration has affected every functional category of protein-coding genes. In contrast, the 16S rRNA gene seems to have evolved clock-like in this genus. We observed that MED4 and SS120 have lost several DNA-repair genes, the absence of which could be related to the mutational bias and the acceleration of amino-acid substitution. Conclusions We have examined the evolutionary mechanisms involved in this process, which are different from those known from host-dependent organisms. Indeed, most substitutions that have occurred in Prochlorococcus have to be selectively neutral, as the large size of populations imposes low genetic drift and strong purifying selection. We assume that the major driving force behind genome reduction within the Prochlorococcus radiation has been a selective process favoring the adaptation of this organism to its environment. A scenario is proposed for genome evolution in this genus.
Collapse
Affiliation(s)
- Alexis Dufresne
- Station Biologique, UMR 7127 CNRS et Université Paris 6, BP74, 29682 Roscoff Cedex, France
| | - Laurence Garczarek
- Station Biologique, UMR 7127 CNRS et Université Paris 6, BP74, 29682 Roscoff Cedex, France
| | - Frédéric Partensky
- Station Biologique, UMR 7127 CNRS et Université Paris 6, BP74, 29682 Roscoff Cedex, France
| |
Collapse
|
30
|
Dethlefsen L, Schmidt TM. Differences in codon bias cannot explain differences in translational power among microbes. BMC Bioinformatics 2005; 6:3. [PMID: 15636642 PMCID: PMC546186 DOI: 10.1186/1471-2105-6-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2004] [Accepted: 01/06/2005] [Indexed: 11/15/2022] Open
Abstract
Background Translational power is the cellular rate of protein synthesis normalized to the biomass invested in translational machinery. Published data suggest a previously unrecognized pattern: translational power is higher among rapidly growing microbes, and lower among slowly growing microbes. One factor known to affect translational power is biased use of synonymous codons. The correlation within an organism between expression level and degree of codon bias among genes of Escherichia coli and other bacteria capable of rapid growth is commonly attributed to selection for high translational power. Conversely, the absence of such a correlation in some slowly growing microbes has been interpreted as the absence of selection for translational power. Because codon bias caused by translational selection varies between rapidly growing and slowly growing microbes, we investigated whether observed differences in translational power among microbes could be explained entirely by differences in the degree of codon bias. Although the data are not available to estimate the effect of codon bias in other species, we developed an empirically-based mathematical model to compare the translation rate of E. coli to the translation rate of a hypothetical strain which differs from E. coli only by lacking codon bias. Results Our reanalysis of data from the scientific literature suggests that translational power can differ by a factor of 5 or more between E. coli and slowly growing microbial species. Using empirical codon-specific in vivo translation rates for 29 codons, and several scenarios for extrapolating from these data to estimates over all codons, we find that codon bias cannot account for more than a doubling of the translation rate in E. coli, even with unrealistic simplifying assumptions that exaggerate the effect of codon bias. With more realistic assumptions, our best estimate is that codon bias accelerates translation in E. coli by no more than 60% in comparison to microbes with very little codon bias. Conclusions While codon bias confers a substantial benefit of faster translation and hence greater translational power, the magnitude of this effect is insufficient to explain observed differences in translational power among bacterial and archaeal species, particularly the differences between slowly growing and rapidly growing species. Hence, large differences in translational power suggest that the translational apparatus itself differs among microbes in ways that influence translational performance.
Collapse
Affiliation(s)
- Les Dethlefsen
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan 48824, USA
- Department of Microbiology and Immunology, Stanford University, Palo Alto, California 94304, USA
| | - Thomas M Schmidt
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, Michigan 48824, USA
| |
Collapse
|
31
|
Carbone A, Képès F, Zinovyev A. Codon bias signatures, organization of microorganisms in codon space, and lifestyle. Mol Biol Evol 2004; 22:547-61. [PMID: 15537809 DOI: 10.1093/molbev/msi040] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
New and simple numerical criteria based on a codon adaptation index are applied to the complete genomic sequences of 80 Eubacteria and 16 Archaea, to infer weak and strong genome tendencies toward content bias, translational bias, and strand bias. These criteria can be applied to all microbial genomes, even those for which little biological information is known, and a codon bias signature, that is the collection of strong biases displayed by a genome, can be automatically derived. A codon bias space, where genomes are identified by their preferred codons, is proposed as a novel formal framework to interpret genomic relationships. Principal component analysis confirms that although GC content has a dominant effect on codon bias space, thermophilic and mesophilic species can be identified and separated by codon preferences. Two more examples concerning lifestyle are studied with linear discriminant analysis: suitable separating functions characterized by sets of preferred codons are provided to discriminate: translationally biased (hyper)thermophiles from mesophiles, and organisms with different respiratory characteristics, aerobic, anaerobic, facultative aerobic and facultative anaerobic. These results suggest that codon bias space might reflect the geometry of a prokaryotic "physiology space." Evolutionary perspectives are noted, numerical criteria and distances among organisms are validated on known cases, and various results and predictions are discussed both on methodological and biological grounds.
Collapse
Affiliation(s)
- A Carbone
- Génomique Analytique, Université Pierre et Marie Curie, INSERM U511, 91, Bd de l'Hôpital, 75013 Paris, France.
| | | | | |
Collapse
|
32
|
Herbeck JT, Wall DP, Wernegreen JJ. Gene expression level influences amino acid usage, but not codon usage, in the tsetse fly endosymbiont Wigglesworthia. MICROBIOLOGY (READING, ENGLAND) 2003; 149:2585-2596. [PMID: 12949182 DOI: 10.1099/mic.0.26381-0] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Wigglesworthia glossinidia brevipalpis, the obligate bacterial endosymbiont of the tsetse fly Glossina brevipalpis, is characterized by extreme genome reduction and AT nucleotide composition bias. Here, multivariate statistical analyses are used to test the hypothesis that mutational bias and genetic drift shape synonymous codon usage and amino acid usage of Wigglesworthia. The results show that synonymous codon usage patterns vary little across the genome and do not distinguish genes of putative high and low expression levels, thus indicating a lack of translational selection. Extreme AT composition bias across the genome also drives relative amino acid usage, but predicted high-expression genes (ribosomal proteins and chaperonins) use GC-rich amino acids more frequently than do low-expression genes. The levels and configuration of amino acid differences between Wigglesworthia and Escherichia coli were compared to test the hypothesis that the relatively GC-rich amino acid profiles of high-expression genes reflect greater amino acid conservation at these loci. This hypothesis is supported by reduced levels of protein divergence at predicted high-expression Wigglesworthia genes and similar configurations of amino acid changes across expression categories. Combined, the results suggest that codon and amino acid usage in the Wigglesworthia genome reflect a strong AT mutational bias and elevated levels of genetic drift, consistent with expected effects of an endosymbiotic lifestyle and repeated population bottlenecks. However, these impacts of mutation and drift are apparently attenuated by selection on amino acid composition at high-expression genes.
Collapse
Affiliation(s)
- Joshua T Herbeck
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA
| | - Dennis P Wall
- Department of Biological Sciences, Stanford University, Stanford, CA 94305, USA
| | - Jennifer J Wernegreen
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, 7 MBL Street, Woods Hole, MA 02543, USA
| |
Collapse
|
33
|
Palacios C, Wernegreen JJ. A strong effect of AT mutational bias on amino acid usage in Buchnera is mitigated at high-expression genes. Mol Biol Evol 2002; 19:1575-84. [PMID: 12200484 DOI: 10.1093/oxfordjournals.molbev.a004219] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The advent of full genome sequences provides exceptionally rich data sets to explore molecular and evolutionary mechanisms that shape divergence among and within genomes. In this study, we use multivariate analysis to determine the processes driving genome-wide patterns of amino usage in the obligate endosymbiont Buchnera and its close free-living relative Escherichia coli. In the AT-rich Buchnera genome, the primary source of variation in amino acid usage differentiates high- and low-expression genes. Amino acids of high-expression Buchnera genes are generally less aromatic and use relatively GC-rich codons, suggesting that selection against aromatic amino acids and against amino acids with AT-rich codons is stronger in high-expression genes. Selection to maintain hydrophobic amino acids in integral membrane proteins is a primary factor driving protein evolution in E. coli but is a secondary factor in Buchnera. In E. coli, gene expression is a secondary force driving amino acid usage, and a correlation with tRNA abundance suggests that translational selection contributes to this effect. Although this and previous studies demonstrate that AT mutational bias and genetic drift influence amino acid usage in Buchnera, this genome-wide analysis argues that selection is sufficient to affect the amino acid content of proteins with different expression and hydropathy levels.
Collapse
Affiliation(s)
- Carmen Palacios
- Josephine Bay Paul Center for Comparative Molecular Biology and Evolution, Marine Biological Laboratory, Woods Hole, Massachusetts 02543, USA
| | | |
Collapse
|
34
|
Amiri H, Alsmark CM, Andersson SGE. Proliferation and deterioration of Rickettsia palindromic elements. Mol Biol Evol 2002; 19:1234-43. [PMID: 12140235 DOI: 10.1093/oxfordjournals.molbev.a004184] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
It has been suggested that Rickettsia Palindromic Elements (RPEs) have evolved as selfish DNA that mediate protein sequence evolution by being targeted to genes that code for RNA and proteins. Here, we have examined the phylogenetic depth of two RPEs that are located close to the genes encoding elongation factors Tu (tuf) and G (fus) in Rickettsia. An exceptional organization of the elongation factor genes was found in all 11 species examined, with complete or partial RPEs identified downstream of the tuf gene (RPE-tuf) in six species and of the fus gene (RPE-fus) in 10 species. A phylogenetic reconstruction shows that both RPE-tuf and RPE-fus have evolved in a manner that is consistent with the expected species divergence. The analysis provides evidence for independent loss of RPE-tuf in several species, possibly mediated by short repetitive sequences flanking the site of excision. The remaining RPE-tuf sequences evolve as neutral sequences in different stages of deterioration. Likewise, highly fragmented remnants of the RPE-fus sequence were identified in two species. This suggests that genome-specific differences in the content of RPEs are the result of recent loss rather than recent proliferation.
Collapse
Affiliation(s)
- Haleh Amiri
- Department of Molecular Evolution, University of Uppsala, Sweden
| | | | | |
Collapse
|
35
|
|
36
|
Zeeberg B. Shannon information theoretic computation of synonymous codon usage biases in coding regions of human and mouse genomes. Genome Res 2002; 12:944-55. [PMID: 12045147 PMCID: PMC1383734 DOI: 10.1101/gr.213402] [Citation(s) in RCA: 40] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2001] [Accepted: 03/06/2002] [Indexed: 11/24/2022]
Abstract
Exonic GC of human mRNA reference sequences (RefSeqs), as well as A, C, G, and T in codon position 3 are linearly correlated with genomic GC. These observations utilize information from the completed human genome sequence and a large, high-quality set of human and mouse coding sequences, and are in accord with similar determinations published by others. A Shannon Information Theoretic measure of bias in synonymous codon usage was developed. When applied to either human or mouse RefSeqs, this measure is nonlinearly correlated with genomic, exonic, and third codon position A, C, G, and T. Information values between orthologous mouse and human RefSeqs are linearly correlated: mouse = 0.092 + 0.55 human. Mouse genes were consistently placed in genomic regions whose GC content was closer to 50% than was the GC content of the human ortholog. Since the (nonlinear) information versus percent GC curve has a minimum at 50% GC and monotonically increases with increasing distance from 50% GC, this phenomenon directly results in the low slope of 0.55. This appears to be a manifestation of an evolutionary strategy for placement of genes in regions of the genome with a GC content that relates synonymous codon bias and protein folding.
Collapse
Affiliation(s)
- Barry Zeeberg
- Laboratory of Molecular Pharmacology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| |
Collapse
|
37
|
Likhoshvai VA, Matushkin YG. Differentiation of single-cell organisms according to elongation stages crucial for gene expression efficacy. FEBS Lett 2002; 516:87-92. [PMID: 11959109 DOI: 10.1016/s0014-5793(02)02507-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
We analyzed the interrelation between the efficiency of a gene expression and the nucleotide composition of all protein-coding sequences in 38 unicellular organisms whose complete genomic sequences are known. These organisms comprise 37 prokaryotic (29 eubacteria and eight archaebacteria) and one eukaryotic (yeast) species. We demonstrated that frequency analysis of gene codon composition fails to reflect adequately the gene expression efficiency of all these organisms. We constructed a measure, the elongation efficiency index, that considers simultaneously the information on codon frequencies and the degree of mRNA local self-complementarity. This measure recognizes the ribosome-coding genes as highly expressed in all the unicellular organisms studied. According to our analysis, these species fall into five groups differentiated by the process that makes the key contribution to the elongation rate.
Collapse
Affiliation(s)
- Vitali A Likhoshvai
- Laboratory of Molecular Evolution, Institute of Cytology and Genetics, Prospekt Lavrentieva 10, 630090, Novosibirsk, Russia.
| | | |
Collapse
|
38
|
Gupta SK, Ghosh TC. Gene expressivity is the main factor in dictating the codon usage variation among the genes in Pseudomonas aeruginosa. Gene 2001; 273:63-70. [PMID: 11483361 DOI: 10.1016/s0378-1119(01)00576-5] [Citation(s) in RCA: 89] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Codon usage biases of all DNA sequences (length greater than or equal to 300 bp) from the complete genome of Pseudomonas aeruginosa have been analyzed. As P. aeruginosa is a GC-rich organism, G and/or C are expected to predominate in their codons. Overall codon usage data analysis indicates that indeed codons ending in G and/or C are predominant in this organism. But multivariate statistical analysis indicates that there is a single major trend in the codon usage variation among the genes in this organism, which has a strong negative correlation with the expressivities of the genes. The majority of the lowly expressed genes are scattered towards the positive end of the major axis whereas the highly expressed genes are clustered towards the negative end. This is the first report where the prokaryotic organism having highly skewed base composition is dictated mainly by translational selection, though some other factors such as the lengths of the genes as well as the hydrophobicity of genes also influence the codon usage variation among the genes in this organism in a minor way.
Collapse
Affiliation(s)
- S K Gupta
- Distributed Information Centre, Bose Institute, P 1/12, C.I.T. Scheme, VII M, Calcutta 700 054, India
| | | |
Collapse
|
39
|
Naya H, Romero H, Carels N, Zavala A, Musto H. Translational selection shapes codon usage in the GC-rich genome of Chlamydomonas reinhardtii. FEBS Lett 2001; 501:127-30. [PMID: 11470270 DOI: 10.1016/s0014-5793(01)02644-8] [Citation(s) in RCA: 70] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
In unicellular species codon usage is determined by mutational biases and natural selection. Among prokaryotes, the influence of these factors is different if the genome is skewed towards AT or GC, since in AT-rich organisms translational selection is absent. On the other hand, in AT-rich unicellular eukaryotes the two factors are present. In order to understand if GC-rich genomes display a similar behavior, the case of Chlamydomonas reinhardtii was studied. Since we found that translational selection strongly influences codon usage in this species, we conclude that there is not a common pattern among unicellular organisms.
Collapse
Affiliation(s)
- H Naya
- Departamento de Biología Celular y Molecular, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay
| | | | | | | | | |
Collapse
|
40
|
Abstract
Studies of neutrally evolving sequences suggest that differences in eukaryotic genome sizes result from different rates of DNA loss. However, very few pseudogenes have been identified in microbial species, and the processes whereby genes and genomes deteriorate in bacteria remain largely unresolved. The typhus-causing agent, Rickettsia prowazekii, is exceptional in that as much as 24% of its 1.1-Mb genome consists of noncoding DNA and pseudogenes. To test the hypothesis that the noncoding DNA in the R. prowazekii genome represents degraded remnants of ancestral genes, we systematically examined all of the identified pseudogenes and their flanking sequences in three additional Rickettsia species. Consistent with the hypothesis, we observe sequence similarities between genes and pseudogenes in one species and intergenic DNA in another species. We show that the frequencies and average sizes of deletions are larger than insertions in neutrally evolving pseudogene sequences. Our results suggest that inactivated genetic material in the Rickettsia genomes deteriorates spontaneously due to a mutation bias for deletions and that the noncoding sequences represent DNA in the final stages of this degenerative process.
Collapse
Affiliation(s)
- J O Andersson
- Department of Molecular Evolution, University of Uppsala, Uppsala, Sweden
| | | |
Collapse
|
41
|
Abstract
The endosymbiotic theory for the origin of mitochondria requires substantial modification. The three identifiable ancestral sources to the proteome of mitochondria are proteins descended from the ancestral alpha-proteobacteria symbiont, proteins with no homology to bacterial orthologs, and diverse proteins with bacterial affinities not derived from alpha-proteobacteria. Random mutations in the form of deletions large and small seem to have eliminated nonessential genes from the endosymbiont-mitochondrial genome lineages. This process, together with the transfer of genes from the endosymbiont-mitochondrial genome to nuclei, has led to a marked reduction in the size of mitochondrial genomes. All proteins of bacterial descent that are encoded by nuclear genes were probably transferred by the same mechanism, involving the disintegration of mitochondria or bacteria by the intracellular membranous vacuoles of cells to release nucleic acid fragments that transform the nuclear genome. This ongoing process has intermittently introduced bacterial genes to nuclear genomes. The genomes of the last common ancestor of all organisms, in particular of mitochondria, encoded cytochrome oxidase homologues. There are no phylogenetic indications either in the mitochondrial proteome or in the nuclear genomes that the initial or subsequent function of the ancestor to the mitochondria was anaerobic. In contrast, there are indications that relatively advanced eukaryotes adapted to anaerobiosis by dismantling their mitochondria and refitting them as hydrogenosomes. Accordingly, a continuous history of aerobic respiration seems to have been the fate of most mitochondrial lineages. The initial phases of this history may have involved aerobic respiration by the symbiont functioning as a scavenger of toxic oxygen. The transition to mitochondria capable of active ATP export to the host cell seems to have required recruitment of eukaryotic ATP transport proteins from the nucleus. The identity of the ancestral host of the alpha-proteobacterial endosymbiont is unclear, but there is no indication that it was an autotroph. There are no indications of a specific alpha-proteobacterial origin to genes for glycolysis. In the absence of data to the contrary, it is assumed that the ancestral host cell was a heterotroph.
Collapse
Affiliation(s)
- C G Kurland
- Department of Molecular Evolution, Evolutionary Biology Centre, University of Uppsala, Uppsala SE 752 36, Lund University, Lund SE 223 62, Sweden.
| | | |
Collapse
|
42
|
Hooper SD, Berg OG. Gradients in nucleotide and codon usage along Escherichia coli genes. Nucleic Acids Res 2000; 28:3517-23. [PMID: 10982871 PMCID: PMC110745 DOI: 10.1093/nar/28.18.3517] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The usage of codons and nucleotide combinations varies along genes and systematic variation causes gradients in usage. We have studied such gradients of nucleotides and nucleotide combinations and their immediate context in Escherichia coli. To distinguish mutational and selectional effects, the genes were subdivided into three groups with different codon usage bias and the gradients of nucleotide usage were studied in each group. Some combinations that can be associated with a propensity for processivity errors show strong negative gradients that become weaker in genes with low codon bias, consistent with a selection on translational efficiency. One of the strongest gradients is for third position G, which shows a pervasive positive gradient in usage in most contexts of surrounding bases.
Collapse
Affiliation(s)
- S D Hooper
- Department of Molecular Evolution, EBC, Uppsala University, Norbyvägen 18C, SE-75236, Uppsala, Sweden
| | | |
Collapse
|
43
|
Romero H, Zavala A, Musto H. Codon usage in Chlamydia trachomatis is the result of strand-specific mutational biases and a complex pattern of selective forces. Nucleic Acids Res 2000; 28:2084-90. [PMID: 10773076 PMCID: PMC105376 DOI: 10.1093/nar/28.10.2084] [Citation(s) in RCA: 145] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The patterns of synonymous codon choices of the completely sequenced genome of the bacterium Chlamydia trachomatis were analysed. We found that the most important source of variation among the genes results from whether the sequence is located on the leading or lagging strand of replication, resulting in an over representation of G or C, respectively. This can be explained by different mutational biases associated to the different enzymes that replicate each strand. Next we found that most highly expressed sequences are located on the leading strand of replication. From this result, replicational-transcriptional selection can be invoked. Then, when the genes located on the leading strand are studied separately, the correspondence analysis detects a principal trend which discriminates between lowly and highly expressed sequences, the latter displaying a different codon usage pattern than the former, suggesting selection for translation, which is reinforced by the fact that Ks values between orthologous sequences from C. trachomatis and Chlamydia pneumoniae are much smaller in highly expressed genes. Finally, synonymous codon choices appear to be influenced by the hydropathy of each encoded protein and by the degree of amino acid conservation. Therefore, synonymous codon usage in C.trachomatis seems to be the result of a very complex balance among different factors, which rises the problem of whether the forces driving codon usage patterns among microorganisms are rather more complex than generally accepted.
Collapse
Affiliation(s)
- H Romero
- Laboratorio de Organización y Evolución del Genoma, Sección Bioquímica, Facultad de Ciencias, Iguá 4225, Montevideo 11400, Uruguay
| | | | | |
Collapse
|
44
|
Lafay B, Atherton JC, Sharp PM. Absence of translationally selected synonymous codon usage bias in Helicobacter pylori. MICROBIOLOGY (READING, ENGLAND) 2000; 146 ( Pt 4):851-860. [PMID: 10784043 DOI: 10.1099/00221287-146-4-851] [Citation(s) in RCA: 80] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Synonymous codon usage in the complete genome of Helicobacter pylori was investigated. The moderate A+T-richness of the genome (G+C=39 mol%) is reflected in the overall synonymous codon usage but the frequencies of some codons cannot be explained by simple mutational biases. A low level of heterogeneity among genes was observed, but this does not appear to be due to varying mutational bias or translational selection. Some of the heterogeneity was due to amino acid composition variation among the encoded proteins, and some may be attributable to recent acquisition of genes from other species. Since Hel. pylori codon usage is not dominated by biased mutation patterns, the absence of evidence for translationally mediated selection among synonymous codons is striking. This has implications with regard to the life history of this species, and in particular suggests that Hel. pylori strains are not subject to periods of competitive exponential growth. Despite the lack of selected codon usage, base composition immediately after the translation initiation site is skewed, consistent with selection against secondary structure formation in this region.
Collapse
Affiliation(s)
- Bénédicte Lafay
- Institute of Genetics1, and Division of Gastroenterology, Department of Medicine and Institute of Infections and Immunity2, University of Nottingham, Queen's Medical Centre, Nottingham NG7 2UH, UK
| | - John C Atherton
- Institute of Genetics1, and Division of Gastroenterology, Department of Medicine and Institute of Infections and Immunity2, University of Nottingham, Queen's Medical Centre, Nottingham NG7 2UH, UK
| | - Paul M Sharp
- Institute of Genetics1, and Division of Gastroenterology, Department of Medicine and Institute of Infections and Immunity2, University of Nottingham, Queen's Medical Centre, Nottingham NG7 2UH, UK
| |
Collapse
|
45
|
Romero H, Zavala A, Musto H. Compositional pressure and translational selection determine codon usage in the extremely GC-poor unicellular eukaryote Entamoeba histolytica. Gene 2000; 242:307-11. [PMID: 10721724 DOI: 10.1016/s0378-1119(99)00491-6] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
It is widely accepted that the compositional pressure is the only factor shaping codon usage in unicellular species displaying extremely biased genomic compositions. This seems to be the case in the prokaryotes Mycoplasma capricolum, Rickettsia prowasekii and Borrelia burgdorferi (GC-poor), and in Micrococcus luteus (GC-rich). However, in the GC-poor unicellular eukaryotes Dictyostelium discoideum and Plasmodium falciparum, there is evidence that selection, acting at the level of translation, influences codon choices. This is a twofold intriguing finding, since (1) the genomic GC levels of the above mentioned eukaryotes are lower than the GC% of any studied bacteria, and (2) bacteria usually have larger effective population sizes than eukaryotes, and hence natural selection is expected to overcome more efficiently the randomizing effects of genetic drift among prokaryotes than among eukaryotes. In order to gain a new insight about this problem, we analysed the patterns of codon preferences of the nuclear genes of Entamoeba histolytica, a unicellular eukaryote characterised by an extremely AT-rich genome (GC = 25%). The overall codon usage is strongly biased towards A and T in the third codon positions, and among the presumed highly expressed sequences, there is an increased relative usage of a subset of codons, many of which are C-ending. Since an increase in C in third codon positions is 'against' the compositional bias, we conclude that codon usage in E. histolytica, as happens in D. discoideum and P. falciparum, is the result of an equilibrium between compositional pressure and selection. These findings raise the question of why strongly compositionally biased eukaryotic cells may be more sensitive to the (presumed) slight differences among synonymous codons than compositionally biased bacteria.
Collapse
Affiliation(s)
- H Romero
- Laboratorio de Organización y Evolución del Genoma, Sección Bioquímica, Facultad de Ciencias, Montevideo, Uruguay
| | | | | |
Collapse
|
46
|
Abstract
Studies of noncoding and pseudogene sequence diversity, particularly in Rickettsia, have begun to reveal the basic principles of genome degradation in microorganisms. Increasingly, studies of genes and genomes suggest that there has been an extensive amount of horizontal gene transfer among microorganisms. As this inflow of genetic material does not seem generally to have resulted in genome size expansions, however, degenerative processes must be at the very least as widespread as horizontal gene transfer. The basic principles of gene degradation and elimination that are being explored in Rickettsia are likely to be of major importance for our understanding of how microbial genomes evolve.
Collapse
Affiliation(s)
- J O Andersson
- Department of Molecular Evolution, Uppsala University, Biomedical Center, Box 590, Uppsala, 751 24, Sweden.
| | | |
Collapse
|
47
|
Abstract
The sequence of an alpha-proteobacterial genome, that of Rickettsia prowazekii, is a substantial advance in microbial and evolutionary biology. The genome of this obligately aerobic intracellular parasite is small and is apparently still undergoing reduction, reflecting gene losses attributable to its intracellular parasitic lifestyle. Evolutionary analyses of proteins encoded in the genome contain the strongest phylogenetic evidence to date for the view that mitochondria descend from alpha-proteobacteria. Although both Rickettsia and mitochondrial genomes are highly reduced, it appears that genome reduction in these lineages has occurred independently. Rickettsia's genome encodes an ATP-generating machinery that is strikingly similar to that of aerobic mitochondria. But it does not encode homologues for the ATP-producing pathways of anaerobic mitochondria or hydrogenosomes, leaving an important issue regarding the origin and nature of the ancestral mitochondrial symbiont unresolved.
Collapse
Affiliation(s)
- M Müller
- Rockefeller University, New York, NY 10021, USA.
| | | |
Collapse
|
48
|
Wilquet V, Van de Casteele M. The role of the codon first letter in the relationship between genomic GC content and protein amino acid composition. Res Microbiol 1999; 150:21-32. [PMID: 10096131 DOI: 10.1016/s0923-2508(99)80043-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Analysis of the statistical distribution of amino acid compositions within 22 protein families shows that a GC bias generally affects proteins with a variety of functions from the extreme thermophile Thermus. This results in evident enrichment in amino acids of the group L, V, A, P, R and G and underrepresentation of amino acids of the group I, M, F, S, T, C and W. The strong amino acid composition biases noted in Thermus proteins are not related to thermoadaptation; they were also found in mesophilic homologues encoded by GC-rich genes. The results of a comparative analysis on large samples of translated sequences from 30 organisms, representing the three major kingdoms of life and including extremophiles, indicate a universal correlation between the usage of particular amino acids and the genomic GC content. It is concluded that the codon first letter plays a dominant role in translating the genomic GC signature into protein amino acid composition and sequences.
Collapse
Affiliation(s)
- V Wilquet
- Laboratoire de Microbiologie, Université Libre de Bruxelles (ULB), Belgium
| | | |
Collapse
|
49
|
Andersson SG, Kurland CG. Ancient and recent horizontal transfer events: the origins of mitochondria. APMIS. SUPPLEMENTUM 1998; 84:5-14. [PMID: 9850675 DOI: 10.1111/j.1600-0463.1998.tb05641.x] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]
Affiliation(s)
- S G Andersson
- Department of Molecular Biology, Uppsala University, Sweden
| | | |
Collapse
|
50
|
|