1
|
Huttener R, Thorrez L, In't Veld T, Granvik M, Snoeck L, Van Lommel L, Schuit F. GC content of vertebrate exome landscapes reveal areas of accelerated protein evolution. BMC Evol Biol 2019; 19:144. [PMID: 31311498 PMCID: PMC6636035 DOI: 10.1186/s12862-019-1469-1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Accepted: 06/26/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Rapid accumulation of vertebrate genome sequences render comparative genomics a powerful approach to study macro-evolutionary events. The assessment of phylogenic relationships between species routinely depends on the analysis of sequence homology at the nucleotide or protein level. RESULTS We analyzed mRNA GC content, codon usage and divergence of orthologous proteins in 55 vertebrate genomes. Data were visualized in genome-wide landscapes using a sliding window approach. Landscapes of GC content reveal both evolutionary conservation of clustered genes, and lineage-specific changes, so that it was possible to construct a phylogenetic tree that closely matched the classic "tree of life". Landscapes of GC content also strongly correlated to landscapes of amino acid usage: positive correlation with glycine, alanine, arginine and proline and negative correlation with phenylalanine, tyrosine, methionine, isoleucine, asparagine and lysine. Peaks of GC content correlated strongly with increased protein divergence. CONCLUSIONS Landscapes of base- and amino acid composition of the coding genome opens a new approach in comparative genomics, allowing identification of discrete regions in which protein evolution accelerated over deep evolutionary time. Insight in the evolution of genome structure may spur novel studies assessing the evolutionary benefit of genes in particular genomic regions.
Collapse
Affiliation(s)
- R Huttener
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - L Thorrez
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.,Tissue Engineering Laboratory, Dept of Development and Regeneration, KU Leuven, Kortrijk, Belgium
| | - T In't Veld
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - M Granvik
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - L Snoeck
- Tissue Engineering Laboratory, Dept of Development and Regeneration, KU Leuven, Kortrijk, Belgium
| | - L Van Lommel
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium
| | - F Schuit
- Gene Expression Unit, Dept of Cellular and Molecular Medicine, KU Leuven, Leuven, Belgium.
| |
Collapse
|
2
|
Van Campenhout J, Vanreusel A, Van Belleghem S, Derycke S. Transcription, Signaling Receptor Activity, Oxidative Phosphorylation, and Fatty Acid Metabolism Mediate the Presence of Closely Related Species in Distinct Intertidal and Cold-Seep Habitats. Genome Biol Evol 2015; 8:51-69. [PMID: 26637468 PMCID: PMC4758239 DOI: 10.1093/gbe/evv242] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open
Abstract
Bathyal cold seeps are isolated extreme deep-sea environments characterized by low species diversity while biomass can be high. The Håkon Mosby mud volcano (Barents Sea, 1,280 m) is a rather stable chemosynthetic driven habitat characterized by prominent surface bacterial mats with high sulfide concentrations and low oxygen levels. Here, the nematode Halomonhystera hermesi thrives in high abundances (11,000 individuals 10 cm−2). Halomonhystera hermesi is a member of the intertidal Halomonhystera disjuncta species complex that includes five cryptic species (GD1-5). GD1-5’s common habitat is characterized by strong environmental fluctuations. Here, we compared the transcriptomes of H. hermesi and GD1, H. hermesi’s closest relative. Genes encoding proteins involved in oxidative phosphorylation are more strongly expressed in H. hermesi than in GD1, and many genes were only observed in H. hermesi while being completely absent in GD1. Both observations could in part be attributed to high sulfide concentrations and low oxygen levels. Additionally, fatty acid elongation was also prominent in H. hermesi confirming the importance of highly unsaturated fatty acids in this species. Significant higher amounts of transcription factors and genes involved in signaling receptor activity were observed in GD1 (many of which were completely absent in H. hermesi), allowing fast signaling and transcriptional reprogramming which can mediate survival in dynamic intertidal environments. GC content was approximately 8% higher in H. hermesi coding unigenes resulting in differential codon usage between both species and a higher proportion of amino acids with GC-rich codons in H. hermesi. In general our results showed that most pathways were active in both environments and that only three genes are under natural selection. This indicates that also plasticity should be taken in consideration in the evolutionary history of Halomonhystera species. Such plasticity, as well as possible preadaptation to low oxygen and high sulfide levels might have played an important role in the establishment of a cold-seep Halomonhystera population.
Collapse
Affiliation(s)
- Jelle Van Campenhout
- Research Group Marine Biology, Biology Department, Ghent University, Belgium Department of Biology, Center for Molecular Phylogenetics and Evolution (CeMoFe), Ghent University, Biology Department, Belgium
| | - Ann Vanreusel
- Research Group Marine Biology, Biology Department, Ghent University, Belgium
| | - Steven Van Belleghem
- Terrestrial Ecology Unit, Biology Department, Ghent University, Belgium OD Taxonomy and Phylogeny, Royal Belgian Institute of Natural Sciences, Brussels, Belgium
| | - Sofie Derycke
- Research Group Marine Biology, Biology Department, Ghent University, Belgium OD Taxonomy and Phylogeny, Royal Belgian Institute of Natural Sciences, Brussels, Belgium
| |
Collapse
|
3
|
Abstract
Amino acids typically are encoded by multiple synonymous codons that are not used with the same frequency. Codon usage bias has drawn considerable attention, and several explanations have been offered, including variation in GC-content between species. Focusing on a simple parameter—combined GC proportion of all the synonymous codons for a particular amino acid, termed GCsyn—we try to deepen our understanding of the relationship between GC-content and amino acid/codon usage in more details. We analyzed 65 widely distributed representative species and found a close association between GCsyn, GC-content, and amino acids usage. The overall usages of the four amino acids with the greatest GCsyn and the five amino acids with the lowest GCsyn both vary with the regional GC-content, whereas the usage of the remaining 11 amino acids with intermediate GCsyn is less variable. More interesting, we discovered that codon usage frequencies are nearly constant in regions with similar GC-content. We further quantified the effects of regional GC-content variation (low to high) on amino acid usage and found that GC-content determines the usage variation of amino acids, especially those with extremely high GCsyn, which accounts for 76.7% of the changed GC-content for those regions. Our results suggest that GCsyn correlates with GC-content and has impact on codon/amino acid usage. These findings suggest a novel approach to understanding the role of codon and amino acid usage in shaping genomic architecture and evolutionary patterns of organisms.
Collapse
|
4
|
Zhou HQ, Ning LW, Zhang HX, Guo FB. Analysis of the relationship between genomic GC Content and patterns of base usage, codon usage and amino acid usage in prokaryotes: similar GC content adopts similar compositional frequencies regardless of the phylogenetic lineages. PLoS One 2014; 9:e107319. [PMID: 25255224 PMCID: PMC4177787 DOI: 10.1371/journal.pone.0107319] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2014] [Accepted: 08/08/2014] [Indexed: 11/19/2022] Open
Abstract
The GC contents of 2670 prokaryotic genomes that belong to diverse phylogenetic lineages were analyzed in this paper. These genomes had GC contents that ranged from 13.5% to 74.9%. We analyzed the distance of base frequencies at the three codon positions, codon frequencies, and amino acid compositions across genomes with respect to the differences in the GC content of these prokaryotic species. We found that although the phylogenetic lineages were remote among some species, a similar genomic GC content forced them to adopt similar base usage patterns at the three codon positions, codon usage patterns, and amino acid usage patterns. Our work demonstrates that in prokaryotic genomes: a) base usage, codon usage, and amino acid usage change with GC content with a linear correlation; b) the distance of each usage has a linear correlation with the GC content difference; and c) GC content is more essential than phylogenetic lineage in determining base usage, codon usage, and amino acid usage. This work is exceptional in that we adopted intuitively graphic methods for all analyses, and we used these analyses to examine as many as 2670 prokaryotes. We hope that this work is helpful for understanding common features in the organization of microbial genomes.
Collapse
Affiliation(s)
- Hui-Qi Zhou
- Center of Bioinformatics and Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
| | - Lu-Wen Ning
- Center of Bioinformatics and Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
| | - Hui-Xiong Zhang
- Center of Bioinformatics and Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
| | - Feng-Biao Guo
- Center of Bioinformatics and Key Laboratory for NeuroInformation of the Ministry of Education, University of Electronic Science and Technology of China, Chengdu, China
- * E-mail:
| |
Collapse
|
5
|
GC constituents and relative codon expressed amino acid composition in cyanobacterial phycobiliproteins. Gene 2014; 546:162-71. [PMID: 24933001 DOI: 10.1016/j.gene.2014.06.024] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Revised: 04/17/2014] [Accepted: 06/12/2014] [Indexed: 02/01/2023]
Abstract
The genomic as well as structural relationship of phycobiliproteins (PBPs) in different cyanobacterial species are determined by nucleotides as well as amino acid composition. The genomic GC constituents influence the amino acid variability and codon usage of particular subunit of PBPs. We have analyzed 11 cyanobacterial species to explore the variation of amino acids and causal relationship between GC constituents and codon usage. The study at the first, second and third levels of GC content showed relatively more amino acid variability on the levels of G3+C3 position in comparison to the first and second positions. The amino acid encoded GC rich level including G rich and C rich or both correlate the codon variability and amino acid availability. The fluctuation in amino acids such as Arg, Ala, His, Asp, Gly, Leu and Glu in α and β subunits was observed at G1C1 position; however, fluctuation in other amino acids such as Ser, Thr, Cys and Trp was observed at G2C2 position. The coding selection pressure of amino acids such as Ala, Thr, Tyr, Asp, Gly, Ile, Leu, Asn, and Ser in α and β subunits of PBPs was more elaborated at G3C3 position. In this study, we observed that each subunit of PBPs is codon specific for particular amino acid. These results suggest that genomic constraint linked with GC constituents selects the codon for particular amino acids and furthermore, the codon level study may be a novel approach to explore many problems associated with genomics and proteomics of cyanobacteria.
Collapse
|
6
|
Pradel N, Ji B, Gimenez G, Talla E, Lenoble P, Garel M, Tamburini C, Fourquet P, Lebrun R, Bertin P, Denis Y, Pophillat M, Barbe V, Ollivier B, Dolla A. The first genomic and proteomic characterization of a deep-sea sulfate reducer: insights into the piezophilic lifestyle of Desulfovibrio piezophilus. PLoS One 2013; 8:e55130. [PMID: 23383081 PMCID: PMC3559428 DOI: 10.1371/journal.pone.0055130] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2012] [Accepted: 12/18/2012] [Indexed: 01/19/2023] Open
Abstract
Desulfovibrio piezophilus strain C1TLV30(T) is a piezophilic anaerobe that was isolated from wood falls in the Mediterranean deep-sea. D. piezophilus represents a unique model for studying the adaptation of sulfate-reducing bacteria to hydrostatic pressure. Here, we report the 3.6 Mbp genome sequence of this piezophilic bacterium. An analysis of the genome revealed the presence of seven genomic islands as well as gene clusters that are most likely linked to life at a high hydrostatic pressure. Comparative genomics and differential proteomics identified the transport of solutes and amino acids as well as amino acid metabolism as major cellular processes for the adaptation of this bacterium to hydrostatic pressure. In addition, the proteome profiles showed that the abundance of key enzymes that are involved in sulfate reduction was dependent on hydrostatic pressure. A comparative analysis of orthologs from the non-piezophilic marine bacterium D. salexigens and D. piezophilus identified aspartic acid, glutamic acid, lysine, asparagine, serine and tyrosine as the amino acids preferentially replaced by arginine, histidine, alanine and threonine in the piezophilic strain. This work reveals the adaptation strategies developed by a sulfate reducer to a deep-sea lifestyle.
Collapse
Affiliation(s)
- Nathalie Pradel
- Aix-Marseille Université, Université du Sud Toulon-Var, CNRS/INSU, IRD, MIO, UM110, Marseille, France
- * E-mail: (NP); (AD)
| | - Boyang Ji
- Aix-Marseille Université, CNRS, LCB, UMR 7283, Marseille, France
| | | | - Emmanuel Talla
- Aix-Marseille Université, CNRS, LCB, UMR 7283, Marseille, France
| | - Patricia Lenoble
- Laboratoire de Finition C.E.A., Institut de Génomique – Genoscope, Evry, France
| | - Marc Garel
- Aix-Marseille Université, Université du Sud Toulon-Var, CNRS/INSU, IRD, MIO, UM110, Marseille, France
| | - Christian Tamburini
- Aix-Marseille Université, Université du Sud Toulon-Var, CNRS/INSU, IRD, MIO, UM110, Marseille, France
| | | | - Régine Lebrun
- Plate-formes Protéomique et Transcriptomique FR3479, IBiSA Marseille-Protéomique. IMM - CNRS, Marseille, France
| | - Philippe Bertin
- UMR 7156, CNRS, Université Louis Pasteur, Strasbourg, France
| | - Yann Denis
- Plate-formes Protéomique et Transcriptomique FR3479, IBiSA Marseille-Protéomique. IMM - CNRS, Marseille, France
| | | | - Valérie Barbe
- Laboratoire de Finition C.E.A., Institut de Génomique – Genoscope, Evry, France
| | - Bernard Ollivier
- Aix-Marseille Université, Université du Sud Toulon-Var, CNRS/INSU, IRD, MIO, UM110, Marseille, France
| | - Alain Dolla
- Aix-Marseille Université, CNRS, LCB, UMR 7283, Marseille, France
- * E-mail: (NP); (AD)
| |
Collapse
|
7
|
Lightfield J, Fram NR, Ely B. Across bacterial phyla, distantly-related genomes with similar genomic GC content have similar patterns of amino acid usage. PLoS One 2011; 6:e17677. [PMID: 21423704 PMCID: PMC3053387 DOI: 10.1371/journal.pone.0017677] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2010] [Accepted: 02/07/2011] [Indexed: 11/24/2022] Open
Abstract
The GC content of bacterial genomes ranges from 16% to 75% and wide ranges of genomic GC content are observed within many bacterial phyla, including both Gram negative and Gram positive phyla. Thus, divergent genomic GC content has evolved repeatedly in widely separated bacterial taxa. Since genomic GC content influences codon usage, we examined codon usage patterns and predicted protein amino acid content as a function of genomic GC content within eight different phyla or classes of bacteria. We found that similar patterns of codon usage and protein amino acid content have evolved independently in all eight groups of bacteria. For example, in each group, use of amino acids encoded by GC-rich codons increased by approximately 1% for each 10% increase in genomic GC content, while the use of amino acids encoded by AT-rich codons decreased by a similar amount. This consistency within every phylum and class studied led us to conclude that GC content appears to be the primary determinant of the codon and amino acid usage patterns observed in bacterial genomes. These results also indicate that selection for translational efficiency of highly expressed genes is constrained by the genomic parameters associated with the GC content of the host genome.
Collapse
Affiliation(s)
- John Lightfield
- Department of Biological Sciences, University of South Carolina, Columbia, South Carolina, United States of America
| | - Noah R. Fram
- Department of Biological Sciences, University of South Carolina, Columbia, South Carolina, United States of America
| | - Bert Ely
- Department of Biological Sciences, University of South Carolina, Columbia, South Carolina, United States of America
- * E-mail:
| |
Collapse
|
8
|
Crystal structure of a thermostable Old Yellow Enzyme from Thermus scotoductus SA-01. Biochem Biophys Res Commun 2010; 393:426-31. [DOI: 10.1016/j.bbrc.2010.02.011] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2010] [Accepted: 02/03/2010] [Indexed: 11/20/2022]
|
9
|
Mitreva M, Wendl MC, Martin J, Wylie T, Yin Y, Larson A, Parkinson J, Waterston RH, McCarter JP. Codon usage patterns in Nematoda: analysis based on over 25 million codons in thirty-two species. Genome Biol 2006; 7:R75. [PMID: 26271136 PMCID: PMC1779591 DOI: 10.1186/gb-2006-7-8-r75] [Citation(s) in RCA: 47] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2006] [Revised: 06/30/2006] [Accepted: 08/14/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Codon usage has direct utility in molecular characterization of species and is also a arker for molecular evolution. To understand codon usage within the diverse phylum Nematoda,we analyzed a total of 265,494 expressed sequence tags (ESTs) from 30 nematode species. The full genomes of Caenorhabditis elegans and C. briggsae were also examined. A total of 25,871,325 codons ere analyzed and a comprehensive codon usage table for all species was generated. This is the first codon usage table available for 24 of these organisms. RESULTS Codon usage similarity in Nematoda usually persists over the breadth of a genus but thenrapidly diminishes even within each clade. Globodera, Meloidogyne, Pristionchus, and Strongyloides have the most highly derived patterns of codon usage. The major factor affecting differences in codon usage between species is the coding sequence GC content, which varies in nematodes from 32%to 51%. Coding GC content (measured as GC3) also explains much of the observed variation in the effective number of codons (R = 0.70), which is a measure of codon bias, and it even accounts for differences in amino acid frequency. Codon usage is also affected by neighboring nucleotides(N1 context). Coding GC content correlates strongly with estimated noncoding genomic GC content (R = 0.92). On examining abundant clusters in five species, candidate optimal codons were identified that may be preferred in highly expressed transcripts. CONCLUSION Evolutionary models indicate that total genomic GC content, probably the product of directional mutation pressure, drives codon usage rather than the converse, a conclusion that is supported by examination of nematode genomes.
Collapse
Affiliation(s)
- Makedonka Mitreva
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - Michael C Wendl
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - John Martin
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - Todd Wylie
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - Yong Yin
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
| | - Allan Larson
- Department of Biology, Washington University, St. Louis, Missouri 63130, USA
| | - John Parkinson
- Hospital for Sick Children, Toronto, and Departments of Biochemistry/Medical Genetics and Microbiology, University of Toronto, M5G 1X8, Canada
| | - Robert H Waterston
- Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA
| | - James P McCarter
- Genome Sequencing Center, Washington University School of Medicine, St Louis, Missouri 63108, USA
- Divergence Inc., St Louis, Missouri 63141, USA
| |
Collapse
|
10
|
Pascal G, Médigue C, Danchin A. Persistent biases in the amino acid composition of prokaryotic proteins. Bioessays 2006; 28:726-38. [PMID: 16850406 DOI: 10.1002/bies.20431] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Correspondence analysis of 28 proteomes selected to span the entire realm of prokaryotes revealed universal biases in the proteins' amino acid distribution. Integral Inner Membrane Proteins always form an individual cluster, which can then be used to predict protein localisation in unknown proteomes, independently of the organism's biotope or kingdom. Orphan proteins are consistently rich in aromatic residues. Another bias is also ubiquitous: the amino acid composition is driven by the G + C content of the first codon position. An unexpected bias is driven, in many proteomes, by the AAN box of the genetic code, suggesting some functional biochemical relationship between asparagine and lysine. Less-significant biases are driven by the rare amino acids, cysteine and tryptophan. Some allow identification of species-specific functions or localisation such as surface or exported proteins. Errors in genome annotations are also revealed by correspondence analysis, making it useful for quality control and correction.
Collapse
Affiliation(s)
- Géraldine Pascal
- Genoscope/CNRS UMR 8030, Atelier de Génomique Comparative, Evry, France
| | | | | |
Collapse
|
11
|
Yang J, Dong XC, Leng Y. Conformation biases of amino acids based on tripeptide microenvironment from PDB database. J Theor Biol 2005; 240:374-84. [PMID: 16290902 DOI: 10.1016/j.jtbi.2005.09.025] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2005] [Revised: 09/28/2005] [Accepted: 09/29/2005] [Indexed: 11/30/2022]
Abstract
We have constructed a bank (FTTP) of tendentious factors of three states of three-peptide units from PDB database based on conformational dihedral angle library and demonstrated that amino acid biases toward protein secondary structure are present in natural protein sequences. Our research results reveal that 20 standard amino acids fall into three groups: nine residues inclined to alpha-helix with a common character (e.g. direct side chain aliphatic residues or positive/negative charged residues) arrange in three grades, viz EA, QKRLD, and MN, in turn; seven residues are apt to beta-strand with 2'-branched side chain aliphatic residues or benzyl-included residues, namely PV, IYTC, and F, in three ranks; and four residues SHWG show a double tendency to both alpha and beta. Noticeably, proline has the strongest ability to form extended conformation, especially the Re value up to 9.5298 at position 3 (Table 3). Thus, biases of codons show an evident tendency in protein folding, where GC-rich codons are mainly in charge of forming contracted conformation, especially the codon's first letter plays a dominant role in translating the genomic GC signature into protein sequences and structures. So, biases of amino acids will play an important role in protein folding, folding codons, refining domain, structure prediction, and structural genomics/proteomics.
Collapse
Affiliation(s)
- Jie Yang
- Life Science College, State Key Laboratory of Pharmaceutical Biotechnology, Nanjing University, Nanjing 210093, PR China.
| | | | | |
Collapse
|
12
|
Abstract
The levels of cellular organization in living organisms are the results of a variety of selection pressures. We have investigated here the final outcome of this integrated selective process in proteins of the best known microbial models Escherichia coli, Bacillus subtilis, and Methanococcus jannaschii, supposed to have undergone separate evolution for more than 1 billion years. Using multivariate analysis methods, including correspondence analysis, we studied the overall amino acid composition of all proteins making a proteome. Starting from and further developing previous results that had pointed out some general forces driving the amino acid composition of the proteomes of these model bacteria, we explored the correlations existing between the structure and functions of the proteins forming a proteome and their amino acid composition. The electric charge of amino acids measured against hydrophobicity creates a highly homogeneous cluster, made exclusively of proteins that are core components of the cytoplasmic membrane of the cell (integral inner membrane proteins). A second bias is imposed by the G+C content of the genome, indicating that protein functions are so robust with respect to amino acid changes that they can accommodate a large shift in the nucleotide content of the genome. A remarkable role of aromatic amino acids was uncovered. Expressed orphan proteins are enriched in these residues, suggesting that they might participate in a process of gain of function during evolution.
Collapse
Affiliation(s)
- Géraldine Pascal
- Genoscope/CNRS UMR 8030, Atelier de Génomique Comparative, Evry, France.
| | | | | |
Collapse
|
13
|
Bharanidharan D, Bhargavi GR, Uthanumallian K, Gautham N. Correlations between nucleotide frequencies and amino acid composition in 115 bacterial species. Biochem Biophys Res Commun 2004; 315:1097-103. [PMID: 14985126 DOI: 10.1016/j.bbrc.2004.01.129] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2004] [Indexed: 11/21/2022]
Abstract
We studied the correlations between amino acid composition and mononucleotide and dinucleotide frequencies in 115 bacterial genomes of varying G+C content. Observed amino acid frequencies were compared with those expected from the actual mononucleotide and dinucleotide frequencies. Both mononucleotide and dinucleotide frequencies correlate well with the amino acid frequency, with dinucleotide frequencies doing so better. Despite the strong correlations, some of the observed amino acid frequencies, in particular for Arg, Val, Asp, Glu, Ser, and Cys, were consistently different from predicted values in all genomes. We suggest that this variation from predicted values is a consequence of selection pressure at the level of amino acids, while the close correspondence to the predictions in residues such as Thr, Phe, Lys, and Asn arises only from mutation and selection pressure at the level of the nucleic acid sequences.
Collapse
Affiliation(s)
- D Bharanidharan
- Department of Crystallography and Biophysics, University of Madras, Guindy Campus, Chennai 600 025, India
| | | | | | | |
Collapse
|
14
|
Schneider D, Liu Y, Gerstein M, Engelman DM. Thermostability of membrane protein helix-helix interaction elucidated by statistical analysis. FEBS Lett 2002; 532:231-6. [PMID: 12459496 DOI: 10.1016/s0014-5793(02)03687-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
A prerequisite for the survival of (micro)organisms at high temperatures is an adaptation of protein stability to extreme environmental conditions. In contrast to soluble proteins, where many factors have already been identified, the mechanisms by which the thermostability of membrane proteins is enhanced are almost unknown. The hydrophobic membrane environment constrains possible stabilizing factors for transmembrane domains, so that a difference might be expected between soluble and membrane proteins. Here we present sequence analysis of predicted transmembrane helices of the genomes from eight thermophilic and 12 mesophilic organisms. A comparison of the amino acid compositions indicates that more polar residues can be found in the transmembrane helices of thermophilic organisms. Particularly, the amino acids aspartic acid and glutamic acid replace the corresponding amides. Cysteine residues are found to be significantly decreased by about 70% in thermophilic membrane domains suggesting a non-specific function of most cysteine residues in transmembrane domains of mesophilic organisms. By a pair-motif analysis of the two sets of transmembrane helices, we found that the small residues glycine and serine contribute more to transmembrane helix-helix interactions in thermophilic organisms. This may result in a tighter packing of the helices allowing more hydrogen bond formation.
Collapse
Affiliation(s)
- Dirk Schneider
- Department of Molecular Biophysics and Biochemistry, Yale University, P.O. Box 208114, New Haven, CT 06520-8114, USA
| | | | | | | |
Collapse
|
15
|
Berkhout B, Grigoriev A, Bakker M, Lukashov VV. Codon and amino acid usage in retroviral genomes is consistent with virus-specific nucleotide pressure. AIDS Res Hum Retroviruses 2002; 18:133-41. [PMID: 11839146 DOI: 10.1089/08892220252779674] [Citation(s) in RCA: 63] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Retroviral RNA genomes are known to have a biased nucleotide composition. For instance, the plus-strand RNA of human immunodeficiency virus (HIV) is A-rich, and the genome of human T cell leukemia virus (HTLV) is C-rich, and other retroviruses have a U-rich or G-rich genome. The biased composition of these genomes is most likely caused by directional mutational pressure of the respective reverse transcriptase enzymes. Using a set of retroviral genomes with a distinct nucleotide composition, we performed skew analyses of the nucleotide bias along the complete viral genome. Distinct nucleotide signatures were apparent, and these typical patterns were generally conserved across the viral genome. Furthermore, it is demonstrated that this typical nucleotide bias, combined with a profound discrimination against the CpG dinucleotide sequence, strongly influences the codon usage of the retroviruses in a direct manner, and their amino acid usage in an indirect manner. The fact that both codon usage and amino acid usage are so closely entwined with the genome composition has important practical implications. For instance, the typical trends in nucleotide usage could influence the molecular phylogenetic reconstruction of the family Retroviridae.
Collapse
Affiliation(s)
- Ben Berkhout
- Department of Human Retrovirology, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, The Netherlands.
| | | | | | | |
Collapse
|
16
|
Radomski JP, Slonimski PP. Genomic style of proteins: concepts, methods and analyses of ribosomal proteins from 16 microbial species. FEMS Microbiol Rev 2001; 25:425-35. [PMID: 11524132 DOI: 10.1111/j.1574-6976.2001.tb00585.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
We have introduced the concept of genomic 'style' of proteins. By style we understand those properties of a large set of proteins which are specific to the genome of one species (species primary-self) and different from the genome of another species (species contrasted-self). To characterise the style, we took advantage of the frequencies of amino acids and dipeptides present in non-identical segments of the complete set of orthologous ribosomal proteins encoded by 16 microbial species. We confirm the dependence of the overall amino acid composition on the genomic (G+C) content, and introduce a rectification procedure making it possible to extricate appropriate species-specific characteristics, which are no longer related to this content. The rectified frequencies are used to calculate inter-species distance matrices, and to build genomic evolutionary trees. Remarkably, the phylograms derived from the frequencies of non-identical residues in proteins closely resemble the classical phylograms based upon the conservation of identical residues in ribosomal RNAs. We believe that the concept of genomic style of proteins can be a useful tool for the study of evolution.
Collapse
Affiliation(s)
- J P Radomski
- Intersdisplinary Centre for Mathematical and Computational Modelling, Warsaw University, PL-02-106 Warsaw, Poland.
| | | |
Collapse
|
17
|
Kreil DP, Ouzounis CA. Identification of thermophilic species by the amino acid compositions deduced from their genomes. Nucleic Acids Res 2001; 29:1608-15. [PMID: 11266564 PMCID: PMC31282 DOI: 10.1093/nar/29.7.1608] [Citation(s) in RCA: 129] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The global amino acid compositions as deduced from the complete genomic sequences of six thermophilic archaea, two thermophilic bacteria, 17 mesophilic bacteria and two eukaryotic species were analysed by hierarchical clustering and principal components analysis. Both methods showed an influence of several factors on amino acid composition. Although GC content has a dominant effect, thermophilic species can be identified by their global amino acid compositions alone. This study presents a careful statistical analysis of factors that affect amino acid composition and also yielded specific features of the average amino acid composition of thermophilic species. Moreover, we introduce the first example of a 'compositional tree' of species that takes into account not only homologous proteins, but also proteins unique to particular species. We expect this simple yet novel approach to be a useful additional tool for the study of phylogeny at the genome level.
Collapse
Affiliation(s)
- D P Kreil
- University of Cambridge and European Bioinformatics Institute, Computational Genomics Group, Research Programme, The European Bioinformatics Institute, EMBL Outstation, Wellcome Trust Genome Campus, Cambridge CB10 1SD, UK.
| | | |
Collapse
|
18
|
Sauvé V, Sygusch J. Molecular cloning, expression, purification, and characterization of fructose-1,6-bisphosphate aldolase from Thermus aquaticus. Protein Expr Purif 2001; 21:293-302. [PMID: 11237691 DOI: 10.1006/prep.2000.1380] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Fructose-1,6-bisphosphate aldolase from the thermophilic eubacteria, Thermus aquaticus YT-1, was cloned and sequenced. Nucleotide-sequence analysis revealed an open reading frame coding for a 33-kDa protein of 305 amino acids having amino acid sequence typical of thermophilic adaptation. Multiple sequence alignment classifies the enzyme as a class II B aldolase that shares similarity with aldolases from other extremophiles: Thermotoga maritima, Aquifex aeolicus, and Helicobacter pylori (49--54% identity, 76--81% homology). Taq FBP aldolase was overexpressed under tac promoter control in Escherichia coli and purified to homogeneity using heat treatment followed by two chromatographic steps. Yields of 40--50 mg of monodisperse protein were obtained per liter of culture. The quaternary structure is that of a homotetramer stabilized by an apparent 21-amino-acid insertion sequence. The recombinant protein is thermostable for at least 45 min at 80 degrees C with little residual activity below 60 degrees C. Kinetic characterization at 70 degrees C, the optimal growth temperature for T. aquaticus, indicates extreme negative subunit cooperativity (h = 0.32) with a limiting K(m) of 305 microM. The maximal specific activity (V(max)) is 46 U/mg at 70 degrees C.
Collapse
Affiliation(s)
- V Sauvé
- Département de Biochimie, Université de Montréal, CP 6128, Succursale Centre Ville, Montréal, Québec, Canada H3C 3J7
| | | |
Collapse
|
19
|
Knight RD, Freeland SJ, Landweber LF. A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol 2001; 2:RESEARCH0010. [PMID: 11305938 PMCID: PMC31479 DOI: 10.1186/gb-2001-2-4-research0010] [Citation(s) in RCA: 201] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2000] [Revised: 02/01/2001] [Accepted: 02/13/2001] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND Correlations between genome composition (in terms of GC content) and usage of particular codons and amino acids have been widely reported, but poorly explained. We show here that a simple model of processes acting at the nucleotide level explains codon usage across a large sample of species (311 bacteria, 28 archaea and 257 eukaryotes). The model quantitatively predicts responses (slope and intercept of the regression line on genome GC content) of individual codons and amino acids to genome composition. RESULTS Codons respond to genome composition on the basis of their GC content relative to their synonyms (explaining 71-87% of the variance in response among the different codons, depending on measure). Amino-acid responses are determined by the mean GC content of their codons (explaining 71-79% of the variance). Similar trends hold for genes within a genome. Position-dependent selection for error minimization explains why individual bases respond differently to directional mutation pressure. CONCLUSIONS Our model suggests that GC content drives codon usage (rather than the converse). It unifies a large body of empirical evidence concerning relationships between GC content and amino-acid or codon usage in disparate systems. The relationship between GC content and codon and amino-acid usage is ahistorical; it is replicated independently in the three domains of living organisms, reinforcing the idea that genes and genomes at mutation/selection equilibrium reproduce a unique relationship between nucleic acid and protein composition. Thus, the model may be useful in predicting amino-acid or nucleotide sequences in poorly characterized taxa.
Collapse
Affiliation(s)
- Robin D Knight
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| | - Stephen J Freeland
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| | - Laura F Landweber
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
20
|
Singer GA, Hickey DA. Nucleotide bias causes a genomewide bias in the amino acid composition of proteins. Mol Biol Evol 2000; 17:1581-8. [PMID: 11070046 DOI: 10.1093/oxfordjournals.molbev.a026257] [Citation(s) in RCA: 183] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We analyzed the nucleotide contents of several completely sequenced genomes, and we show that nucleotide bias can have a dramatic effect on the amino acid composition of the encoded proteins. By surveying the genes in 21 completely sequenced eubacterial and archaeal genomes, along with the entire Saccharomyces cerevisiae genome and two Plasmodium falciparum chromosomes, we show that biased DNA encodes biased proteins on a genomewide scale. The predicted bias affects virtually all genes within the genome, and it could be clearly seen even when we limited the analysis to sets of homologous gene sequences. Parallel patterns of compositional bias were found within the archaea and the eubacteria. We also found a positive correlation between the degree of amino acid bias and the magnitude of protein sequence divergence. We conclude that mutational bias can have a major effect on the molecular evolution of proteins. These results could have important implications for the interpretation of protein-based molecular phylogenies and for the inference of functional protein adaptation from comparative sequence data.
Collapse
Affiliation(s)
- G A Singer
- Department of Biology, University of Ottawa, Ottawa, Ontario, Canada
| | | |
Collapse
|
21
|
Sekowska A, Danchin A, Risler JL. Phylogeny of related functions: the case of polyamine biosynthetic enzymes. MICROBIOLOGY (READING, ENGLAND) 2000; 146 ( Pt 8):1815-1828. [PMID: 10931887 DOI: 10.1099/00221287-146-8-1815] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Genome annotation requires explicit identification of gene function. This task frequently uses protein sequence alignments with examples having a known function. Genetic drift, co-evolution of subunits in protein complexes and a variety of other constraints interfere with the relevance of alignments. Using a specific class of proteins, it is shown that a simple data analysis approach can help solve some of the problems posed. The origin of ureohydrolases has been explored by comparing sequence similarity trees, maximizing amino acid alignment conservation. The trees separate agmatinases from arginases but suggest the presence of unknown biases responsible for unexpected positions of some enzymes. Using factorial correspondence analysis, a distance tree between sequences was established, comparing regions with gaps in the alignments. The gap tree gives a consistent picture of functional kinship, perhaps reflecting some aspects of phylogeny, with a clear domain of enzymes encoding two types of ureohydrolases (agmatinases and arginases) and activities related to, but different from ureohydrolases. Several annotated genes appeared to correspond to a wrong assignment if the trees were significant. They were cloned and their products expressed and identified biochemically. This substantiated the validity of the gap tree. Its organization suggests a very ancient origin of ureohydrolases. Some enzymes of eukaryotic origin are spread throughout the arginase part of the trees: they might have been derived from the genes found in the early symbiotic bacteria that became the organelles. They were transferred to the nucleus when symbiotic genes had to escape Muller's ratchet. This work also shows that arginases and agmatinases share the same two manganese-ion-binding sites and exhibit only subtle differences that can be accounted for knowing the three-dimensional structure of arginases. In the absence of explicit biochemical data, extreme caution is needed when annotating genes having similarities to ureohydrolases.
Collapse
Affiliation(s)
- Agnieszka Sekowska
- Hong Kong University Pasteur Research Centre, Dexter HC Man Building, 8 Sassoon Road, Pokfulam, Hong Kong2
- Regulation of Gene Expression, Institut Pasteur, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France1
| | - Antoine Danchin
- Hong Kong University Pasteur Research Centre, Dexter HC Man Building, 8 Sassoon Road, Pokfulam, Hong Kong2
- Regulation of Gene Expression, Institut Pasteur, 28 rue du Docteur Roux, 75724 Paris Cedex 15, France1
| | - Jean-Loup Risler
- Genome and Informatics, Université de Versailles-Saint-Quentin, 45 Avenue des Etats Unis, 78035 Versailles Cedex, France3
| |
Collapse
|