1
|
Huang Y, Mao Z, Zhang Y, Zhao J, Luan X, Wu K, Yun L, Yu J, Shi Z, Liao X, Ma H. Omics data analysis reveals the system-level constraint on cellular amino acid composition. Synth Syst Biotechnol 2024; 9:304-311. [PMID: 38510205 PMCID: PMC10951587 DOI: 10.1016/j.synbio.2024.03.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Revised: 03/01/2024] [Accepted: 03/01/2024] [Indexed: 03/22/2024] Open
Abstract
Proteins play a pivotal role in coordinating the functions of organisms, essentially governing their traits, as the dynamic arrangement of diverse amino acids leads to a multitude of folded configurations within peptide chains. Despite dynamic changes in amino acid composition of an individual protein (referred to as AAP) and great variance in protein expression levels under different conditions, our study, utilizing transcriptomics data from four model organisms uncovers surprising stability in the overall amino acid composition of the total cellular proteins (referred to as AACell). Although this value may vary between different species, we observed no significant differences among distinct strains of the same species. This indicates that organisms enforce system-level constraints to maintain a consistent AACell, even amid fluctuations in AAP and protein expression. Further exploration of this phenomenon promises insights into the intricate mechanisms orchestrating cellular protein expression and adaptation to varying environmental challenges.
Collapse
Affiliation(s)
- Yuanyuan Huang
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin, 300457, China
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Zhitao Mao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Yue Zhang
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin, 300457, China
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Jianxiao Zhao
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
- Frontier Science Center for Synthetic Biology and Key Laboratory of Systems Bioengineering (Ministry of Education), Tianjin University, Tianjin, 300072, China
| | - Xiaodi Luan
- College of Biotechnology, Tianjin University of Science and Technology, Tianjin, 300457, China
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Ke Wu
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Lili Yun
- Tianjin Medical Laboratory, BGI-Tianjin, BGI-Shenzhen, Tianjin, 300308, China
| | - Jing Yu
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Zhenkun Shi
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Xiaoping Liao
- Haihe Laboratory of Synthetic Biology, Tianjin, 300308, China
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| | - Hongwu Ma
- Biodesign Center, Key Laboratory of Engineering Biology for Low-carbon Manufacturing, Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, Tianjin, 300308, China
- National Center of Technology Innovation for Synthetic Biology, Tianjin, 300308, China
| |
Collapse
|
2
|
Szitenberg A, Cha S, Opperman CH, Bird DM, Blaxter ML, Lunt DH. Genetic Drift, Not Life History or RNAi, Determine Long-Term Evolution of Transposable Elements. Genome Biol Evol 2016; 8:2964-2978. [PMID: 27566762 PMCID: PMC5635653 DOI: 10.1093/gbe/evw208] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/20/2016] [Indexed: 12/11/2022] Open
Abstract
Transposable elements (TEs) are a major source of genome variation across the branches of life. Although TEs may play an adaptive role in their host's genome, they are more often deleterious, and purifying selection is an important factor controlling their genomic loads. In contrast, life history, mating system, GC content, and RNAi pathways have been suggested to account for the disparity of TE loads in different species. Previous studies of fungal, plant, and animal genomes have reported conflicting results regarding the direction in which these genomic features drive TE evolution. Many of these studies have had limited power, however, because they studied taxonomically narrow systems, comparing only a limited number of phylogenetically independent contrasts, and did not address long-term effects on TE evolution. Here, we test the long-term determinants of TE evolution by comparing 42 nematode genomes spanning over 500 million years of diversification. This analysis includes numerous transitions between life history states, and RNAi pathways, and evaluates if these forces are sufficiently persistent to affect the long-term evolution of TE loads in eukaryotic genomes. Although we demonstrate statistical power to detect selection, we find no evidence that variation in these factors influence genomic TE loads across extended periods of time. In contrast, the effects of genetic drift appear to persist and control TE variation among species. We suggest that variation in the tested factors are largely inconsequential to the large differences in TE content observed between genomes, and only by these large-scale comparisons can we distinguish long-term and persistent effects from transient or random changes.
Collapse
Affiliation(s)
- Amir Szitenberg
- Evolutionary Biology Group, School of Environmental Sciences, University of Hull, England, United Kingdom The Dead Sea and Arava Science Center, Israel
| | - Soyeon Cha
- Department of Plant Pathology, North Carolina State University
| | | | - David M Bird
- Department of Plant Pathology, North Carolina State University
| | - Mark L Blaxter
- School of Biological Sciences, Institute of Evolutionary Biology, University of Edinburgh, Scotland
| | - David H Lunt
- Evolutionary Biology Group, School of Environmental Sciences, University of Hull, England, United Kingdom
| |
Collapse
|
3
|
Abstract
Amino acids typically are encoded by multiple synonymous codons that are not used with the same frequency. Codon usage bias has drawn considerable attention, and several explanations have been offered, including variation in GC-content between species. Focusing on a simple parameter—combined GC proportion of all the synonymous codons for a particular amino acid, termed GCsyn—we try to deepen our understanding of the relationship between GC-content and amino acid/codon usage in more details. We analyzed 65 widely distributed representative species and found a close association between GCsyn, GC-content, and amino acids usage. The overall usages of the four amino acids with the greatest GCsyn and the five amino acids with the lowest GCsyn both vary with the regional GC-content, whereas the usage of the remaining 11 amino acids with intermediate GCsyn is less variable. More interesting, we discovered that codon usage frequencies are nearly constant in regions with similar GC-content. We further quantified the effects of regional GC-content variation (low to high) on amino acid usage and found that GC-content determines the usage variation of amino acids, especially those with extremely high GCsyn, which accounts for 76.7% of the changed GC-content for those regions. Our results suggest that GCsyn correlates with GC-content and has impact on codon/amino acid usage. These findings suggest a novel approach to understanding the role of codon and amino acid usage in shaping genomic architecture and evolutionary patterns of organisms.
Collapse
|
4
|
Nikbakht H, Xia X, Hickey DA. The evolution of genomic GC content undergoes a rapid reversal within the genus Plasmodium. Genome 2015; 57:507-11. [PMID: 25633864 DOI: 10.1139/gen-2014-0158] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The genome of the malarial parasite Plasmodium falciparum is extremely AT rich. This bias toward a low GC content is a characteristic of several, but not all, species within the genus Plasmodium. We compared 4283 orthologous pairs of protein-coding sequences between Plasmodium falciparum and the less AT-biased Plasmodium vivax. Our results indicate that the common ancestor of these two species was also extremely AT rich. This means that, although there was a strong bias toward A+T during the early evolution of the ancestral Plasmodium lineage, there was a subsequent reversal of this trend during the more recent evolution of some species, such as P. vivax. Moreover, we show that not only is the P. vivax genome losing its AT richness, it is actually gaining a very significant degree of GC richness. This example illustrates the potential volatility of nucleotide content during the course of molecular evolution. Such reversible fluxes in nucleotide content within lineages could have important implications for phylogenetic reconstruction based on molecular sequence data.
Collapse
Affiliation(s)
- Hamid Nikbakht
- a Department of Biology, Concordia University, Montreal, QC H4B 1R6, Canada
| | | | | |
Collapse
|
5
|
Mutational bias plays an important role in shaping longevity-related amino acid content in mammalian mtDNA-encoded proteins. J Mol Evol 2012; 74:332-41. [PMID: 22752047 DOI: 10.1007/s00239-012-9510-7] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2012] [Accepted: 06/12/2012] [Indexed: 10/28/2022]
Abstract
During the course of evolution, amino acid shifts might have resulted in mitochondrial proteomes better endowed to resist oxidative stress. However, owing to the problem of distinguishing between functional constraints/adaptations in protein sequences and mutation-driven biases in the composition of these sequences, the adaptive value of such amino acid shifts remains under discussion. We have analyzed the coding sequences of mtDNA from 173 mammalian species, dissecting the effect of nucleotide composition on amino acid usages. We found remarkable cysteine avoidance in mtDNA-encoded proteins. However, no effect of longevity on cysteine content could be detected. On the other hand, nucleotide compositional shifts fully accounted for threonine usages. In spite of a strong effect of mutational bias on methionine abundances, our results suggest a role of selection in determining the composition of methionine. Whether this selective effect is linked or not to protection against oxidative stress is still a subject of debate.
Collapse
|
6
|
Radomski JP, Slonimski PP. Alignment free characterization of the influenza-A hemagglutinin genes by the ISSCOR method. C R Biol 2012; 335:180-93. [PMID: 22464426 DOI: 10.1016/j.crvi.2012.01.001] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2010] [Revised: 10/26/2011] [Accepted: 01/11/2012] [Indexed: 12/23/2022]
Abstract
Analyses and visualizations by the ISSCOR method of the influenza virus hemagglutinin genes of three different A-subtypes revealed some rather striking temporal (for A/H3N3), and spatial relationships (for A/H5N1) between groups of individual gene subsets. The application to the A/H1N1 set revealed also relationships between the seasonal H1, and the swine-like novel 2009 H1v variants in a quick and unambiguous manner. Based on these examples we consider the application of the ISSCOR method for analysis of large sets of homologous genes as a worthwhile addition to a toolbox of genomics-it allows a rapid diagnostics of trends, and possibly can even aid an early warning of newly emerging epidemiological threats.
Collapse
Affiliation(s)
- Jan P Radomski
- Interdisciplinary Center for Mathematical and Computational Modeling, Warsaw University, Warsaw, Poland.
| | | |
Collapse
|
7
|
Radomski JP, Slonimski PP. ISSCOR: Intragenic, Stochastic Synonymous Codon Occurrence Replacement--a new method for an alignment-free genome sequence analysis. C R Biol 2009; 332:336-50. [PMID: 19304264 DOI: 10.1016/j.crvi.2008.11.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2008] [Revised: 11/06/2008] [Accepted: 12/02/2008] [Indexed: 11/17/2022]
Abstract
Synonymous codons do not occur at equal frequencies. Codon usage and codon bias have been extensively studied. However, the sequential order in which synonymous codons appear within a gene has not been studied until now. Here we describe an in silico method, which is the first attempt to tackle this problem: to what extent this sequential order is unique, and to what extent the succession of synonymous codons is important. This method, which we called Intragenic, Stochastic Synonymous Codon Occurrence Replacement (ISSCOR), generates, by a Monte Carlo approach, a set of genes which code for the same amino acid sequence, and display the same codon usage, but have random permutations of the synonymous codons, and therefore different sequential codon orders from the original gene. We analyze the complete genome of the bacterium Helicobacter pylori (containing 1574 protein coding genes), and show by various, alignment-free computational methods (e.g., frequency distribution of codon-pairs, as well as that of nucleotide bigrams in codon-pairs), that: (i) not only the succession of adjacent synonymous codons is far from random, but also, which is totally unexpected, the occurrences of non-adjacent synonymous codon-pairs are highly constrained, at strikingly long distances of dozens of nucleotides; (ii) the statistical deviations from the random synonymous codon order are overwhelming; and (iii) the pattern of nucleotide bigrams in codon-pairs can be used in a novel way for characterizing and comparing genes and genomes. Our results demonstrate that the sequential order of synonymous codons within a gene must be under a strong selective pressure, which is superimposed on the classical codon usage. This new dimension can be measured by the ISSCOR method, which is simple, robust, and should be useful for comparative and functional genomics.
Collapse
Affiliation(s)
- Jan P Radomski
- Interdisciplinary Center for Mathematical and Computational Modeling, Warsaw University, Pawińskiego 5A, Bldg. D, 02106 Warsaw, Poland.
| | | |
Collapse
|
8
|
Gatherer D. Evolution of the G+C Content Frontier in the Rat Cytomegalovirus Genome. Virology (Auckl) 2008. [DOI: 10.4137/vrt.s1023] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Within the 230138 bp of the rat cytomegalovirus (RCMV) genome, the G+C content changes abruptly at position 142644, constituting a G+C content frontier. To the left of this point, overall G+C content is 69.2%, and to the right it is only 47.6%. A region of extremely low G+C content (33.8%) is found in the 5 kb immediately to the right of the frontier, in which there are no predicted coding sequences. To the right of position 147501, the G+C content rises and predicted coding sequences reappear. However, these genes are much shorter (average 848 bp, 50% G+C) than those in the left two-thirds of the genome (average 1462 bp, 70% G+C). Whole genome alignment of several viruses indicates that the initial ultra-low G+C region appeared in the common ancestor of the genera Cytomegalovirus and Muromegalovirus, and that the lowering of G+C in the right third has been a subsequent process in the lineage leading to RCMV. The left two-thirds of RCMV has stop codon occurrences at 67.5% of their expected level, based on a modified Markov chain model of stop codon distribution, and the corresponding figure for the right third is 78%. Therefore, despite heavy mutation pressure, selective constraint has operated in the right third of the RCMV genome to maintain a degree of gene length unusual for such low G+C sequences.
Collapse
Affiliation(s)
- Derek Gatherer
- MRC Virology Unit, Institute of Virology, University of Glasgow, Church Street, Glasgow, G11 5JR, U.K
| |
Collapse
|
9
|
The unique genomic properties of sex-biased genes: insights from avian microarray data. BMC Genomics 2008; 9:148. [PMID: 18377635 PMCID: PMC2294128 DOI: 10.1186/1471-2164-9-148] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2007] [Accepted: 03/31/2008] [Indexed: 02/07/2023] Open
Abstract
Background In order to develop a framework for the analysis of sex-biased genes, we present a characterization of microarray data comparing male and female gene expression in 18 day chicken embryos for brain, gonad, and heart tissue. Results From the 15982 significantly expressed coding regions that have been assigned to either the autosomes or the Z chromosome (12979 in brain, 13301 in gonad, and 12372 in heart), roughly 18% were significantly sex-biased in any one tissue, though only 4 gene targets were biased in all tissues. The gonad was the most sex-biased tissue, followed by the brain. Sex-biased autosomal genes tended to be expressed at lower levels and in fewer tissues than unbiased gene targets, and autosomal somatic sex-biased genes had more expression noise than similar unbiased genes. Sex-biased genes linked to the Z-chromosome showed reduced expression in females, but not in males, when compared to unbiased Z-linked genes, and sex-biased Z-linked genes were also expressed in fewer tissues than unbiased Z coding regions. Third position GC content, and codon usage bias showed some sex-biased effects, primarily for autosomal genes expressed in the gonad. Finally, there were several over-represented Gene Ontology terms in the sex-biased gene sets. Conclusion On the whole, this analysis suggests that sex-biased genes have unique genomic and organismal properties that delineate them from genes that are expressed equally in males and females.
Collapse
|
10
|
On the origin of synonymous codon usage divergence between thermophilic and mesophilic prokaryotes. FEBS Lett 2007; 581:5825-30. [DOI: 10.1016/j.febslet.2007.11.054] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2007] [Revised: 11/14/2007] [Accepted: 11/16/2007] [Indexed: 01/24/2023]
|
11
|
Bharanidharan D, Bhargavi GR, Uthanumallian K, Gautham N. Correlations between nucleotide frequencies and amino acid composition in 115 bacterial species. Biochem Biophys Res Commun 2004; 315:1097-103. [PMID: 14985126 DOI: 10.1016/j.bbrc.2004.01.129] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2004] [Indexed: 11/21/2022]
Abstract
We studied the correlations between amino acid composition and mononucleotide and dinucleotide frequencies in 115 bacterial genomes of varying G+C content. Observed amino acid frequencies were compared with those expected from the actual mononucleotide and dinucleotide frequencies. Both mononucleotide and dinucleotide frequencies correlate well with the amino acid frequency, with dinucleotide frequencies doing so better. Despite the strong correlations, some of the observed amino acid frequencies, in particular for Arg, Val, Asp, Glu, Ser, and Cys, were consistently different from predicted values in all genomes. We suggest that this variation from predicted values is a consequence of selection pressure at the level of amino acids, while the close correspondence to the predictions in residues such as Thr, Phe, Lys, and Asn arises only from mutation and selection pressure at the level of the nucleic acid sequences.
Collapse
Affiliation(s)
- D Bharanidharan
- Department of Crystallography and Biophysics, University of Madras, Guindy Campus, Chennai 600 025, India
| | | | | | | |
Collapse
|
12
|
Chen SL, Lee W, Hottes AK, Shapiro L, McAdams HH. Codon usage between genomes is constrained by genome-wide mutational processes. Proc Natl Acad Sci U S A 2004; 101:3480-5. [PMID: 14990797 PMCID: PMC373487 DOI: 10.1073/pnas.0307827100] [Citation(s) in RCA: 247] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Analysis of genome-wide codon bias shows that only two parameters effectively differentiate the genome-wide codon bias of 100 eubacterial and archaeal organisms. The first parameter correlates with genome GC content, and the second parameter correlates with context-dependent nucleotide bias. Both of these parameters may be calculated from intergenic sequences. Therefore, genome-wide codon bias in eubacteria and archaea may be predicted from intergenic sequences that are not translated. When these two parameters are calculated for genes from nonmammalian eukaryotic organisms, genes from the same organism again have similar values, and genome-wide codon bias may also be predicted from intergenic sequences. In mammals, genes from the same organism are similar only in the second parameter, because GC content varies widely among isochores. Our results suggest that, in general, genome-wide codon bias is determined primarily by mutational processes that act throughout the genome, and only secondarily by selective forces acting on translated sequences.
Collapse
Affiliation(s)
- Swaine L Chen
- Department of Developmental Biology, Stanford University School of Medicine, Beckman Center, B300, Stanford, CA 94304, USA.
| | | | | | | | | |
Collapse
|
13
|
Berkhout B, Grigoriev A, Bakker M, Lukashov VV. Codon and amino acid usage in retroviral genomes is consistent with virus-specific nucleotide pressure. AIDS Res Hum Retroviruses 2002; 18:133-41. [PMID: 11839146 DOI: 10.1089/08892220252779674] [Citation(s) in RCA: 63] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Retroviral RNA genomes are known to have a biased nucleotide composition. For instance, the plus-strand RNA of human immunodeficiency virus (HIV) is A-rich, and the genome of human T cell leukemia virus (HTLV) is C-rich, and other retroviruses have a U-rich or G-rich genome. The biased composition of these genomes is most likely caused by directional mutational pressure of the respective reverse transcriptase enzymes. Using a set of retroviral genomes with a distinct nucleotide composition, we performed skew analyses of the nucleotide bias along the complete viral genome. Distinct nucleotide signatures were apparent, and these typical patterns were generally conserved across the viral genome. Furthermore, it is demonstrated that this typical nucleotide bias, combined with a profound discrimination against the CpG dinucleotide sequence, strongly influences the codon usage of the retroviruses in a direct manner, and their amino acid usage in an indirect manner. The fact that both codon usage and amino acid usage are so closely entwined with the genome composition has important practical implications. For instance, the typical trends in nucleotide usage could influence the molecular phylogenetic reconstruction of the family Retroviridae.
Collapse
Affiliation(s)
- Ben Berkhout
- Department of Human Retrovirology, Academic Medical Center, University of Amsterdam, 1105 AZ Amsterdam, The Netherlands.
| | | | | | | |
Collapse
|
14
|
Radomski JP, Slonimski PP. Genomic style of proteins: concepts, methods and analyses of ribosomal proteins from 16 microbial species. FEMS Microbiol Rev 2001; 25:425-35. [PMID: 11524132 DOI: 10.1111/j.1574-6976.2001.tb00585.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
We have introduced the concept of genomic 'style' of proteins. By style we understand those properties of a large set of proteins which are specific to the genome of one species (species primary-self) and different from the genome of another species (species contrasted-self). To characterise the style, we took advantage of the frequencies of amino acids and dipeptides present in non-identical segments of the complete set of orthologous ribosomal proteins encoded by 16 microbial species. We confirm the dependence of the overall amino acid composition on the genomic (G+C) content, and introduce a rectification procedure making it possible to extricate appropriate species-specific characteristics, which are no longer related to this content. The rectified frequencies are used to calculate inter-species distance matrices, and to build genomic evolutionary trees. Remarkably, the phylograms derived from the frequencies of non-identical residues in proteins closely resemble the classical phylograms based upon the conservation of identical residues in ribosomal RNAs. We believe that the concept of genomic style of proteins can be a useful tool for the study of evolution.
Collapse
Affiliation(s)
- J P Radomski
- Intersdisplinary Centre for Mathematical and Computational Modelling, Warsaw University, PL-02-106 Warsaw, Poland.
| | | |
Collapse
|
15
|
Abstract
The human genome is described in the literature as being composed of the isochores, i.e., long (hundreds of kilobases) segments with a homogeneous (G + C) content. We calculated the (G + C) content variations along the DNA molecules of the human chromosomes 21 and 22 and found the variations to be higher everywhere compared to the randomized sequences. Hence the (G + C) content is certainly not homogeneous on the isochore scale in the two human chromosomes. In addition, we found no significant difference between the two human molecules and the genome of E. coli regarding the (G + C) content variations. Hence no isochores are either present in the DNA molecules of the human chromosomes 21 and 22, or the isochores are also present in the genome of Escherichia coli. In any case, the present communication demonstrates that the isochores should be defined in unambiguous molecular terms if they are to be used for an up-to-date genome structure characterization.
Collapse
Affiliation(s)
- D Häring
- Institute of Biophysics, Academy of Sciences of the Czech Republic, Královopolská 135, Brno, CZ-61265, Czech Republic
| | | |
Collapse
|
16
|
Knight RD, Freeland SJ, Landweber LF. A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol 2001; 2:RESEARCH0010. [PMID: 11305938 PMCID: PMC31479 DOI: 10.1186/gb-2001-2-4-research0010] [Citation(s) in RCA: 206] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2000] [Revised: 02/01/2001] [Accepted: 02/13/2001] [Indexed: 11/28/2022] Open
Abstract
BACKGROUND Correlations between genome composition (in terms of GC content) and usage of particular codons and amino acids have been widely reported, but poorly explained. We show here that a simple model of processes acting at the nucleotide level explains codon usage across a large sample of species (311 bacteria, 28 archaea and 257 eukaryotes). The model quantitatively predicts responses (slope and intercept of the regression line on genome GC content) of individual codons and amino acids to genome composition. RESULTS Codons respond to genome composition on the basis of their GC content relative to their synonyms (explaining 71-87% of the variance in response among the different codons, depending on measure). Amino-acid responses are determined by the mean GC content of their codons (explaining 71-79% of the variance). Similar trends hold for genes within a genome. Position-dependent selection for error minimization explains why individual bases respond differently to directional mutation pressure. CONCLUSIONS Our model suggests that GC content drives codon usage (rather than the converse). It unifies a large body of empirical evidence concerning relationships between GC content and amino-acid or codon usage in disparate systems. The relationship between GC content and codon and amino-acid usage is ahistorical; it is replicated independently in the three domains of living organisms, reinforcing the idea that genes and genomes at mutation/selection equilibrium reproduce a unique relationship between nucleic acid and protein composition. Thus, the model may be useful in predicting amino-acid or nucleotide sequences in poorly characterized taxa.
Collapse
Affiliation(s)
- Robin D Knight
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| | - Stephen J Freeland
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| | - Laura F Landweber
- Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
17
|
Singer GA, Hickey DA. Nucleotide bias causes a genomewide bias in the amino acid composition of proteins. Mol Biol Evol 2000; 17:1581-8. [PMID: 11070046 DOI: 10.1093/oxfordjournals.molbev.a026257] [Citation(s) in RCA: 184] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We analyzed the nucleotide contents of several completely sequenced genomes, and we show that nucleotide bias can have a dramatic effect on the amino acid composition of the encoded proteins. By surveying the genes in 21 completely sequenced eubacterial and archaeal genomes, along with the entire Saccharomyces cerevisiae genome and two Plasmodium falciparum chromosomes, we show that biased DNA encodes biased proteins on a genomewide scale. The predicted bias affects virtually all genes within the genome, and it could be clearly seen even when we limited the analysis to sets of homologous gene sequences. Parallel patterns of compositional bias were found within the archaea and the eubacteria. We also found a positive correlation between the degree of amino acid bias and the magnitude of protein sequence divergence. We conclude that mutational bias can have a major effect on the molecular evolution of proteins. These results could have important implications for the interpretation of protein-based molecular phylogenies and for the inference of functional protein adaptation from comparative sequence data.
Collapse
Affiliation(s)
- G A Singer
- Department of Biology, University of Ottawa, Ottawa, Ontario, Canada
| | | |
Collapse
|
18
|
Rodríguez-Trelles F, Tarrío R, Ayala FJ. Switch in codon bias and increased rates of amino acid substitution in the Drosophila saltans species group. Genetics 1999; 153:339-50. [PMID: 10471717 PMCID: PMC1460741 DOI: 10.1093/genetics/153.1.339] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We investigated the nucleotide composition of five genes, Xdh, Adh, Sod, Per, and 28SrRNA, in nine species of Drosophila (subgenus Sophophora) and one of Scaptodrosophila. The six species of the Drosophila saltans group markedly differ from the others in GC content and codon use bias. The GC content in the third codon position, and to a lesser extent in the first position and the introns, is higher in the D. melanogaster and D. obscura groups than in the D. saltans group (in Scaptodrosophila it is intermediate but closer to the melanogaster and obscura species). Differences are greater for Xdh than for Adh, Sod, Per, and 28SrRNA, which are functionally more constrained. We infer that rapid evolution of GC content in the saltans lineage is largely due to a shift in mutation pressure, which may have been associated with diminished natural selection due to smaller effective population numbers rather than reduced recombination rates. The rate of GC content evolution impacts the rate of protein evolution and may distort phylogenetic inferences. Previous observations suggesting that GC content evolution is very limited in Drosophila may have been distorted due to the restricted number of genes and species (mostly D. melanogaster) investigated.
Collapse
Affiliation(s)
- F Rodríguez-Trelles
- Department of Ecology and Evolutionary Biology, University of California, Irvine, California 92697-2525, USA.
| | | | | |
Collapse
|
19
|
Lobry JR. Influence of genomic G+C content on average amino-acid composition of proteins from 59 bacterial species. Gene X 1997; 205:309-16. [PMID: 9461405 DOI: 10.1016/s0378-1119(97)00403-4] [Citation(s) in RCA: 106] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
The amino-acid composition of 23,490 proteins from 59 bacterial species was analyzed as a function of genomic G+C content. Observed amino-acid frequencies were compared with those expected from a neutral model assuming the absence of selection on average protein composition. Integral membrane proteins and non-integral membrane proteins were analyzed separately. The average deviation from this neutral model shows that there is a selective pressure increasing content in charged amino acids for non-integral membrane proteins, and content in hydrophobic amino acids for integral membrane proteins. Amino-acid frequencies were greatly influenced by genomic G+C content, but the influence was found to be often weaker than predicted. This may be evidence for a selective pressure, maintaining most amino-acid frequencies close to an optimal value. Concordance between the genetic code and protein composition is discussed in the light of this observation.
Collapse
Affiliation(s)
- J R Lobry
- CNRS UMR 5558-Laboratoire BGBP, Université Claude Bernard, Villeurbanne, France.
| |
Collapse
|
20
|
Jermiin LS, Foster PG, Graur D, Lowe RM, Crozier RH. Unbiased estimation of symmetrical directional mutation pressure from protein-coding DNA. J Mol Evol 1996; 42:476-80. [PMID: 8642618 DOI: 10.1007/bf02498643] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The most generally applicable procedure for obtaining estimates of the symmetrical, or strandnonspecific, directional mutation pressure (microD) on protein-coding DNA sequences is to determine the G+C content at synonymous codon sites (Psyn), and to divide Psyn by twice the arithmetic mean of the G+C content at synonymous codon sites of a large number of randomly generated, synonymously coding DNA sequences (Psyn). Unfortunately, the original procedure yields biased estimates of Psyn and microD and is computationally expensive. We here present a fast procedure for estimating unbiased microD values. The procedure employs direct calculation of Psyn (approximately Psyn) and two normalization procedures, one for Psyn < or = Psyn and another for Psyn > or = Psyn. The normalization removes a bias sometimes caused by codons specifying arginine, asparagine, isoleucine, and leucine. Consequently, comparison of protein-coding genes that are translated using different genetic codes is facilitated.
Collapse
|
21
|
Fitzgerald DJ, Bronson EC, Anderson JN. Compositional similarities between the human immunodeficiency virus and surface antigens of pathogens. AIDS Res Hum Retroviruses 1996; 12:99-106. [PMID: 8834459 DOI: 10.1089/aid.1996.12.99] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
The genome of the human immunodeficiency virus (HIV) is rich in A but not U and deficient in C but not G. This asymmetric nucleotide bias is the major factor in determining the unusual composition of HIV proteins. In this report, we have identified the cellular genes in the GenBank database that are compositionally similar to HIV in order to further understand the significance of the nucleotide bias of the viral genome. A total of 101 genes in the bacterial and invertebrate subdivisions of the database were found to have a base composition that is similar to the composition of the HIV genome. The identified cellular sequences represent a discrete subset of the database since 81 of the 101 entries code for antigens from pathogens and nearly all of these organisms infect humans. The amino acid compositions of these surface antigens are also similar to the unusual composition of HIV proteins, which are deficient in proline and rich in lysine and other polar residues encoded by A-rich codons. The similarities between the HIV proteins and the immunodominant antigens from other pathogens may indicate a common pathogenic strategy for the promotion of immune dysregulation.
Collapse
Affiliation(s)
- D J Fitzgerald
- Department of Biological Sciences, Purdue University, West Lafayette, Indiana 47907, USA
| | | | | |
Collapse
|
22
|
Xue H, Wong JT. Interferon induction of human tryptophanyl-tRNA synthetase safeguards the synthesis of tryptophan-rich immune-system proteins: a hypothesis. Gene 1995; 165:335-9. [PMID: 8522205 DOI: 10.1016/0378-1119(95)00550-p] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Ever since the discovery that the human tryptophanyl-tRNA synthetase (TrpRS)-encoding gene is induced by interferon (IFN) [J. Fleckner et al., Proc. Natl. Acad. Sci. USA 88 (1991) 11520-11524] and contains IFN-response regulatory elements [Frolova et al., Gene 128 (1993) 237-245], the biological rationale for this induction has remained unresolved. A survey of immune system proteins in this study reveals that the human major histocompatibility complex (MHC) antigens, beta-2-microglobulin (beta MG) and complement factor B, which are known to be induced by IFN, together with immunoglobulins (Ig) are all exceptionally enriched in Trp residues, as compared to human proteins in general. It also reveals the conservation of a sequence motif, CX10-17 WX26-62C, in Ig domains. The conservation of this sequence motif and the utility of Trp residues within antigen-binding sites clearly contribute to the Trp enrichment in Ig. These observations suggest a biological rationale for the induction of TrpRS by IFN in safeguarding Trp incorporation for the IFN-enhanced synthesis of immunological molecules.
Collapse
|
23
|
Andersson SG, Kurland CG. Genomic evolution drives the evolution of the translation system. Biochem Cell Biol 1995; 73:775-87. [PMID: 8721994 DOI: 10.1139/o95-086] [Citation(s) in RCA: 68] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
Our thesis is that the characteristics of the translational machinery and its organization are selected in part by evolutionary pressure on genomic traits have nothing to do with translation per se. These genomic traits include size, composition, and architecture. To illustrate this point, we draw parallels between the structure of different genomes that have adapted to intracellular niches independently of each other. Our starting point is the general observation that the evolutionary history of organellar and parasitic bacteria have favored bantam genomes. Furthermore, we suggest that the constraints of the reductive mode of genomic evolution account for the divergence of the genetic code in mitochondria and the genetic organization of the translational system observed in parasitic bacteria. In particular, we associate codon reassignments in animal mitochondria with greatly simplified tRNA populations. Likewise, we relate the organization of translational genes in the obligate intracellular parasite Rickettsia prowazekii to the processes supporting the reductive mode of genomic evolution. Such findings provide strong support for the hypothesis that genomes of organelles and of parasitic bacteria have arisen from the much larger genomes of ancestral bacteria that have been reduced by intrachromosomal recombination and deletion events. A consequence of the reductive mode of genomic evolution is that the resulting translation systems may deviate markedly from conventional systems.
Collapse
Affiliation(s)
- S G Andersson
- Department of Molecular Biology, Uppsala University, Sweden
| | | |
Collapse
|
24
|
Divergence of the phytochrome gene family predates angiosperm evolution and suggests thatSelaginella andEquisetum arose prior toPsilotum. J Mol Evol 1995. [DOI: 10.1007/bf01215179] [Citation(s) in RCA: 28] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
25
|
|
26
|
Abstract
The rates and patterns of evolution at silent sites in codons reveal much about the basic features of molecular evolution. Recent increases in the amount of sequence data available for various species and more precise knowledge of the chromosomal locations of those sequences, coming in particular from genome projects, reveal that some features of molecular evolution vary around the genome.
Collapse
Affiliation(s)
- P M Sharp
- Department of Genetics, University of Nottingham, Queens Medical Centre, UK
| | | |
Collapse
|
27
|
Abstract
Proteins, on binding to a DNA sequence, alter the frequency and quality of mutations that occur in the sequence. This represents a reverse flow of information from proteins to DNA. Nucleosome binding causes patterns of UV-induced damage which, when converted to mutations by replication, will phase nucleosomes. We propose that DNA binding proteins create their own high- or low-affinity binding sites along DNA sequences by biased mutational pressure.
Collapse
Affiliation(s)
- G P Holmquist
- Department of Biology, Beckman Research Institute, City of Hope Medical Center, Duarte, CA 91010
| |
Collapse
|
28
|
Brown CM, Stockwell PA, Dalphin ME, Tate WP. The translational termination signal database (TransTerm) now also includes initiation contexts. Nucleic Acids Res 1994; 22:3620-4. [PMID: 7937070 PMCID: PMC308332 DOI: 10.1093/nar/22.17.3620] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
The TransTerm database of termination codon contexts has been extended to include sense codon usage, and initiation codon contexts. The database was constructed from 23,721 coding sequences from 93 organisms. The database contains: a) the sequence around the termination codon (-10, +10); b) the sequence around the initiation codon (-20, +10); c) the length, 'G+C%' of the third position of codons (GC3), the 'codon adaptation index' (CAI) and the 'effective number of codons' statistic (Nc); d) summary tables for each organism including total codon usage, stop codon and tetranucleotide stop-signal usage, and matrices tallying base frequencies at each position around the initiation and termination codons. The data are arranged to facilitate investigation of the relationships between the three phases of protein synthesis. The database is available electronically from EMBL.
Collapse
Affiliation(s)
- C M Brown
- Department of Biochemistry, University of Otago, Dunedin, New Zealand
| | | | | | | |
Collapse
|
29
|
Jermiin LS, Graur D, Lowe RM, Crozier RH. Analysis of directional mutation pressure and nucleotide content in mitochondrial cytochrome b genes. J Mol Evol 1994; 39:160-73. [PMID: 7932780 DOI: 10.1007/bf00163805] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
We present a new approach for analyzing directional mutation pressure and nucleotide content in protein-coding genes. Directional mutation pressure, the heterogenicity in the likelihood of different nucleotide substitutions, is used to explain the increasing or decreasing guanine-cytosine content (GC%) in DNA and is represented by microD, in agreement with Sueoka (1962, Proc Natl Acad Sci USA 48:582-592). The new method uses simulation to facilitate identification of significant A+T or G+C pressure as well as the comparison of directional mutation pressure among genes, even when they are translated by different genetic codes. We use the method to analyze the evolution of directional mutation pressure and nucleotide content of mitochondrial cytochrome b genes. Results from a survey of 110 taxa indicate that the cytochrome b genes of most taxa are subjected to significant directional mutation pressure and that the gene is subject to A+T pressure in most cases. Only in the anseriform bird Cairina moschata is the cytochrome b gene subject to significant G+C pressure. The GC% at nonsynonymous codon sites decreases proportionately with increasing A+T pressure, and with a slope less than one, indicating a presence of selective constraints. The cytochrome b genes of insects, nematodes, and eumycotes are subject to extreme A+T pressures (microD = 0.123, 0.224, and 0.130) and, in parallel, the GC% of the nonsynonymous codon sites has decreased from about 0.44 in organisms that are not subjected to A+T or G+C pressure to about 0.332, 0.323, and 0.367, respectively. The distribution of taxa according to the GC% at nonsynonymous codon sites and directional mutation pressure supports the notion that variation in these parameters is a phylogenetic component.
Collapse
Affiliation(s)
- L S Jermiin
- School of Genetics and Human Variation, La Trobe University, Bundoora, Victoria, Australia
| | | | | | | |
Collapse
|
30
|
Holmquist GP, Filipski J. Organization of mutations along the genome: a prime determinant of genome evolution. Trends Ecol Evol 1994; 9:65-9. [DOI: 10.1016/0169-5347(94)90277-1] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|