1
|
Misawa K, Ootsuki R. A simple method for estimating time-irreversible nucleotide substitution rates in the SARS-CoV-2 genome. NAR Genom Bioinform 2024; 6:lqae009. [PMID: 39678027 PMCID: PMC11640943 DOI: 10.1093/nargab/lqae009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 01/07/2024] [Accepted: 01/17/2024] [Indexed: 12/17/2024] Open
Abstract
SARS-CoV-2 is the cause of the current worldwide pandemic of severe acute respiratory syndrome. The change of nucleotide composition of the SARS-CoV-2 genome is crucial for understanding the spread and transmission dynamics of the virus because viral nucleotide sequences are essential in identifying viral strains. Recent studies have shown that cytosine (C) to uracil (U) substitutions are overrepresented in SARS-CoV-2 genome sequences. These asymmetric substitutions between C and U indicate that traditional time-reversible substitution models cannot be applied to the evolution of SARS-CoV-2 sequences. Thus, we develop a new time-irreversible model of nucleotide substitutions to estimate the substitution rates in SARS-CoV-2 genomes. We investigated the number of nucleotide substitutions among the 7862 genomic sequences of SARS-CoV-2 registered in the Global Initiative on Sharing All Influenza Data (GISAID) that have been sampled from all over the world. Using the new method, the substitution rates in SARS-CoV-2 genomes were estimated. The C-to-U substitution rates of SARS-CoV-2 were estimated to be 1.95 × 10-3 ± 4.88 × 10-4 per site per year, compared with 1.48 × 10-4 ± 7.42 × 10-5 per site per year for all other types of substitutions.
Collapse
Affiliation(s)
- Kazuharu Misawa
- Department of Human Genetics, Yokohama City University Graduate School of
Medicine, 3-9 Fukuura, Kanazawa-ku, Yokohama
236-0004, Japan
- RIKEN Center for Advanced Intelligence Project, 1-4-1
Nihonbashi, Chuo-ku, Tokyo 103-0027,
Japan
| | - Ryo Ootsuki
- Department of Natural Sciences, Faculty of Arts and Sciences,
1-23-1 Komazawa, Setagaya-ku,
Tokyo 154-8525, Japan
- Department of Chemical and Biological Sciences, Faculty of Science, Japan
Women's University, 2-8-1 Mejirodai, Bunkyo-ku, Tokyo 112-8681, Japan
| |
Collapse
|
2
|
Trexler M, Bányai L, Kerekes K, Patthy L. Evolution of termination codons of proteins and the TAG-TGA paradox. Sci Rep 2023; 13:14294. [PMID: 37653005 PMCID: PMC10471768 DOI: 10.1038/s41598-023-41410-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2023] [Accepted: 08/25/2023] [Indexed: 09/02/2023] Open
Abstract
In most eukaryotes and prokaryotes TGA is used at a significantly higher frequency than TAG as termination codon of protein-coding genes. Although this phenomenon has been recognized several years ago, there is no generally accepted explanation for the TAG-TGA paradox. Our analyses of human mutation data revealed that out of the eighteen sense codons that can give rise to a nonsense codon by single base substitution, the CGA codon is exceptional: it gives rise to the TGA stop codon at an order of magnitude higher rate than the other codons. Here we propose that the TAG-TGA paradox is due to methylation and hypermutabilty of CpG dinucleotides. In harmony with this explanation, we show that the coding genomes of organisms with strong CpG methylation have a significant bias for TGA whereas those from organisms that lack CpG methylation use TGA and TAG termination codons with similar probability.
Collapse
Affiliation(s)
- Mária Trexler
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, 1117, Hungary
| | - László Bányai
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, 1117, Hungary
| | - Krisztina Kerekes
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, 1117, Hungary
| | - László Patthy
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, 1117, Hungary.
| |
Collapse
|
3
|
Zou Z, Zhang J. Amino acid exchangeabilities vary across the tree of life. SCIENCE ADVANCES 2019; 5:eaax3124. [PMID: 31840062 PMCID: PMC6892623 DOI: 10.1126/sciadv.aax3124] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 09/24/2019] [Indexed: 05/05/2023]
Abstract
Different amino acid pairs have drastically different relative exchangeabilities (REs), and accounting for this variation is an important and common practice in inferring phylogenies, testing selection, and predicting mutational effects, among other analyses. In all such endeavors, REs have been generally considered invariant among species; this assumption, however, has not been scrutinized. Using maximum likelihood to analyze 180 genome sequences, we estimated REs from 90 clades representing all three domains of life, and found numerous instances of substantial between-clade differences in REs. REs show more differences between orthologous proteins of different clades than unrelated proteins of the same clade, suggesting that REs are genome-wide, clade-specific features, probably a result of proteome-wide evolutionary changes in the physicochemical environments of amino acid residues. The discovery of among-clade RE variations cautions against assuming constant REs in various analyses and demonstrates a higher-than-expected complexity in mechanisms of proteome evolution.
Collapse
|
4
|
Situ AJ, Ulmer TS. Universal principles of membrane protein assembly, composition and evolution. PLoS One 2019; 14:e0221372. [PMID: 31415673 PMCID: PMC6695178 DOI: 10.1371/journal.pone.0221372] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 08/05/2019] [Indexed: 11/18/2022] Open
Abstract
Structural diversity in α-helical membrane proteins (MP) arises from variations in helix-helix crossings and contacts that may bias amino acid usage. Here, we reveal systematic changes in transmembrane amino acid frequencies (f) as a function of the number of helices (n). For eukarya, breaks in f(n) trends of packing (Ala, Gly and Pro), polar, and hydrophobic residues identify different MP assembly principles for 2≤n≤7, 8≤n≤12 and n≥13. In bacteria, the first f break already occurs after n = 6 in correlation to an earlier n peak in MP size distribution and dominance of packing over polar interactions. In contrast to the later n brackets, the integration levels of helix bundles continuously increased in the first, most populous brackets indicating the formation of single structural units (domains). The larger first bracket of eukarya relates to a balance of polar and packing interactions that enlarges helix-helix combinatorial possibilities (MP diversity). Between the evolutionary old, packing and new, polar residues f anti-correlations extend over all biological taxa, broadly ordering them according to evolutionary history and allowing f estimates for the earliest forms of life. Next to evolutionary history, the amino acid composition of MP is determined by size (n), proteome diversity, and effective amino acid cost.
Collapse
Affiliation(s)
- Alan J. Situ
- Department of Physiology and Neuroscience, Keck School of Medicine, University of Southern California, Los Angeles, CA, United States of America
| | - Tobias S. Ulmer
- Department of Physiology and Neuroscience, Keck School of Medicine, University of Southern California, Los Angeles, CA, United States of America
- Department of Biochemistry and Molecular Medicine, Keck School of Medicine, University of Southern California, Los Angeles, CA, United States of America
- * E-mail:
| |
Collapse
|
5
|
Du MZ, Zhang C, Wang H, Liu S, Wei W, Guo FB. The GC Content as a Main Factor Shaping the Amino Acid Usage During Bacterial Evolution Process. Front Microbiol 2018; 9:2948. [PMID: 30581420 PMCID: PMC6292993 DOI: 10.3389/fmicb.2018.02948] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2018] [Accepted: 11/16/2018] [Indexed: 11/13/2022] Open
Abstract
Understanding how proteins evolve is important, and the order of amino acids being recruited into the genetic codons was found to be an important factor shaping the amino acid composition of proteins. The latest work about the last universal common ancestor (LUCA) makes it possible to determine the potential factors shaping amino acid compositions during evolution. Those LUCA genes/proteins from Methanococcus maripaludis S2, which is one of the possible LUCA, were investigated. The evolutionary rates of these genes positively correlate with GC contents with P-value significantly lower than 0.05 for 94% homologous genes. Linear regression results showed that compositions of amino acids coded by GC-rich codons positively contribute to the evolutionary rates, while these amino acids tend to be gained in GC-rich organisms according to our results. The first principal component correlates with the GC content very well. The ratios of amino acids of the LUCA proteins coded by GC rich codons positively correlate with the GC content of different bacteria genomes, while the ratios of amino acids coded by AT rich codons negatively correlate with the increase of GC content of genomes. Next, we found that the recruitment order does correlate with the amino acid compositions, but gain and loss in codons showed newly recruited amino acids are not significantly increased along with the evolution. Thus, we conclude that GC content is a primary factor shaping amino acid compositions. GC content shapes amino acid composition to trade off the cost of amino acids with bases, which could be caused by the energy efficiency.
Collapse
Affiliation(s)
- Meng-Ze Du
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | | | - Huan Wang
- School of Life Sciences, Chongqing University, Chongqing, China
| | - Shuo Liu
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
| | - Wen Wei
- School of Life Sciences, Chongqing University, Chongqing, China
| | - Feng-Biao Guo
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, China
- Centre for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
6
|
Yampolsky LY, Wolf YI, Bouzinier MA. Net Evolutionary Loss of Residue Polarity in Drosophilid Protein Cores Indicates Ongoing Optimization of Amino Acid Composition. Genome Biol Evol 2018; 9:2879-2892. [PMID: 28985302 PMCID: PMC5737390 DOI: 10.1093/gbe/evx191] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/16/2017] [Indexed: 02/07/2023] Open
Abstract
Amino acid frequencies in proteins may not be at equilibrium. We consider two possible explanations for the nonzero net residue fluxes in drosophilid proteins. First, protein interiors may have a suboptimal residue composition and be under a selective pressure favoring stability, that is, leading to the loss of polar (and the gain of large) amino acids. One would then expect stronger net fluxes on the protein interior than at the exposed sites. Alternatively, if most of the polarity loss occurs at the exposed sites and the selective constraint on amino acid composition at such sites decreases over time, net loss of polarity may be neutral and caused by disproportionally high occurrence of polar residues at exposed, least constrained sites. We estimated net evolutionary fluxes of residue polarity and volume at sites with different solvent accessibility in conserved protein families from 12 species of Drosophila. Net loss of polarity, miniscule in magnitude, but consistent across all lineages, occurred at all sites except the most exposed ones, where net flux of polarity was close to zero or, in membrane proteins, even positive. At the intermediate solvent accessibility the net fluxes of polarity and volume were similar to neutral predictions, whereas much of the polarity loss not attributable to neutral expectations occurred at the buried sites. These observations are consistent with the hypothesis that residue composition in many proteins is structurally suboptimal and continues to evolve toward lower polarity in the protein interior, in particular in proteins with intracellular localization. The magnitude of polarity and volume changes was independent from the protein’s evolutionary age, indicating that the approach to equilibrium has been slow or that no such single equilibrium exists.
Collapse
Affiliation(s)
- Lev Y Yampolsky
- Department of Biological Sciences, East Tennessee State University
| | - Yuri I Wolf
- National Center for Biotechnology Information, NIH, Bethesda, Maryland
| | | |
Collapse
|
7
|
Epigenetic Inheritance and Its Role in Evolutionary Biology: Re-Evaluation and New Perspectives. BIOLOGY 2016; 5:biology5020024. [PMID: 27231949 PMCID: PMC4929538 DOI: 10.3390/biology5020024] [Citation(s) in RCA: 98] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/16/2016] [Revised: 04/26/2016] [Accepted: 05/11/2016] [Indexed: 01/08/2023]
Abstract
Epigenetics increasingly occupies a pivotal position in our understanding of inheritance, natural selection and, perhaps, even evolution. A survey of the PubMed database, however, reveals that the great majority (>93%) of epigenetic papers have an intra-, rather than an inter-generational focus, primarily on mechanisms and disease. Approximately ~1% of epigenetic papers even mention the nexus of epigenetics, natural selection and evolution. Yet, when environments are dynamic (e.g., climate change effects), there may be an “epigenetic advantage” to phenotypic switching by epigenetic inheritance, rather than by gene mutation. An epigenetically-inherited trait can arise simultaneously in many individuals, as opposed to a single individual with a gene mutation. Moreover, a transient epigenetically-modified phenotype can be quickly “sunsetted”, with individuals reverting to the original phenotype. Thus, epigenetic phenotype switching is dynamic and temporary and can help bridge periods of environmental stress. Epigenetic inheritance likely contributes to evolution both directly and indirectly. While there is as yet incomplete evidence of direct permanent incorporation of a complex epigenetic phenotype into the genome, doubtlessly, the presence of epigenetic markers and the phenotypes they create (which may sort quite separately from the genotype within a population) will influence natural selection and, so, drive the collective genotype of a population.
Collapse
|
8
|
Bateson P, Gluckman P, Hanson M. The biology of developmental plasticity and the Predictive Adaptive Response hypothesis. J Physiol 2015; 592:2357-68. [PMID: 24882817 DOI: 10.1113/jphysiol.2014.271460] [Citation(s) in RCA: 328] [Impact Index Per Article: 32.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
Many forms of developmental plasticity have been observed and these are usually beneficial to the organism. The Predictive Adaptive Response (PAR) hypothesis refers to a form of developmental plasticity in which cues received in early life influence the development of a phenotype that is normally adapted to the environmental conditions of later life. When the predicted and actual environments differ, the mismatch between the individual's phenotype and the conditions in which it finds itself can have adverse consequences for Darwinian fitness and, later, for health. Numerous examples exist of the long-term effects of cues indicating a threatening environment affecting the subsequent phenotype of the individual organism. Other examples consist of the long-term effects of variations in environment within a normal range, particularly in the individual's nutritional environment. In mammals the cues to developing offspring are often provided by the mother's plane of nutrition, her body composition or stress levels. This hypothetical effect in humans is thought to be important by some scientists and controversial by others. In resolving the conflict, distinctions should be drawn between PARs induced by normative variations in the developmental environment and the ill effects on development of extremes in environment such as a very poor or very rich nutritional environment. Tests to distinguish between different developmental processes impacting on adult characteristics are proposed. Many of the mechanisms underlying developmental plasticity involve molecular epigenetic processes, and their elucidation in the context of PARs and more widely has implications for the revision of classical evolutionary theory.
Collapse
Affiliation(s)
- Patrick Bateson
- Department of Zoology, University of Cambridge, Cambridge, UK
| | - Peter Gluckman
- Liggins Institute, University of Auckland, Auckland, New Zealand
| | - Mark Hanson
- Institute of Developmental Sciences, Faculty of Medicine, University of Southampton and NIHR Nutrition Biomedical Research Centre, Universazity Hospital Southampton, Southampton, UK
| |
Collapse
|
9
|
Mannige RV, Brooks CL, Shakhnovich EI. A universal trend among proteomes indicates an oily last common ancestor. PLoS Comput Biol 2012; 8:e1002839. [PMID: 23300421 PMCID: PMC3531291 DOI: 10.1371/journal.pcbi.1002839] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2012] [Accepted: 10/28/2012] [Indexed: 11/19/2022] Open
Abstract
Despite progresses in ancestral protein sequence reconstruction, much needs to be unraveled about the nature of the putative last common ancestral proteome that served as the prototype of all extant lifeforms. Here, we present data that indicate a steady decline (oil escape) in proteome hydrophobicity over species evolvedness (node number) evident in 272 diverse proteomes, which indicates a highly hydrophobic (oily) last common ancestor (LCA). This trend, obtained from simple considerations (free from sequence reconstruction methods), was corroborated by regression studies within homologous and orthologous protein clusters as well as phylogenetic estimates of the ancestral oil content. While indicating an inherent irreversibility in molecular evolution, oil escape also serves as a rare and universal reaction-coordinate for evolution (reinforcing Darwin's principle of Common Descent), and may prove important in matters such as (i) explaining the emergence of intrinsically disordered proteins, (ii) developing composition- and speciation-based "global" molecular clocks, and (iii) improving the statistical methods for ancestral sequence reconstruction.
Collapse
Affiliation(s)
- Ranjan V Mannige
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America.
| | | | | |
Collapse
|
10
|
Misawa K, Tajima F. New weighting methods for phylogenetic tree reconstruction using multiple loci. J Mol Evol 2012; 75:1-10. [PMID: 22871951 PMCID: PMC3480593 DOI: 10.1007/s00239-012-9513-4] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Accepted: 07/13/2012] [Indexed: 11/24/2022]
Abstract
Efficient determination of evolutionary distances is important for the correct reconstruction of phylogenetic trees. The performance of the pooled distance required for reconstructing a phylogenetic tree can be improved by applying large weights to appropriate distances for reconstructing phylogenetic trees and small weights to inappropriate distances. We developed two weighting methods, the modified Tajima–Takezaki method and the modified least-squares method, for reconstructing phylogenetic trees from multiple loci. By computer simulations, we found that both of the new methods were more efficient in reconstructing correct topologies than the no-weight method. Hence, we reconstructed hominoid phylogenetic trees from mitochondrial DNA using our new methods, and found that the levels of bootstrap support were significantly increased by the modified Tajima–Takezaki and by the modified least-squares method.
Collapse
Affiliation(s)
- Kazuharu Misawa
- Research Program for Computational Science, Research and Development Group for Next-generation Integrated Living Matter Simulation, Fusion of Data and Analysis Research and Development Team, RIKEN, 1-7-22 Suehiro-cho, Tsurumi, Yokohama 230-0045, Japan.
| | | |
Collapse
|
11
|
The impact of the organism on its descendants. GENETICS RESEARCH INTERNATIONAL 2011; 2012:640612. [PMID: 22567396 PMCID: PMC3335618 DOI: 10.1155/2012/640612] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 07/07/2011] [Revised: 09/30/2011] [Accepted: 10/24/2011] [Indexed: 11/18/2022]
Abstract
Historically, evolutionary biologists have taken the view that an understanding of development is irrelevant to theories of evolution. However, the integration of several disciplines in recent years suggests that this position is wrong. The capacity of the organism to adapt to challenges from the environment can set up conditions that affect the subsequent evolution of its descendants. Moreover, molecular events arising from epigenetic processes can be transmitted from one generation to the next and influence genetic mutation. This in turn can facilitate evolution in the conditions in which epigenetic change was first initiated.
Collapse
|
12
|
Misawa K. A codon substitution model that incorporates the effect of the GC contents, the gene density and the density of CpG islands of human chromosomes. BMC Genomics 2011; 12:397. [PMID: 21819607 PMCID: PMC3169530 DOI: 10.1186/1471-2164-12-397] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2011] [Accepted: 08/06/2011] [Indexed: 11/16/2022] Open
Abstract
Background Developing a model for codon substitutions is essential for the analyses of protein sequences. Recent studies on the mutation rates in the non-coding regions have shown that CpG mutation rates in the human genome are negatively correlated to the local GC content and to the densities of functional elements. This study aimed at understanding the effect of genomic features, namely, GC content, gene density, and frequency of CpG islands, on the rates of codon substitution in human chromosomes. Results Codon substitution rates of CpG to TpG mutations, TpG to CpG mutations, and non-CpG transitions and transversions in humans were estimated by comparing the coding regions of thousands of human and chimpanzee genes and inferring their ancestral sequences by using macaque genes as the outgroup. Since the genomic features are depending on each other, partial regression coefficients of these features were obtained. Conclusion The substitution rates of codons depend on gene densities of the chromosomes. Transcription-associated mutation is one such pressure. On the basis of these results, a model of codon substitutions that incorporates the effect of genomic features on codon substitution in human chromosomes was developed.
Collapse
Affiliation(s)
- Kazuharu Misawa
- Research Program for Computational Science, Research and Development Group for Next-Generation Integrated Living Matter Simulation, Fusion of Data and Analysis Research and Development Team, 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama City, Kanagawa 230-0045, Japan.
| |
Collapse
|
13
|
Misawa K, Kikuno RF. Relationship between amino acid composition and gene expression in the mouse genome. BMC Res Notes 2011; 4:20. [PMID: 21272306 PMCID: PMC3038927 DOI: 10.1186/1756-0500-4-20] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2010] [Accepted: 01/27/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Codon bias is a phenomenon that refers to the differences in the frequencies of synonymous codons among different genes. In many organisms, natural selection is considered to be a cause of codon bias because codon usage in highly expressed genes is biased toward optimal codons. Methods have previously been developed to predict the expression level of genes from their nucleotide sequences, which is based on the observation that synonymous codon usage shows an overall bias toward a few codons called major codons. However, the relationship between codon bias and gene expression level, as proposed by the translation-selection model, is less evident in mammals. FINDINGS We investigated the correlations between the expression levels of 1,182 mouse genes and amino acid composition, as well as between gene expression and codon preference. We found that a weak but significant correlation exists between gene expression levels and amino acid composition in mouse. In total, less than 10% of variation of expression levels is explained by amino acid components. We found the effect of codon preference on gene expression was weaker than the effect of amino acid composition, because no significant correlations were observed with respect to codon preference. CONCLUSION These results suggest that it is difficult to predict expression level from amino acid components or from codon bias in mouse.
Collapse
Affiliation(s)
- Kazuharu Misawa
- Research Program for Computational Science, Research and Development Group for Next-Generation Integrated Living Matter Simulation, Fusion of Data and Analysis Research and Development Team, RIKEN, 4-6-1 Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan.
| | | |
Collapse
|
14
|
Yampolsky LY, Bouzinier MA. Evolutionary patterns of amino acid substitutions in 12 Drosophila genomes. BMC Genomics 2010; 11 Suppl 4:S10. [PMID: 21143793 PMCID: PMC3005911 DOI: 10.1186/1471-2164-11-s4-s10] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Harnessing vast amounts of genomic data in phylogenetic context stemming from massive sequencing of multiple closely related genomes requires new tools and approaches. We present a tool for the genome-wide analysis of frequencies and patterns of amino acid substitutions in multiple alignments of genes' coding regions, and a database of amino acid substitutions in the phylogeny of 12 Drosophila genomes. We illustrate the use of these resources to address three types of evolutionary genomics questions: about fluxes in amino acid composition in proteins, about asymmetries in amino acid substitutions and about patterns of molecular evolution in duplicated genes. RESULTS We demonstrate that amino acid composition of Drosophila proteins underwent a significant shift over the last 70 million years encompassed by the studied phylogeny, with less common amino acids (Cys, Met, His) increasing in frequency and more common ones (Ala, Leu, Glu) becoming less frequent. These fluxes are strongly correlated with polarity of source and destination amino acids, resulting in overall systematic decrease of mean polarity of amino acids found in Drosophila proteins. Frequency and radicality of amino acid substitutions are higher in paralogs than in orthologous single-copy genes and are higher in gene families with paralogs than in gene families without surviving duplications. Rate and radicality of substitutions, as expected, are negatively correlated with overall level and uniformity of gene expression. However, these correlations are not observed for substitutions occurring in duplicated genes, indicating a different selective constraint on the evolution of paralogous sequences. Clades resulting from duplications show a marked asymmetry in rate and radicality of amino acid substitutions, possibly a signal of widespread neofunctionalization. These patterns differ among protein families of different functionality, with genes coding for RNA-binding proteins differing from most other functional groups in terms of amino acid substitution patterns in duplicated and single-copy genes. CONCLUSIONS We demonstrate that deep phylogenetic analysis of amino acid substitutions can reveal interesting genome-wide patterns. Amino acid composition of drosophilid proteins is shaped by fluxes similar to those previously observed in prokaryotic, yeast and mammalian genomes, indicating globally present patterns. Increased frequency and radicality of amino acid substitutions in duplicated genes and the presence of asymmetry of these parameters between paralogous clades indicate widespread neofunctionalization among paralogs as the mechanism of duplication retention.
Collapse
Affiliation(s)
- Lev Y Yampolsky
- Department of Biological sciences, East Tennessee State University, Johnson City, TN 37614, USA.
| | | |
Collapse
|
15
|
Misawa K, Kikuno RF. Evaluation of the effect of CpG hypermutability on human codon substitution. Gene 2008; 431:18-22. [PMID: 19059467 DOI: 10.1016/j.gene.2008.11.006] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2008] [Revised: 10/10/2008] [Accepted: 11/06/2008] [Indexed: 10/21/2022]
Abstract
Understanding the cause underlying the changes in amino acid composition of proteins is essential for understanding protein evolution and function. Accurate models of DNA and protein evolution are essential for studying molecular evolution. Although many models have been developed, most models assume that each site evolves independently and that substitutions are time reversible. In mammals and other organisms, CpG hypermutability is one of the major causes of nucleotide mutations because CpG dinucleotides are often methylated at C, and the methyl-C mutation spontaneously deaminates to yield T about 3 times more rapidly than other types of point mutations. In this study, we evaluate the effect of CpG hypermutability on codon substitution by comparing thousands of coding regions in the human and chimpanzee genomes and by inferring ancestral sequences by using mouse as the outgroup. We found that 14% of synonymous and nonsynonymous substitutions on human genes were caused by CpG hypermutability. Based on these results, we developed a model that incorporates CpG hypermutability as well as the transition/transversion ratio and changes in the chemical properties of amino acids.
Collapse
Affiliation(s)
- Kazuharu Misawa
- Chiba Industry Advancement Center, 2-6 Nakase, Mihama-ku, Chiba 261-7126, Japan.
| | | |
Collapse
|