1
|
Cote-L'Heureux AE, Sterner EG, Maurer-Alcalá XX, Katz LA. Lost in translation: conserved amino acid usage despite extreme codon bias in foraminifera. mBio 2025; 16:e0391624. [PMID: 40042280 PMCID: PMC11980380 DOI: 10.1128/mbio.03916-24] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/20/2025] [Accepted: 02/04/2025] [Indexed: 04/10/2025] Open
Abstract
Analyses of codon usage in eukaryotes suggest that amino acid usage responds to GC pressure so AT-biased substitutions drive higher usage of amino acids with AT-ending codons. Here, we combine single-cell transcriptomics and phylogenomics to explore codon usage patterns in foraminifera, a diverse and ancient clade of predominantly uncultivable microeukaryotes. We curate data from 1,044 gene families in 49 individuals representing 28 genera, generating perhaps the largest existing dataset of data from a predominantly uncultivable clade of protists, to analyze compositional bias and codon usage. We find extreme variation in composition, with a median GC content at fourfold degenerate silent sites below 3% in some species and above 75% in others. The most AT-biased species are distributed among diverse non-monophyletic lineages. Surprisingly, despite the extreme variation in compositional bias, amino acid usage is highly conserved across all foraminifera. By analyzing nucleotide, codon, and amino acid composition within this diverse clade of amoeboid eukaryotes, we expand our knowledge of patterns of genome evolution across the eukaryotic tree of life.IMPORTANCEPatterns of molecular evolution in protein-coding genes reflect trade-offs between substitution biases and selection on both codon and amino acid usage. Most analyses of these factors in microbial eukaryotes focus on model species such as Acanthamoeba, Plasmodium, and yeast, where substitution bias is a primary contributor to patterns of amino acid usage. Foraminifera, an ancient clade of single-celled eukaryotes, present a conundrum, as we find highly conserved amino acid usage underlain by divergent nucleotide composition, including extreme AT-bias at silent sites among multiple non-sister lineages. We speculate that these paradoxical patterns are enabled by the dynamic genome structure of foraminifera, whose life cycles can include genome endoreplication and chromatin extrusion.
Collapse
Affiliation(s)
| | - Elinor G. Sterner
- Department of Biological Sciences, Smith College, Northampton, Massachusetts, USA
| | - Xyrus X. Maurer-Alcalá
- Division of Invertebrate Zoology, Sackler Institute for Comparative Genomics, American Museum of Natural History, New York, New York, USA
| | - Laura A. Katz
- Department of Biological Sciences, Smith College, Northampton, Massachusetts, USA
- Program in Organismic Biology and Evolution, University of Massachusetts Amherst, Amherst, Massachusetts, USA
| |
Collapse
|
2
|
Christmas MJ, Dong MX, Meadows JRS, Kozyrev SV, Lindblad-Toh K. Interpreting mammalian synonymous site conservation in light of the unwanted transcript hypothesis. Nat Commun 2025; 16:2007. [PMID: 40011430 DOI: 10.1038/s41467-025-57179-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2024] [Accepted: 02/12/2025] [Indexed: 02/28/2025] Open
Abstract
Mammalian genomes are biased towards GC bases at third codon positions, likely due to a GC-biased ancestral genome and the selectively neutral recombination-related process of GC-biased gene conversion. The unwanted transcript hypothesis posits that this high GC content at synonymous sites may be beneficial for protecting against spurious transcripts, particularly in species with low effective population sizes. Utilising a 240 placental mammal genome alignment and single-base resolution conservation scores, we interpret sequence conservation at mammalian four-fold degenerate sites in this context and find evidence in support of the unwanted transcript hypothesis, including a strong GC bias, high conservation at sites relating to exon splicing, less human genetic variation at conserved four-fold degenerate sites, and conservation of sites important for epigenetic regulation of developmental genes. Additionally, we show that high conservation of four-fold degenerate sites in essential developmental genes, including homeobox genes, likely relates to the low mutation rates experienced by these genes.
Collapse
Affiliation(s)
- Matthew J Christmas
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.
- SciLifeLab, Uppsala University, Uppsala, Sweden.
| | - Michael X Dong
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Jennifer R S Meadows
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Sergey V Kozyrev
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- SciLifeLab, Uppsala University, Uppsala, Sweden
| | - Kerstin Lindblad-Toh
- Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
- SciLifeLab, Uppsala University, Uppsala, Sweden
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| |
Collapse
|
3
|
Weber CC. Disentangling cobionts and contamination in long-read genomic data using sequence composition. G3 (BETHESDA, MD.) 2024; 14:jkae187. [PMID: 39148415 PMCID: PMC11540323 DOI: 10.1093/g3journal/jkae187] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Revised: 08/02/2024] [Accepted: 08/02/2024] [Indexed: 08/17/2024]
Abstract
The recent acceleration in genome sequencing targeting previously unexplored parts of the tree of life presents computational challenges. Samples collected from the wild often contain sequences from several organisms, including the target, its cobionts, and contaminants. Effective methods are therefore needed to separate sequences. Though advances in sequencing technology make this task easier, it remains difficult to taxonomically assign sequences from eukaryotic taxa that are not well represented in databases. Therefore, reference-based methods alone are insufficient. Here, I examine how we can take advantage of differences in sequence composition between organisms to identify symbionts, parasites, and contaminants in samples, with minimal reliance on reference data. To this end, I explore data from the Darwin Tree of Life project, including hundreds of high-quality HiFi read sets from insects. Visualizing two-dimensional representations of read tetranucleotide composition learned by a variational autoencoder can reveal distinct components of a sample. Annotating the embeddings with additional information, such as coding density, estimated coverage, or taxonomic labels allows rapid assessment of the contents of a dataset. The approach scales to millions of sequences, making it possible to explore unassembled read sets, even for large genomes. Combined with interactive visualization tools, it allows a large fraction of cobionts reported by reference-based screening to be identified. Crucially, it also facilitates retrieving genomes for which suitable reference data are absent.
Collapse
Affiliation(s)
- Claudia C Weber
- Tree of Life, Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton CB10 1SA, UK
| |
Collapse
|
4
|
Gaunt ER, Digard P. Compositional biases in RNA viruses: Causes, consequences and applications. WILEY INTERDISCIPLINARY REVIEWS. RNA 2022; 13:e1679. [PMID: 34155814 PMCID: PMC8420353 DOI: 10.1002/wrna.1679] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/08/2021] [Revised: 05/29/2021] [Accepted: 05/31/2021] [Indexed: 01/05/2023]
Abstract
If each of the four nucleotides were represented equally in the genomes of viruses and the hosts they infect, each base would occur at a frequency of 25%. However, this is not observed in nature. Similarly, the order of nucleotides is not random (e.g., in the human genome, guanine follows cytosine at a frequency of ~0.0125, or a quarter the number of times predicted by random representation). Codon usage and codon order are also nonrandom. Furthermore, nucleotide and codon biases vary between species. Such biases have various drivers, including cellular proteins that recognize specific patterns in nucleic acids, that once triggered, induce mutations or invoke intrinsic or innate immune responses. In this review we examine the types of compositional biases identified in viral genomes and current understanding of the evolutionary mechanisms underpinning these trends. Finally, we consider the potential for large scale synonymous recoding strategies to engineer RNA virus vaccines, including those with pandemic potential, such as influenza A virus and Severe Acute Respiratory Syndrome Coronavirus Virus 2. This article is categorized under: RNA in Disease and Development > RNA in Disease RNA Evolution and Genomics > Computational Analyses of RNA RNA Interactions with Proteins and Other Molecules > Protein-RNA Recognition.
Collapse
Affiliation(s)
- Eleanor R. Gaunt
- Department of Infection and ImmunityThe Roslin Institute, The University of EdinburghEdinburghUK
| | - Paul Digard
- Department of Infection and ImmunityThe Roslin Institute, The University of EdinburghEdinburghUK
| |
Collapse
|
5
|
Jordan-Paiz A, Franco S, Martínez MA. Impact of Synonymous Genome Recoding on the HIV Life Cycle. Front Microbiol 2021; 12:606087. [PMID: 33796084 PMCID: PMC8007914 DOI: 10.3389/fmicb.2021.606087] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2020] [Accepted: 02/25/2021] [Indexed: 12/19/2022] Open
Abstract
Synonymous mutations within protein coding regions introduce changes in DNA or messenger (m) RNA, without mutating the encoded proteins. Synonymous recoding of virus genomes has facilitated the identification of previously unknown virus biological features. Moreover, large-scale synonymous recoding of the genome of human immunodeficiency virus type 1 (HIV-1) has elucidated new antiviral mechanisms within the innate immune response, and has improved our knowledge of new functional virus genome structures, the relevance of codon usage for the temporal regulation of viral gene expression, and HIV-1 mutational robustness and adaptability. Continuous improvements in our understanding of the impacts of synonymous substitutions on virus phenotype - coupled with the decreased cost of chemically synthesizing DNA and improved methods for assembling DNA fragments - have enhanced our ability to identify potential HIV-1 and host factors and other aspects involved in the infection process. In this review, we address how silent mutagenesis impacts HIV-1 phenotype and replication capacity. We also discuss the general potential of synonymous recoding of the HIV-1 genome to elucidate unknown aspects of the virus life cycle, and to identify new therapeutic targets.
Collapse
Affiliation(s)
- Ana Jordan-Paiz
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| | - Sandra Franco
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| | - Miguel Angel Martínez
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| |
Collapse
|
6
|
Martínez MA, Jordan-Paiz A, Franco S, Nevot M. Synonymous genome recoding: a tool to explore microbial biology and new therapeutic strategies. Nucleic Acids Res 2020; 47:10506-10519. [PMID: 31584076 PMCID: PMC6846928 DOI: 10.1093/nar/gkz831] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 09/12/2019] [Accepted: 09/30/2019] [Indexed: 12/18/2022] Open
Abstract
Synthetic genome recoding is a new means of generating designed organisms with altered phenotypes. Synonymous mutations introduced into the protein coding region tolerate modifications in DNA or mRNA without modifying the encoded proteins. Synonymous genome-wide recoding has allowed the synthetic generation of different small-genome viruses with modified phenotypes and biological properties. Recently, a decreased cost of chemically synthesizing DNA and improved methods for assembling DNA fragments (e.g. lambda red recombination and CRISPR-based editing) have enabled the construction of an Escherichia coli variant with a 4-Mb synthetic synonymously recoded genome with a reduced number of sense codons (n = 59) encoding the 20 canonical amino acids. Synonymous genome recoding is increasing our knowledge of microbial interactions with innate immune responses, identifying functional genome structures, and strategically ameliorating cis-inhibitory signaling sequences related to splicing, replication (in eukaryotes), and complex microbe functions, unraveling the relevance of codon usage for the temporal regulation of gene expression and the microbe mutant spectrum and adaptability. New biotechnological and therapeutic applications of this methodology can easily be envisaged. In this review, we discuss how synonymous genome recoding may impact our knowledge of microbial biology and the development of new and better therapeutic methodologies.
Collapse
Affiliation(s)
- Miguel Angel Martínez
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| | - Ana Jordan-Paiz
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| | - Sandra Franco
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| | - Maria Nevot
- IrsiCaixa, Hospital Universitari Germans Trias i Pujol, Universitat Autònoma de Barcelona (UAB), Badalona, Spain
| |
Collapse
|
7
|
Auboeuf D. Physicochemical Foundations of Life that Direct Evolution: Chance and Natural Selection are not Evolutionary Driving Forces. Life (Basel) 2020; 10:life10020007. [PMID: 31973071 PMCID: PMC7175370 DOI: 10.3390/life10020007] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2019] [Revised: 01/15/2020] [Accepted: 01/16/2020] [Indexed: 12/11/2022] Open
Abstract
The current framework of evolutionary theory postulates that evolution relies on random mutations generating a diversity of phenotypes on which natural selection acts. This framework was established using a top-down approach as it originated from Darwinism, which is based on observations made of complex multicellular organisms and, then, modified to fit a DNA-centric view. In this article, it is argued that based on a bottom-up approach starting from the physicochemical properties of nucleic and amino acid polymers, we should reject the facts that (i) natural selection plays a dominant role in evolution and (ii) the probability of mutations is independent of the generated phenotype. It is shown that the adaptation of a phenotype to an environment does not correspond to organism fitness, but rather corresponds to maintaining the genome stability and integrity. In a stable environment, the phenotype maintains the stability of its originating genome and both (genome and phenotype) are reproduced identically. In an unstable environment (i.e., corresponding to variations in physicochemical parameters above a physiological range), the phenotype no longer maintains the stability of its originating genome, but instead influences its variations. Indeed, environment- and cellular-dependent physicochemical parameters define the probability of mutations in terms of frequency, nature, and location in a genome. Evolution is non-deterministic because it relies on probabilistic physicochemical rules, and evolution is driven by a bidirectional interplay between genome and phenotype in which the phenotype ensures the stability of its originating genome in a cellular and environmental physicochemical parameter-depending manner.
Collapse
Affiliation(s)
- Didier Auboeuf
- Laboratory of Biology and Modelling of the Cell, Univ Lyon, ENS de Lyon, Univ Claude Bernard, CNRS UMR 5239, INSERM U1210, 46 Allée d'Italie, Site Jacques Monod, F-69007, Lyon, France
| |
Collapse
|
8
|
Tuteja R, McKeown PC, Ryan P, Morgan CC, Donoghue MTA, Downing T, O'Connell MJ, Spillane C. Paternally Expressed Imprinted Genes under Positive Darwinian Selection in Arabidopsis thaliana. Mol Biol Evol 2019; 36:1239-1253. [PMID: 30913563 PMCID: PMC6526901 DOI: 10.1093/molbev/msz063] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Genomic imprinting is an epigenetic phenomenon where autosomal genes display uniparental expression depending on whether they are maternally or paternally inherited. Genomic imprinting can arise from parental conflicts over resource allocation to the offspring, which could drive imprinted loci to evolve by positive selection. We investigate whether positive selection is associated with genomic imprinting in the inbreeding species Arabidopsis thaliana. Our analysis of 140 genes regulated by genomic imprinting in the A. thaliana seed endosperm demonstrates they are evolving more rapidly than expected. To investigate whether positive selection drives this evolutionary acceleration, we identified orthologs of each imprinted gene across 34 plant species and elucidated their evolutionary trajectories. Increased positive selection was sought by comparing its incidence among imprinted genes with nonimprinted controls. Strikingly, we find a statistically significant enrichment of imprinted paternally expressed genes (iPEGs) evolving under positive selection, 50.6% of the total, but no such enrichment for positive selection among imprinted maternally expressed genes (iMEGs). This suggests that maternally- and paternally expressed imprinted genes are subject to different selective pressures. Almost all positively selected amino acids were fixed across 80 sequenced A. thaliana accessions, suggestive of selective sweeps in the A. thaliana lineage. The imprinted genes under positive selection are involved in processes important for seed development including auxin biosynthesis and epigenetic regulation. Our findings support a genomic imprinting model for plants where positive selection can affect paternally expressed genes due to continued conflict with maternal sporophyte tissues, even when parental conflict is reduced in predominantly inbreeding species.
Collapse
Affiliation(s)
- Reetu Tuteja
- Genetics & Biotechnology Lab, Plant & AgriBiosciences Research Centre (PABC), School of Natural Sciences, Ryan Institute, National University of Ireland Galway, Galway, Ireland.,Center for Genomics and Systems Biology, New York University, New York, NY
| | - Peter C McKeown
- Genetics & Biotechnology Lab, Plant & AgriBiosciences Research Centre (PABC), School of Natural Sciences, Ryan Institute, National University of Ireland Galway, Galway, Ireland
| | - Pat Ryan
- Genetics & Biotechnology Lab, Plant & AgriBiosciences Research Centre (PABC), School of Natural Sciences, Ryan Institute, National University of Ireland Galway, Galway, Ireland
| | - Claire C Morgan
- School of Biotechnology, Faculty of Biological Sciences, Dublin City University, Dublin, Ireland.,Division of Diabetes, Endocrinology and Metabolism, Imperial College London, London, United Kingdom
| | - Mark T A Donoghue
- Genetics & Biotechnology Lab, Plant & AgriBiosciences Research Centre (PABC), School of Natural Sciences, Ryan Institute, National University of Ireland Galway, Galway, Ireland.,Memorial Sloan Kettering Cancer Center, New York, NY
| | - Tim Downing
- School of Biotechnology, Faculty of Biological Sciences, Dublin City University, Dublin, Ireland
| | - Mary J O'Connell
- Computational and Molecular Evolutionary Biology Research Group, School of Biology, Faculty of Biological Sciences, The University of Leeds, Leeds, United Kingdom.,Computational and Molecular Evolutionary Biology Group, School of Life Sciences, University of Nottingham, Nottingham, United Kingdom
| | - Charles Spillane
- Genetics & Biotechnology Lab, Plant & AgriBiosciences Research Centre (PABC), School of Natural Sciences, Ryan Institute, National University of Ireland Galway, Galway, Ireland
| |
Collapse
|
9
|
Athey J, Alexaki A, Osipova E, Rostovtsev A, Santana-Quintero LV, Katneni U, Simonyan V, Kimchi-Sarfaty C. A new and updated resource for codon usage tables. BMC Bioinformatics 2017; 18:391. [PMID: 28865429 PMCID: PMC5581930 DOI: 10.1186/s12859-017-1793-7] [Citation(s) in RCA: 170] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2017] [Accepted: 08/15/2017] [Indexed: 01/24/2023] Open
Abstract
Background Due to the degeneracy of the genetic code, most amino acids can be encoded by multiple synonymous codons. Synonymous codons naturally occur with different frequencies in different organisms. The choice of codons may affect protein expression, structure, and function. Recombinant gene technologies commonly take advantage of the former effect by implementing a technique termed codon optimization, in which codons are replaced with synonymous ones in order to increase protein expression. This technique relies on the accurate knowledge of codon usage frequencies. Accurately quantifying codon usage bias for different organisms is useful not only for codon optimization, but also for evolutionary and translation studies: phylogenetic relations of organisms, and host-pathogen co-evolution relationships, may be explored through their codon usage similarities. Furthermore, codon usage has been shown to affect protein structure and function through interfering with translation kinetics, and cotranslational protein folding. Results Despite the obvious need for accurate codon usage tables, currently available resources are either limited in scope, encompassing only organisms from specific domains of life, or greatly outdated. Taking advantage of the exponential growth of GenBank and the creation of NCBI’s RefSeq database, we have developed a new database, the High-performance Integrated Virtual Environment-Codon Usage Tables (HIVE-CUTs), to present and analyse codon usage tables for every organism with publicly available sequencing data. Compared to existing databases, this new database is more comprehensive, addresses concerns that limited the accuracy of earlier databases, and provides several new functionalities, such as the ability to view and compare codon usage between individual organisms and across taxonomical clades, through graphical representation or through commonly used indices. In addition, it is being routinely updated to keep up with the continuous flow of new data in GenBank and RefSeq. Conclusion Given the impact of codon usage bias on recombinant gene technologies, this database will facilitate effective development and review of recombinant drug products and will be instrumental in a wide area of biological research. The database is available at hive.biochemistry.gwu.edu/review/codon. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1793-7) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- John Athey
- Division of Plasma Protein Therapeutics, Office of Tissue and Advanced Therapies, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, USA
| | - Aikaterini Alexaki
- Division of Plasma Protein Therapeutics, Office of Tissue and Advanced Therapies, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, USA
| | - Ekaterina Osipova
- High Performance Integrated Environment, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, USA
| | - Alexandre Rostovtsev
- High Performance Integrated Environment, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, USA
| | - Luis V Santana-Quintero
- High Performance Integrated Environment, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, USA
| | - Upendra Katneni
- Division of Plasma Protein Therapeutics, Office of Tissue and Advanced Therapies, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, USA
| | - Vahan Simonyan
- High Performance Integrated Environment, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, USA
| | - Chava Kimchi-Sarfaty
- Division of Plasma Protein Therapeutics, Office of Tissue and Advanced Therapies, Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, USA.
| |
Collapse
|
10
|
Pancsa R, Tompa P. Coding Regions of Intrinsic Disorder Accommodate Parallel Functions. Trends Biochem Sci 2016; 41:898-906. [DOI: 10.1016/j.tibs.2016.08.009] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2016] [Revised: 08/16/2016] [Accepted: 08/19/2016] [Indexed: 02/01/2023]
|
11
|
Chi PB, Liberles DA. Selection on protein structure, interaction, and sequence. Protein Sci 2016; 25:1168-78. [PMID: 26808055 PMCID: PMC4918422 DOI: 10.1002/pro.2886] [Citation(s) in RCA: 40] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2015] [Revised: 01/18/2016] [Accepted: 01/19/2016] [Indexed: 11/10/2022]
Abstract
Characterizing the probabilities of observing amino acid substitutions at specific sites in a protein over evolutionary time is a major goal in the field of molecular evolution. While purely statistical approaches at different levels of complexity exist, approaches rooted in underlying biological processes are necessary to characterize both the context-dependence of sequence changes (epistasis) and to extrapolate to sequences not observed in biological databases. To develop such approaches, an understanding of the different selective forces that act on amino acid substitution is necessary. Here, an overview of selection on and corresponding modeling of folding stability, folding specificity, binding affinity and specificity for ligands, the evolution of new binding sites on protein surfaces, protein dynamics, intrinsic disorder, and protein aggregation as well as the interplay with protein expression level (concentration) and biased mutational processes are presented.
Collapse
Affiliation(s)
- Peter B Chi
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122
- Department of Mathematics and Computer Science, Ursinus College, Collegeville, Pennsylvania, 19426
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, 19122
| |
Collapse
|
12
|
Zou Y, Shao X, Dong D. Inferring the determinants of protein evolutionary rates in mammals. Gene 2016; 584:161-6. [PMID: 26899866 DOI: 10.1016/j.gene.2016.02.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2015] [Revised: 01/15/2016] [Accepted: 02/15/2016] [Indexed: 11/25/2022]
Abstract
Understanding the determinants of protein evolutionary rates is one of the most fundamental evolutionary questions. Previous studies have revealed that many biological variables are tightly associated with protein evolutionary rates in mammals. However, the dominant role of these biological variables and their combinatorial effects to evolutionary rates of mammalian proteins are still less understood. In this work, we derived a quantitative model to correlate protein evolutionary rates with the levels of these variables. The result showed that only a small number of variables are necessary to accurately predict protein evolutionary rates, among which miRNA regulation plays the most important role. Our result suggested that biological variables are extensively interrelated and suffer from hidden redundancies in determining protein evolutionary rates. Various variables should be considered in a natural ensemble to comprehensively assess the determinants of protein evolutionary rate.
Collapse
Affiliation(s)
- Yang Zou
- Laboratory of Molecular Ecology and Evolution, Institute of Estuarine and Coastal Research, East China Normal University, Shanghai 200062, China
| | - Xiaojian Shao
- Department of Human Genetics, McGill University, 740 Dr. Penfield Avenue, H3A 0G1 Montreal, Quebec, Canada
| | - Dong Dong
- Laboratory of Molecular Ecology and Evolution, Institute of Estuarine and Coastal Research, East China Normal University, Shanghai 200062, China.
| |
Collapse
|
13
|
Martínez MA, Jordan-Paiz A, Franco S, Nevot M. Synonymous Virus Genome Recoding as a Tool to Impact Viral Fitness. Trends Microbiol 2016; 24:134-147. [DOI: 10.1016/j.tim.2015.11.002] [Citation(s) in RCA: 56] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2015] [Revised: 10/28/2015] [Accepted: 11/04/2015] [Indexed: 01/28/2023]
|
14
|
Ressayre A, Glémin S, Montalent P, Serre-Giardi L, Dillmann C, Joets J. Introns Structure Patterns of Variation in Nucleotide Composition in Arabidopsis thaliana and Rice Protein-Coding Genes. Genome Biol Evol 2015; 7:2913-28. [PMID: 26450849 PMCID: PMC4684703 DOI: 10.1093/gbe/evv189] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Plant genomes present a continuous range of variation in nucleotide composition (G + C content). In coding regions, G + C-poor species tend to have unimodal distributions of G + C content among genes within genomes and slight 5′–3′ gradients along genes. In contrast, G + C-rich species display bimodal distributions of G + C content among genes and steep 5′–3′ decreasing gradients along genes. The causes of these peculiar patterns are still poorly understood. Within two species (Arabidopsis thaliana and rice), each representative of one side of the continuum, we studied the consequences of intron presence on coding region and intron G + C content at different scales. By properly taking intron structure into account, we showed that, in both species, intron presence is associated with step changes in nucleotide, codon, and amino acid composition. This suggests that introns have a barrier effect structuring G + C content along genes and that previous continuous characterizations of the 5′–3′ gradients were artifactual. In external gene regions (located upstream first or downstream last introns), species-specific factors, such as GC-biased gene conversion, are shaping G + C content whereas in internal gene regions (surrounded by introns), G + C content is likely constrained to remain within a range common to both species.
Collapse
Affiliation(s)
- Adrienne Ressayre
- UMR 0320/UMR 8120 Génétique Quantitative et Evolution-Le Moulon, INRA, Gif-sur-Yvette, France
| | - Sylvain Glémin
- Institut des Sciences de l'Evolution (ISEM), UMR 5554, Université de Montpellier, CNRS-IRD-EPHE, France Department of Ecology and Genetics, Evolutionary Biology Centre, Uppsala University, Sweden
| | - Pierre Montalent
- UMR 0320/UMR 8120 Génétique Quantitative et Evolution-Le Moulon, INRA, Gif-sur-Yvette, France
| | - Laurana Serre-Giardi
- UMR 1345 IRHS Institut de Recherche en Horticulture et Semences, INRA, Centre de Recherche Angers-Nantes, Beaucousé, France
| | - Christine Dillmann
- UMR 0320/UMR 8120 Génétique Quantitative et Evolution-Le Moulon, Université Paris-Sud, Gif-sur-Yvette, France
| | - Johann Joets
- UMR 0320/UMR 8120 Génétique Quantitative et Evolution-Le Moulon, INRA, Gif-sur-Yvette, France
| |
Collapse
|
15
|
Balabanova L, Golotin V, Podvolotskaya A, Rasskazov V. Genetically modified proteins: functional improvement and chimeragenesis. Bioengineered 2015. [PMID: 26211369 DOI: 10.1080/21655979.2015.1075674] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022] Open
Abstract
This review focuses on the emerging role of site-specific mutagenesis and chimeragenesis for the functional improvement of proteins in areas where traditional protein engineering methods have been extensively used and practically exhausted. The novel path for the creation of the novel proteins has been created on the farther development of the new structure and sequence optimization algorithms for generating and designing the accurate structure models in result of x-ray crystallography studies of a lot of proteins and their mutant forms. Artificial genetic modifications aim to expand nature's repertoire of biomolecules. One of the most exciting potential results of mutagenesis or chimeragenesis finding could be design of effective diagnostics, bio-therapeutics and biocatalysts. A sampling of recent examples is listed below for the in vivo and in vitro genetically improvement of various binding protein and enzyme functions, with references for more in-depth study provided for the reader's benefit.
Collapse
Affiliation(s)
- Larissa Balabanova
- a G.B. Elyakov Pacific Institute of Bioorganic Chemistry; Far Eastern Branch; Russian Academy of Science ; Vladivostok , Russia.,b Far Eastern Federal University ; Vladivostok , Russia
| | - Vasily Golotin
- a G.B. Elyakov Pacific Institute of Bioorganic Chemistry; Far Eastern Branch; Russian Academy of Science ; Vladivostok , Russia.,b Far Eastern Federal University ; Vladivostok , Russia
| | | | - Valery Rasskazov
- a G.B. Elyakov Pacific Institute of Bioorganic Chemistry; Far Eastern Branch; Russian Academy of Science ; Vladivostok , Russia
| |
Collapse
|
16
|
Smithers B, Oates ME, Gough J. Splice junctions are constrained by protein disorder. Nucleic Acids Res 2015; 43:4814-22. [PMID: 25934802 PMCID: PMC4446445 DOI: 10.1093/nar/gkv407] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2015] [Accepted: 04/15/2015] [Indexed: 01/23/2023] Open
Abstract
We have discovered that positions of splice junctions in genes are constrained by the tolerance for disorder-promoting amino acids in the translated protein region. It is known that efficient splicing requires nucleotide bias at the splice junction; the preferred usage produces a distribution of amino acids that is disorder-promoting. We observe that efficiency of splicing, as seen in the amino-acid distribution, is not compromised to accommodate globular structure. Thus we infer that it is the positions of splice junctions in the gene that must be under constraint by the local protein environment. Examining exonic splicing enhancers found near the splice junction in the gene, reveals that these (short DNA motifs) are more prevalent in exons that encode disordered protein regions than exons encoding structured regions. Thus we also conclude that local protein features constrain efficient splicing more in structure than in disorder.
Collapse
Affiliation(s)
- Ben Smithers
- Department of Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | - Matt E Oates
- Department of Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| | - Julian Gough
- Department of Computer Science, University of Bristol, Bristol, BS8 1UB, UK
| |
Collapse
|
17
|
Karambataki M, Malousi A, Kouidou S. Risk-associated coding synonymous SNPs in type 2 diabetes and neurodegenerative diseases: genetic silence and the underrated association with splicing regulation and epigenetics. Mutat Res 2014; 770:85-93. [PMID: 25771874 DOI: 10.1016/j.mrfmmm.2014.09.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2014] [Revised: 09/15/2014] [Accepted: 09/16/2014] [Indexed: 06/04/2023]
Abstract
Single nucleotide polymorphisms (SNPs) are tentatively critical with regard to disease predisposition, but coding synonymous SNPs (sSNPs) are generally considered "neutral". Nevertheless, sSNPs in serine/arginine-rich (SR) and splice-site (SS) exonic splicing enhancers (ESEs) or in exonic CpG methylation targets, could be decisive for splicing, particularly in aging-related conditions, where mis-splicing is frequently observed. We presently identified 33 genes T2D-related and 28 related to neurodegenerative diseases, by investigating the impact of the corresponding coding sSNPs on splicing and using gene ontology data and computational tools. Potentially critical (prominent) sSNPs comply with the following criteria: changing the splicing potential of prominent SR-ESEs or of significant SS-ESEs by >1.5 units (Δscore), or formation/deletion of ESEs with maximum splicing score. We also noted the formation/disruption of CpGs (tentative methylation sites of epigenetic sSNPs). All disease association studies involving sSNPs are also reported. Only 21/670 coding SNPs, mostly epigenetic, reported in 33 T2D-related genes, were found to be prominent coding synonymous. No prominent sSNPs have been recorded in three key T2D-related genes (GCGR, PPARGC1A, IGF1). Similarly, 20/366 coding synonymous were identified in ND related genes, mostly epigenetic. Meta-analysis showed that 17 of the above prominent sSNPs were previously investigated in association with various pathological conditions. Three out of four sSNPs (all epigenetic) were associated with T2D and one with NDs (branch site sSNP). Five were associated with other or related pathological conditions. None of the four sSNPs introducing new ESEs was found to be disease-associated. sSNPs introducing smaller Δscore changes (<1.5) in key proteins (INSR, IRS1, DISC1) were also correlated to pathological conditions. This data reveals that genetic variation in splicing-regulatory and particularly CpG sites might be related to disease predisposition and that in-silico analysis is useful for identifying sSNPs, which might be falsely identified as silent or synonymous.
Collapse
Affiliation(s)
- M Karambataki
- Lab of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - A Malousi
- Lab of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece
| | - S Kouidou
- Lab of Biological Chemistry, School of Medicine, Aristotle University of Thessaloniki, Thessaloniki, Greece.
| |
Collapse
|
18
|
Falanga A, Stojanović O, Kiffer-Moreira T, Pinto S, Millán JL, Vlahoviček K, Baralle M. Exonic splicing signals impose constraints upon the evolution of enzymatic activity. Nucleic Acids Res 2014; 42:5790-8. [PMID: 24692663 PMCID: PMC4027185 DOI: 10.1093/nar/gku240] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Exon splicing enhancers (ESEs) overlap with amino acid coding sequences implying a dual evolutionary selective pressure. In this study, we map ESEs in the placental alkaline phosphatase gene (ALPP), absent in the corresponding exon of the ancestral tissue-non-specific alkaline phosphatase gene (ALPL). The ESEs are associated with amino acid differences between the transcripts in an area otherwise conserved. We switched out the ALPP ESEs sequences with the sequence from the related ALPL, introducing the associated amino acid changes. The resulting enzymes, produced by cDNA expression, showed different kinetic characteristics than ALPL and ALPP. In the organism, this enzyme will never be subjected to selection because gene splicing analysis shows exon skipping due to loss of the ESE. Our data prove that ESEs restrict the evolution of enzymatic activity. Thus, suboptimal proteins may exist in scenarios when coding nucleotide changes and consequent amino acid variation cannot be reconciled with the splicing function.
Collapse
Affiliation(s)
- Alessia Falanga
- Molecular Pathology Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), Padriciano 99, 34149 Trieste, Italy
| | - Ozren Stojanović
- Bioinformatics Group, Department of Molecular Biology, Division of Biology, Faculty of Science, University of Zagreb, Horvatovac 102a, 10000 Zagreb, Croatia
| | - Tina Kiffer-Moreira
- Sanford Children's Health Research Center, Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Sofia Pinto
- Bioinformatics Group, Department of Molecular Biology, Division of Biology, Faculty of Science, University of Zagreb, Horvatovac 102a, 10000 Zagreb, Croatia
| | - José Luis Millán
- Sanford Children's Health Research Center, Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA 92037, USA
| | - Kristian Vlahoviček
- Bioinformatics Group, Department of Molecular Biology, Division of Biology, Faculty of Science, University of Zagreb, Horvatovac 102a, 10000 Zagreb, Croatia Department of Informatics, University of Oslo, PO Box 1080 Blindern, NO-0316 Oslo, Norway
| | - Marco Baralle
- Molecular Pathology Group, International Centre for Genetic Engineering and Biotechnology (ICGEB), Padriciano 99, 34149 Trieste, Italy
| |
Collapse
|
19
|
Weber CC, Boussau B, Romiguier J, Jarvis ED, Ellegren H. Evidence for GC-biased gene conversion as a driver of between-lineage differences in avian base composition. Genome Biol 2014; 15:549. [PMID: 25496599 PMCID: PMC4290106 DOI: 10.1186/s13059-014-0549-1] [Citation(s) in RCA: 54] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2014] [Accepted: 11/19/2014] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND While effective population size (Ne) and life history traits such as generation time are known to impact substitution rates, their potential effects on base composition evolution are less well understood. GC content increases with decreasing body mass in mammals, consistent with recombination-associated GC biased gene conversion (gBGC) more strongly impacting these lineages. However, shifts in chromosomal architecture and recombination landscapes between species may complicate the interpretation of these results. In birds, interchromosomal rearrangements are rare and the recombination landscape is conserved, suggesting that this group is well suited to assess the impact of life history on base composition. RESULTS Employing data from 45 newly and 3 previously sequenced avian genomes covering a broad range of taxa, we found that lineages with large populations and short generations exhibit higher GC content. The effect extends to both coding and non-coding sites, indicating that it is not due to selection on codon usage. Consistent with recombination driving base composition, GC content and heterogeneity were positively correlated with the rate of recombination. Moreover, we observed ongoing increases in GC in the majority of lineages. CONCLUSIONS Our results provide evidence that gBGC may drive patterns of nucleotide composition in avian genomes and are consistent with more effective gBGC in large populations and a greater number of meioses per unit time; that is, a shorter generation time. Thus, in accord with theoretical predictions, base composition evolution is substantially modulated by species life history.
Collapse
Affiliation(s)
- Claudia C Weber
- />Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, SE-752 36 Uppsala, Sweden
| | - Bastien Boussau
- />Laboratoire de Biométrie et Biologie Evolutive, Université de Lyon, Université Lyon 1, CNRS, UMR5558 Villeurbanne, France
| | | | - Erich D Jarvis
- />Department of Neurobiology, Howard Hughes Medical Institute, Duke University Medical Center, Durham, NC USA
| | - Hans Ellegren
- />Department of Evolutionary Biology, Evolutionary Biology Centre, Uppsala University, Norbyvägen 18D, SE-752 36 Uppsala, Sweden
| |
Collapse
|
20
|
Affiliation(s)
- Robert J Weatheritt
- MRC Laboratory of Molecular Biology, Francis Crick Avenue, Cambridge CB2 0QH, UK
| | | |
Collapse
|
21
|
Stergachis AB, Haugen E, Shafer A, Fu W, Vernot B, Reynolds A, Raubitschek A, Ziegler S, LeProust EM, Akey JM, Stamatoyannopoulos JA. Exonic transcription factor binding directs codon choice and affects protein evolution. Science 2013; 342:1367-72. [PMID: 24337295 DOI: 10.1126/science.1243490] [Citation(s) in RCA: 208] [Impact Index Per Article: 17.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Genomes contain both a genetic code specifying amino acids and a regulatory code specifying transcription factor (TF) recognition sequences. We used genomic deoxyribonuclease I footprinting to map nucleotide resolution TF occupancy across the human exome in 81 diverse cell types. We found that ~15% of human codons are dual-use codons ("duons") that simultaneously specify both amino acids and TF recognition sites. Duons are highly conserved and have shaped protein evolution, and TF-imposed constraint appears to be a major driver of codon usage bias. Conversely, the regulatory code has been selectively depleted of TFs that recognize stop codons. More than 17% of single-nucleotide variants within duons directly alter TF binding. Pervasive dual encoding of amino acid and regulatory information appears to be a fundamental feature of genome evolution.
Collapse
Affiliation(s)
- Andrew B Stergachis
- Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Reference-free population genomics from next-generation transcriptome data and the vertebrate-invertebrate gap. PLoS Genet 2013; 9:e1003457. [PMID: 23593039 PMCID: PMC3623758 DOI: 10.1371/journal.pgen.1003457] [Citation(s) in RCA: 122] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2012] [Accepted: 03/04/2013] [Indexed: 01/19/2023] Open
Abstract
In animals, the population genomic literature is dominated by two taxa, namely mammals and drosophilids, in which fully sequenced, well-annotated genomes have been available for years. Data from other metazoan phyla are scarce, probably because the vast majority of living species still lack a closely related reference genome. Here we achieve de novo, reference-free population genomic analysis from wild samples in five non-model animal species, based on next-generation sequencing transcriptome data. We introduce a pipe-line for cDNA assembly, read mapping, SNP/genotype calling, and data cleaning, with specific focus on the issue of hidden paralogy detection. In two species for which a reference genome is available, similar results were obtained whether the reference was used or not, demonstrating the robustness of our de novo inferences. The population genomic profile of a hare, a turtle, an oyster, a tunicate, and a termite were found to be intermediate between those of human and Drosophila, indicating that the discordant genomic diversity patterns that have been reported between these two species do not reflect a generalized vertebrate versus invertebrate gap. The genomic average diversity was generally higher in invertebrates than in vertebrates (with the notable exception of termite), in agreement with the notion that population size tends to be larger in the former than in the latter. The non-synonymous to synonymous ratio, however, did not differ significantly between vertebrates and invertebrates, even though it was negatively correlated with genetic diversity within each of the two groups. This study opens promising perspective regarding genome-wide population analyses of non-model organisms and the influence of population size on non-synonymous versus synonymous diversity.
Collapse
|
23
|
Nabiyouni M, Prakash A, Fedorov A. Vertebrate codon bias indicates a highly GC-rich ancestral genome. Gene 2013; 519:113-9. [PMID: 23376453 DOI: 10.1016/j.gene.2013.01.033] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2012] [Revised: 01/10/2013] [Accepted: 01/17/2013] [Indexed: 11/16/2022]
Abstract
Two factors are thought to have contributed to the origin of codon usage bias in eukaryotes: 1) genome-wide mutational forces that shape overall GC-content and create context-dependent nucleotide bias, and 2) positive selection for codons that maximize efficient and accurate translation. Particularly in vertebrates, these two explanations contradict each other and cloud the origin of codon bias in the taxon. On the one hand, mutational forces fail to explain GC-richness (~60%) of third codon positions, given the GC-poor overall genomic composition among vertebrates (~40%). On the other hand, positive selection cannot easily explain strict regularities in codon preferences. Large-scale bioinformatic assessment, of nucleotide composition of coding and non-coding sequences in vertebrates and other taxa, suggests a simple possible resolution for this contradiction. Specifically, we propose that the last common vertebrate ancestor had a GC-rich genome (~65% GC). The data suggest that whole-genome mutational bias is the major driving force for generating codon bias. As the bias becomes prominent, it begins to affect translation and can result in positive selection for optimal codons. The positive selection can, in turn, significantly modulate codon preferences.
Collapse
Affiliation(s)
- Maryam Nabiyouni
- Program in Bioinformatics and Proteomics/Genomics, University of Toledo, Health Science Campus, Toledo, OH 43614, USA.
| | | | | |
Collapse
|
24
|
Williams C, Hoppe HJ, Rezgui D, Strickland M, Forbes BE, Grutzner F, Frago S, Ellis RZ, Wattana-Amorn P, Prince SN, Zaccheo OJ, Nolan CM, Mungall AJ, Jones EY, Crump MP, Hassan AB. An exon splice enhancer primes IGF2:IGF2R binding site structure and function evolution. Science 2012; 338:1209-13. [PMID: 23197533 PMCID: PMC4658703 DOI: 10.1126/science.1228633] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Placental development and genomic imprinting coevolved with parental conflict over resource distribution to mammalian offspring. The imprinted genes IGF2 and IGF2R code for the growth promoter insulin-like growth factor 2 (IGF2) and its inhibitor, mannose 6-phosphate (M6P)/IGF2 receptor (IGF2R), respectively. M6P/IGF2R of birds and fish do not recognize IGF2. In monotremes, which lack imprinting, IGF2 specifically bound M6P/IGF2R via a hydrophobic CD loop. We show that the DNA coding the CD loop in monotremes functions as an exon splice enhancer (ESE) and that structural evolution of binding site loops (AB, HI, FG) improved therian IGF2 affinity. We propose that ESE evolution led to the fortuitous acquisition of IGF2 binding by M6P/IGF2R that drew IGF2R into parental conflict; subsequent imprinting may then have accelerated affinity maturation.
Collapse
Affiliation(s)
- Christopher Williams
- Department of Organic and Biological Chemistry, School of Chemistry, University of Bristol, Bristol BS8 1TS, UK
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
25
|
Aledo JC, Valverde H, Ruíz-Camacho M. Thermodynamic stability explains the differential evolutionary dynamics of cytochrome b and COX I in mammals. J Mol Evol 2012; 74:69-80. [PMID: 22362464 DOI: 10.1007/s00239-012-9489-0] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2011] [Accepted: 02/02/2012] [Indexed: 12/29/2022]
Abstract
By using a combination of evolutionary and structural data from 231 species, we have addressed the relationship between evolution and structural features of cytochrome b and COX I, two mtDNA-encoded proteins. The interior of cytochrome b, in contrast to that of COX I, exhibits a remarkable tolerance to changes. The higher evolvability of cytochrome b contrasts with the lower rate of synonymous substitutions of its gene when compared to that of COX I, suggesting that the latter is subjected to a stronger purifying selection. We present evidences that the stability effect of mutations (ΔΔG) may be behind these differential behaviour.
Collapse
Affiliation(s)
- Juan Carlos Aledo
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de Málaga, 29071, Málaga, Spain.
| | | | | |
Collapse
|
26
|
Wilburn DB, Bowen KE, Gregg RG, Cai J, Feldhoff PW, Houck LD, Feldhoff RC. Proteomic and UTR analyses of a rapidly evolving hypervariable family of vertebrate pheromones. Evolution 2012; 66:2227-39. [PMID: 22759298 DOI: 10.1111/j.1558-5646.2011.01572.x] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
During the annual mating season, the mental gland of male plethodontid salamanders diverts its protein synthesizing capacity to the production of courtship pheromones that increase female receptivity. Plethodontid modulating factor (PMF), a highly disulfide-bonded 7-kDa pheromone, shows unusual hypervariability with each male expressing >30 isoforms. Twenty-eight PMFs were purified and matched by proteomic analyses to cDNA sequences. In contrast to coding sequence hypervariability, the untranslated regions (UTRs) show extraordinary conservation, no predicted microRNA binding sites, and an overlapping triplet polyadenylation signal. Full-length cDNA sequencing revealed three PMF gene classes containing subclasses of clustered sequences that support ≥ 13 PMF gene duplications. The unusual phenomena of hypervariable coding regions embedded within extremely conserved UTRs is proposed to occur by a disjunctive evolutionary process. During the short courtship season, the UTRs are hypothesized to subsume and coordinate the transcriptional and translational regulatory mechanisms of the mental gland. PMF, as a secreted protein with limited metabolic feedback in the male, is under minimal mutational restraint and thus has experienced highly accelerated rates of evolution. Consequently, plethodontid salamanders may provide a unique model for furthering our understanding of the selective forces that determine differential rates of gene duplication and evolution in protein families.
Collapse
Affiliation(s)
- Damien B Wilburn
- Department of Biochemistry and Molecular Biology, University of Louisville Health Sciences Center, Louisville, Kentucky 40292, USA
| | | | | | | | | | | | | |
Collapse
|
27
|
Buschiazzo E, Ritland C, Bohlmann J, Ritland K. Slow but not low: genomic comparisons reveal slower evolutionary rate and higher dN/dS in conifers compared to angiosperms. BMC Evol Biol 2012; 12:8. [PMID: 22264329 PMCID: PMC3328258 DOI: 10.1186/1471-2148-12-8] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2011] [Accepted: 01/20/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Comparative genomics can inform us about the processes of mutation and selection across diverse taxa. Among seed plants, gymnosperms have been lacking in genomic comparisons. Recent EST and full-length cDNA collections for two conifers, Sitka spruce (Picea sitchensis) and loblolly pine (Pinus taeda), together with full genome sequences for two angiosperms, Arabidopsis thaliana and poplar (Populus trichocarpa), offer an opportunity to infer the evolutionary processes underlying thousands of orthologous protein-coding genes in gymnosperms compared with an angiosperm orthologue set. RESULTS Based upon pairwise comparisons of 3,723 spruce and pine orthologues, we found an average synonymous genetic distance (dS) of 0.191, and an average dN/dS ratio of 0.314. Using a fossil-established divergence time of 140 million years between spruce and pine, we extrapolated a nucleotide substitution rate of 0.68 × 10(-9) synonymous substitutions per site per year. When compared to angiosperms, this indicates a dramatically slower rate of nucleotide substitution rates in conifers: on average 15-fold. Coincidentally, we found a three-fold higher dN/dS for the spruce-pine lineage compared to the poplar-Arabidopsis lineage. This joint occurrence of a slower evolutionary rate in conifers with higher dN/dS, and possibly positive selection, showcases the uniqueness of conifer genome evolution. CONCLUSIONS Our results are in line with documented reduced nucleotide diversity, conservative genome evolution and low rates of diversification in conifers on the one hand and numerous examples of local adaptation in conifers on the other hand. We propose that reduced levels of nucleotide mutation in large and long-lived conifer trees, coupled with large effective population size, were the main factors leading to slow substitution rates but retention of beneficial mutations.
Collapse
Affiliation(s)
- Emmanuel Buschiazzo
- Department of Forest Sciences, University of British Columbia, 2424 Main Mall, Vancouver, BC V6T 1Z4, Canada.
| | | | | | | |
Collapse
|
28
|
Error prevention and mitigation as forces in the evolution of genes and genomes. Nat Rev Genet 2011; 12:875-81. [PMID: 22094950 DOI: 10.1038/nrg3092] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Why are short introns rarely a multiple of three nucleotides long? Why do essential genes cluster? Why are genes in operons often lined up in the order in which they are needed in the encoded pathway? In this Opinion article, we argue that these and many other - ostensibly disparate - observations are all pieces of an emerging picture in which multiple aspects of gene anatomy and genome architecture have evolved in response to error-prone gene expression.
Collapse
|
29
|
Determinants of translation efficiency and accuracy. Mol Syst Biol 2011; 7:481. [PMID: 21487400 PMCID: PMC3101949 DOI: 10.1038/msb.2011.14] [Citation(s) in RCA: 338] [Impact Index Per Article: 24.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2010] [Accepted: 02/15/2011] [Indexed: 12/17/2022] Open
Abstract
A given protein sequence can be encoded by an astronomical number of alternative nucleotide sequences. Recent research has revealed that this flexibility provides evolution with multiple ways to tune the efficiency and fidelity of protein translation and folding. Proper functioning of biological cells requires that the process of protein expression be carried out with high efficiency and fidelity. Given an amino-acid sequence of a protein, multiple degrees of freedom still remain that may allow evolution to tune efficiency and fidelity for each gene under various conditions and cell types. Particularly, the redundancy of the genetic code allows the choice between alternative codons for the same amino acid, which, although ‘synonymous,' may exert dramatic effects on the process of translation. Here we review modern developments in genomics and systems biology that have revolutionized our understanding of the multiple means by which translation is regulated. We suggest new means to model the process of translation in a richer framework that will incorporate information about gene sequences, the tRNA pool of the organism and the thermodynamic stability of the mRNA transcripts. A practical demonstration of a better understanding of the process would be a more accurate prediction of the proteome, given the transcriptome at a diversity of biological conditions.
Collapse
|
30
|
Zago P, Buratti E, Stuani C, Baralle FE. Evolutionary connections between coding and splicing regulatory regions in the fibronectin EDA exon. J Mol Biol 2011; 411:1-15. [PMID: 21663748 DOI: 10.1016/j.jmb.2011.05.031] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2011] [Revised: 05/16/2011] [Accepted: 05/20/2011] [Indexed: 01/03/2023]
Abstract
Research on exonic coding sequences has demonstrated that many substitutions at the amino acid level may also reflect profound changes at the level of splicing regulatory regions. These results have revealed that, for many alternatively spliced exons, there is considerable pressure to strike a balance between two different and sometimes conflicting forces: the drive to improve the quality and production efficiency of proteins and the maintenance of proper exon recognition by the splicing machinery. Up to now, the systems used to investigate these connections have mostly focused on short alternatively spliced exons that contain a high density of splicing regulatory elements. Although this is obviously a desirable feature in order to maximize the chances of spotting connections, it also complicates the process of drawing straightforward evolutionary pathways between different species (because of the numerous alternative pathways through which the same end point can be achieved). The alternatively spliced fibronectin extra domain A exon (also referred to as EDI or EIIIA) does not have these limitations, as its inclusion is already known to depend on a single exonic splicing enhancer element within its sequence. In this study, we have compared the rat and human fibronectin EDA exons with regard to RNA structure, exonic splicing enhancer strengths, and SR protein occupancy. The results gained from these analyses have then been used to perform an accurate evaluation of EDA sequences observed in a wide range of animal species. This comparison strongly suggests the existence of an evolutionary connection between changes at the nucleotide levels and the need to maintain efficient EDA recognition in different species.
Collapse
Affiliation(s)
- Paola Zago
- International Center for Genetic Engineering and Biotechnology, Trieste, Italy
| | | | | | | |
Collapse
|
31
|
Wilke CO, Drummond DA. Signatures of protein biophysics in coding sequence evolution. Curr Opin Struct Biol 2010; 20:385-9. [PMID: 20395125 DOI: 10.1016/j.sbi.2010.03.004] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2010] [Accepted: 03/22/2010] [Indexed: 10/19/2022]
Abstract
Since the early days of molecular evolution, the conventional wisdom has been that the evolution of protein-coding genes is primarily determined by functional constraints. Yet recent evidence indicates that the evolution of these genes is strongly shaped by the biophysical processes of protein synthesis, protein folding, and specific as well as nonspecific protein-protein interactions. Selection pressures related to these biophysical processes affect primarily the amino-acid sequence of genes, but they also leave their mark on synonymous sites at the nucleotide level. While evidence for specific selection pressures related to protein biophysics is strong, there is currently no unifying framework that integrates the various selection pressures on coding sequences and disentangles their relative importance.
Collapse
Affiliation(s)
- Claus O Wilke
- Center for Computational Biology and Bioinformatics, Institute for Cell and Molecular Biology, and Section of Integrative Biology, The University of Texas at Austin, Austin, TX, USA.
| | | |
Collapse
|
32
|
Zhang Z, Zhou L, Wang P, Liu Y, Chen X, Hu L, Kong X. Divergence of exonic splicing elements after gene duplication and the impact on gene structures. Genome Biol 2009; 10:R120. [PMID: 19883501 PMCID: PMC3091315 DOI: 10.1186/gb-2009-10-11-r120] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2009] [Revised: 09/28/2009] [Accepted: 11/02/2009] [Indexed: 12/18/2022] Open
Abstract
An analysis of human exonic splicing elements in duplicated genes reveals their important role in the generation of new gene structures. Background The origin of new genes and their contribution to functional novelty has been the subject of considerable interest. There has been much progress in understanding the mechanisms by which new genes originate. Here we examine a novel way that new gene structures could originate, namely through the evolution of new alternative splicing isoforms after gene duplication. Results We studied the divergence of exonic splicing enhancers and silencers after gene duplication and the contributions of such divergence to the generation of new splicing isoforms. We found that exonic splicing enhancers and exonic splicing silencers diverge especially fast shortly after gene duplication. About 10% and 5% of paralogous exons undergo significantly asymmetric evolution of exonic splicing enhancers and silencers, respectively. When compared to pre-duplication ancestors, we found that there is a significant overall loss of exonic splicing enhancers and the magnitude increases with duplication age. Detailed examination reveals net gains and losses of exonic splicing enhancers and silencers in different copies and paralog clusters after gene duplication. Furthermore, we found that exonic splicing enhancer and silencer changes are mainly caused by synonymous mutations, though nonsynonymous changes also contribute. Finally, we found that exonic splicing enhancer and silencer divergence results in exon splicing state transitions (from constitutive to alternative or vice versa), and that the proportion of paralogous exon pairs with different splicing states also increases over time, consistent with previous predictions. Conclusions Our results suggest that exonic splicing enhancer and silencer changes after gene duplication have important roles in alternative splicing divergence and that these changes contribute to the generation of new gene structures.
Collapse
Affiliation(s)
- Zhenguo Zhang
- The Key Laboratory of Stem Cell Biology, Institute of Health Sciences, Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS) and Shanghai Jiao Tong University School of Medicine (SJTUSM), 225 South Chong Qing Road, Shanghai 200025, PR China.
| | | | | | | | | | | | | |
Collapse
|
33
|
Abstract
Charles Darwin's theory of evolution was based on studies of biology at the species level. In the time since his death, studies at the molecular level have confirmed his ideas about the kinship of all life on Earth and have provided a wealth of detail about the evolutionary relationships between different species and a deeper understanding of the finer workings of natural selection. We now have a wealth of data, including the genome sequences of a wide range of organisms, an even larger number of protein sequences, a significant knowledge of the three-dimensional structures of proteins, DNA and other biological molecules, and a huge body of information about the operation of these molecules as systems in the molecular machinery of all living things. This issue of Biochemical Society Transactions contains papers from oral presentations given at a Biochemical Society Focused Meeting to commemorate the 200th Anniversary of Charles Darwin's birth, held on 26-27 January 2009 at the Wellcome Trust Conference Centre, Cambridge. The talks reported on some of the insights into evolution which have been obtained from the study of protein sequences, structures and systems.
Collapse
|