1
|
Cherry JL. A Short-Term View of Protein Sequence Evolution from Salmonella. Genome Biol Evol 2025; 17:evaf040. [PMID: 40048608 PMCID: PMC11925014 DOI: 10.1093/gbe/evaf040] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/20/2025] [Indexed: 03/21/2025] Open
Abstract
Much of the study of protein sequence evolution is based on sequence changes inferred to have occurred in nature. The sequences compared for this purpose are usually sufficiently distant that purifying selection has had nearly its full effect and most of the changes inferred have been exposed to a variety of conditions. Here, I make use of large numbers of Salmonella genome sequences to study changes known to be of very recent origin because they are inferred from comparison of very closely related sequences. The effects of purifying selection are weak yet discernible on this short timescale: the ratio of nonsynonymous to synonymous changes is smaller than expected under selective neutrality, but only slightly so. Essential genes have lower rates of nonsynonymous change, as they do on a longer timescale, but much more of this association remains after controlling for expression level. Positive selection for nonsynonymous change is inferred for 151 genes. For nearly half of these, this is attributable to selection for loss of function. Other forms of positive selection inferred include selection for amino acid changes that make enzymes less sensitive to antibiotics and selection for activating changes to proteins involved in transcriptional regulation. Positively selected variants of many genes are likely favored only under unusual conditions and disfavored in the long term, making detection of the positive selection with more distant comparisons difficult or impossible. The short-term view provided by close comparisons complements the long-term view obtained from more distant comparisons such as those between species.
Collapse
Affiliation(s)
- Joshua L Cherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
- Division of International Epidemiology and Population Studies, Fogarty International Center, National Institutes of Health, Bethesda, MD, USA
| |
Collapse
|
2
|
Buric F, Viknander S, Fu X, Lemke O, Carmona OG, Zrimec J, Szyrwiel L, Mülleder M, Ralser M, Zelezniak A. Amino acid sequence encodes protein abundance shaped by protein stability at reduced synthesis cost. Protein Sci 2025; 34:e5239. [PMID: 39665261 PMCID: PMC11635393 DOI: 10.1002/pro.5239] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 10/11/2024] [Accepted: 11/14/2024] [Indexed: 12/13/2024]
Abstract
Understanding what drives protein abundance is essential to biology, medicine, and biotechnology. Driven by evolutionary selection, an amino acid sequence is tailored to meet the required abundance of a proteome, underscoring the intricate relationship between sequence and functional demand. Yet, the specific role of amino acid sequences in determining proteome abundance remains elusive. Here we show that the amino acid sequence alone encodes over half of protein abundance variation across all domains of life, ranging from bacteria to mouse and human. With an attempt to go beyond predictions, we trained a manageable-size Transformer model to interpret latent factors predictive of protein abundances. Intuitively, the model's attention focused on the protein's structural features linked to stability and metabolic costs related to protein synthesis. To probe these relationships, we introduce MGEM (Mutation Guided by an Embedded Manifold), a methodology for guiding protein abundance through sequence modifications. We find that mutations which increase predicted abundance have significantly altered protein polarity and hydrophobicity, underscoring a connection between protein structural features and abundance. Through molecular dynamics simulations we revealed that abundance-enhancing mutations possibly contribute to protein thermostability by increasing rigidity, which occurs at a lower synthesis cost.
Collapse
Affiliation(s)
- Filip Buric
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
| | - Sandra Viknander
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
| | - Xiaozhi Fu
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
| | - Oliver Lemke
- Department of BiochemistryCharité – Universitätsmedizin BerlinBerlinGermany
| | - Oriol Gracia Carmona
- Randall Centre for Cell & Molecular BiophysicsKing's College LondonLondonUK
- Institute of Structural and Molecular BiologyUniversity College LondonLondonUK
| | - Jan Zrimec
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
- Department of Biotechnology and Systems BiologyNational Institute of BiologyLjubljanaSlovenia
| | - Lukasz Szyrwiel
- Department of BiochemistryCharité – Universitätsmedizin BerlinBerlinGermany
| | - Michael Mülleder
- Core Facility High Throughput Mass SpectrometryCharité – Universitätsmedizin BerlinBerlinGermany
| | - Markus Ralser
- Department of BiochemistryCharité – Universitätsmedizin BerlinBerlinGermany
| | - Aleksej Zelezniak
- Department of Biology and Biological EngineeringChalmers University of TechnologyGothenburgSweden
- Randall Centre for Cell & Molecular BiophysicsKing's College LondonLondonUK
- Institute of Biotechnology, Life Sciences CentreVilnius UniversityVilniusLithuania
| |
Collapse
|
3
|
Usmanova DR, Plata G, Vitkup D. Functional Optimization in Distinct Tissues and Conditions Constrains the Rate of Protein Evolution. Mol Biol Evol 2024; 41:msae200. [PMID: 39431545 PMCID: PMC11523136 DOI: 10.1093/molbev/msae200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 07/29/2024] [Accepted: 08/05/2024] [Indexed: 10/22/2024] Open
Abstract
Understanding the main determinants of protein evolution is a fundamental challenge in biology. Despite many decades of active research, the molecular and cellular mechanisms underlying the substantial variability of evolutionary rates across cellular proteins are not currently well understood. It also remains unclear how protein molecular function is optimized in the context of multicellular species and why many proteins, such as enzymes, are only moderately efficient on average. Our analysis of genomics and functional datasets reveals in multiple organisms a strong inverse relationship between the optimality of protein molecular function and the rate of protein evolution. Furthermore, we find that highly expressed proteins tend to be substantially more functionally optimized. These results suggest that cellular expression costs lead to more pronounced functional optimization of abundant proteins and that the purifying selection to maintain high levels of functional optimality significantly slows protein evolution. We observe that in multicellular species both the rate of protein evolution and the degree of protein functional efficiency are primarily affected by expression in several distinct cell types and tissues, specifically, in developed neurons with upregulated synaptic processes in animals and in young and fast-growing tissues in plants. Overall, our analysis reveals how various constraints from the molecular, cellular, and species' levels of biological organization jointly affect the rate of protein evolution and the level of protein functional adaptation.
Collapse
Affiliation(s)
- Dinara R Usmanova
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Germán Plata
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
- BiomEdit, Fishers, IN 46037, USA
| | - Dennis Vitkup
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| |
Collapse
|
4
|
Potera K, Tomala K. Using yeasts for the studies of nonfunctional factors in protein evolution. Yeast 2024; 41:529-536. [PMID: 38895906 DOI: 10.1002/yea.3970] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 05/08/2024] [Accepted: 06/06/2024] [Indexed: 06/21/2024] Open
Abstract
The evolution of protein sequence is driven not only by factors directly related to protein function and shape but also by nonfunctional factors. Such factors in protein evolution might be categorized as those connected to energetic costs, synthesis efficiency, and avoidance of misfolding and toxicity. A common approach to studying them is correlational analysis contrasting them with some characteristics of the protein, like amino acid composition, but these features are interdependent. To avoid possible bias, empirical studies are needed, and not enough work has been done to date. In this review, we describe the role of nonfunctional factors in protein evolution and present an experimental approach using yeast as a suitable model organism. The focus of the proposed approach is on the potential negative impact on the fitness of mutations that change protein properties not related to function and the frequency of mutations that change these properties. Experimental results of testing the misfolding avoidance hypothesis as an explanation for why highly expressed proteins evolve slowly are inconsistent with correlational research results. Therefore, more efforts should be made to empirically test the effects of nonfunctional factors in protein evolution and to contrast these results with the results of the correlational analysis approach.
Collapse
Affiliation(s)
- Katarzyna Potera
- Faculty of Biology, Institute of Environmental Sciences, Jagiellonian University, Krakow, Poland
- Doctoral School of Exact and Natural Sciences, Jagiellonian University, Krakow, Poland
| | - Katarzyna Tomala
- Faculty of Biology, Institute of Environmental Sciences, Jagiellonian University, Krakow, Poland
| |
Collapse
|
5
|
Couce A, Limdi A, Magnan M, Owen SV, Herren CM, Lenski RE, Tenaillon O, Baym M. Changing fitness effects of mutations through long-term bacterial evolution. Science 2024; 383:eadd1417. [PMID: 38271521 DOI: 10.1126/science.add1417] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Accepted: 12/12/2023] [Indexed: 01/27/2024]
Abstract
The distribution of fitness effects of new mutations shapes evolution, but it is challenging to observe how it changes as organisms adapt. Using Escherichia coli lineages spanning 50,000 generations of evolution, we quantify the fitness effects of insertion mutations in every gene. Macroscopically, the fraction of deleterious mutations changed little over time whereas the beneficial tail declined sharply, approaching an exponential distribution. Microscopically, changes in individual gene essentiality and deleterious effects often occurred in parallel; altered essentiality is only partly explained by structural variation. The identity and effect sizes of beneficial mutations changed rapidly over time, but many targets of selection remained predictable because of the importance of loss-of-function mutations. Taken together, these results reveal the dynamic-but statistically predictable-nature of mutational fitness effects.
Collapse
Affiliation(s)
- Alejandro Couce
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, IAME, F-75018 Paris, France
- Department of Life Sciences, Imperial College London, London SW7 2AZ, UK
- Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM), 28223 Madrid, Spain
| | - Anurag Limdi
- Department of Biomedical Informatics, and Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Melanie Magnan
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, IAME, F-75018 Paris, France
| | - Siân V Owen
- Department of Biomedical Informatics, and Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Cristina M Herren
- Department of Biomedical Informatics, and Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
- Department of Marine and Environmental Sciences, Northeastern University, Boston, MA 02115, USA
| | - Richard E Lenski
- Department of Microbiology and Molecular Genetics, Michigan State University, East Lansing, MI 48824, USA
- Program in Ecology, Evolution, and Behavior, Michigan State University, East Lansing, MI 48824, USA
| | - Olivier Tenaillon
- Université Paris Cité and Université Sorbonne Paris Nord, Inserm, IAME, F-75018 Paris, France
- Université Paris Cité, Inserm, Institut Cochin, F-75014 Paris, France
| | - Michael Baym
- Department of Biomedical Informatics, and Laboratory of Systems Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
6
|
Luzuriaga-Neira AR, Ritchie AM, Payne BL, Carrillo-Parramon O, Liberles DA, Alvarez-Ponce D. Highly Abundant Proteins Are Highly Thermostable. Genome Biol Evol 2023; 15:evad112. [PMID: 37399326 DOI: 10.1093/gbe/evad112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/08/2023] [Indexed: 07/05/2023] Open
Abstract
Highly abundant proteins tend to evolve slowly (a trend called E-R anticorrelation), and a number of hypotheses have been proposed to explain this phenomenon. The misfolding avoidance hypothesis attributes the E-R anticorrelation to the abundance-dependent toxic effects of protein misfolding. To avoid these toxic effects, protein sequences (particularly those of highly expressed proteins) would be under selection to fold properly. One prediction of the misfolding avoidance hypothesis is that highly abundant proteins should exhibit high thermostability (i.e., a highly negative free energy of folding, ΔG). Thus far, only a handful of analyses have tested for a relationship between protein abundance and thermostability, producing contradictory results. These analyses have been limited by 1) the scarcity of ΔG data, 2) the fact that these data have been obtained by different laboratories and under different experimental conditions, 3) the problems associated with using proteins' melting energy (Tm) as a proxy for ΔG, and 4) the difficulty of controlling for potentially confounding variables. Here, we use computational methods to compare the free energy of folding of pairs of human-mouse orthologous proteins with different expression levels. Even though the effect size is limited, the most highly expressed ortholog is often the one with a more negative ΔG of folding, indicating that highly expressed proteins are often more thermostable.
Collapse
Affiliation(s)
| | - Andrew M Ritchie
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, USA
| | | | | | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, Pennsylvania, USA
| | | |
Collapse
|
7
|
Hara Y, Kuraku S. The impact of local genomic properties on the evolutionary fate of genes. eLife 2023; 12:82290. [PMID: 37223962 DOI: 10.7554/elife.82290] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Accepted: 04/25/2023] [Indexed: 05/25/2023] Open
Abstract
Functionally indispensable genes are likely to be retained and otherwise to be lost during evolution. This evolutionary fate of a gene can also be affected by factors independent of gene dispensability, including the mutability of genomic positions, but such features have not been examined well. To uncover the genomic features associated with gene loss, we investigated the characteristics of genomic regions where genes have been independently lost in multiple lineages. With a comprehensive scan of gene phylogenies of vertebrates with a careful inspection of evolutionary gene losses, we identified 813 human genes whose orthologs were lost in multiple mammalian lineages: designated 'elusive genes.' These elusive genes were located in genomic regions with rapid nucleotide substitution, high GC content, and high gene density. A comparison of the orthologous regions of such elusive genes across vertebrates revealed that these features had been established before the radiation of the extant vertebrates approximately 500 million years ago. The association of human elusive genes with transcriptomic and epigenomic characteristics illuminated that the genomic regions containing such genes were subject to repressive transcriptional regulation. Thus, the heterogeneous genomic features driving gene fates toward loss have been in place and may sometimes have relaxed the functional indispensability of such genes. This study sheds light on the complex interplay between gene function and local genomic properties in shaping gene evolution that has persisted since the vertebrate ancestor.
Collapse
Affiliation(s)
- Yuichiro Hara
- Research Center for Genome & Medical Sciences, Tokyo Metropolitan Institute of Medical Science, Tokyo, Japan
| | - Shigehiro Kuraku
- Molecular Life History Laboratory, Department of Genomics and Evolutionary Biology, National Institute of Genetics, Mishima, Japan
- Department of Genetics, Sokendai (Graduate University for Advanced Studies), Mishima, Japan
- RIKEN Center for Biosystems Dynamics Research, Kobe, Japan
| |
Collapse
|
8
|
Hao J, Liang Y, Ping J, Li J, Shi W, Su Y, Wang T. Chloroplast gene expression level is negatively correlated with evolutionary rates and selective pressure while positively with codon usage bias in Ophioglossum vulgatum L. BMC PLANT BIOLOGY 2022; 22:580. [PMID: 36510137 PMCID: PMC9746204 DOI: 10.1186/s12870-022-03960-8] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2022] [Accepted: 11/24/2022] [Indexed: 05/02/2023]
Abstract
BACKGROUND Characterization of the key factors determining gene expression level has been of significant interest. Previous studies on the relationship among evolutionary rates, codon usage bias, and expression level mostly focused on either nuclear genes or unicellular/multicellular organisms but few in chloroplast (cp) genes. Ophioglossum vulgatum is a unique fern and has important scientific and medicinal values. In this study, we sequenced its cp genome and transcriptome to estimate the evolutionary rates (dN and dS), selective pressure (dN/dS), gene expression level, codon usage bias, and their correlations. RESULTS The correlation coefficients between dN, dS, and dN/dS, and Transcripts Per Million (TPM) average values were -0.278 (P = 0.027 < 0.05), -0.331 (P = 0.008 < 0.05), and -0.311 (P = 0.013 < 0.05), respectively. The codon adaptation index (CAI) and tRNA adaptation index (tAI) were significantly positively correlated with TPM average values (P < 0.05). CONCLUSIONS Our results indicated that when the gene expression level was higher, the evolutionary rates and selective pressure were lower, but the codon usage bias was stronger. We provided evidence from cp gene data which supported the E-R (E stands for gene expression level and R stands for evolutionary rate) anti-correlation.
Collapse
Affiliation(s)
- Jing Hao
- College of Life Sciences, South China Agricultural University, Guangzhou, 510642, China
| | - Yingyi Liang
- College of Life Sciences, South China Agricultural University, Guangzhou, 510642, China
| | - Jingyao Ping
- College of Life Sciences, South China Agricultural University, Guangzhou, 510642, China
| | - Jinye Li
- College of Life Sciences, South China Agricultural University, Guangzhou, 510642, China
| | - Wanxin Shi
- College of Life Sciences, South China Agricultural University, Guangzhou, 510642, China
| | - Yingjuan Su
- School of Life Sciences, Sun Yat-sen University, Guangzhou, 510275, China.
- Research Institute of Sun Yat-sen University in Shenzhen, Shenzhen, 518057, China.
| | - Ting Wang
- College of Life Sciences, South China Agricultural University, Guangzhou, 510642, China.
| |
Collapse
|
9
|
Bédard C, Cisneros AF, Jordan D, Landry CR. Correlation between protein abundance and sequence conservation: what do recent experiments say? Curr Opin Genet Dev 2022; 77:101984. [PMID: 36162152 DOI: 10.1016/j.gde.2022.101984] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Revised: 08/23/2022] [Accepted: 08/26/2022] [Indexed: 01/27/2023]
Abstract
Cells evolve in a space of parameter values set by physical and chemical forces. These constraints create associations among cellular properties. A particularly strong association is the negative correlation between the rate of evolution of proteins and their abundance in the cell. Highly expressed proteins evolve slower than lowly expressed ones. Multiple hypotheses have been put forward to explain this relationship, including, for instance, the requirement for higher mRNA stability, misfolding avoidance, and misinteraction avoidance for highly expressed proteins. Here, we review some of these hypotheses, their predictions, and how they are supported to finally discuss recent experiments that have been performed to test these predictions.
Collapse
Affiliation(s)
- Camille Bédard
- Département de Biologie, Faculté des Sciences et de Génie, Université Laval, G1V 0A6, Canada; Institut de Biologie Intégrative et des Systèmes, Université Laval, G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, G1V 0A6, Canada; Centre de Recherche sur les Données Massives, Université Laval, G1V 0A6, Canada. https://twitter.com/@CamilleBed17
| | - Angel F Cisneros
- Institut de Biologie Intégrative et des Systèmes, Université Laval, G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, G1V 0A6, Canada; Centre de Recherche sur les Données Massives, Université Laval, G1V 0A6, Canada; Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, G1V 0A6, Canada. https://twitter.com/@AngelFCC119
| | - David Jordan
- Institut de Biologie Intégrative et des Systèmes, Université Laval, G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, G1V 0A6, Canada; Centre de Recherche sur les Données Massives, Université Laval, G1V 0A6, Canada; Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, G1V 0A6, Canada. https://twitter.com/@DavidJordan1997
| | - Christian R Landry
- Département de Biologie, Faculté des Sciences et de Génie, Université Laval, G1V 0A6, Canada; Institut de Biologie Intégrative et des Systèmes, Université Laval, G1V 0A6, Canada; PROTEO, Le regroupement québécois de recherche sur la fonction, l'ingénierie et les applications des protéines, Université Laval, G1V 0A6, Canada; Centre de Recherche sur les Données Massives, Université Laval, G1V 0A6, Canada; Département de Biochimie, de Microbiologie et de Bio-informatique, Faculté des Sciences et de Génie, Université Laval, G1V 0A6, Canada.
| |
Collapse
|
10
|
Moldovan MA, Gaydukova SA. Unusual Dependence between Gene Expression and Negative Selection in Euplotes. Mol Biol 2022. [DOI: 10.1134/s0026893323010090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
11
|
Shibai A, Kotani H, Sakata N, Furusawa C, Tsuru S. Purifying selection enduringly acts on the sequence evolution of highly expressed proteins in Escherichia coli. G3 GENES|GENOMES|GENETICS 2022; 12:6694045. [PMID: 36073932 PMCID: PMC9635659 DOI: 10.1093/g3journal/jkac235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/04/2022] [Accepted: 08/27/2022] [Indexed: 11/17/2022]
Abstract
The evolutionary speed of a protein sequence is constrained by its expression level, with highly expressed proteins evolving relatively slowly. This negative correlation between expression levels and evolutionary rates (known as the E–R anticorrelation) has already been widely observed in past macroevolution between species from bacteria to animals. However, it remains unclear whether this seemingly general law also governs recent evolution, including past and de novo, within a species. However, the advent of genomic sequencing and high-throughput phenotyping, particularly for bacteria, has revealed fundamental gaps between the 2 evolutionary processes and has provided empirical data opposing the possible underlying mechanisms which are widely believed. These conflicts raise questions about the generalization of the E–R anticorrelation and the relevance of plausible mechanisms. To explore the ubiquitous impact of expression levels on molecular evolution and test the relevance of the possible underlying mechanisms, we analyzed the genome sequences of 99 strains of Escherichia coli for evolution within species in nature. We also analyzed genomic mutations accumulated under laboratory conditions as a model of de novo evolution within species. Here, we show that E–R anticorrelation is significant in both past and de novo evolution within species in E. coli. Our data also confirmed ongoing purifying selection on highly expressed genes. Ongoing selection included codon-level purifying selection, supporting the relevance of the underlying mechanisms. However, the impact of codon-level purifying selection on the constraints in evolution within species might be smaller than previously expected from evolution between species.
Collapse
Affiliation(s)
- Atsushi Shibai
- Center for Biosystems Dynamics Research (BDR), RIKEN , Osaka 565-0874, Japan
| | - Hazuki Kotani
- Center for Biosystems Dynamics Research (BDR), RIKEN , Osaka 565-0874, Japan
| | - Natsue Sakata
- Center for Biosystems Dynamics Research (BDR), RIKEN , Osaka 565-0874, Japan
| | - Chikara Furusawa
- Center for Biosystems Dynamics Research (BDR), RIKEN , Osaka 565-0874, Japan
- Universal Biology Institute, School of Science, The University of Tokyo , Tokyo 113-0033, Japan
| | - Saburo Tsuru
- Universal Biology Institute, School of Science, The University of Tokyo , Tokyo 113-0033, Japan
| |
Collapse
|
12
|
Sarkar C, Alvarez-Ponce D. Extracellular domains of transmembrane proteins defy the expression level-evolutionary rate anticorrelation. Genome Biol Evol 2021; 14:6402012. [PMID: 34665250 PMCID: PMC8755491 DOI: 10.1093/gbe/evab235] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/14/2021] [Indexed: 11/13/2022] Open
Abstract
Highly expressed proteins tend to evolve slowly, a trend known as the expression level-rate of evolution (E-R) anticorrelation. Whereas the reasons for this anticorrelation remain unclear, the most influential hypotheses attribute it to highly expressed proteins being subjected to strong selective pressures to avoid misfolding and/or misinteraction. In accordance with these hypotheses, work in our laboratory has recently shown that extracellular (secreted) proteins lack an E-R anticorrelation (or exhibit a weaker than usual E-R anticorrelation). Extracellular proteins are folded inside the endoplasmic reticulum, where enhanced quality control of folding mechanisms exist, and function in the extracellular space, where misinteraction is unlikely to occur or to produce deleterious effects. Transmembrane proteins contain both intracellular domains (which are folded and function in the cytosol) and extracellular domains (which complete their folding in the endoplasmic reticulum and function in the extracellular space). We thus hypothesized that the extracellular domains of transmembrane proteins should exhibit a weaker E-R anticorrelation than their intracellular domains. Our analyses of human, Saccharomyces and Arabidopsis transmembrane proteins allowed us to confirm our hypothesis. Our results are in agreement with models attributing the E-R anticorrelation to the deleterious effects of misfolding and/or misinteraction.
Collapse
Affiliation(s)
- Chandra Sarkar
- Department of Biology, University of Nevada, Reno, NV, USA
| | | |
Collapse
|
13
|
Latrille T, Lartillot N. Quantifying the impact of changes in effective population size and expression level on the rate of coding sequence evolution. Theor Popul Biol 2021; 142:57-66. [PMID: 34563555 DOI: 10.1016/j.tpb.2021.09.005] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 09/08/2021] [Accepted: 09/11/2021] [Indexed: 02/07/2023]
Abstract
Molecular sequences are shaped by selection, where the strength of selection relative to drift is determined by effective population size (Ne). Populations with high Ne are expected to undergo stronger purifying selection, and consequently to show a lower substitution rate for selected mutations relative to the substitution rate for neutral mutations (ω). However, computational models based on biophysics of protein stability have suggested that ω can also be independent of Ne. Together, the response of ω to changes in Ne depends on the specific mapping from sequence to fitness. Importantly, an increase in protein expression level has been found empirically to result in decrease of ω, an observation predicted by theoretical models assuming selection for protein stability. Here, we derive a theoretical approximation for the response of ω to changes in Ne and expression level, under an explicit genotype-phenotype-fitness map. The method is generally valid for additive traits and log-concave fitness functions. We applied these results to protein undergoing selection for their conformational stability and corroborate out findings with simulations under more complex models. We predict a weak response of ω to changes in either Ne or expression level, which are interchangeable. Based on empirical data, we propose that fitness based on the conformational stability may not be a sufficient mechanism to explain the empirically observed variation in ω across species. Other aspects of protein biophysics might be explored, such as protein-protein interactions, which can lead to a stronger response of ω to changes in Ne.
Collapse
Affiliation(s)
- T Latrille
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69622 Villeurbanne, France; École Normale Supérieure de Lyon, Université de Lyon, Université Lyon 1, Lyon, France.
| | - N Lartillot
- Université de Lyon, Université Lyon 1, CNRS, Laboratoire de Biométrie et Biologie Évolutive UMR 5558, F-69622 Villeurbanne, France
| |
Collapse
|
14
|
Variables Influencing Differences in Sequence Conservation in the Fission Yeast Schizosaccharomyces pombe. J Mol Evol 2021; 89:601-610. [PMID: 34436628 PMCID: PMC8599406 DOI: 10.1007/s00239-021-10028-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Accepted: 08/17/2021] [Indexed: 11/17/2022]
Abstract
Which variables determine the constraints on gene sequence evolution is one of the most central questions in molecular evolution. In the fission yeast Schizosaccharomyces pombe, an important model organism, the variables influencing the rate of sequence evolution have yet to be determined. Previous studies in other single celled organisms have generally found gene expression levels to be most significant, with numerous other variables such as gene length and functional importance identified as having a smaller impact. Using publicly available data, we used partial least squares regression, principal components regression, and partial correlations to determine the variables most strongly associated with sequence evolution constraints. We identify centrality in the protein–protein interactions network, amino acid composition, and cellular location as the most important determinants of sequence conservation. However, each factor only explains a small amount of variance, and there are numerous variables having a significant or heterogeneous influence. Our models explain more than half of the variance in dN, raising the possibility that future refined models could quantify the role of stochastics in evolutionary rate variation.
Collapse
|
15
|
Biesiadecka MK, Sliwa P, Tomala K, Korona R. An Overexpression Experiment Does Not Support the Hypothesis That Avoidance of Toxicity Determines the Rate of Protein Evolution. Genome Biol Evol 2021; 12:589-596. [PMID: 32259256 PMCID: PMC7250497 DOI: 10.1093/gbe/evaa067] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/01/2020] [Indexed: 12/22/2022] Open
Abstract
The misfolding avoidance hypothesis postulates that sequence mutations render proteins cytotoxic and therefore the higher the gene expression, the stronger the operation of selection against substitutions. This translates into prediction that relative toxicity of extant proteins is higher for those evolving faster. In the present experiment, we selected pairs of yeast genes which were paralogous but evolving at different rates. We expressed them artificially to high levels. We expected that toxicity would be higher for ones bearing more mutations, especially that overcrowding should rather exacerbate than reverse the already existing differences in misfolding rates. We did find that the applied mode of overexpression caused a considerable decrease in fitness and that the decrease was proportional to the amount of excessive protein. However, it was not higher for proteins which are normally expressed at lower levels (and have less conserved sequence). This result was obtained consistently, regardless whether the rate of growth or ability to compete in common cultures was used as a proxy for fitness. In additional experiments, we applied factors that reduce accuracy of translation or enhance structural instability of proteins. It did not change a consistent pattern of independence between the fitness cost caused by overexpression of a protein and the rate of its sequence evolution.
Collapse
Affiliation(s)
| | - Piotr Sliwa
- Department of Genetics, Faculty of Biotechnology, University of Rzeszów, Poland
| | - Katarzyna Tomala
- Institute of Environmental Sciences, Faculty of Biology, Jagiellonian University, Cracow, Poland
| | - Ryszard Korona
- Institute of Environmental Sciences, Faculty of Biology, Jagiellonian University, Cracow, Poland
| |
Collapse
|
16
|
Razban RM, Dasmeh P, Serohijos AWR, Shakhnovich EI. Avoidance of protein unfolding constrains protein stability in long-term evolution. Biophys J 2021; 120:2413-2424. [PMID: 33932438 PMCID: PMC8390877 DOI: 10.1016/j.bpj.2021.03.042] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2020] [Revised: 02/24/2021] [Accepted: 03/17/2021] [Indexed: 11/28/2022] Open
Abstract
Every amino acid residue can influence a protein's overall stability, making stability highly susceptible to change throughout evolution. We consider the distribution of protein stabilities evolutionarily permittable under two previously reported protein fitness functions: flux dynamics and misfolding avoidance. We develop an evolutionary dynamics theory and find that it agrees better with an extensive protein stability data set for dihydrofolate reductase orthologs under the misfolding avoidance fitness function rather than the flux dynamics fitness function. Further investigation with ribonuclease H data demonstrates that not any misfolded state is avoided; rather, it is only the unfolded state. At the end, we discuss how our work pertains to the universal protein abundance-evolutionary rate correlation seen across organisms' proteomes. We derive a closed-form expression relating protein abundance to evolutionary rate that captures Escherichia coli, Saccharomyces cerevisiae, and Homo sapiens experimental trends without fitted parameters.
Collapse
Affiliation(s)
- Rostam M Razban
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts
| | - Pouria Dasmeh
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts; Departement de Biochimie, Université de Montréal, Montreal, Quebec, Canada
| | | | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts.
| |
Collapse
|
17
|
Dubreuil B, Levy ED. Abundance Imparts Evolutionary Constraints of Similar Magnitude on the Buried, Surface, and Disordered Regions of Proteins. Front Mol Biosci 2021; 8:626729. [PMID: 33996892 PMCID: PMC8119896 DOI: 10.3389/fmolb.2021.626729] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2020] [Accepted: 03/29/2021] [Indexed: 12/02/2022] Open
Abstract
An understanding of the forces shaping protein conservation is key, both for the fundamental knowledge it represents and to allow for optimal use of evolutionary information in practical applications. Sequence conservation is typically examined at one of two levels. The first is a residue-level, where intra-protein differences are analyzed and the second is a protein-level, where inter-protein differences are studied. At a residue level, we know that solvent-accessibility is a prime determinant of conservation. By inverting this logic, we inferred that disordered regions are slightly more solvent-accessible on average than the most exposed surface residues in domains. By integrating abundance information with evolutionary data within and across proteins, we confirmed a previously reported strong surface-core association in the evolution of structured regions, but we found a comparatively weak association between disordered and structured regions. The facts that disordered and structured regions experience different structural constraints and evolve independently provide a unique setup to examine an outstanding question: why is a protein’s abundance the main determinant of its sequence conservation? Indeed, any structural or biophysical property linked to the abundance-conservation relationship should increase the relative conservation of regions concerned with that property (e.g., disordered residues with mis-interactions, domain residues with misfolding). Surprisingly, however, we found the conservation of disordered and structured regions to increase in equal proportion with abundance. This observation implies that either abundance-related constraints are structure-independent, or multiple constraints apply to different regions and perfectly balance each other.
Collapse
Affiliation(s)
- Benjamin Dubreuil
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Emmanuel D Levy
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
18
|
Wei C, Chen YM, Chen Y, Qian W. The Missing Expression Level-Evolutionary Rate Anticorrelation in Viruses Does Not Support Protein Function as a Main Constraint on Sequence Evolution. Genome Biol Evol 2021; 13:evab049. [PMID: 33713114 PMCID: PMC7989579 DOI: 10.1093/gbe/evab049] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/06/2021] [Indexed: 12/13/2022] Open
Abstract
One of the central goals in molecular evolutionary biology is to determine the sources of variation in the rate of sequence evolution among proteins. Gene expression level is widely accepted as the primary determinant of protein evolutionary rate, because it scales with the extent of selective constraints imposed on a protein, leading to the well-known negative correlation between expression level and protein evolutionary rate (the E-R anticorrelation). Selective constraints have been hypothesized to entail the maintenance of protein function, the avoidance of cytotoxicity caused by protein misfolding or nonspecific protein-protein interactions, or both. However, empirical tests evaluating the relative importance of these hypotheses remain scarce, likely due to the nontrivial difficulties in distinguishing the effect of a deleterious mutation on a protein's function versus its cytotoxicity. We realized that examining the sequence evolution of viral proteins could overcome this hurdle. It is because purifying selection against mutations in a viral protein that result in cytotoxicity per se is likely relaxed, whereas purifying selection against mutations that impair viral protein function persists. Multiple analyses of SARS-CoV-2 and nine other virus species revealed a complete absence of any E-R anticorrelation. As a control, the E-R anticorrelation does exist in human endogenous retroviruses where purifying selection against cytotoxicity is present. Taken together, these observations do not support the maintenance of protein function as the main constraint on protein sequence evolution in cellular organisms.
Collapse
Affiliation(s)
- Changshuo Wei
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Yan-Ming Chen
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Ying Chen
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China
| | - Wenfeng Qian
- State Key Laboratory of Plant Genomics, Institute of Genetics and Developmental Biology, Innovation Academy for Seed Design, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
19
|
Usmanova DR, Plata G, Vitkup D. The Relationship between the Misfolding Avoidance Hypothesis and Protein Evolutionary Rates in the Light of Empirical Evidence. Genome Biol Evol 2021; 13:6081017. [PMID: 33432359 PMCID: PMC7874998 DOI: 10.1093/gbe/evab006] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/07/2021] [Indexed: 12/14/2022] Open
Abstract
For more than a decade, the misfolding avoidance hypothesis (MAH) and related theories have dominated evolutionary discussions aimed at explaining the variance of the molecular clock across cellular proteins. In this study, we use various experimental data to further investigate the consistency of the MAH predictions with empirical evidence. We also critically discuss experimental results that motivated the MAH development and that are often viewed as evidence of its major contribution to the variability of protein evolutionary rates. We demonstrate, in Escherichia coli and Homo sapiens, the lack of a substantial negative correlation between protein evolutionary rates and Gibbs free energies of unfolding, a direct measure of protein stability. We then analyze multiple new genome-scale data sets characterizing protein aggregation and interaction propensities, the properties that are likely optimized in evolution to alleviate deleterious effects associated with toxic protein misfolding and misinteractions. Our results demonstrate that the propensity of proteins to aggregate, the fraction of charged amino acids, and protein stickiness do correlate with protein abundances. Nevertheless, across multiple organisms and various data sets we do not observe substantial correlations between proteins’ aggregation- and stability-related properties and evolutionary rates. Therefore, diverse empirical data support the conclusion that the MAH and similar hypotheses do not play a major role in mediating a strong negative correlation between protein expression and the molecular clock, and thus in explaining the variability of evolutionary rates across cellular proteins.
Collapse
Affiliation(s)
- Dinara R Usmanova
- Department of Systems Biology, Columbia University, New York, NY, USA
| | - Germán Plata
- Department of Systems Biology, Columbia University, New York, NY, USA.,Elanco Animal Health, Greenfield, IN, USA
| | - Dennis Vitkup
- Department of Systems Biology, Columbia University, New York, NY, USA.,Department of Biomedical Informatics, Columbia University, New York, NY, USA
| |
Collapse
|
20
|
Sun M, Zhang J. Chromosome-wide co-fluctuation of stochastic gene expression in mammalian cells. PLoS Genet 2019; 15:e1008389. [PMID: 31525198 PMCID: PMC6762216 DOI: 10.1371/journal.pgen.1008389] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Revised: 09/26/2019] [Accepted: 08/28/2019] [Indexed: 12/31/2022] Open
Abstract
Gene expression is subject to stochastic noise, but to what extent and by which means such stochastic variations are coordinated among different genes are unclear. We hypothesize that neighboring genes on the same chromosome co-fluctuate in expression because of their common chromatin dynamics, and verify it at the genomic scale using allele-specific single-cell RNA-sequencing data of mouse cells. Unexpectedly, the co-fluctuation extends to genes that are over 60 million bases apart. We provide evidence that this long-range effect arises in part from chromatin co-accessibilities of linked loci attributable to three-dimensional proximity, which is much closer intra-chromosomally than inter-chromosomally. We further show that genes encoding components of the same protein complex tend to be chromosomally linked, likely resulting from natural selection for intracellular among-component dosage balance. These findings have implications for both the evolution of genome organization and optimal design of synthetic genomes in the face of gene expression noise.
Collapse
Affiliation(s)
- Mengyi Sun
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, United States of America
| | - Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, Ann Arbor, MI, United States of America
| |
Collapse
|
21
|
Fesenko I, Kirov I, Kniazev A, Khazigaleeva R, Lazarev V, Kharlampieva D, Grafskaia E, Zgoda V, Butenko I, Arapidi G, Mamaeva A, Ivanov V, Govorun V. Distinct types of short open reading frames are translated in plant cells. Genome Res 2019; 29:1464-1477. [PMID: 31387879 PMCID: PMC6724668 DOI: 10.1101/gr.253302.119] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2019] [Accepted: 08/01/2019] [Indexed: 02/07/2023]
Abstract
Genomes contain millions of short (<100 codons) open reading frames (sORFs), which are usually dismissed during gene annotation. Nevertheless, peptides encoded by such sORFs can play important biological roles, and their impact on cellular processes has long been underestimated. Here, we analyzed approximately 70,000 transcribed sORFs in the model plant Physcomitrella patens (moss). Several distinct classes of sORFs that differ in terms of their position on transcripts and the level of evolutionary conservation are present in the moss genome. Over 5000 sORFs were conserved in at least one of 10 plant species examined. Mass spectrometry analysis of proteomic and peptidomic data sets suggested that tens of sORFs located on distinct parts of mRNAs and long noncoding RNAs (lncRNAs) are translated, including conserved sORFs. Translational analysis of the sORFs and main ORFs at a single locus suggested the existence of genes that code for multiple proteins and peptides with tissue-specific expression. Functional analysis of four lncRNA-encoded peptides showed that sORFs-encoded peptides are involved in regulation of growth and differentiation in moss. Knocking out lncRNA-encoded peptides resulted in a decrease of moss growth. In contrast, the overexpression of these peptides resulted in a diverse range of phenotypic effects. Our results thus open new avenues for discovering novel, biologically active peptides in the plant kingdom.
Collapse
Affiliation(s)
- Igor Fesenko
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Ilya Kirov
- Laboratory of marker-assisted and genomic selection of plants, All-Russian Research Institute of Agricultural Biotechnology, 127550 Moscow, Russian Federation
| | - Andrey Kniazev
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Regina Khazigaleeva
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Vassili Lazarev
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation.,Moscow Institute of Physics and Technology (National Research University), 141701 Dolgoprudny, Moscow Region, Russian Federation
| | - Daria Kharlampieva
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation
| | - Ekaterina Grafskaia
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation.,Moscow Institute of Physics and Technology (National Research University), 141701 Dolgoprudny, Moscow Region, Russian Federation
| | - Viktor Zgoda
- Laboratory of System Biology, Institute of Biomedical Chemistry, 119121 Moscow, Russian Federation
| | - Ivan Butenko
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation
| | - Georgy Arapidi
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation.,Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation
| | - Anna Mamaeva
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Vadim Ivanov
- Shemyakin and Ovchinnikov Institute of Bioorganic Chemistry of the Russian Academy of Sciences, 117997 Moscow, Russian Federation
| | - Vadim Govorun
- Federal Research and Clinical Center of Physical-Chemical Medicine of Federal Medical Biological Agency, 119435 Moscow, Russian Federation
| |
Collapse
|
22
|
Lipinska AP, Serrano-Serrano ML, Cormier A, Peters AF, Kogame K, Cock JM, Coelho SM. Rapid turnover of life-cycle-related genes in the brown algae. Genome Biol 2019; 20:35. [PMID: 30764885 PMCID: PMC6374913 DOI: 10.1186/s13059-019-1630-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Accepted: 01/16/2019] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Sexual life cycles in eukaryotes involve a cyclic alternation between haploid and diploid phases. While most animals possess a diploid life cycle, many plants and algae alternate between multicellular haploid (gametophyte) and diploid (sporophyte) generations. In many algae, gametophytes and sporophytes are independent and free-living and may present dramatic phenotypic differences. The same shared genome can therefore be subject to different, even conflicting, selection pressures during each of the life cycle generations. Here, we analyze the nature and extent of genome-wide, generation-biased gene expression in four species of brown algae with contrasting levels of dimorphism between life cycle generations. RESULTS We show that the proportion of the transcriptome that is generation-specific is broadly associated with the level of phenotypic dimorphism between the life cycle stages. Importantly, our data reveals a remarkably high turnover rate for life-cycle-related gene sets across the brown algae and highlights the importance not only of co-option of regulatory programs from one generation to the other but also of a role for newly emerged, lineage-specific gene expression patterns in the evolution of the gametophyte and sporophyte developmental programs in this major eukaryotic group. Moreover, we show that generation-biased genes display distinct evolutionary modes, with gametophyte-biased genes evolving rapidly at the coding sequence level whereas sporophyte-biased genes tend to exhibit changes in their patterns of expression. CONCLUSION Our analysis uncovers the characteristics, expression patterns, and evolution of generation-biased genes and underlines the selective forces that shape this previously underappreciated source of phenotypic diversity.
Collapse
Affiliation(s)
- Agnieszka P Lipinska
- Sorbonne Université, UPMC Univ Paris 06, CNRS, Algal Genetics Group, Integrative Biology of Marine Models, Station Biologique de Roscoff, CS 90074, F-29688, Roscoff, France
| | | | - Alexandre Cormier
- Laboratoire Ecologie et Biologie des Interactions, Equipe Ecologie Evolution Symbiose, Université de Poitiers, UMR CNRS 7267, Poitiers, France
| | | | - Kazuhiro Kogame
- Department of Biological Sciences, Faculty of Sciences, Hokkaido University, Sapporo, 060-0810, Japan
| | - J Mark Cock
- Sorbonne Université, UPMC Univ Paris 06, CNRS, Algal Genetics Group, Integrative Biology of Marine Models, Station Biologique de Roscoff, CS 90074, F-29688, Roscoff, France
| | - Susana M Coelho
- Sorbonne Université, UPMC Univ Paris 06, CNRS, Algal Genetics Group, Integrative Biology of Marine Models, Station Biologique de Roscoff, CS 90074, F-29688, Roscoff, France.
| |
Collapse
|
23
|
Izquierdo A, Fahrenberger M, Persampieri T, Benedict MQ, Giles T, Catteruccia F, Emes RD, Dottorini T. Evolution of gene expression levels in the male reproductive organs of Anopheles mosquitoes. Life Sci Alliance 2019; 2:e201800191. [PMID: 30623175 PMCID: PMC6315087 DOI: 10.26508/lsa.201800191] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Revised: 12/21/2018] [Accepted: 12/21/2018] [Indexed: 12/31/2022] Open
Abstract
Modifications in gene expression determine many of the phenotypic differentiations between closely related species. This is particularly evident in reproductive tissues, where evolution of genes is more rapid, facilitating the appearance of distinct reproductive characteristics which may lead to species isolation and phenotypic variation. Large-scale, comparative analyses of transcript expression levels have been limited until recently by lack of inter-species data mining solutions. Here, by combining expression normalisation across lineages, multivariate statistical analysis, evolutionary rate, and protein-protein interaction analysis, we investigate ortholog transcripts in the male accessory glands and testes across five closely related species in the Anopheles gambiae complex. We first demonstrate that the differentiation by transcript expression is consistent with the known Anopheles phylogeny. Then, through clustering, we discover groups of transcripts with tissue-dependent expression patterns conserved across lineages, or lineage-dependent patterns conserved across tissues. The strongest associations with reproductive function, transcriptional regulatory networks, protein-protein subnetworks, and evolutionary rate are found for the groups of transcripts featuring large expression differences in lineage or tissue-conserved patterns.
Collapse
Affiliation(s)
- Abril Izquierdo
- School of Veterinary Medicine and Science, Sutton Bonington Campus, University of Nottingham, Leicestershire, UK
| | - Martin Fahrenberger
- School of Veterinary Medicine and Science, Sutton Bonington Campus, University of Nottingham, Leicestershire, UK
| | - Tania Persampieri
- Department of Experimental Medicine, University of Perugia, Perugia, Italy
| | - Mark Q Benedict
- Centers for Disease Control and Prevention, Division of Parasitic Diseases and Malaria, Entomology Branch, Atlanta, GA, USA
| | - Tom Giles
- Advanced Data Analysis Centre, Sutton Bonington Campus, University of Nottingham, Leicestershire, UK
| | - Flaminia Catteruccia
- Department of Immunology and Infectious Diseases, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Richard D Emes
- School of Veterinary Medicine and Science, Sutton Bonington Campus, University of Nottingham, Leicestershire, UK.,Advanced Data Analysis Centre, Sutton Bonington Campus, University of Nottingham, Leicestershire, UK
| | - Tania Dottorini
- School of Veterinary Medicine and Science, Sutton Bonington Campus, University of Nottingham, Leicestershire, UK
| |
Collapse
|
24
|
Song H, Sun J, Yang G. Comparative analysis of selection mode reveals different evolutionary rate and expression pattern in Arachis duranensis and Arachis ipaënsis duplicated genes. PLANT MOLECULAR BIOLOGY 2018; 98:349-361. [PMID: 30298428 DOI: 10.1007/s11103-018-0784-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Accepted: 09/28/2018] [Indexed: 06/08/2023]
Abstract
Our results reveal that Ks is a determining factor affecting selective pressure and different evolution and expression patterns are detected between PSGs and NSGs in wild Arachis duplicates. Selective pressure, including purifying (negative) and positive selection, can be detected in organisms. However, studies on comparative evolutionary rates, gene expression patterns and gene features between negatively selected genes (NSGs) and positively selected genes (PSGs) are lagging in paralogs of plants. Arachis duranensis and Arachis ipaënsis are ancestors of the cultivated peanut, an important oil and protein crop. Here, we carried out a series of systematic analyses, comparing NSG and PSG in paralogs, using genome sequences and transcriptome datasets in A. duranensis and A. ipaënsis. We found that synonymous substitution rate (Ks) is a determining factor affecting selective pressure in A. duranensis and A. ipaënsis duplicated genes. Lower expression level, lower gene expression breadth, higher codon bias and shorter polypeptide length were found in PSGs and not in NSGs. The correlation analyses showed that gene expression breadth was positively correlated with polypeptide length and GC content at the first codon site (GC1) in PSGs and NSGs, respectively. There was a negative correlation between expression level and polypeptide length in PSGs. In NSGs, the Ks was positively correlated with expression level, gene expression breadth, GC1, and GC content at the third codon site (GC3), but selective pressure was negatively correlated with expression level, gene expression breadth, polypeptide length, GC1, and GC3 content. The function of most duplicated gene pairs was divergent under drought and nematode stress. Taken together, our results show that different evolution and expression patterns occur between PSGs and NSGs in paralogs of two wild Arachis species.
Collapse
Affiliation(s)
- Hui Song
- Grassland Agri-husbandry Research Center, Qingdao Agricultural University, 700# Changcheng Road, Qingdao, China.
| | - Juan Sun
- Grassland Agri-husbandry Research Center, Qingdao Agricultural University, 700# Changcheng Road, Qingdao, China
| | - Guofeng Yang
- Grassland Agri-husbandry Research Center, Qingdao Agricultural University, 700# Changcheng Road, Qingdao, China.
| |
Collapse
|
25
|
Abstract
From bacteria to humans, ancient stress responses enable organisms to contend with damage to both the genome and the proteome. These pathways have long been viewed as fundamentally separate responses. Yet recent discoveries from multiple fields have revealed surprising links between the two. Many DNA-damaging agents also target proteins, and mutagenesis induced by DNA damage produces variant proteins that are prone to misfolding, degradation, and aggregation. Likewise, recent studies have observed pervasive engagement of a p53-mediated response, and other factors linked to maintenance of genomic integrity, in response to misfolded protein stress. Perhaps most remarkably, protein aggregation and self-assembly has now been observed in multiple proteins that regulate the DNA damage response. The importance of these connections is highlighted by disease models of both cancer and neurodegeneration, in which compromised DNA repair machinery leads to profound defects in protein quality control, and vice versa.
Collapse
|
26
|
Song H, Gao H, Liu J, Tian P, Nan Z. Comprehensive analysis of correlations among codon usage bias, gene expression, and substitution rate in Arachis duranensis and Arachis ipaënsis orthologs. Sci Rep 2017; 7:14853. [PMID: 29093502 PMCID: PMC5665869 DOI: 10.1038/s41598-017-13981-1] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2017] [Accepted: 10/04/2017] [Indexed: 11/22/2022] Open
Abstract
The relationship between evolutionary rates and gene expression in model plant orthologs is well documented. However, little is known about the relationships between gene expression and evolutionary trends in Arachis orthologs. We identified 7,435 one-to-one orthologs, including 925 single-copy and 6,510 multiple-copy sequences in Arachis duranensis and Arachis ipaënsis. Codon usage was stronger for shorter polypeptides, which were encoded by codons with higher GC contents. Highly expressed coding sequences had higher codon usage bias, GC content, and expression breadth. Additionally, expression breadth was positively correlated with polypeptide length, but there was no correlation between gene expression and polypeptide length. Inferred selective pressure was also negatively correlated with both gene expression and expression breadth in all one-to-one orthologs, while positively but non-significantly correlated with gene expression in sequences with signatures of positive selection. Gene expression levels and expression breadth were significantly higher for single-copy genes than for multiple-copy genes. Similarly, the gene expression and expression breadth in sequences with signatures of purifying selection were higher than those of sequences with positive selective signatures. These results indicated that gene expression differed between single-copy and multiple-copy genes as well as sequences with signatures of positive and purifying selection.
Collapse
Affiliation(s)
- Hui Song
- State Key Laboratory of Grassland Agro-ecosystems, College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, 730000, China.
| | - Hongjuan Gao
- State Key Laboratory of Grassland Agro-ecosystems, College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, 730000, China
| | - Jing Liu
- State Key Laboratory of Grassland Agro-ecosystems, College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, 730000, China
| | - Pei Tian
- State Key Laboratory of Grassland Agro-ecosystems, College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, 730000, China
| | - Zhibiao Nan
- State Key Laboratory of Grassland Agro-ecosystems, College of Pastoral Agriculture Science and Technology, Lanzhou University, Lanzhou, 730000, China.
| |
Collapse
|
27
|
Echave J, Wilke CO. Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence. Annu Rev Biophys 2017; 46:85-103. [PMID: 28301766 DOI: 10.1146/annurev-biophys-070816-033819] [Citation(s) in RCA: 75] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
For decades, rates of protein evolution have been interpreted in terms of the vague concept of functional importance. Slowly evolving proteins or sites within proteins were assumed to be more functionally important and thus subject to stronger selection pressure. More recently, biophysical models of protein evolution, which combine evolutionary theory with protein biophysics, have completely revolutionized our view of the forces that shape sequence divergence. Slowly evolving proteins have been found to evolve slowly because of selection against toxic misfolding and misinteractions, linking their rate of evolution primarily to their abundance. Similarly, most slowly evolving sites in proteins are not directly involved in function, but mutating these sites has a large impact on protein structure and stability. In this article, we review the studies in the emerging field of biophysical protein evolution that have shaped our current understanding of sequence divergence patterns. We also propose future research directions to develop this nascent field.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina; .,Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Texas 78712;
| |
Collapse
|
28
|
Grusz AL, Rothfels CJ, Schuettpelz E. Transcriptome sequencing reveals genome-wide variation in molecular evolutionary rate among ferns. BMC Genomics 2016; 17:692. [PMID: 27577050 PMCID: PMC5006594 DOI: 10.1186/s12864-016-3034-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2016] [Accepted: 08/22/2016] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Transcriptomics in non-model plant systems has recently reached a point where the examination of nuclear genome-wide patterns in understudied groups is an achievable reality. This progress is especially notable in evolutionary studies of ferns, for which molecular resources to date have been derived primarily from the plastid genome. Here, we utilize transcriptome data in the first genome-wide comparative study of molecular evolutionary rate in ferns. We focus on the ecologically diverse family Pteridaceae, which comprises about 10 % of fern diversity and includes the enigmatic vittarioid ferns-an epiphytic, tropical lineage known for dramatically reduced morphologies and radically elongated phylogenetic branch lengths. Using expressed sequence data for 2091 loci, we perform pairwise comparisons of molecular evolutionary rate among 12 species spanning the three largest clades in the family and ask whether previously documented heterogeneity in plastid substitution rates is reflected in their nuclear genomes. We then inquire whether variation in evolutionary rate is being shaped by genes belonging to specific functional categories and test for differential patterns of selection. RESULTS We find significant, genome-wide differences in evolutionary rate for vittarioid ferns relative to all other lineages within the Pteridaceae, but we recover few significant correlations between faster/slower vittarioid loci and known functional gene categories. We demonstrate that the faster rates characteristic of the vittarioid ferns are likely not driven by positive selection, nor are they unique to any particular type of nucleotide substitution. CONCLUSIONS Our results reinforce recently reviewed mechanisms hypothesized to shape molecular evolutionary rates in vittarioid ferns and provide novel insight into substitution rate variation both within and among fern nuclear genomes.
Collapse
Affiliation(s)
- Amanda L. Grusz
- Department of Botany, Smithsonian Institution, MRC 166 PO Box 37012, Washington, DC, 20013-7012 USA
- Department of Biology, University of Minnesota Duluth, 1035 Kirby Drive, Duluth, MN 55812 USA
| | - Carl J. Rothfels
- Department of Integrative Biology, University of California Berkeley, 1001 Valley Life Sciences Building, Berkeley, CA 94720-2466 USA
| | - Eric Schuettpelz
- Department of Botany, Smithsonian Institution, MRC 166 PO Box 37012, Washington, DC, 20013-7012 USA
| |
Collapse
|
29
|
Price MN, Arkin AP. A Theoretical Lower Bound for Selection on the Expression Levels of Proteins. Genome Biol Evol 2016; 8:1917-28. [PMID: 27289091 PMCID: PMC4943197 DOI: 10.1093/gbe/evw126] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022] Open
Abstract
We use simple models of the costs and benefits of microbial gene expression to show that changing a protein's expression away from its optimum by 2-fold should reduce fitness by at least [Formula: see text], where P is the fraction the cell's protein that the gene accounts for. As microbial genes are usually expressed at above 5 parts per million, and effective population sizes are likely to be above 10(6), this implies that 2-fold changes to gene expression levels are under strong selection, as [Formula: see text], where Ne is the effective population size and s is the selection coefficient. Thus, most gene duplications should be selected against. On the other hand, we predict that for most genes, small changes in the expression will be effectively neutral.
Collapse
Affiliation(s)
- Morgan N Price
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Lab
| | - Adam P Arkin
- Environmental Genomics and Systems Biology, Lawrence Berkeley National Lab
| |
Collapse
|
30
|
Expression Differentiation Is Constrained to Low-Expression Proteins over Ecological Timescales. Genetics 2015; 202:273-83. [PMID: 26546003 DOI: 10.1534/genetics.115.180547] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2015] [Accepted: 11/04/2015] [Indexed: 02/03/2023] Open
Abstract
Protein expression level is one of the strongest predictors of protein sequence evolutionary rate, with high-expression protein sequences evolving at slower rates than low-expression protein sequences largely because of constraints on protein folding and function. Expression evolutionary rates also have been shown to be negatively correlated with expression level across human and mouse orthologs over relatively long divergence times (i.e., ∼100 million years). Long-term evolutionary patterns, however, often cannot be extrapolated to microevolutionary processes (and vice versa), and whether this relationship holds for traits evolving under directional selection within a single species over ecological timescales (i.e., <5000 years) is unknown and not necessarily expected. Expression is a metabolically costly process, and the expression level of a particular protein is predicted to be a tradeoff between the benefit of its function and the costs of its expression. Selection should drive the expression level of all proteins close to values that maximize fitness, particularly for high-expression proteins because of the increased energetic cost of production. Therefore, stabilizing selection may reduce the amount of standing expression variation for high-expression proteins, and in combination with physiological constraints that may place an upper bound on the range of beneficial expression variation, these constraints could severely limit the availability of beneficial expression variants. To determine whether rapid-expression evolution was restricted to low-expression proteins owing to these constraints on highly expressed proteins over ecological timescales, we compared venom protein expression levels across mainland and island populations for three species of pit vipers. We detected significant differentiation in protein expression levels in two of the three species and found that rapid-expression differentiation was restricted to low-expression proteins. Our results suggest that various constraints on high-expression proteins reduce the availability of beneficial expression variants relative to low-expression proteins, enabling low-expression proteins to evolve and potentially lead to more rapid adaptation.
Collapse
|
31
|
Abstract
The rate and mechanism of protein sequence evolution have been central questions in evolutionary biology since the 1960s. Although the rate of protein sequence evolution depends primarily on the level of functional constraint, exactly what determines functional constraint has remained unclear. The increasing availability of genomic data has enabled much needed empirical examinations on the nature of functional constraint. These studies found that the evolutionary rate of a protein is predominantly influenced by its expression level rather than functional importance. A combination of theoretical and empirical analyses has identified multiple mechanisms behind these observations and demonstrated a prominent role in protein evolution of selection against errors in molecular and cellular processes.
Collapse
Affiliation(s)
- Jianzhi Zhang
- Department of Ecology and Evolutionary Biology, University of Michigan, 830 North University Avenue, Ann Arbor, Michigan 48109, USA
| | - Jian-Rong Yang
- Department of Ecology and Evolutionary Biology, University of Michigan, 830 North University Avenue, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
32
|
Bush SJ, Kover PX, Urrutia AO. Lineage-specific sequence evolution and exon edge conservation partially explain the relationship between evolutionary rate and expression level in A. thaliana. Mol Ecol 2015; 24:3093-106. [PMID: 25930165 PMCID: PMC4480654 DOI: 10.1111/mec.13221] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2014] [Revised: 04/21/2015] [Accepted: 04/28/2015] [Indexed: 02/06/2023]
Abstract
Rapidly evolving proteins can aid the identification of genes underlying phenotypic adaptation across taxa, but functional and structural elements of genes can also affect evolutionary rates. In plants, the ‘edges’ of exons, flanking intron junctions, are known to contain splice enhancers and to have a higher degree of conservation compared to the remainder of the coding region. However, the extent to which these regions may be masking indicators of positive selection or account for the relationship between dN/dS and other genomic parameters is unclear. We investigate the effects of exon edge conservation on the relationship of dN/dS to various sequence characteristics and gene expression parameters in the model plant Arabidopsis thaliana. We also obtain lineage-specific dN/dS estimates, making use of the recently sequenced genome of Thellungiella parvula, the second closest sequenced relative after the sister species Arabidopsis lyrata. Overall, we find that the effect of exon edge conservation, as well as the use of lineage-specific substitution estimates, upon dN/dS ratios partly explains the relationship between the rates of protein evolution and expression level. Furthermore, the removal of exon edges shifts dN/dS estimates upwards, increasing the proportion of genes potentially under adaptive selection. We conclude that lineage-specific substitutions and exon edge conservation have an important effect on dN/dS ratios and should be considered when assessing their relationship with other genomic parameters.
Collapse
Affiliation(s)
- Stephen J Bush
- Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| | - Paula X Kover
- Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| | - Araxi O Urrutia
- Department of Biology and Biochemistry, University of Bath, Bath, BA2 7AY, UK
| |
Collapse
|
33
|
Codon-by-codon modulation of translational speed and accuracy via mRNA folding. PLoS Biol 2014; 12:e1001910. [PMID: 25051069 PMCID: PMC4106722 DOI: 10.1371/journal.pbio.1001910] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Accepted: 06/12/2014] [Indexed: 11/20/2022] Open
Abstract
Secondary structure in mRNAs modulates the speed of protein synthesis codon-by-codon to improve accuracy at important sites while ensuring high speed elsewhere. Rapid cell growth demands fast protein translational elongation to alleviate ribosome shortage. However, speedy elongation undermines translational accuracy because of a mechanistic tradeoff. Here we provide genomic evidence in budding yeast and mouse embryonic stem cells that the efficiency–accuracy conflict is alleviated by slowing down the elongation at structurally or functionally important residues to ensure their translational accuracies while sacrificing the accuracy for speed at other residues. Our computational analysis in yeast with codon resolution suggests that mRNA secondary structures serve as elongation brakes to control the speed and hence the fidelity of protein translation. The position-specific effect of mRNA folding on translational accuracy is further demonstrated experimentally by swapping synonymous codons in a yeast transgene. Our findings explain why highly expressed genes tend to have strong mRNA folding, slow translational elongation, and conserved protein sequences. The exquisite codon-by-codon translational modulation uncovered here is a testament to the power of natural selection in mitigating efficiency–accuracy conflicts, which are prevalent in biology. Protein synthesis by ribosomal translation is a vital cellular process, but our understanding of its regulation has been poor. Because the number of ribosomes in the cell is limited, rapid growth relies on fast translational elongation. The accuracy of translation must also be maintained, and in an ideal scenario, both speed and accuracy should be maximized to sustain rapid and productive growth. However, existing data suggest a tradeoff between speed and accuracy, making it impossible to simultaneously maximize both. A potential solution is slowing the elongation at functionally or structurally important sites to ensure their translational accuracies, while sacrificing accuracy for speed at other sites. Here, we show that budding yeast and mouse embryonic stem cells indeed use this strategy. We discover that a codon-by-codon adaptive modulation of translational elongation is accomplished by mRNA secondary structures, which serve as brakes to control the elongation speed and hence translational fidelity. Our findings explain why highly expressed genes tend to have strong mRNA folding, slow translational elongation, and conserved protein sequences. The exquisite translational modulation reflects the power of natural selection in mitigating efficiency–accuracy conflicts, and our study offers a general framework for analyzing similar conflicts, which are widespread in biology.
Collapse
|
34
|
Chang TY, Liao BY. Flagellated algae protein evolution suggests the prevalence of lineage-specific rules governing evolutionary rates of eukaryotic proteins. Genome Biol Evol 2013; 5:913-22. [PMID: 23563973 PMCID: PMC3673635 DOI: 10.1093/gbe/evt055] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open
Abstract
Understanding the general rules governing the rate of protein evolution is fundamental to evolutionary biology. However, attempts to address this issue in yeasts and mammals have revealed considerable differences in the relative importance of determinants for protein evolutionary rates. This phenomenon was previously explained by the fact that yeasts and mammals are different in many cellular and genomic properties. Flagellated algae species have several cellular and genomic characteristics that are intermediate between yeasts and mammals. Using partial correlation analyses on the evolution of 6,921 orthologous proteins from Chlamydomonas reinhardtii and Volvox carteri, we examined factors influencing evolutionary rates of proteins in flagellated algae. Previous studies have shown that mRNA abundance and gene compactness are strong determinants for protein evolutionary rates in yeasts and mammals, respectively. We show that both factors also influence algae protein evolution with mRNA abundance having a larger impact than gene compactness on the rates of algae protein evolution. More importantly, among all the factors examined, coding sequence (CDS) length has the strongest (positive) correlation with protein evolutionary rates. This correlation between CDS length and the rates of protein evolution is not due to alignment-related issues or domain density. These results suggest no simple and universal rules governing protein evolutionary rates across different eukaryotic lineages. Instead, gene properties influence the rate of protein evolution in a lineage-specific manner.
Collapse
Affiliation(s)
- Ting-Yan Chang
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Taiwan, Republic of China
| | | |
Collapse
|
35
|
Bush SJ, Castillo-Morales A, Tovar-Corona JM, Chen L, Kover PX, Urrutia AO. Presence-absence variation in A. thaliana is primarily associated with genomic signatures consistent with relaxed selective constraints. Mol Biol Evol 2013; 31:59-69. [PMID: 24072814 PMCID: PMC3879440 DOI: 10.1093/molbev/mst166] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
The sequencing of multiple genomes of the same plant species has revealed polymorphic gene and exon loss. Genes associated with disease resistance are overrepresented among those showing structural variations, suggesting an adaptive role for gene and exon presence–absence variation (PAV). To shed light on the possible functional relevance of polymorphic coding region loss and the mechanisms driving this process, we characterized genes that have lost entire exons or their whole coding regions in 17 fully sequenced Arabidopsis thaliana accessions. We found that although a significant enrichment in genes associated with certain functional categories is observed, PAV events are largely restricted to genes with signatures of reduced essentiality: PAV genes tend to be newer additions to the genome, tissue specific, and lowly expressed. In addition, PAV genes are located in regions of lower gene density and higher transposable element density. Partial coding region PAV events were associated with only a marginal reduction in gene expression level in the affected accession and occurred in genes with higher levels of alternative splicing in the Col-0 accession. Together, these results suggest that although adaptive scenarios cannot be ruled out, PAV events can be explained without invoking them.
Collapse
Affiliation(s)
- Stephen J Bush
- Department of Biology and Biochemistry, University of Bath, Bath, United Kingdom
| | | | | | | | | | | |
Collapse
|
36
|
Differential requirements for mRNA folding partially explain why highly expressed proteins evolve slowly. Proc Natl Acad Sci U S A 2013; 110:E678-86. [PMID: 23382244 DOI: 10.1073/pnas.1218066110] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The cause of the tremendous among-protein variation in the rate of sequence evolution is a central subject of molecular evolution. Expression level has been identified as a leading determinant of this variation among genes encoded in the same genome, but the underlying mechanisms are not fully understood. We here propose and demonstrate that a requirement for stronger folding of more abundant mRNAs results in slower evolution of more highly expressed genes and proteins. Specifically, we show that: (i) the higher the expression level of a gene, the greater the selective pressure for its mRNA to fold; (ii) random mutations are more likely to decrease mRNA folding when occurring in highly expressed genes than in lowly expressed genes; and (iii) amino acid substitution rate is negatively correlated with mRNA folding strength, with or without the control of expression level. Furthermore, synonymous (d(S)) and nonsynonymous (d(N)) nucleotide substitution rates are both negatively correlated with mRNA folding strength. However, counterintuitively, d(S) and d(N) are differentially constrained by selection for mRNA folding, resulting in a significant correlation between mRNA folding strength and d(N)/d(S), even when gene expression level is controlled. The direction and magnitude of this correlation is determined primarily by the G+C frequency at third codon positions. Together, these findings explain why highly expressed genes evolve slowly, demonstrate a major role of natural selection at the mRNA level in constraining protein evolution, and reveal a previously unrecognized and unexpected form of nonprotein-level selection that impacts d(N)/d(S).
Collapse
|
37
|
Park C, Qian W, Zhang J. Genomic evidence for elevated mutation rates in highly expressed genes. EMBO Rep 2012; 13:1123-9. [PMID: 23146897 DOI: 10.1038/embor.2012.165] [Citation(s) in RCA: 85] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2012] [Revised: 09/10/2012] [Accepted: 10/05/2012] [Indexed: 11/09/2022] Open
Abstract
Reporter gene assays have demonstrated both transcription-associated mutagenesis (TAM) and transcription-coupled repair, but the net impact of transcription on mutation rate remains unclear, especially at the genomic scale. Using comparative genomics of related species as well as mutation accumulation lines, we show in yeast that the rate of point mutation in a gene increases with the expression level of the gene. Transcription induces mutagenesis on both DNA strands, indicating simultaneous actions of several TAM mechanisms. A significant positive correlation is also detected between the human germline mutation rate and expression level. These results indicate that transcription is overall mutagenic.
Collapse
Affiliation(s)
- Chungoo Park
- Department of Ecology and Evolutionary Biology, University of Michigan, 1075 Natural Science Building, 830 North University Avenue, Ann Arbor, Michigan 48109, USA
| | | | | |
Collapse
|
38
|
Nabholz B, Ellegren H, Wolf JBW. High Levels of Gene Expression Explain the Strong Evolutionary Constraint of Mitochondrial Protein-Coding Genes. Mol Biol Evol 2012; 30:272-84. [DOI: 10.1093/molbev/mss238] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
|
39
|
Abstract
Much molecular-evolution research is concerned with sequence analysis. Yet these sequences represent real, three-dimensional molecules with complex structure and function. Here I highlight a growing trend in the field to incorporate molecular structure and function into computational molecular-evolution work. I consider three focus areas: reconstruction and analysis of past evolutionary events, such as phylogenetic inference or methods to infer selection pressures; development of toy models and simulations to identify fundamental principles of molecular evolution; and atom-level, highly realistic computational modeling of molecular structure and function aimed at making predictions about possible future evolutionary events.
Collapse
Affiliation(s)
- Claus O Wilke
- Institute of Cell and Molecular Biology, The University of Texas at Austin, Austin, Texas, United States of America.
| |
Collapse
|
40
|
Abstract
Horizontal gene transfer (HGT), the movement of genetic material from one species to another, is a common phenomenon in prokaryotic evolution. Although the rate of HGT is known to vary among genes, our understanding of the cause of this variation, currently summarized by two rules, is far from complete. The first rule states that informational genes, which are involved in DNA replication, transcription, and translation, have lower transferabilities than operational genes. The second rule asserts that protein interactivity negatively impacts gene transferability. Here, we hypothesize that high expression hampers HGT, because the fitness cost of an HGT to the recipient, arising from the 1) energy expenditure in transcription and translation, 2) cytotoxic protein misfolding, 3) reduction in cellular translational efficiency, 4) detrimental protein misinteraction, and 5) disturbance of the optimal protein concentration or cell physiology, increases with the expression level of the transferred gene. To test this hypothesis, we examined laboratory and natural HGTs to Escherichia coli. We observed lower transferabilities of more highly expressed genes, even after controlling the confounding factors from the two established rules and the genic GC content. Furthermore, expression level predicts gene transferability better than all other factors examined. We also confirmed the significant negative impact of gene expression on the rate of HGTs to 127 of 133 genomes of eubacteria and archaebacteria. Together, these findings establish the gene expression level as a major determinant of horizontal gene transferability. They also suggest that most successful HGTs are initially slightly deleterious, fixed because of their negligibly low costs rather than high benefits to the recipient.
Collapse
Affiliation(s)
- Chungoo Park
- Department of Ecology and Evolutionary Biology, University of Michigan, MI, USA
| | | |
Collapse
|
41
|
Protein misinteraction avoidance causes highly expressed proteins to evolve slowly. Proc Natl Acad Sci U S A 2012; 109:E831-40. [PMID: 22416125 DOI: 10.1073/pnas.1117408109] [Citation(s) in RCA: 129] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The tempo and mode of protein evolution have been central questions in biology. Genomic data have shown a strong influence of the expression level of a protein on its rate of sequence evolution (E-R anticorrelation), which is currently explained by the protein misfolding avoidance hypothesis. Here, we show that this hypothesis does not fully explain the E-R anticorrelation, especially for protein surface residues. We propose that natural selection against protein-protein misinteraction, which wastes functional molecules and is potentially toxic, constrains the evolution of surface residues. Because highly expressed proteins are under stronger pressures to avoid misinteraction, surface residues are expected to show an E-R anticorrelation. Our molecular-level evolutionary simulation and yeast genomic analysis confirm multiple predictions of the hypothesis. These findings show a pluralistic origin of the E-R anticorrelation and reveal the role of protein misinteraction, an inherent property of complex cellular systems, in constraining protein evolution.
Collapse
|
42
|
Moreira R, Balseiro P, Romero A, Dios S, Posada D, Novoa B, Figueras A. Gene expression analysis of clams Ruditapes philippinarum and Ruditapes decussatus following bacterial infection yields molecular insights into pathogen resistance and immunity. DEVELOPMENTAL AND COMPARATIVE IMMUNOLOGY 2012; 36:140-9. [PMID: 21756933 DOI: 10.1016/j.dci.2011.06.012] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/13/2011] [Revised: 06/24/2011] [Accepted: 06/25/2011] [Indexed: 05/15/2023]
Abstract
The carpet shell clam (Ruditapes decussatus) and Manila clam (Ruditapes philippinarum), which are cultured bivalve species with important commercial value, are affected by diseases that result in large economic losses. Because the molecular mechanism of the immune response of bivalves, especially clams, is scarce and fragmentary, we have examined all Expressed Sequence Tags (EST) resources available in public databases for these two species in order to increase our knowledge on genes related with the immune function in these animals. After automatic annotation and classification of the 3784 not-annotated ESTs of R. decussatus and 4607 of R. philippinarum found in GenBank, 424 ESTs of R. decussatus and 464 of R. philippinarum were found to be putatively involved in immune response. These were carefully reviewed and reannotated. As a result, 13 immune-related ESTs were selected and studied to compare the immune response of R. decussatus and R. philippinarum following a Vibrio alginolyticus challenge. Quantitative PCR was performed, and the expression of each EST was determined. The results showed that, in R. philippinarum, the immune response seems to be faster than that in R. decussatus. Additionally, expression of NF-κB activating genes in R. decussatus did not seem to be sufficient to promote an immune response after Vibrio infection. R. philippinarum, however, was able to trigger and efficiently regulate the transcriptional activity of NF-κB, even when low expression values were reported.
Collapse
Affiliation(s)
- R Moreira
- Instituto de Investigaciones Marinas (IIM), Consejo Superior de Investigaciones Científicas (CSIC), Eduardo Cabello 6, 36208 Vigo, Spain
| | | | | | | | | | | | | |
Collapse
|
43
|
Chain FJJ, Dushoff J, Evans BJ. The odds of duplicate gene persistence after polyploidization. BMC Genomics 2011; 12:599. [PMID: 22151890 PMCID: PMC3258412 DOI: 10.1186/1471-2164-12-599] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2011] [Accepted: 12/12/2011] [Indexed: 12/26/2022] Open
Abstract
Background Gene duplication is an important biological phenomenon associated with genomic redundancy, degeneration, specialization, innovation, and speciation. After duplication, both copies continue functioning when natural selection favors duplicated protein function or expression, or when mutations make them functionally distinct before one copy is silenced. Results Here we quantify the degree to which genetic parameters related to gene expression, molecular evolution, and gene structure in a diploid frog - Silurana tropicalis - influence the odds of functional persistence of orthologous duplicate genes in a closely related tetraploid species - Xenopus laevis. Using public databases and 454 pyrosequencing, we obtained genetic and expression data from S. tropicalis orthologs of 3,387 X. laevis paralogs and 4,746 X. laevis singletons - the most comprehensive dataset for African clawed frogs yet analyzed. Using logistic regression, we demonstrate that the most important predictors of the odds of duplicate gene persistence in the tetraploid species are the total gene expression level and evenness of expression across tissues and development in the diploid species. Slow protein evolution and information density (fewer exons, shorter introns) in the diploid are also positively correlated with duplicate gene persistence in the tetraploid. Conclusions Our findings suggest that a combination of factors contribute to duplicate gene persistence following whole genome duplication, but that the total expression level and evenness of expression across tissues and through development before duplication are most important. We speculate that these parameters are useful predictors of duplicate gene longevity after whole genome duplication in other taxa.
Collapse
Affiliation(s)
- Frédéric J J Chain
- Department of Biology, McMaster University, 1280 Main Street West, Hamilton, ON, L8S 4K1, Canada.
| | | | | |
Collapse
|
44
|
Lin CH, Lian CY, Hsiung CA, Chen FC. Changes in transcriptional orientation are associated with increases in evolutionary rates of enterobacterial genes. BMC Bioinformatics 2011; 12 Suppl 9:S19. [PMID: 22152004 PMCID: PMC3283321 DOI: 10.1186/1471-2105-12-s9-s19] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Changes in transcriptional orientation (“CTOs”) occur frequently in prokaryotic genomes. Such changes usually result from genomic inversions, which may cause a conflict between the directions of replication and transcription and an increase in mutation rate. However, CTOs do not always lead to the replication-transcription confrontation. Furthermore, CTOs may cause deleterious disruptions of operon structure and/or gene regulations. The currently existing CTOs may indicate relaxation of selection pressure. Therefore, it is of interest to investigate whether CTOs have an independent effect on the evolutionary rates of the affected genes, and whether these genes are subject to any type of selection pressure in prokaryotes. Methods Three closely related enterbacteria, Escherichia coli, Klebsiella pneumoniae and Salmonella enterica serovar Typhimurium, were selected for comparisons of synonymous (dS) and nonsynonymous (dN) substitution rate between the genes that have experienced changes in transcriptional orientation (changed-orientation genes, “COGs”) and those that do not (same-orientation genes, “SOGs”). The dN/dS ratio was also derived to evaluate the selection pressure on the analyzed genes. Confounding factors in the estimation of evolutionary rates, such as gene essentiality, gene expression level, replication-transcription confrontation, and decreased dS at gene terminals were controlled in the COG-SOG comparisons. Results We demonstrate that COGs have significantly higher dN and dS than SOGs when a series of confounding factors are controlled. However, the dN/dS ratios are similar between the two gene groups, suggesting that the increase in dS can sufficiently explain the increase in dN in COGs. Therefore, the increases in evolutionary rates in COGs may be mainly mutation-driven. Conclusions Here we show that CTOs can increase the evolutionary rates of the affected genes. This effect is independent of the replication-transcription confrontation, which is suggested to be the major cause of inversion-associated evolutionary rate increases. The real cause of such evolutionary rate increases remains unclear but is worth further explorations.
Collapse
Affiliation(s)
- Chieh-Hua Lin
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, 35 Keyen Road, Zhunan Town, Miaoli County, Taiwan, Republic of China
| | | | | | | |
Collapse
|