1
|
Metzger BPH, Park Y, Starr TN, Thornton JW. Epistasis facilitates functional evolution in an ancient transcription factor. eLife 2024; 12:RP88737. [PMID: 38767330 PMCID: PMC11105156 DOI: 10.7554/elife.88737] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/22/2024] Open
Abstract
A protein's genetic architecture - the set of causal rules by which its sequence produces its functions - also determines its possible evolutionary trajectories. Prior research has proposed that the genetic architecture of proteins is very complex, with pervasive epistatic interactions that constrain evolution and make function difficult to predict from sequence. Most of this work has analyzed only the direct paths between two proteins of interest - excluding the vast majority of possible genotypes and evolutionary trajectories - and has considered only a single protein function, leaving unaddressed the genetic architecture of functional specificity and its impact on the evolution of new functions. Here, we develop a new method based on ordinal logistic regression to directly characterize the global genetic determinants of multiple protein functions from 20-state combinatorial deep mutational scanning (DMS) experiments. We use it to dissect the genetic architecture and evolution of a transcription factor's specificity for DNA, using data from a combinatorial DMS of an ancient steroid hormone receptor's capacity to activate transcription from two biologically relevant DNA elements. We show that the genetic architecture of DNA recognition consists of a dense set of main and pairwise effects that involve virtually every possible amino acid state in the protein-DNA interface, but higher-order epistasis plays only a tiny role. Pairwise interactions enlarge the set of functional sequences and are the primary determinants of specificity for different DNA elements. They also massively expand the number of opportunities for single-residue mutations to switch specificity from one DNA target to another. By bringing variants with different functions close together in sequence space, pairwise epistasis therefore facilitates rather than constrains the evolution of new functions.
Collapse
Affiliation(s)
- Brian PH Metzger
- Department of Ecology and Evolution, University of ChicagoChicagoUnited States
| | - Yeonwoo Park
- Program in Genetics, Genomics, and Systems Biology, University of ChicagoChicagoUnited States
| | - Tyler N Starr
- Department of Biochemistry and Molecular Biophysics, University of ChicagoChicagoUnited States
| | - Joseph W Thornton
- Department of Ecology and Evolution, University of ChicagoChicagoUnited States
- Department of Human Genetics, University of ChicagoChicagoUnited States
| |
Collapse
|
2
|
Yépez Y, Marcano-Ruiz M, Bezerra RS, Fam B, Ximenez JP, Silva WA, Bortolini MC. Evolutionary history of the SARS-CoV-2 Gamma variant of concern (P.1): a perfect storm. Genet Mol Biol 2022; 45:e20210309. [PMID: 35266951 PMCID: PMC8908351 DOI: 10.1590/1678-4685-gmb-2021-0309] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2021] [Accepted: 12/29/2021] [Indexed: 12/11/2022] Open
Abstract
Our goal was to describe in more detail the evolutionary history of Gamma and two
derived lineages (P.1.1 and P.1.2), which are part of the arms race that
SARS-CoV-2 wages with its host. A total of 4,977 sequences of the Gamma strain
of SARS-CoV-2 from Brazil were analyzed. We detected 194 sites under positive
selection in 12 genes/ORFs: Spike, N, M, E, ORF1a, ORF1b, ORF3, ORF6,
ORF7a, ORF7b, ORF8, and ORF10. Some diagnostic
sites for Gamma lacked a signature of positive selection in our study, but these
were not fixed, apparently escaping the action of purifying selection. Our
network analyses revealed branches leading to expanding haplotypes with sites
under selection only detected when P.1.1 and P.1.2 were considered. The P.1.2
exclusive haplotype H_5 originated from a non-synonymous mutational step
(H3509Y) in H_1 of ORF1a. The selected allele, 3509Y,
represents an adaptive novelty involving ORF1a of P.1. Finally,
we discuss how phenomena such as epistasis and antagonistic pleiotropy could
limit the emergence of new alleles (and combinations thereof) in SARS-COV-2
lineages, maintaining infectivity in humans, while providing rapid response
capabilities to face the arms race triggered by host immuneresponses.
Collapse
Affiliation(s)
- Yuri Yépez
- Universidade Federal do Rio Grande do Sul, Departamento de Genética, Laboratório de Evolução Humana e Molecular, Porto Alegre, RS, Brazil
| | - Mariana Marcano-Ruiz
- Universidade Federal do Rio Grande do Sul, Departamento de Genética, Laboratório de Evolução Humana e Molecular, Porto Alegre, RS, Brazil
| | - Rafael S Bezerra
- Universidade de São Paulo, Faculdade de Medicina de Ribeirão Preto, Departamento de Genética, Ribeirão Preto, SP, Brazil
| | - Bibiana Fam
- Universidade Federal do Rio Grande do Sul, Departamento de Genética, Laboratório de Evolução Humana e Molecular, Porto Alegre, RS, Brazil
| | - João Pb Ximenez
- Universidade de São Paulo, Faculdade de Medicina de Ribeirão Preto, Departamento de Genética, Ribeirão Preto, SP, Brazil
| | - Wilson A Silva
- Universidade de São Paulo, Faculdade de Medicina de Ribeirão Preto, Departamento de Genética, Ribeirão Preto, SP, Brazil.,Instituto de Pesquisa do Câncer de Guarapuava, Guarapuava, PR, Brazil
| | - Maria Cátira Bortolini
- Universidade Federal do Rio Grande do Sul, Departamento de Genética, Laboratório de Evolução Humana e Molecular, Porto Alegre, RS, Brazil
| |
Collapse
|
3
|
Youssef N, Susko E, Roger AJ, Bielawski JP. Shifts in amino acid preferences as proteins evolve: A synthesis of experimental and theoretical work. Protein Sci 2021; 30:2009-2028. [PMID: 34322924 PMCID: PMC8442975 DOI: 10.1002/pro.4161] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2021] [Revised: 07/19/2021] [Accepted: 07/26/2021] [Indexed: 11/08/2022]
Abstract
Amino acid preferences vary across sites and time. While variation across sites is widely accepted, the extent and frequency of temporal shifts are contentious. Our understanding of the drivers of amino acid preference change is incomplete: To what extent are temporal shifts driven by adaptive versus nonadaptive evolutionary processes? We review phenomena that cause preferences to vary (e.g., evolutionary Stokes shift, contingency, and entrenchment) and clarify how they differ. To determine the extent and prevalence of shifted preferences, we review experimental and theoretical studies. Analyses of natural sequence alignments often detect decreases in homoplasy (convergence and reversions) rates, and variation in replacement rates with time-signals that are consistent with temporally changing preferences. While approaches inferring shifts in preferences from patterns in natural alignments are valuable, they are indirect since multiple mechanisms (both adaptive and nonadaptive) could lead to the observed signal. Alternatively, site-directed mutagenesis experiments allow for a more direct assessment of shifted preferences. They corroborate evidence from multiple sequence alignments, revealing that the preference for an amino acid at a site varies depending on the background sequence. However, shifts in preferences are usually minor in magnitude and sites with significantly shifted preferences are low in frequency. The small yet consistent perturbations in preferences could, nevertheless, jeopardize the accuracy of inference procedures, which assume constant preferences. We conclude by discussing if and how such shifts in preferences might influence widely used time-homogenous inference procedures and potential ways to mitigate such effects.
Collapse
Affiliation(s)
- Noor Youssef
- Department of BiologyDalhousie UniversityHalifaxNova ScotiaCanada
| | - Edward Susko
- Department of Mathematics and StatisticsDalhousie UniversityHalifaxNova ScotiaCanada
| | - Andrew J. Roger
- Department of Biochemistry and Molecular BiologyDalhousie UniversityHalifaxNova ScotiaCanada
| | - Joseph P. Bielawski
- Department of BiologyDalhousie UniversityHalifaxNova ScotiaCanada
- Department of Mathematics and StatisticsDalhousie UniversityHalifaxNova ScotiaCanada
| |
Collapse
|
4
|
Paré P, Reales G, Paixão-Côrtes VR, Vargas-Pinilla P, Viscardi LH, Fam B, Pissinatti A, Santos FR, Bortolini MC. Molecular evolutionary insights from PRLR in mammals. Gen Comp Endocrinol 2021; 309:113791. [PMID: 33872604 DOI: 10.1016/j.ygcen.2021.113791] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/27/2021] [Revised: 04/02/2021] [Accepted: 04/13/2021] [Indexed: 12/12/2022]
Abstract
Prolactin (PRL) is a pleiotropic neurohormone secreted by the mammalian pituitary gland into the blood, thus reaching many tissues and organs beyond the brain. PRL binds to its receptor, PRLR, eliciting a molecular signaling cascade. This system modulates essential mammalian behaviors and promotes notable modifications in the reproductive female tissues and organs. Here, we explore how the intracellular domain of PRLR (PRLR-ICD) modulates the expression of the PRLR gene. Despite differences in the reproductive strategies between eutherian and metatherian mammals, there is no clear distinction between PRLR-ICD functional motifs. However, we found selection signatures that showed differences between groups, with many conserved functional elements strongly maintained through purifying selection across the class Mammalia. We observed a few residues under relaxed selection, the levels of which were more pronounced in Eutheria and particularly striking in primates (Simiiformes), which could represent a pre-adaptive genetic element protected from purifying selection. Alternative, new motifs, such as YLDP (318-321) and others with residues Y283 and Y290, may already be functional. These motifs would have been co-opted in primates as part of a complex genetic repertoire related to some derived adaptive phenotypes, but these changes would have no impact on the primordial functions that characterize the mammals as a whole and that are related to the PRL-PRLR system.
Collapse
Affiliation(s)
- Pamela Paré
- Laboratório de Evolução Humana e Molecular, Programa de Pós-Graduação em Genética e Biologia Molecular, Departamento de Genética, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, RS, Brazil
| | - Guillermo Reales
- Laboratório de Evolução Humana e Molecular, Programa de Pós-Graduação em Genética e Biologia Molecular, Departamento de Genética, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, RS, Brazil; Cambridge Institute of Therapeutic Immunology & Infectious Disease (CITIID), Jeffrey Cheah Biomedical Centre, Cambridge Biomedical Campus, University of Cambridge, Puddicombe Way, Cambridge CB2 0AW, UK; Department of Medicine, University of Cambridge School of Clinical Medicine, Cambridge Biomedical Campus, Cambridge CB2 0QQ, UK
| | - Vanessa R Paixão-Côrtes
- Laboratório de Biologia Evolutiva e Genômica (LABEG), Programa de Pós-Graduação em Biodiversidade e Evolução, Instituto de Biologia, Universidade Federal da Bahia (UFBA), Salvador, BA, Brazil
| | - Pedro Vargas-Pinilla
- Laboratório de Evolução Humana e Molecular, Programa de Pós-Graduação em Genética e Biologia Molecular, Departamento de Genética, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, RS, Brazil; Faculdade de Medicina de Ribeirão Preto, Departamento de Bioquímica e Imunologia, Universidade de São Paulo, Ribeirão Preto, SP, Brazil
| | - Lucas Henriques Viscardi
- Laboratório de Evolução Humana e Molecular, Programa de Pós-Graduação em Genética e Biologia Molecular, Departamento de Genética, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, RS, Brazil
| | - Bibiana Fam
- Laboratório de Evolução Humana e Molecular, Programa de Pós-Graduação em Genética e Biologia Molecular, Departamento de Genética, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, RS, Brazil
| | | | - Fabrício R Santos
- Laboratório de Biodiversidade e Evolução Molecular, Departamento de Genética, Ecologia e Evolução da Universidade Federal de Minas Gerais (UFMG), Belo-Horizonte, MG, Brazil.
| | - Maria Cátira Bortolini
- Laboratório de Evolução Humana e Molecular, Programa de Pós-Graduação em Genética e Biologia Molecular, Departamento de Genética, Universidade Federal do Rio Grande do Sul (UFRGS), Porto Alegre, RS, Brazil.
| |
Collapse
|
5
|
Tao Q, Barba-Montoya J, Huuki LA, Durnan MK, Kumar S. Relative Efficiencies of Simple and Complex Substitution Models in Estimating Divergence Times in Phylogenomics. Mol Biol Evol 2021; 37:1819-1831. [PMID: 32119075 PMCID: PMC7253201 DOI: 10.1093/molbev/msaa049] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
The conventional wisdom in molecular evolution is to apply parameter-rich models of nucleotide and amino acid substitutions for estimating divergence times. However, the actual extent of the difference between time estimates produced by highly complex models compared with those from simple models is yet to be quantified for contemporary data sets that frequently contain sequences from many species and genes. In a reanalysis of many large multispecies alignments from diverse groups of taxa, we found that the use of the simplest models can produce divergence time estimates and credibility intervals similar to those obtained from the complex models applied in the original studies. This result is surprising because the use of simple models underestimates sequence divergence for all the data sets analyzed. We found three fundamental reasons for the observed robustness of time estimates to model complexity in many practical data sets. First, the estimates of branch lengths and node-to-tip distances under the simplest model show an approximately linear relationship with those produced by using the most complex models applied on data sets with many sequences. Second, relaxed clock methods automatically adjust rates on branches that experience considerable underestimation of sequence divergences, resulting in time estimates that are similar to those from complex models. And, third, the inclusion of even a few good calibrations in an analysis can reduce the difference in time estimates from simple and complex models. The robustness of time estimates to model complexity in these empirical data analyses is encouraging, because all phylogenomics studies use statistical models that are oversimplified descriptions of actual evolutionary substitution processes.
Collapse
Affiliation(s)
- Qiqing Tao
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA
| | - Jose Barba-Montoya
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA
| | - Louise A Huuki
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA
| | - Mary Kathleen Durnan
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA
| | - Sudhir Kumar
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA.,Department of Biology, Temple University, Philadelphia, PA.,Center for Excellence in Genome Medicine and Research, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
6
|
Missaggia BO, Reales G, Cybis GB, Hünemeier T, Bortolini MC. Adaptation and co-adaptation of skin pigmentation and vitamin D genes in native Americans. AMERICAN JOURNAL OF MEDICAL GENETICS PART C-SEMINARS IN MEDICAL GENETICS 2020; 184:1060-1077. [PMID: 33325159 DOI: 10.1002/ajmg.c.31873] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2020] [Revised: 11/23/2020] [Accepted: 12/02/2020] [Indexed: 11/06/2022]
Abstract
We carried out an exhaustive review regarding human skin color variation and how much it may be related to vitamin D metabolism and other photosensitive molecules. We discuss evolutionary contexts that modulate this variability and hypotheses postulated to explain them; for example, a small amount of melanin in the skin facilitates vitamin D production, making it advantageous to have fair skin in an environment with little radiation incidence. In contrast, more melanin protects folate from degradation in an environment with a high incidence of radiation. Some Native American populations have a skin color at odds with what would be expected for the amount of radiation in the environment in which they live, a finding challenging the so-called "vitamin D-folate hypothesis." Since food is also a source of vitamin D, dietary habits should also be considered. Here we argue that a gene network approach provides tools to explain this phenomenon since it indicates potential alleles co-evolving in a compensatory way. We identified alleles of the vitamin D metabolism and pigmentation pathways segregated together, but in different proportions, in agriculturalists and hunter-gatherers. Finally, we highlight how an evolutionary approach can be useful to understand current topics of medical interest.
Collapse
Affiliation(s)
- Bruna Oliveira Missaggia
- Genetics Departament, Biosciences Institute, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Guillermo Reales
- Genetics Departament, Biosciences Institute, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Gabriela B Cybis
- Statistics Department, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| | - Tábita Hünemeier
- Department of Genetics and Evolutionary Biology, Biosciences Institute, Universidade de São Paulo, São Paulo, SP, Brazil
| | - Maria Cátira Bortolini
- Genetics Departament, Biosciences Institute, Universidade Federal do Rio Grande do Sul, Porto Alegre, RS, Brazil
| |
Collapse
|
7
|
Stolyarova AV, Nabieva E, Ptushenko VV, Favorov AV, Popova AV, Neverov AD, Bazykin GA. Senescence and entrenchment in evolution of amino acid sites. Nat Commun 2020; 11:4603. [PMID: 32929079 PMCID: PMC7490271 DOI: 10.1038/s41467-020-18366-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 08/20/2020] [Indexed: 01/01/2023] Open
Abstract
Amino acid propensities at a site change in the course of protein evolution. This may happen for two reasons. Changes may be triggered by substitutions at epistatically interacting sites elsewhere in the genome. Alternatively, they may arise due to environmental changes that are external to the genome. Here, we design a framework for distinguishing between these alternatives. Using analytical modelling and simulations, we show that they cause opposite dynamics of the fitness of the allele currently occupying the site: it tends to increase with the time since its origin due to epistasis ("entrenchment"), but to decrease due to random environmental fluctuations ("senescence"). By analysing the genomes of vertebrates and insects, we show that the amino acids originating at negatively selected sites experience strong entrenchment. By contrast, the amino acids originating at positively selected sites experience senescence. We propose that senescence of the current allele is a cause of adaptive evolution.
Collapse
Affiliation(s)
- A V Stolyarova
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Skolkovo, 143028, Russia.
| | - E Nabieva
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Skolkovo, 143028, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, 127051, Russia
| | - V V Ptushenko
- Department of Photochemistry and Photobiology, N. M. Emanuel Institute of Biochemical Physics of Russian Academy of Sciences, Moscow, 119334, Russia
- A. N. Belozersky Institute of Physical-Chemical Biology, M. V. Lomonosov Moscow State University, Moscow, 119992, Russia
| | - A V Favorov
- Division of Biostatistics and Bioinformatics, Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, Johns Hopkins School of Medicine, Baltimore, MD, 21205, USA
- Laboratory of System Biology and Computational Genetics, Vavilov Institute of General Genetics, Moscow, 119991, Russia
| | - A V Popova
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, 111123, Russia
| | - A D Neverov
- Department of Molecular Diagnostics, Central Research Institute for Epidemiology, Moscow, 111123, Russia
| | - G A Bazykin
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Skolkovo, 143028, Russia
- Institute for Information Transmission Problems (Kharkevich Institute), Russian Academy of Sciences, Moscow, 127051, Russia
| |
Collapse
|
8
|
Burskaia V, Naumenko S, Schelkunov M, Bedulina D, Neretina T, Kondrashov A, Yampolsky L, Bazykin GA. Excessive Parallelism in Protein Evolution of Lake Baikal Amphipod Species Flock. Genome Biol Evol 2020; 12:1493-1503. [PMID: 32653919 DOI: 10.1093/gbe/evaa138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/03/2020] [Indexed: 11/12/2022] Open
Abstract
Repeated emergence of similar adaptations is often explained by parallel evolution of underlying genes. However, evidence of parallel evolution at amino acid level is limited. When the analyzed species are highly divergent, this can be due to epistatic interactions underlying the dynamic nature of the amino acid preferences: The same amino acid substitution may have different phenotypic effects on different genetic backgrounds. Distantly related species also often inhabit radically different environments, which makes the emergence of parallel adaptations less likely. Here, we hypothesize that parallel molecular adaptations are more prevalent between closely related species. We analyze the rate of parallel evolution in genome-size sets of orthologous genes in three groups of species with widely ranging levels of divergence: 46 species of the relatively recent lake Baikal amphipod radiation, a species flock of very closely related cichlids, and a set of significantly more divergent vertebrates. Strikingly, in genes of amphipods, the rate of parallel substitutions at nonsynonymous sites exceeded that at synonymous sites, suggesting rampant selection driving parallel adaptation. At sites of parallel substitutions, the intraspecies polymorphism is low, suggesting that parallelism has been driven by positive selection and is therefore adaptive. By contrast, in cichlids, the rate of nonsynonymous parallel evolution was similar to that at synonymous sites, whereas in vertebrates, this rate was lower than that at synonymous sites, indicating that in these groups of species, parallel substitutions are mainly fixed by drift.
Collapse
Affiliation(s)
- Valentina Burskaia
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Moscow Oblast, Russia
| | - Sergey Naumenko
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevitch Institute), Moscow, Russia
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts
| | - Mikhail Schelkunov
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Moscow Oblast, Russia
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevitch Institute), Moscow, Russia
| | - Daria Bedulina
- Institute of Biology, Irkutsk State University, Russia
- Baikal Research Centre, Irkutsk, Russia
| | - Tatyana Neretina
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevitch Institute), Moscow, Russia
- N.A. Pertsov White Sea Biological Station, Lomonosov Moscow State University, Primorskiy, Russia
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Russia
| | - Alexey Kondrashov
- Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Russia
- Department of Ecology and Evolutionary Biology, University of Michigan
| | - Lev Yampolsky
- Department of Biological Sciences, East Tennessee State University
| | - Georgii A Bazykin
- Center of Life Sciences, Skolkovo Institute of Science and Technology, Moscow, Moscow Oblast, Russia
- Institute for Information Transmission Problems of the Russian Academy of Sciences (Kharkevitch Institute), Moscow, Russia
| |
Collapse
|
9
|
Unified rational protein engineering with sequence-based deep representation learning. Nat Methods 2019; 16:1315-1322. [PMID: 31636460 DOI: 10.1038/s41592-019-0598-1] [Citation(s) in RCA: 438] [Impact Index Per Article: 87.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 09/11/2019] [Indexed: 01/03/2023]
Abstract
Rational protein engineering requires a holistic understanding of protein function. Here, we apply deep learning to unlabeled amino-acid sequences to distill the fundamental features of a protein into a statistical representation that is semantically rich and structurally, evolutionarily and biophysically grounded. We show that the simplest models built on top of this unified representation (UniRep) are broadly applicable and generalize to unseen regions of sequence space. Our data-driven approach predicts the stability of natural and de novo designed proteins, and the quantitative function of molecularly diverse mutants, competitively with the state-of-the-art methods. UniRep further enables two orders of magnitude efficiency improvement in a protein engineering task. UniRep is a versatile summary of fundamental protein features that can be applied across protein engineering informatics.
Collapse
|
10
|
Kuzminkova AA, Sokol AD, Ushakova KE, Popadin KY, Gunbin KV. mtProtEvol: the resource presenting molecular evolution analysis of proteins involved in the function of Vertebrate mitochondria. BMC Evol Biol 2019; 19:47. [PMID: 30813887 PMCID: PMC6391778 DOI: 10.1186/s12862-019-1371-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Heterotachy is the variation in the evolutionary rate of aligned sites in different parts of the phylogenetic tree. It occurs mainly due to epistatic interactions among the substitutions, which are highly complex and make it difficult to study protein evolution. The vast majority of computational evolutionary approaches for studying these epistatic interactions or their evolutionary consequences in proteins require high computational time. However, recently, it has been shown that the evolution of residue solvent accessibility (RSA) is tightly linked with changes in protein fitness and intra-protein epistatic interactions. This provides a computationally fast alternative, based on comparison of evolutionary rates of amino acid replacements with the rates of RSA evolutionary changes in order to recognize any shifts in epistatic interaction. RESULTS Based on RSA information, data randomization and phylogenetic approaches, we constructed a software pipeline, which can be used to analyze the evolutionary consequences of intra-protein epistatic interactions with relatively low computational time. We analyzed the evolution of 512 protein families tightly linked to mitochondrial function in Vertebrates and created "mtProtEvol", the web resource with data on protein evolution. In strict agreement with lifespan and metabolic rate data, we demonstrated that different functional categories of mitochondria-related proteins subjected to selection on accelerated and decelerated RSA rates in rodents and primates. For example, accelerated RSA evolution in rodents has been shown for Krebs cycle enzymes, respiratory chain and reactive oxygen species metabolism, while in primates these functions are stress-response, translation and mtDNA integrity. Decelerated RSA evolution in rodents has been demonstrated for translational machinery and oxidative stress response components. CONCLUSIONS mtProtEvol is an interactive resource focused on evolutionary analysis of epistatic interactions in protein families involved in Vertebrata mitochondria function and available at http://bioinfodbs.kantiana.ru/mtProtEvol /. This resource and the devised software pipeline may be useful tool for researchers in area of protein evolution.
Collapse
Affiliation(s)
- Anastasia A. Kuzminkova
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Anastasia D. Sokol
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Kristina E. Ushakova
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
| | - Konstantin Yu. Popadin
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
- Center for Integrative Genomics, University of Lausanne, Lausanne, Switzerland
| | - Konstantin V. Gunbin
- Center for Mitochondrial Functional Genomics, School of Life Science, Immanuel Kant Baltic Federal University, Kaliningrad, Russia
- Center of Brain Neurobiology and Neurogenetics, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
- Novosibirsk State University, Novosibirsk, Russia
| |
Collapse
|
11
|
Doroshkov AV, Konstantinov DK, Afonnikov DA, Gunbin KV. The evolution of gene regulatory networks controlling Arabidopsis thaliana L. trichome development. BMC PLANT BIOLOGY 2019; 19:53. [PMID: 30813891 PMCID: PMC6393967 DOI: 10.1186/s12870-019-1640-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/09/2023]
Abstract
BACKGROUND The variation in structure and function of gene regulatory networks (GRNs) participating in organisms development is a key for understanding species-specific evolutionary strategies. Even the tiniest modification of developmental GRN might result in a substantial change of a complex morphogenetic pattern. Great variety of trichomes and their accessibility makes them a useful model for studying the molecular processes of cell fate determination, cell cycle control and cellular morphogenesis. Nowadays, a large number of genes regulating the morphogenesis of A. thaliana trichomes are described. Here we aimed at a study the evolution of the GRN defining the trichome formation, and evaluation its importance in other developmental processes. RESULTS In study of the evolution of trichomes formation GRN we combined classical phylogenetic analysis with information on the GRN topology and composition in major plants taxa. This approach allowed us to estimate both times of evolutionary emergence of the GRN components which are mainly proteins, and the relative rate of their molecular evolution. Various simplifications of protein structure (based on the position of amino acid residues in protein globula, secondary structure type, and structural disorder) allowed us to demonstrate the evolutionary associations between changes in protein globules and speciations/duplications events. We discussed their potential involvement in protein-protein interactions and GRN function. CONCLUSIONS We hypothesize that the divergence and/or the specialization of the trichome-forming GRN is linked to the emergence of plant taxa. Information about the structural targets of the protein evolution in the GRN may predict switching points in gene networks functioning in course of evolution. We also propose a list of candidate genes responsible for the development of trichomes in a wide range of plant species.
Collapse
Affiliation(s)
- Alexey V. Doroshkov
- The Siberian Branch of the Russian Academy of Sciences (IC&G SB RAS), The Institute of Cytology and Genetics, Novosibirsk, Russia
- Novosibirsk State University (NSU), Novosibirsk, Russia
| | - Dmitrii K. Konstantinov
- The Siberian Branch of the Russian Academy of Sciences (IC&G SB RAS), The Institute of Cytology and Genetics, Novosibirsk, Russia
- Novosibirsk State University (NSU), Novosibirsk, Russia
| | - Dmitrij A. Afonnikov
- The Siberian Branch of the Russian Academy of Sciences (IC&G SB RAS), The Institute of Cytology and Genetics, Novosibirsk, Russia
- Novosibirsk State University (NSU), Novosibirsk, Russia
| | - Konstantin V. Gunbin
- Novosibirsk State University (NSU), Novosibirsk, Russia
- School of Life Science, Immanuel Kant Federal Baltic University, Kaliningrad, Russia
- Center of Brain Neurobiology and Neurogenetics, Institute of Cytology and Genetics SB RAS, Novosibirsk, Russia
| |
Collapse
|
12
|
Storz JF. Compensatory mutations and epistasis for protein function. Curr Opin Struct Biol 2018; 50:18-25. [PMID: 29100081 PMCID: PMC5936477 DOI: 10.1016/j.sbi.2017.10.009] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2017] [Revised: 10/05/2017] [Accepted: 10/12/2017] [Indexed: 01/09/2023]
Abstract
Adaptive protein evolution may be facilitated by neutral amino acid mutations that confer no benefit when they first arise but which potentiate subsequent function-altering mutations via direct or indirect structural mechanisms. Theoretical and empirical results indicate that such compensatory interactions (intramolecular epistasis) can exert a strong influence on trajectories of protein evolution. For this reason, assessing the form and prevalence of intramolecular epistasis and characterizing biophysical mechanisms of compensatory interaction are important research goals at the nexus of structural biology and molecular evolution. Here I review recent insights derived from protein-engineering studies, and I describe an approach for identifying and characterizing mechanisms of epistasis that integrates experimental data on structure-function relationships with analyses of comparative sequence data.
Collapse
Affiliation(s)
- Jay F Storz
- University of Nebraska, School of Biological Sciences, Lincoln, NE 68588-0114, United States.
| |
Collapse
|
13
|
Platt A, Weber CC, Liberles DA. Protein evolution depends on multiple distinct population size parameters. BMC Evol Biol 2018; 18:17. [PMID: 29422024 PMCID: PMC5806465 DOI: 10.1186/s12862-017-1085-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Accepted: 11/20/2017] [Indexed: 01/08/2023] Open
Abstract
That population size affects the fate of new mutations arising in genomes, modulating both how frequently they arise and how efficiently natural selection is able to filter them, is well established. It is therefore clear that these distinct roles for population size that characterize different processes should affect the evolution of proteins and need to be carefully defined. Empirical evidence is consistent with a role for demography in influencing protein evolution, supporting the idea that functional constraints alone do not determine the composition of coding sequences. Given that the relationship between population size, mutant fitness and fixation probability has been well characterized, estimating fitness from observed substitutions is well within reach with well-formulated models. Molecular evolution research has, therefore, increasingly begun to leverage concepts from population genetics to quantify the selective effects associated with different classes of mutation. However, in order for this type of analysis to provide meaningful information about the intra- and inter-specific evolution of coding sequences, a clear definition of concepts of population size, what they influence, and how they are best parameterized is essential. Here, we present an overview of the many distinct concepts that “population size” and “effective population size” may refer to, what they represent for studying proteins, and how this knowledge can be harnessed to produce better specified models of protein evolution.
Collapse
Affiliation(s)
- Alexander Platt
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, 19121, USA
| | - Claudia C Weber
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, 19121, USA
| | - David A Liberles
- Department of Biology and Center for Computational Genetics and Genomics, Temple University, Philadelphia, 19121, USA.
| |
Collapse
|
14
|
Klink GV, Bazykin GA. Parallel Evolution of Metazoan Mitochondrial Proteins. Genome Biol Evol 2018; 9:1341-1350. [PMID: 28595327 PMCID: PMC5520408 DOI: 10.1093/gbe/evx025] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/06/2017] [Indexed: 12/11/2022] Open
Abstract
Amino acid propensities at amino acid sites change with time due to epistatic interactions or changing environment, affecting the probabilities of fixation of different amino acids. Such changes should lead to an increased rate of homoplasies (reversals, parallelisms, and convergences) at closely related species. Here, we reconstruct the phylogeny of twelve mitochondrial proteins from several thousand metazoan species, and measure the phylogenetic distances between branches at which either the same allele originated repeatedly due to homoplasies, or different alleles originated due to divergent substitutions. The mean phylogenetic distance between parallel substitutions is ∼20% lower than the mean phylogenetic distance between divergent substitutions, indicating that a variant fixed in a species is more likely to be deleterious in a more phylogenetically remote species, compared with a more closely related species. These findings are robust to artefacts of phylogenetic reconstruction or of pooling of sites from different conservation classes or functional groups, and imply that single-position fitness landscapes change at rates similar to rates of amino acid changes.
Collapse
Affiliation(s)
- Galya V Klink
- Institute for Information Transmission Problems (Kharkevich Institute) of the Russian Academy of Sciences, Moscow, Russia
| | - Georgii A Bazykin
- Institute for Information Transmission Problems (Kharkevich Institute) of the Russian Academy of Sciences, Moscow, Russia.,Skolkovo Institute of Science and Technology, Skolkovo, Russia
| |
Collapse
|
15
|
Goldstein RA, Pollock DD. Sequence entropy of folding and the absolute rate of amino acid substitutions. Nat Ecol Evol 2017; 1:1923-1930. [PMID: 29062121 PMCID: PMC5701738 DOI: 10.1038/s41559-017-0338-9] [Citation(s) in RCA: 39] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2017] [Accepted: 09/05/2017] [Indexed: 12/01/2022]
Abstract
Adequate representations of protein evolution should consider how the acceptance of mutations depends on the sequence context in which they arise. However, epistatic interactions among sites in a protein result in time and spatial substitution rate heterogeneity beyond the capabilities of current models. Here, we exploit parallels between amino acid substitutions and chemical reaction kinetics to develop an improved theory of protein evolution. We constructed a mechanistic framework for modelling amino acid substitution rates that employs the formalisms of statistical mechanics, with population genetics principles underlying the analysis. Theoretical analyses and computer simulations of proteins under purifying selection for thermodynamic stability show that substitution rates and the stabilisation of resident amino acids (the ‘evolutionary Stokes shift’) can be predicted from biophysics and the effect of sequence entropy alone. Furthermore, we demonstrate that substitutions predominantly occur when epistatic interactions result in near neutrality; substitution rates are determined by how often epistasis results in such nearly neutral conditions. This theory provides a general framework for modelling protein sequence change under purifying selection, potentially explains patterns of convergence and mutation rates in real proteins that are incompatible with previous models, and provides a better null model for the detection of adaptive changes.
Collapse
Affiliation(s)
- Richard A Goldstein
- Division of Infection and Immunity, University College London, London, WC1E 6BT, UK
| | - David D Pollock
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO, 80045, USA.
| |
Collapse
|
16
|
Ivankov DN. Exact correspondence between walk in nucleotide and protein sequence spaces. PLoS One 2017; 12:e0182525. [PMID: 28800638 PMCID: PMC5553642 DOI: 10.1371/journal.pone.0182525] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Accepted: 07/07/2017] [Indexed: 11/29/2022] Open
Abstract
In the course of evolution, genes traverse the nucleotide sequence space, which translates to a trajectory of changes in the protein sequence in protein sequence space. The correspondence between regions of the nucleotide and protein sequence spaces is understood in general but not in detail. One of the unexplored questions is how many sequences a protein can reach with a certain number of nucleotide substitutions in its gene sequence. Here I propose an algorithm to calculate the volume of protein sequence space accessible to a given protein sequence as a function of the number of nucleotide substitutions made in the protein-coding sequence. The algorithm utilizes the power of the dynamic programming approach, and makes all calculations within a couple of seconds on a desktop computer. I apply the algorithm to green fluorescence protein, and get the number of sequences four times higher than estimated before. However, taking into account the astronomically huge size of the protein sequence space, the previous estimate can be considered as acceptable as an order of magnitude estimation. The proposed algorithm has practical applications in the study of evolutionary trajectories in sequence space.
Collapse
Affiliation(s)
- Dmitry N. Ivankov
- Laboratory of Evolutionary Genomics, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, Pushchino, Moscow region, Russia
- * E-mail:
| |
Collapse
|
17
|
Bazykin GA. Changing preferences: deformation of single position amino acid fitness landscapes and evolution of proteins. Biol Lett 2016; 11:rsbl.2015.0315. [PMID: 26445980 DOI: 10.1098/rsbl.2015.0315] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
The fitness landscape-the function that relates genotypes to fitness-and its role in directing evolution are a central object of evolutionary biology. However, its huge dimensionality precludes understanding of even the basic aspects of its shape. One way to approach it is to ask a simpler question: what are the properties of a function that assigns fitness to each possible variant at just one particular site-a single position fitness landscape-and how does it change in the course of evolution? Analyses of genomic data from multiple species and multiple individuals within a species have proved beyond reasonable doubt that fitness functions of positions throughout the genome do themselves change with time, thus shaping protein evolution. Here, I will briefly review the literature that addresses these dynamics, focusing on recent genome-scale analyses of fitness functions of amino acid sites, i.e. vectors of fitnesses of 20 individual amino acid variants at a given position of a protein. The set of amino acids that confer high fitness at a particular position changes with time, and the rate of this change is comparable with the rate at which a position evolves, implying that this process plays a major role in evolutionary dynamics. However, the causes of these changes remain largely unclear.
Collapse
Affiliation(s)
- Georgii A Bazykin
- Institute for Information Transmission Problems (Kharkevich Institute) of the Russian Academy of Sciences, Moscow 127051, Russia Faculty of Bioengineering and Bioinformatics and Belozersky Institute of Physico-Chemical Biology, Lomonosov Moscow State University, Moscow 119234, Russia Pirogov Russian National Research Medical University, Moscow 117997, Russia
| |
Collapse
|
18
|
Starr TN, Thornton JW. Epistasis in protein evolution. Protein Sci 2016; 25:1204-18. [PMID: 26833806 PMCID: PMC4918427 DOI: 10.1002/pro.2897] [Citation(s) in RCA: 285] [Impact Index Per Article: 35.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2015] [Revised: 01/25/2016] [Accepted: 01/27/2016] [Indexed: 01/18/2023]
Abstract
The structure, function, and evolution of proteins depend on physical and genetic interactions among amino acids. Recent studies have used new strategies to explore the prevalence, biochemical mechanisms, and evolutionary implications of these interactions-called epistasis-within proteins. Here we describe an emerging picture of pervasive epistasis in which the physical and biological effects of mutations change over the course of evolution in a lineage-specific fashion. Epistasis can restrict the trajectories available to an evolving protein or open new paths to sequences and functions that would otherwise have been inaccessible. We describe two broad classes of epistatic interactions, which arise from different physical mechanisms and have different effects on evolutionary processes. Specific epistasis-in which one mutation influences the phenotypic effect of few other mutations-is caused by direct and indirect physical interactions between mutations, which nonadditively change the protein's physical properties, such as conformation, stability, or affinity for ligands. In contrast, nonspecific epistasis describes mutations that modify the effect of many others; these typically behave additively with respect to the physical properties of a protein but exhibit epistasis because of a nonlinear relationship between the physical properties and their biological effects, such as function or fitness. Both types of interaction are rampant, but specific epistasis has stronger effects on the rate and outcomes of evolution, because it imposes stricter constraints and modulates evolutionary potential more dramatically; it therefore makes evolution more contingent on low-probability historical events and leaves stronger marks on the sequences, structures, and functions of protein families.
Collapse
Affiliation(s)
- Tyler N Starr
- Graduate Program in Biochemistry and Molecular Biophysics, University of Chicago, Chicago, Illinois, 60637
| | - Joseph W Thornton
- Departments of Ecology and Evolution and Human Genetics, University of Chicago, Chicago, Illinois, 60637
| |
Collapse
|
19
|
Epistasis and the Dynamics of Reversion in Molecular Evolution. Genetics 2016; 203:1335-51. [PMID: 27194749 DOI: 10.1534/genetics.116.188961] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2016] [Accepted: 04/27/2016] [Indexed: 12/27/2022] Open
Abstract
Recent studies of protein evolution contend that the longer an amino acid substitution is present at a site, the less likely it is to revert to the amino acid previously occupying that site. Here we study this phenomenon of decreasing reversion rates rigorously and in a much more general context. We show that, under weak mutation and for arbitrary fitness landscapes, reversion rates decrease with time for any site that is involved in at least one epistatic interaction. Specifically, we prove that, at stationarity, the hazard function of the distribution of waiting times until reversion is strictly decreasing for any such site. Thus, in the presence of epistasis, the longer a particular character has been absent from a site, the less likely the site will revert to its prior state. We also explore several examples of this general result, which share a common pattern whereby the probability of having reverted increases rapidly at short times to some substantial value before becoming almost flat after a few substitutions at other sites. This pattern indicates a characteristic tendency for reversion to occur either almost immediately after the initial substitution or only after a very long time.
Collapse
|
20
|
Local fitness landscape of the green fluorescent protein. Nature 2016; 533:397-401. [PMID: 27193686 PMCID: PMC4968632 DOI: 10.1038/nature17995] [Citation(s) in RCA: 275] [Impact Index Per Article: 34.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2015] [Accepted: 04/07/2016] [Indexed: 01/16/2023]
Abstract
Fitness landscapes1,2, depictions of how genotypes manifest at the phenotypic level, form the basis for our understanding of many areas of biology2–7 yet their properties remain elusive. Studies addressing this issue often consider specific genes and their function as proxy for fitness2,4, experimentally assessing the impact on function of single mutations and their combinations in a specific sequence2,5,8–15 or in different sequences2,3,5,16–18. However, systematic high-throughput studies of the local fitness landscape of an entire protein have not yet been reported. Here, we chart an extensive region of the local fitness landscape of the green fluorescent protein from Aequorea victoria (avGFP) by measuring the native function, fluorescence, of tens of thousands of derivative genotypes of avGFP. We find that its fitness landscape is narrow, with half of genotypes with two mutations showing reduced fluorescence and half of genotypes with five mutations being completely non-fluorescent. The narrowness is enhanced by epistasis, which was detected in up to 30% of genotypes with multiple mutations arising mostly through the cumulative impact of slightly deleterious mutations causing a threshold-like decrease of protein stability and concomitant loss of fluorescence. A model of orthologous sequence divergence spanning hundreds of millions of years predicted the extent of epistasis in our data, indicating congruence between the fitness landscape properties at the local and global scales. The characterization of the local fitness landscape of avGFP has important implications for a number of fields including molecular evolution, population genetics and protein design.
Collapse
|
21
|
Julien P, Miñana B, Baeza-Centurion P, Valcárcel J, Lehner B. The complete local genotype-phenotype landscape for the alternative splicing of a human exon. Nat Commun 2016; 7:11558. [PMID: 27161764 PMCID: PMC4866304 DOI: 10.1038/ncomms11558] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2015] [Accepted: 04/08/2016] [Indexed: 01/21/2023] Open
Abstract
The properties of genotype–phenotype landscapes are crucial for understanding evolution but are not characterized for most traits. Here, we present a >95% complete local landscape for a defined molecular function—the alternative splicing of a human exon (FAS/CD95 exon 6, involved in the control of apoptosis). The landscape provides important mechanistic insights, revealing that regulatory information is dispersed throughout nearly every nucleotide in an exon, that the exon is more robust to the effects of mutations than its immediate neighbours in genotype space, and that high mutation sensitivity (evolvability) will drive the rapid divergence of alternative splicing between species unless it is constrained by selection. Moreover, the extensive epistasis in the landscape predicts that exonic regulatory sequences may diverge between species even when exon inclusion levels are functionally important and conserved by selection. Genotype–phenotype landscapes are an important characteristic for understanding the evolution of traits. Here the authors construct the local landscape for the alternative splicing of FAS/CD95 exon 6, revealing the regulation of splicing and the evolution of regulatory information between species.
Collapse
Affiliation(s)
- Philippe Julien
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain
| | - Belén Miñana
- Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain.,Gene Regulation, Stem Cells and Cancer Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain
| | - Pablo Baeza-Centurion
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain
| | - Juan Valcárcel
- Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain.,Gene Regulation, Stem Cells and Cancer Program, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluís Companys 23, 08010 Barcelona, Spain
| | - Ben Lehner
- EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Dr Aiguader 88, Barcelona 08003, Spain.,Universitat Pompeu Fabra (UPF), Barcelona 08003, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), Pg. Lluís Companys 23, 08010 Barcelona, Spain
| |
Collapse
|
22
|
Abstract
To what extent is the convergent evolution of protein function attributable to convergent or parallel changes at the amino acid level? The mutations that contribute to adaptive protein evolution may represent a biased subset of all possible beneficial mutations owing to mutation bias and/or variation in the magnitude of deleterious pleiotropy. A key finding is that the fitness effects of amino acid mutations are often conditional on genetic background. This context dependence (epistasis) can reduce the probability of convergence and parallelism because it reduces the number of possible mutations that are unconditionally acceptable in divergent genetic backgrounds. Here, I review factors that influence the probability of replicated evolution at the molecular level.
Collapse
Affiliation(s)
- Jay F Storz
- School of Biological Sciences, University of Nebraska, Lincoln, Nebraska 68588, USA
| |
Collapse
|
23
|
Arenas M. Trends in substitution models of molecular evolution. Front Genet 2015; 6:319. [PMID: 26579193 PMCID: PMC4620419 DOI: 10.3389/fgene.2015.00319] [Citation(s) in RCA: 78] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2015] [Accepted: 10/09/2015] [Indexed: 11/13/2022] Open
Abstract
Substitution models of evolution describe the process of genetic variation through fixed mutations and constitute the basis of the evolutionary analysis at the molecular level. Almost 40 years after the development of first substitution models, highly sophisticated, and data-specific substitution models continue emerging with the aim of better mimicking real evolutionary processes. Here I describe current trends in substitution models of DNA, codon and amino acid sequence evolution, including advantages and pitfalls of the most popular models. The perspective concludes that despite the large number of currently available substitution models, further research is required for more realistic modeling, especially for DNA coding and amino acid data. Additionally, the development of more accurate complex models should be coupled with new implementations and improvements of methods and frameworks for substitution model selection and downstream evolutionary analysis.
Collapse
Affiliation(s)
- Miguel Arenas
- Institute of Molecular Pathology and Immunology of the University of Porto Porto, Portugal
| |
Collapse
|
24
|
Contingency and entrenchment in protein evolution under purifying selection. Proc Natl Acad Sci U S A 2015; 112:E3226-35. [PMID: 26056312 DOI: 10.1073/pnas.1412933112] [Citation(s) in RCA: 140] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
The phenotypic effect of an allele at one genetic site may depend on alleles at other sites, a phenomenon known as epistasis. Epistasis can profoundly influence the process of evolution in populations and shape the patterns of protein divergence across species. Whereas epistasis between adaptive substitutions has been studied extensively, relatively little is known about epistasis under purifying selection. Here we use computational models of thermodynamic stability in a ligand-binding protein to explore the structure of epistasis in simulations of protein sequence evolution. Even though the predicted effects on stability of random mutations are almost completely additive, the mutations that fix under purifying selection are enriched for epistasis. In particular, the mutations that fix are contingent on previous substitutions: Although nearly neutral at their time of fixation, these mutations would be deleterious in the absence of preceding substitutions. Conversely, substitutions under purifying selection are subsequently entrenched by epistasis with later substitutions: They become increasingly deleterious to revert over time. Our results imply that, even under purifying selection, protein sequence evolution is often contingent on history and so it cannot be predicted by the phenotypic effects of mutations assayed in the ancestral background.
Collapse
|