1
|
Feyertag F, Berninsone PM, Alvarez-Ponce D. N-glycoproteins exhibit a positive expression level-evolutionary rate correlation. J Evol Biol 2019; 32:390-394. [PMID: 30697857 DOI: 10.1111/jeb.13420] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2018] [Revised: 01/23/2019] [Accepted: 01/25/2019] [Indexed: 12/22/2022]
Abstract
The different proteins of any proteome evolve at enormously different rates. One of the primary factors influencing rates of protein evolution is expression level, with highly expressed proteins tending to evolve at slow rates. This phenomenon, known as the expression level-evolutionary rate (E-R) anticorrelation, has been attributed to the abundance-dependent deleterious effects of misfolding or misinteraction. We have recently shown that secreted proteins either lack an E-R anticorrelation or exhibit a significantly reduced E-R anticorrelation. This effect may be due to the strict quality control to which secreted proteins are subject in the endoplasmic reticulum (which is expected to reduce the rate of misfolding and its deleterious effects) or to their extracellular location (expected to reduce the rate of misinteraction and its deleterious effects). Among secreted proteins, N-glycosylated ones are under particularly strong quality control. Here, we investigate how N-linked glycosylation affects the E-R anticorrelation. Strikingly, we observe a positive E-R correlation among N-glycosylated proteins. That is, N-glycoproteins that are highly expressed evolve at faster rates than lowly expressed N-glycoproteins, in contrast to what is observed among intracellular proteins.
Collapse
Affiliation(s)
- Felix Feyertag
- Department of Biology, University of Nevada, Reno, Reno, Nevada
| | | | | |
Collapse
|
2
|
Alvarez-Ponce D, Feyertag F, Chakraborty S. Position Matters: Network Centrality Considerably Impacts Rates of Protein Evolution in the Human Protein-Protein Interaction Network. Genome Biol Evol 2018; 9:1742-1756. [PMID: 28854629 PMCID: PMC5570066 DOI: 10.1093/gbe/evx117] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/01/2017] [Indexed: 02/06/2023] Open
Abstract
The proteins of any organism evolve at disparate rates. A long list of factors affecting rates of protein evolution have been identified. However, the relative importance of each factor in determining rates of protein evolution remains unresolved. The prevailing view is that evolutionary rates are dominantly determined by gene expression, and that other factors such as network centrality have only a marginal effect, if any. However, this view is largely based on analyses in yeasts, and accurately measuring the importance of the determinants of rates of protein evolution is complicated by the fact that the different factors are often correlated with each other, and by the relatively poor quality of available functional genomics data sets. Here, we use correlation, partial correlation and principal component regression analyses to measure the contributions of several factors to the variability of the rates of evolution of human proteins. For this purpose, we analyzed the entire human protein–protein interaction data set and the human signal transduction network—a network data set of exceptionally high quality, obtained by manual curation, which is expected to be virtually free from false positives. In contrast with the prevailing view, we observe that network centrality (measured as the number of physical and nonphysical interactions, betweenness, and closeness) has a considerable impact on rates of protein evolution. Surprisingly, the impact of centrality on rates of protein evolution seems to be comparable, or even superior according to some analyses, to that of gene expression. Our observations seem to be independent of potentially confounding factors and from the limitations (biases and errors) of interactomic data sets.
Collapse
|
3
|
Feyertag F, Alvarez-Ponce D. Disulfide Bonds Enable Accelerated Protein Evolution. Mol Biol Evol 2018; 34:1833-1837. [PMID: 28431018 DOI: 10.1093/molbev/msx135] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
The different proteins of any proteome evolve at enormously different rates. What factors contribute to this variability, and to what extent, is still a largely open question. We hypothesized that disulfide bonds, by increasing protein stability, should make proteins' structures relatively independent of their amino acid sequences, thus acting as buffers of deleterious mutations and enabling accelerated sequence evolution. In agreement with this hypothesis, we observed that membrane proteins with disulfide bonds evolved 88% faster than those without disulfide bonds, and that extracellular proteins with disulfide bonds evolved 49% faster than those without disulfide bonds. In addition, genes encoding proteins with disulfide bonds exhibit an increased likelihood of showing signatures of positive selection. Multivariate analyses indicate that the trend is independent of a number of potentially confounding factors. The effect, however, is not observed among the longest proteins, which can become stabilized by mechanisms other than disulfide bonds.
Collapse
Affiliation(s)
- Felix Feyertag
- Department of Biology, University of Nevada-Reno, Reno, NV
| | | |
Collapse
|
4
|
Alvarez-Ponce D, Sabater-Muñoz B, Toft C, Ruiz-González MX, Fares MA. Essentiality Is a Strong Determinant of Protein Rates of Evolution during Mutation Accumulation Experiments in Escherichia coli. Genome Biol Evol 2016; 8:2914-2927. [PMID: 27566759 PMCID: PMC5630975 DOI: 10.1093/gbe/evw205] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
The Neutral Theory of Molecular Evolution is considered the most powerful theory to understand the evolutionary behavior of proteins. One of the main predictions of this theory is that essential proteins should evolve slower than dispensable ones owing to increased selective constraints. Comparison of genomes of different species, however, has revealed only small differences between the rates of evolution of essential and nonessential proteins. In some analyses, these differences vanish once confounding factors are controlled for, whereas in other cases essentiality seems to have an independent, albeit small, effect. It has been argued that comparing relatively distant genomes may entail a number of limitations. For instance, many of the genes that are dispensable in controlled lab conditions may be essential in some of the conditions faced in nature. Moreover, essentiality can change during evolution, and rates of protein evolution are simultaneously shaped by a variety of factors, whose individual effects are difficult to isolate. Here, we conducted two parallel mutation accumulation experiments in Escherichia coli, during 5,500–5,750 generations, and compared the genomes at different points of the experiments. Our approach (a short-term experiment, under highly controlled conditions) enabled us to overcome many of the limitations of previous studies. We observed that essential proteins evolved substantially slower than nonessential ones during our experiments. Strikingly, rates of protein evolution were only moderately affected by expression level and protein length.
Collapse
Affiliation(s)
| | - Beatriz Sabater-Muñoz
- Instituto de Biología Molecular y Celular de Plantas (CSIC-UPV), Valencia, Spain Department of Genetics, Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Dublin, Ireland
| | - Christina Toft
- Department of Genetics, University of Valencia, Valencia, Spain Departamento de Biotecnología, Instituto de Agroquímica y Tecnología de los Alimentos (CSIC), Valencia, Spain
| | - Mario X Ruiz-González
- Instituto de Biología Molecular y Celular de Plantas (CSIC-UPV), Valencia, Spain Current Address: Secretaría de Educación Superior, Ciencia, Tecnología e Innovación, Proyecto Prometeo; Departamento de Ciencias Biológicas, Universidad Tócnica Particular de Loja, Loja, Ecuador
| | - Mario A Fares
- Instituto de Biología Molecular y Celular de Plantas (CSIC-UPV), Valencia, Spain Department of Genetics, Smurfit Institute of Genetics, University of Dublin, Trinity College Dublin, Dublin, Ireland
| |
Collapse
|
5
|
Faure G, Koonin EV. Universal distribution of mutational effects on protein stability, uncoupling of protein robustness from sequence evolution and distinct evolutionary modes of prokaryotic and eukaryotic proteins. Phys Biol 2015; 12:035001. [PMID: 25927823 DOI: 10.1088/1478-3975/12/3/035001] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Robustness to destabilizing effects of mutations is thought of as a key factor of protein evolution. The connections between two measures of robustness, the relative core size and the computationally estimated effect of mutations on protein stability (ΔΔG), protein abundance and the selection pressure on protein-coding genes (dN/dS) were analyzed for the organisms with a large number of available protein structures including four eukaryotes, two bacteria and one archaeon. The distribution of the effects of mutations in the core on protein stability is universal and indistinguishable in eukaryotes and bacteria, centered at slightly destabilizing amino acid replacements, and with a heavy tail of more strongly destabilizing replacements. The distribution of mutational effects in the hyperthermophilic archaeon Thermococcus gammatolerans is significantly shifted toward strongly destabilizing replacements which is indicative of stronger constraints that are imposed on proteins in hyperthermophiles. The median effect of mutations is strongly, positively correlated with the relative core size, in evidence of the congruence between the two measures of protein robustness. However, both measures show only limited correlations to the expression level and selection pressure on protein-coding genes. Thus, the degree of robustness reflected in the universal distribution of mutational effects appears to be a fundamental, ancient feature of globular protein folds whereas the observed variations are largely neutral and uncoupled from short term protein evolution. A weak anticorrelation between protein core size and selection pressure is observed only for surface residues in prokaryotes but a stronger anticorrelation is observed for all residues in eukaryotic proteins. This substantial difference between proteins of prokaryotes and eukaryotes is likely to stem from the demonstrable higher compactness of prokaryotic proteins.
Collapse
Affiliation(s)
- Guilhem Faure
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | |
Collapse
|
6
|
Abstract
Levels of selective constraint vary among proteins. Although strong constraint on a protein is often attributed to its functional importance, evolutionary rate may also be limited if a protein is fragile, such that a large proportion of amino acid replacements reduce its fitness. To determine the relative contributions of essentiality and fragility to selective constraint, we compared relationships of selection against nonsense mutations (snon) and selection against missense mutations (smis) to protein sequence conservation (Ka). As expected, snon is greater than smis; however, the correlation between smis and Ka is nearly three times stronger than the correlation between snon and Ka. Moreover, examination of relationships to gene expression level, tissue specificity, and number of protein-protein interactions shows that smis is more strongly correlated than snon to all three measures of biological function. Thus, our analysis reveals that slowly evolving proteins are under strong selective constraint primarily because they are fragile, and that this association likely exists because allowing a protein to function improperly, rather than removing it from a biological network, can negatively affect the functions of other molecules it interacts with and their downstream products.
Collapse
Affiliation(s)
- Raquel Assis
- Department of Biology, Pennsylvania State University
| | | |
Collapse
|
7
|
Alvarez-Ponce D, Fares MA. Evolutionary rate and duplicability in the Arabidopsis thaliana protein-protein interaction network. Genome Biol Evol 2013; 4:1263-74. [PMID: 23160177 PMCID: PMC3542556 DOI: 10.1093/gbe/evs101] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Genes show a bewildering variation in their patterns of molecular evolution, as a result of the action of different levels and types of selective forces. The factors underlying this variation are, however, still poorly understood. In the last decade, the position of proteins in the protein-protein interaction network has been put forward as a determinant factor of the evolutionary rate and duplicability of their encoding genes. This conclusion, however, has been based on the analysis of the limited number of microbes and animals for which interactome-level data are available (essentially, Escherichia coli, yeast, worm, fly, and humans). Here, we study, for the first time, the relationship between the position of proteins in the high-density interactome of a plant (Arabidopsis thaliana) and the patterns of molecular evolution of their encoding genes. We found that genes whose encoded products act at the center of the network are more evolutionarily constrained than those acting at the network periphery. This trend remains significant when potential confounding factors (gene expression level and breadth, duplicability, function, and length of the encoded products) are controlled for. Even though the correlation between centrality measures and rates of evolution is generally weak, for some functional categories, it is comparable in strength to (or even stronger than) the correlation between evolutionary rates and expression levels or breadths. In addition, genes encoding interacting proteins in the network evolve at relatively similar rates. Finally, Arabidopsis proteins encoded by duplicated genes are more highly connected than those encoded by singleton genes. This observation is in agreement with the patterns observed in humans, but in contrast with those observed in E. coli, yeast, worm, and fly (whose duplicated genes tend to act at the periphery of the network), implying that the relationship between duplicability and centrality inverted at least twice during eukaryote evolution. Taken together, these results indicate that the structure of the A. thaliana network constrains the evolution of its components at multiple levels.
Collapse
Affiliation(s)
- David Alvarez-Ponce
- Department of Abiotic Stress, Integrative and Systems Biology Laboratory, Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicias (CSIC-UPV), Valencia, Spain.
| | | |
Collapse
|
8
|
Park S, Yang JS, Kim J, Shin YE, Hwang J, Park J, Jang SK, Kim S. Evolutionary history of human disease genes reveals phenotypic connections and comorbidity among genetic diseases. Sci Rep 2012; 2:757. [PMID: 23091697 PMCID: PMC3477654 DOI: 10.1038/srep00757] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2012] [Accepted: 10/03/2012] [Indexed: 01/02/2023] Open
Abstract
The extent to which evolutionary changes have impacted the phenotypic relationships among human diseases remains unclear. In this work, we report that phenotypically similar diseases are connected by the evolutionary constraints on human disease genes. Human disease groups can be classified into slowly or rapidly evolving classes, where the diseases in the slowly evolving class are enriched with morphological phenotypes and those in the rapidly evolving class are enriched with physiological phenotypes. Our findings establish a clear evolutionary connection between disease classes and disease phenotypes for the first time. Furthermore, the high comorbidity found between diseases connected by similar evolutionary constraints enables us to improve the predictability of the relative risk of human diseases. We find the evolutionary constraints on disease genes are a new layer of molecular connection in the network-based exploration of human diseases.
Collapse
Affiliation(s)
- Solip Park
- School of Interdisciplinary Bioscience and Bioengineering, Biotechnology Research Center, Pohang University of Science and Technology, Pohang, Korea
| | | | | | | | | | | | | | | |
Collapse
|
9
|
Rorick M. Quantifying protein modularity and evolvability: a comparison of different techniques. Biosystems 2012; 110:22-33. [PMID: 22796584 DOI: 10.1016/j.biosystems.2012.06.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2011] [Revised: 06/20/2012] [Accepted: 06/27/2012] [Indexed: 10/28/2022]
Abstract
Modularity increases evolvability by reducing constraints on adaptation and by allowing preexisting parts to function in new contexts for novel uses. Protein evolution provides an excellent context to study the causes and consequences of biological modularity. In order to address such questions, however, an index for protein modularity is necessary. This paper proposes a simple index for protein modularity-"module density"-which is the number of evolutionarily independent modules that compose a protein divided by the number of amino acids in the protein. The decomposition of proteins into constituent modules can be accomplished by either of two classes of methods. The first class of methods relies on "suppositional" criteria to assign amino acids to modules, whereas the second class of methods relies on "coevolutionary" criteria for this task. One simple and practical method from the first class consists of approximating the number of modules in a protein as the number of regular secondary structure elements (i.e., helices and sheets). Methods based on coevolutionary criteria require more elaborate data, but they have the advantage of being able to specify modules without prior assumptions about why they exist. Given the increasing availability of datasets sampling protein mutational spectra (e.g., from comparative genomics, experimental evolution, and computational prediction), methods based on coevolutionary criteria will likely become more promising in the near future. The ability to meaningfully quantify protein modularity via simple indices has the potential to aid future efforts to understand protein evolutionary rate determinants, improve molecular evolution models and engineer novel proteins.
Collapse
Affiliation(s)
- Mary Rorick
- University of Michigan, Department of Ecology and Evolutionary Biology, Ann Arbor, MI 48109-1048, United States.
| |
Collapse
|
10
|
Linard B, Nguyen NH, Prosdocimi F, Poch O, Thompson JD. EvoluCode: Evolutionary Barcodes as a Unifying Framework for Multilevel Evolutionary Data. Evol Bioinform Online 2011; 8:61-77. [PMID: 22267905 PMCID: PMC3256995 DOI: 10.4137/ebo.s8814] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022] Open
Abstract
Evolutionary systems biology aims to uncover the general trends and principles governing the evolution of biological networks. An essential part of this process is the reconstruction and analysis of the evolutionary histories of these complex, dynamic networks. Unfortunately, the methodologies for representing and exploiting such complex evolutionary histories in large scale studies are currently limited. Here, we propose a new formalism, called EvoluCode (Evolutionary barCode), which allows the integration of different evolutionary parameters (eg, sequence conservation, orthology, synteny …) in a unifying format and facilitates the multilevel analysis and visualization of complex evolutionary histories at the genome scale. The advantages of the approach are demonstrated by constructing barcodes representing the evolution of the complete human proteome. Two large-scale studies are then described: (i) the mapping and visualization of the barcodes on the human chromosomes and (ii) automatic clustering of the barcodes to highlight protein subsets sharing similar evolutionary histories and their functional analysis. The methodologies developed here open the way to the efficient application of other data mining and knowledge extraction techniques in evolutionary systems biology studies. A database containing all EvoluCode data is available at: http://lbgi.igbmc.fr/barcodes.
Collapse
Affiliation(s)
- Benjamin Linard
- Laboratoire De Bioinformatique Et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire CNRS/INSERM/UDS, Illkirch, France
| | - Ngoc Hoan Nguyen
- Laboratoire De Bioinformatique Et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire CNRS/INSERM/UDS, Illkirch, France
| | | | - Olivier Poch
- Laboratoire De Bioinformatique Et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire CNRS/INSERM/UDS, Illkirch, France
| | - Julie D. Thompson
- Laboratoire De Bioinformatique Et Génomique Intégratives, Institut de Génétique et de Biologie Moléculaire et Cellulaire CNRS/INSERM/UDS, Illkirch, France
| |
Collapse
|
11
|
Rorick MM, Wagner GP. Protein structural modularity and robustness are associated with evolvability. Genome Biol Evol 2011; 3:456-75. [PMID: 21602570 PMCID: PMC3134980 DOI: 10.1093/gbe/evr046] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
Theory suggests that biological modularity and robustness allow for maintenance of fitness under mutational change, and when this change is adaptive, for evolvability. Empirical demonstrations that these traits promote evolvability in nature remain scant however. This is in part because modularity, robustness, and evolvability are difficult to define and measure in real biological systems. Here, we address whether structural modularity and/or robustness confer evolvability at the level of proteins by looking for associations between indices of protein structural modularity, structural robustness, and evolvability. We propose a novel index for protein structural modularity: the number of regular secondary structure elements (helices and strands) divided by the number of residues in the structure. We index protein evolvability as the proportion of sites with evidence of being under positive selection multiplied by the average rate of adaptive evolution at these sites, and we measure this as an average over a phylogeny of 25 mammalian species. We use contact density as an index of protein designability, and thus, structural robustness. We find that protein evolvability is positively associated with structural modularity as well as structural robustness and that the effect of structural modularity on evolvability is independent of the structural robustness index. We interpret these associations to be the result of reduced constraints on amino acid substitutions in highly modular and robust protein structures, which results in faster adaptation through natural selection.
Collapse
Affiliation(s)
- Mary M Rorick
- Department of Genetics, Yale University, New Haven, Connecticut, USA.
| | | |
Collapse
|
12
|
Wolf YI, Gopich IV, Lipman DJ, Koonin EV. Relative contributions of intrinsic structural-functional constraints and translation rate to the evolution of protein-coding genes. Genome Biol Evol 2010; 2:190-9. [PMID: 20624725 PMCID: PMC2940324 DOI: 10.1093/gbe/evq010] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
A long-standing assumption in evolutionary biology is that the evolution rate of protein-coding genes depends, largely, on specific constraints that affect the function of the given protein. However, recent research in evolutionary systems biology revealed unexpected, significant correlations between evolution rate and characteristics of genes or proteins that are not directly related to specific protein functions, such as expression level and protein–protein interactions. The strongest connections were consistently detected between protein sequence evolution rate and the expression level of the respective gene. A recent genome-wide proteomic study revealed an extremely strong correlation between the abundances of orthologous proteins in distantly related animals, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster. We used the extensive protein abundance data from this study along with short-term evolutionary rates (ERs) of orthologous genes in nematodes and flies to estimate the relative contributions of structural–functional constraints and the translation rate to the evolution rate of protein-coding genes. Together the intrinsic constraints and translation rate account for approximately 50% of the variance of the ERs. The contribution of constraints is estimated to be 3- to 5-fold greater than the contribution of translation rate.
Collapse
Affiliation(s)
- Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA
| | | | | | | |
Collapse
|
13
|
The universal distribution of evolutionary rates of genes and distinct characteristics of eukaryotic genes of different apparent ages. Proc Natl Acad Sci U S A 2009; 106:7273-80. [PMID: 19351897 DOI: 10.1073/pnas.0901808106] [Citation(s) in RCA: 171] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The evolutionary rates of protein-coding genes in an organism span, approximately, 3 orders of magnitude and show a universal, approximately log-normal distribution in a broad variety of species from prokaryotes to mammals. This universal distribution implies a steady-state process, with identical distributions of evolutionary rates among genes that are gained and genes that are lost. A mathematical model of such process is developed under the single assumption of the constancy of the distributions of the propensities for gene loss (PGL). This model predicts that genes of different ages, that is, genes with homologs detectable at different phylogenetic depths, substantially differ in those variables that correlate with PGL. We computationally partition protein-coding genes from humans, flies, and Aspergillus fungus into age classes, and show that genes of different ages retain the universal log-normal distribution of evolutionary rates, with a shift toward higher rates in "younger" classes but also with a substantial overlap. The only exception involves human primate-specific genes that show a heavy tail of rapidly evolving genes, probably owing to gene annotation artifacts. As predicted, the gene age classes differ in characteristics correlated with PGL. Compared with "young" genes (e.g., mammal-specific human ones), "old" genes (e.g., eukaryote-specific), on average, are longer, are expressed at a higher level, possess a higher intron density, evolve slower on the short time scale, and are subject to stronger purifying selection. Thus, genome evolution fits a simple model with approximately uniform rates of gene gain and loss, without major bursts of genomic innovation.
Collapse
|
14
|
Li M, Huang Y, Xiao Y. Effects of external interactions on protein sequence-structure relations of beta-trefoil fold. Proteins 2009; 72:1161-70. [PMID: 18320584 DOI: 10.1002/prot.22010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Proteins with symmetric structures are ideal models to investigate the sequence-structure relations. We investigate proteins with beta-trefoil fold and find they have different degrees of sequence symmetries although they show similar symmetric structures. To understand this, we calculate the strength of interactions of the beta-trefoil folds with surrounding environments and find the low degrees of sequence symmetries are often correlated with large external interactions. Our results give an additional confirmation of Anfinsen's thermodynamic hypothesis that protein structures are not only determined by their sequences but also by their surrounding environments. We suggest the external interactions should be considered additionally in protein structure prediction through ab initio folding.
Collapse
Affiliation(s)
- Mingfeng Li
- Department of Physics, Biomolecular Physics and Modeling Group, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | | | | |
Collapse
|
15
|
Wolf MY, Wolf YI, Koonin EV. Comparable contributions of structural-functional constraints and expression level to the rate of protein sequence evolution. Biol Direct 2008; 3:40. [PMID: 18840284 PMCID: PMC2572155 DOI: 10.1186/1745-6150-3-40] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2008] [Accepted: 10/07/2008] [Indexed: 01/01/2023] Open
Abstract
Background Proteins show a broad range of evolutionary rates. Understanding the factors that are responsible for the characteristic rate of evolution of a given protein arguably is one of the major goals of evolutionary biology. A long-standing general assumption used to be that the evolution rate is, primarily, determined by the specific functional constraints that affect the given protein. These constrains were traditionally thought to depend both on the specific features of the protein's structure and its biological role. The advent of systems biology brought about new types of data, such as expression level and protein-protein interactions, and unexpectedly, a variety of correlations between protein evolution rate and these variables have been observed. The strongest connections by far were repeatedly seen between protein sequence evolution rate and the expression level of the respective gene. It has been hypothesized that this link is due to the selection for the robustness of the protein structure to mistranslation-induced misfolding that is particularly important for highly expressed proteins and is the dominant determinant of the sequence evolution rate. Results This work is an attempt to assess the relative contributions of protein domain structure and function, on the one hand, and expression level on the other hand, to the rate of sequence evolution. To this end, we performed a genome-wide analysis of the effect of the fusion of a pair of domains in multidomain proteins on the difference in the domain-specific evolutionary rates. The mistranslation-induced misfolding hypothesis would predict that, within multidomain proteins, fused domains, on average, should evolve at substantially closer rates than the same domains in different proteins because, within a mutlidomain protein, all domains are translated at the same rate. We performed a comprehensive comparison of the evolutionary rates of mammalian and plant protein domains that are either joined in multidomain proteins or contained in distinct proteins. Substantial homogenization of evolutionary rates in multidomain proteins was, indeed, observed in both animals and plants, although highly significant differences between domain-specific rates remained. The contributions of the translation rate, as determined by the effect of the fusion of a pair of domains within a multidomain protein, and intrinsic, domain-specific structural-functional constraints appear to be comparable in magnitude. Conclusion Fusion of domains in a multidomain protein results in substantial homogenization of the domain-specific evolutionary rates but significant differences between domain-specific evolution rates remain. Thus, the rate of translation and intrinsic structural-functional constraints both exert sizable and comparable effects on sequence evolution. Reviewers This article was reviewed by Sergei Maslov, Dennis Vitkup, Claus Wilke (nominated by Orly Alter), and Allan Drummond (nominated by Joel Bader). For the full reviews, please go to the Reviewers' Reports section.
Collapse
Affiliation(s)
- Maxim Y Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
16
|
Rose MR, Oakley TH. The new biology: beyond the Modern Synthesis. Biol Direct 2007; 2:30. [PMID: 18036242 PMCID: PMC2222615 DOI: 10.1186/1745-6150-2-30] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2007] [Accepted: 11/24/2007] [Indexed: 12/22/2022] Open
Abstract
BACKGROUND The last third of the 20th Century featured an accumulation of research findings that severely challenged the assumptions of the "Modern Synthesis" which provided the foundations for most biological research during that century. The foundations of that "Modernist" biology had thus largely crumbled by the start of the 21st Century. This in turn raises the question of foundations for biology in the 21st Century. CONCLUSION Like the physical sciences in the first half of the 20th Century, biology at the start of the 21st Century is achieving a substantive maturity of theory, experimental tools, and fundamental findings thanks to relatively secure foundations in genomics. Genomics has also forced biologists to connect evolutionary and molecular biology, because these formerly Balkanized disciplines have been brought together as actors on the genomic stage. Biologists are now addressing the evolution of genetic systems using more than the concepts of population biology alone, and the problems of cell biology using more than the tools of biochemistry and molecular biology alone. It is becoming increasingly clear that solutions to such basic problems as aging, sex, development, and genome size potentially involve elements of biological science at every level of organization, from molecule to population. The new biology knits together genomics, bioinformatics, evolutionary genetics, and other such general-purpose tools to supply novel explanations for the paradoxes that undermined Modernist biology.
Collapse
Affiliation(s)
- Michael R Rose
- Department of Ecology and Evolutionary Biology, University of California, Irvine, CA, 92697-2525 USA
| | - Todd H Oakley
- Department of Ecology, Evolution, and Marine Biology, University of California, Santa Barbara, CA 93106-9610 USA
| |
Collapse
|
17
|
Abstract
All protein coding genes have a phylogenetic history that when understood can lead to deep insights into the diversification or conservation of function, the evolution of developmental complexity, and the molecular basis of disease. One important part to reconstructing the relationships among genes in different organisms is an accurate method to find orthologs as well as an accurate measure of evolutionary diversification. The present chapter details such a method, called the reciprocal smallest distance algorithm (RSD). This approach improves upon the common procedure of taking reciprocal best Basic Local Alignment Search Tool hits (RBH) in the identification of orthologs by using global sequence alignment and maximum likelihood estimation of evolutionary distances to detect orthologs between two genomes. RSD finds many putative orthologs missed by RBH because it is less likely to be misled by the presence of close paralogs in genomes. The package offers a tremendous amount of flexibility in investigating parameter settings allowing the user to search for increasingly distant orthologs between highly divergent species, among other advantages. The flexibility of this tool makes it a unique and powerful addition to other available approaches for ortholog detection.
Collapse
|
18
|
Koonin EV, Wolf YI. Evolutionary systems biology: links between gene evolution and function. Curr Opin Biotechnol 2006; 17:481-7. [PMID: 16962765 DOI: 10.1016/j.copbio.2006.08.003] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2006] [Revised: 08/10/2006] [Accepted: 08/30/2006] [Indexed: 11/25/2022]
Abstract
The recent accumulation of genome-wide data on various facets of gene expression, function and evolution stimulated the emergence of a new field, evolutionary systems biology. Many significant correlations were detected between variables that characterize the functioning of a gene, such as expression level, knockout effect, connectivity of genetic and protein-protein interaction networks, and variables that describe gene evolution, such as sequence evolution rate and propensity for gene loss. The first attempts on multidimensional analysis of genomic data yielded composite variables that describe the 'status' of a gene in the genomic community. However, it remains uncertain whether different functional variables affect gene evolution synergistically or there is a single, dominant factor. The number of translation events, linked to selection for translational robustness, was proposed as a candidate for such a major determinant of protein evolution. These developments show that, although the methodological basis of evolutionary systems biology is not yet fully solidified, this area of research is already starting to yield fundamental biological insights.
Collapse
Affiliation(s)
- Eugene V Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | |
Collapse
|
19
|
Abstract
Recent genome analyses revealed intriguing correlations between variables characterizing the functioning of a gene, such as expression level (EL), connectivity of genetic and protein-protein interaction networks, and knockout effect, and variables describing gene evolution, such as sequence evolution rate (ER) and propensity for gene loss. Typically, variables within each of these classes are positively correlated, e.g. products of highly expressed genes also have a propensity to be involved in many protein-protein interactions, whereas variables between classes are negatively correlated, e.g. highly expressed genes, on average, evolve slower than weakly expressed genes. Here, we describe principal component (PC) analysis of seven genome-related variables and propose biological interpretations for the first three PCs. The first PC reflects a gene's 'importance', or the 'status' of a gene in the genomic community, with positive contributions from knockout lethality, EL, number of protein-protein interaction partners and the number of paralogues, and negative contributions from sequence ER and gene loss propensity. The next two PCs define a plane that seems to reflect the functional and evolutionary plasticity of a gene. Specifically, PC2 can be interpreted as a gene's 'adaptability' whereby genes with high adaptability readily duplicate, have many genetic interaction partners and tend to be non-essential. PC3 also might reflect the role of a gene in organismal adaptation albeit with a negative rather than a positive contribution of genetic interactions; we provisionally designate this PC 'reactivity'. The interpretation of PC2 and PC3 as measures of a gene's plasticity is compatible with the observation that genes with high values of these PCs tend to be expressed in a condition- or tissue-specific manner. Functional classes of genes substantially vary in status, adaptability and reactivity, with the highest status characteristic of the translation system and cytoskeletal proteins, highest adaptability seen in cellular processes and signalling genes, and top reactivity characteristic of metabolic enzymes.
Collapse
|
20
|
Sällström B, Arnaout RA, Davids W, Bjelkmar P, Andersson SGE. Protein evolutionary rates correlate with expression independently of synonymous substitutions in Helicobacter pylori. J Mol Evol 2006; 62:600-14. [PMID: 16586017 DOI: 10.1007/s00239-005-0104-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2005] [Accepted: 12/20/2005] [Indexed: 11/29/2022]
Abstract
In free-living microorganisms, such as Escherichia coli and Saccharomyces cerevisiae, both synonymous and nonsynonymous substitution frequencies correlate with expression levels. Here, we have tested the hypothesis that the correlation between amino acid substitution rates and expression is a by-product of selection for codon bias and translational efficiency in highly expressed genes. To this end, we have examined the correlation between protein evolutionary rates and expression in the human gastric pathogen Helicobacter pylori, where the absence of selection on synonymous sites enables the two types of substitutions to be uncoupled. The results revealed a statistically significant negative correlation between expression levels and nonsynonymous substitutions in both H. pylori and E. coli. We also found that neighboring genes located on the same, but not on opposite strands, evolve at significantly more similar rates than random gene pairs, as expected by co-expression of genes located in the same operon. However, the two species differ in that synonymous substitutions show a strand-specific pattern in E. coli, whereas the weak similarity in synonymous substitutions for neighbors in H. pylori is independent of gene orientation. These results suggest a direct influence of expression levels on nonsynonymous substitution frequencies independent of codon bias and selective constraints on synonymous sites.
Collapse
Affiliation(s)
- Björn Sällström
- Program of Molecular Evolution, Department of Evolution, Genomics and Systematics, Evolutionary Biology Center, Uppsala University, 752 36 Uppsala, Sweden
| | | | | | | | | |
Collapse
|