1
|
Majid I, Sergeev YV. Linking Protein Stability to Pathogenicity: Predicting Clinical Significance of Single-Missense Mutations in Ocular Proteins Using Machine Learning. Int J Mol Sci 2024; 25:11649. [PMID: 39519200 PMCID: PMC11546782 DOI: 10.3390/ijms252111649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 10/28/2024] [Accepted: 10/28/2024] [Indexed: 11/16/2024] Open
Abstract
Understanding the effect of single-missense mutations on protein stability is crucial for clinical decision-making and therapeutic development. The impact of these mutations on protein stability and 3D structure remains underexplored. Here, we developed a program to investigate the relationship between pathogenic mutations with protein unfolding and compared seven machine learning (ML) models to predict the clinical significance of single-missense mutations with unknown impacts, based on protein stability parameters. We analyzed seven proteins associated with ocular disease-causing genes. The program revealed an R-squared value of 0.846 using Decision Tree Regression between pathogenic mutations and decreased protein stability, with 96.20% of pathogenic mutations in RPE65 leading to protein instability. Among the ML models, Random Forest achieved the highest AUC (0.922) and PR AUC (0.879) in predicting the clinical significance of mutations with unknown effects. Our findings indicate that most pathogenic mutations affecting protein stability occur in alpha-helices, beta-pleated sheets, and active sites. This study suggests that protein stability can serve as a valuable parameter for interpreting the clinical significance of single-missense mutations in ocular proteins.
Collapse
Affiliation(s)
| | - Yuri V. Sergeev
- Ophthalmic Genetics and Visual Function Branch, National Eye Institute, National Institute of Health, Bethesda, MD 20892, USA
| |
Collapse
|
2
|
Usmanova DR, Plata G, Vitkup D. Functional Optimization in Distinct Tissues and Conditions Constrains the Rate of Protein Evolution. Mol Biol Evol 2024; 41:msae200. [PMID: 39431545 PMCID: PMC11523136 DOI: 10.1093/molbev/msae200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 07/29/2024] [Accepted: 08/05/2024] [Indexed: 10/22/2024] Open
Abstract
Understanding the main determinants of protein evolution is a fundamental challenge in biology. Despite many decades of active research, the molecular and cellular mechanisms underlying the substantial variability of evolutionary rates across cellular proteins are not currently well understood. It also remains unclear how protein molecular function is optimized in the context of multicellular species and why many proteins, such as enzymes, are only moderately efficient on average. Our analysis of genomics and functional datasets reveals in multiple organisms a strong inverse relationship between the optimality of protein molecular function and the rate of protein evolution. Furthermore, we find that highly expressed proteins tend to be substantially more functionally optimized. These results suggest that cellular expression costs lead to more pronounced functional optimization of abundant proteins and that the purifying selection to maintain high levels of functional optimality significantly slows protein evolution. We observe that in multicellular species both the rate of protein evolution and the degree of protein functional efficiency are primarily affected by expression in several distinct cell types and tissues, specifically, in developed neurons with upregulated synaptic processes in animals and in young and fast-growing tissues in plants. Overall, our analysis reveals how various constraints from the molecular, cellular, and species' levels of biological organization jointly affect the rate of protein evolution and the level of protein functional adaptation.
Collapse
Affiliation(s)
- Dinara R Usmanova
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
| | - Germán Plata
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
- BiomEdit, Fishers, IN 46037, USA
| | - Dennis Vitkup
- Department of Systems Biology, Columbia University, New York, NY 10032, USA
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, USA
| |
Collapse
|
3
|
Analysis of the Contribution of Intrinsic Disorder in Shaping Potyvirus Genetic Diversity. Viruses 2022; 14:v14091959. [PMID: 36146764 PMCID: PMC9504506 DOI: 10.3390/v14091959] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 08/30/2022] [Accepted: 08/31/2022] [Indexed: 12/30/2022] Open
Abstract
Intrinsically disordered regions (IDRs) are abundant in the proteome of RNA viruses. The multifunctional properties of these regions are widely documented and their structural flexibility is associated with the low constraint in their amino acid positions. Therefore, from an evolutionary stand point, these regions could have a greater propensity to accumulate non-synonymous mutations (NS) than highly structured regions (ORs, or 'ordered regions'). To address this hypothesis, we compared the distribution of non-synonymous mutations (NS), which we relate here to mutational robustness, in IDRs and ORs in the genome of potyviruses, a major genus of plant viruses. For this purpose, a simulation model was built and used to distinguish a possible selection phenomenon in the biological datasets from randomly generated mutations. We analyzed several short-term experimental evolution datasets. An analysis was also performed on the natural diversity of three different species of potyviruses reflecting their long-term evolution. We observed that the mutational robustness of IDRs is significantly higher than that of ORs. Moreover, the substitutions in the ORs are very constrained by the conservation of the physico-chemical properties of the amino acids. This feature is not found in the IDRs where the substitutions tend to be more random. This reflects the weak structural constraints in these regions, wherein an amino acid polymorphism is naturally conserved. In the course of evolution, potyvirus IDRs and ORs follow different evolutive paths with respect to their mutational robustness. These results have forced the authors to consider the hypothesis that IDRs and their associated amino acid polymorphism could constitute a potential adaptive reservoir.
Collapse
|
4
|
Pollet L, Lambourne L, Xia Y. Structural Determinants of Yeast Protein-Protein Interaction Interface Evolution at the Residue Level. J Mol Biol 2022; 434:167750. [PMID: 35850298 DOI: 10.1016/j.jmb.2022.167750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 06/09/2022] [Accepted: 07/12/2022] [Indexed: 12/01/2022]
Abstract
Interfaces of contact between proteins play important roles in determining the proper structure and function of protein-protein interactions (PPIs). Therefore, to fully understand PPIs, we need to better understand the evolutionary design principles of PPI interfaces. Previous studies have uncovered that interfacial sites are more evolutionarily conserved than other surface protein sites. Yet, little is known about the nature and relative importance of evolutionary constraints in PPI interfaces. Here, we explore constraints imposed by the structure of the microenvironment surrounding interfacial residues on residue evolutionary rate using a large dataset of over 700 structural models of baker's yeast PPIs. We find that interfacial residues are, on average, systematically more conserved than all other residues with a similar degree of total burial as measured by relative solvent accessibility (RSA). Besides, we find that RSA of the residue when the PPI is formed is a better predictor of interfacial residue evolutionary rate than RSA in the monomer state. Furthermore, we investigate four structure-based measures of residue interfacial involvement, including change in RSA upon binding (ΔRSA), number of residue-residue contacts across the interface, and distance from the center or the periphery of the interface. Integrated modeling for evolutionary rate prediction in interfaces shows that ΔRSA plays a dominant role among the four measures of interfacial involvement, with minor, but independent contributions from other measures. These results yield insight into the evolutionary design of interfaces, improving our understanding of the role that structure plays in the molecular evolution of PPIs at the residue level.
Collapse
Affiliation(s)
- Léah Pollet
- Department of Bioengineering, Faculty of Engineering, McGill University, Montreal, QC, Canada
| | - Luke Lambourne
- Center for Cancer Systems Biology (CCSB), Dana-Farber Cancer Institute, Boston, MA, USA; Department of Genetics, Blavatnik Institute, Harvard Medical School, Boston, MA, USA; Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA, USA.
| | - Yu Xia
- Department of Bioengineering, Faculty of Engineering, McGill University, Montreal, QC, Canada.
| |
Collapse
|
5
|
Bæk KT, Kepp KP. Assessment of AlphaFold2 for Human Proteins via Residue Solvent Exposure. J Chem Inf Model 2022; 62:3391-3400. [PMID: 35785970 DOI: 10.1021/acs.jcim.2c00243] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
As only 35% of human proteins feature (often partial) PDB structures, the protein structure prediction tool AlphaFold2 (AF2) could have massive impact on human biology and medicine fields, making independent benchmarks of interest. We studied AF2's ability to describe the backbone solvent exposure as a functionally important and easily interpretable "natural coordinate" of protein conformation, using human proteins as test case. After screening for appropriate comparative sets, we matched 1818 human proteins predicted by AF2 against 7585 unique experimental PDBs, and after curation for sequence overlap, we assessed 1264 comparative pairs comprising 115 unique AF2 structures and 652 unique experimental structures. AF2 performed markedly worse for multimers, whereas ligands, cofactors, and experimental resolution were interestingly not very important for performance. AF2 performed excellently for monomer proteins. Challenges relating to specific groups of residues and multimers were analyzed. We identified larger deviations for lower-confidence scores (pLDDT), and exposed residues and polar residues (e.g., Asp, Glu, Asn) being less accurately described than hydrophobic residues. Proline conformations were the hardest to predict, probably due to a common location in dynamic solvent-accessible parts. In summary, using solvent exposure as a metric, we quantified the performance of AF2 for human proteins and provided estimates of the expected agreement as a function of ligand presence, multimer/monomer status, local residue solvent exposure, pLDDT, and amino acid type. Overall performance was found to be excellent.
Collapse
Affiliation(s)
- Kristoffer T Bæk
- DTU Chemistry, Technical University of Denmark, Building 206, Kgs. Lyngby 2800, Denmark
| | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Building 206, Kgs. Lyngby 2800, Denmark
| |
Collapse
|
6
|
Abstract
The spike protein (S-protein) of SARS-CoV-2, the protein that enables the virus to infect human cells, is the basis for many vaccines and a hotspot of concerning virus evolution. Here, we discuss the outstanding progress in structural characterization of the S-protein and how these structures facilitate analysis of virus function and evolution. We emphasize the differences in reported structures and that analysis of structure-function relationships is sensitive to the structure used. We show that the average residue solvent exposure in nearly complete structures is a good descriptor of open vs closed conformation states. Because of structural heterogeneity of functionally important surface-exposed residues, we recommend using averages of a group of high-quality protein structures rather than a single structure before reaching conclusions on specific structure-function relationships. To illustrate these points, we analyze some significant chemical tendencies of prominent S-protein mutations in the context of the available structures. In the discussion of new variants, we emphasize the selectivity of binding to ACE2 vs prominent antibodies rather than simply the antibody escape or ACE2 affinity separately. We note that larger chemical changes, in particular increased electrostatic charge or side-chain volume of exposed surface residues, are recurring in mutations of concern, plausibly related to adaptation to the negative surface potential of human ACE2. We also find indications that the fixated mutations of the S-protein in the main variants are less destabilizing than would be expected on average, possibly pointing toward a selection pressure on the S-protein. The richness of available structures for all of these situations provides an enormously valuable basis for future research into these structure-function relationships.
Collapse
Affiliation(s)
- Rukmankesh Mehra
- Department of Chemistry, Indian Institute
of Technology Bhilai, Sejbahar, Raipur 492015, Chhattisgarh,
India
| | - Kasper P. Kepp
- DTU Chemistry, Technical University of
Denmark, Building 206, 2800 Kongens Lyngby,
Denmark
| |
Collapse
|
7
|
Zhang H, Bei Z, Xi W, Hao M, Ju Z, Saravanan KM, Zhang H, Guo N, Wei Y. Evaluation of residue-residue contact prediction methods: From retrospective to prospective. PLoS Comput Biol 2021; 17:e1009027. [PMID: 34029314 PMCID: PMC8177648 DOI: 10.1371/journal.pcbi.1009027] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2020] [Revised: 06/04/2021] [Accepted: 04/28/2021] [Indexed: 12/31/2022] Open
Abstract
Sequence-based residue contact prediction plays a crucial role in protein structure reconstruction. In recent years, the combination of evolutionary coupling analysis (ECA) and deep learning (DL) techniques has made tremendous progress for residue contact prediction, thus a comprehensive assessment of current methods based on a large-scale benchmark data set is very needed. In this study, we evaluate 18 contact predictors on 610 non-redundant proteins and 32 CASP13 targets according to a wide range of perspectives. The results show that different methods have different application scenarios: (1) DL methods based on multi-categories of inputs and large training sets are the best choices for low-contact-density proteins such as the intrinsically disordered ones and proteins with shallow multi-sequence alignments (MSAs). (2) With at least 5L (L is sequence length) effective sequences in the MSA, all the methods show the best performance, and methods that rely only on MSA as input can reach comparable achievements as methods that adopt multi-source inputs. (3) For top L/5 and L/2 predictions, DL methods can predict more hydrophobic interactions while ECA methods predict more salt bridges and disulfide bonds. (4) ECA methods can detect more secondary structure interactions, while DL methods can accurately excavate more contact patterns and prune isolated false positives. In general, multi-input DL methods with large training sets dominate current approaches with the best overall performance. Despite the great success of current DL methods must be stated the fact that there is still much room left for further improvement: (1) With shallow MSAs, the performance will be greatly affected. (2) Current methods show lower precisions for inter-domain compared with intra-domain contact predictions, as well as very high imbalances in precisions between intra-domains. (3) Strong prediction similarities between DL methods indicating more feature types and diversified models need to be developed. (4) The runtime of most methods can be further optimized. The amino acid sequence of a protein ultimately determines its tertiary structure, and the tertiary structure determines its function(s) and plays a key role in understanding biological processes and disease pathogenesis. Protein tertiary structure can be determined using experimental techniques such as cryo-electron microscopy, nuclear magnetic resonance and X-ray crystallography, which are very expensive and time-consuming. As an alternative, researchers are trying to use in silico methods to predict the 3D structures. Residue contact-assisted protein folding paves an avenue for sequence-based protein structure prediction and therefore has become one of the most challenging and promising problems in structural bioinformatics. Over the past years, contact prediction has undergone continuous evolution in techniques. Through a retrospective analysis of traditional machine learning /evolutionary coupling analysis methods/ consensus machine learning methods and a multi-perspective study on recently developed deep learning methods, we explore the most advanced contact predictors, pursue application scenarios for different methods, and seek prospective directions for further improvement. We anticipate that our study will serve as a practical and useful guide for the development of future approaches to contact prediction.
Collapse
Affiliation(s)
- Huiling Zhang
- University of Chinese Academy of Sciences, Beijing, China
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Zhendong Bei
- Cloud Computing Department, Alibaba Group, Hangzhou, China
| | - Wenhui Xi
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Min Hao
- College of Electronic and Information Engineering, Southwest University, Chongqing, China
| | - Zhen Ju
- University of Chinese Academy of Sciences, Beijing, China
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Konda Mani Saravanan
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Haiping Zhang
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Ning Guo
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Yanjie Wei
- University of Chinese Academy of Sciences, Beijing, China
- Centre for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
- * E-mail:
| |
Collapse
|
8
|
Decomposing Structural Response Due to Sequence Changes in Protein Domains with Machine Learning. J Mol Biol 2020; 432:4435-4446. [PMID: 32485208 DOI: 10.1016/j.jmb.2020.05.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2020] [Revised: 05/06/2020] [Accepted: 05/27/2020] [Indexed: 10/24/2022]
Abstract
How protein domain structure changes in response to mutations is not well understood. Some mutations change the structure drastically, while most only result in small changes. To gain an understanding of this, we decompose the relationship between changes in domain sequence and structure using machine learning. We select pairs of evolutionarily related domains with a broad range of evolutionary distances. In contrast to earlier studies, we do not find a strictly linear relationship between sequence and structural changes. We train a random forest regressor that predicts the structural similarity between pairs with an average accuracy of 0.029 lDDT ( local Distance Difference Test) score, and a correlation coefficient of 0.92. Decomposing the feature importance shows that the domain length, or analogously, size is the most important feature. Our model enables assessing deviations in relative structural response, and thus prediction of evolutionary trajectories, in protein domains across evolution.
Collapse
|
9
|
Razban RM, Gilson AI, Durfee N, Strobelt H, Dinkla K, Choi JM, Pfister H, Shakhnovich EI. ProteomeVis: a web app for exploration of protein properties from structure to sequence evolution across organisms' proteomes. Bioinformatics 2018; 34:3557-3565. [PMID: 29741573 PMCID: PMC6184454 DOI: 10.1093/bioinformatics/bty370] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2017] [Revised: 03/27/2018] [Accepted: 05/03/2018] [Indexed: 01/27/2023] Open
Abstract
Motivation Protein evolution spans time scales and its effects span the length of an organism. A web app named ProteomeVis is developed to provide a comprehensive view of protein evolution in the Saccharomyces cerevisiae and Escherichia coli proteomes. ProteomeVis interactively creates protein chain graphs, where edges between nodes represent structure and sequence similarities within user-defined ranges, to study the long time scale effects of protein structure evolution. The short time scale effects of protein sequence evolution are studied by sequence evolutionary rate (ER) correlation analyses with protein properties that span from the molecular to the organismal level. Results We demonstrate the utility and versatility of ProteomeVis by investigating the distribution of edges per node in organismal protein chain universe graphs (oPCUGs) and putative ER determinants. S.cerevisiae and E.coli oPCUGs are scale-free with scaling constants of 1.79 and 1.56, respectively. Both scaling constants can be explained by a previously reported theoretical model describing protein structure evolution. Protein abundance most strongly correlates with ER among properties in ProteomeVis, with Spearman correlations of -0.49 (P-value < 10-10) and -0.46 (P-value < 10-10) for S.cerevisiae and E.coli, respectively. This result is consistent with previous reports that found protein expression to be the most important ER determinant. Availability and implementation ProteomeVis is freely accessible at http://proteomevis.chem.harvard.edu. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rostam M Razban
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Amy I Gilson
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Niamh Durfee
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Hendrik Strobelt
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Kasper Dinkla
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Jeong-Mo Choi
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| | - Hanspeter Pfister
- School of Engineering & Applied Sciences, Harvard University, Cambridge, MA, USA
| | - Eugene I Shakhnovich
- Department of Chemistry & Chemical Biology, Harvard University, Cambridge, MA, USA
| |
Collapse
|
10
|
Sydykova DK, Jack BR, Spielman SJ, Wilke CO. Measuring evolutionary rates of proteins in a structural context. F1000Res 2017; 6:1845. [PMID: 29167739 DOI: 10.12688/f1000research.12874.1] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 11/18/2017] [Indexed: 11/20/2022] Open
Abstract
We describe how to measure site-specific rates of evolution in protein-coding genes and how to correlate these rates with structural features of the expressed protein, such as relative solvent accessibility, secondary structure, or weighted contact number. We present two alternative approaches to rate calculations: One based on relative amino-acid rates, and the other based on site-specific codon rates measured as dN/ dS. We additionally provide a code repository containing scripts to facilitate the specific analysis protocols we recommend.
Collapse
Affiliation(s)
- Dariya K Sydykova
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Benjamin R Jack
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Stephanie J Spielman
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| |
Collapse
|
11
|
Sydykova DK, Jack BR, Spielman SJ, Wilke CO. Measuring evolutionary rates of proteins in a structural context. F1000Res 2017; 6:1845. [PMID: 29167739 PMCID: PMC5676193 DOI: 10.12688/f1000research.12874.2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/31/2018] [Indexed: 12/14/2022] Open
Abstract
We describe how to measure site-specific rates of evolution in protein-coding genes and how to correlate these rates with structural features of the expressed protein, such as relative solvent accessibility, secondary structure, or weighted contact number. We present two alternative approaches to rate calculations: One based on relative amino-acid rates, and the other based on site-specific codon rates measured as
dN/
dS. We additionally provide a code repository containing scripts to facilitate the specific analysis protocols we recommend.
Collapse
Affiliation(s)
- Dariya K Sydykova
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Benjamin R Jack
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| | - Stephanie J Spielman
- Institute for Genomics and Evolutionary Medicine, Temple University, Philadelphia, PA, 19122, USA
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, TX, 78712, USA
| |
Collapse
|
12
|
Sydykova DK, Wilke CO. Calculating site-specific evolutionary rates at the amino-acid or codon level yields similar rate estimates. PeerJ 2017; 5:e3391. [PMID: 28584717 PMCID: PMC5452972 DOI: 10.7717/peerj.3391] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2017] [Accepted: 05/08/2017] [Indexed: 11/20/2022] Open
Abstract
Site-specific evolutionary rates can be estimated from codon sequences or from amino-acid sequences. For codon sequences, the most popular methods use some variation of the dN∕dS ratio. For amino-acid sequences, one widely-used method is called Rate4Site, and it assigns a relative conservation score to each site in an alignment. How site-wise dN∕dS values relate to Rate4Site scores is not known. Here we elucidate the relationship between these two rate measurements. We simulate sequences with known dN∕dS, using either dN∕dS models or mutation–selection models for simulation. We then infer Rate4Site scores on the simulated alignments, and we compare those scores to either true or inferred dN∕dS values on the same alignments. We find that Rate4Site scores generally correlate well with true dN∕dS, and the correlation strengths increase in alignments with greater sequence divergence and more taxa. Moreover, Rate4Site scores correlate very well with inferred (as opposed to true) dN∕dS values, even for small alignments with little divergence. Finally, we verify this relationship between Rate4Site and dN∕dS in a variety of empirical datasets. We conclude that codon-level and amino-acid-level analysis frameworks are directly comparable and yield very similar inferences.
Collapse
Affiliation(s)
- Dariya K Sydykova
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|
13
|
The Role of Evolutionary Selection in the Dynamics of Protein Structure Evolution. Biophys J 2017; 112:1350-1365. [PMID: 28402878 DOI: 10.1016/j.bpj.2017.02.029] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Revised: 02/16/2017] [Accepted: 02/22/2017] [Indexed: 02/05/2023] Open
Abstract
Homology modeling is a powerful tool for predicting a protein's structure. This approach is successful because proteins whose sequences are only 30% identical still adopt the same structure, while structure similarity rapidly deteriorates beyond the 30% threshold. By studying the divergence of protein structure as sequence evolves in real proteins and in evolutionary simulations, we show that this nonlinear sequence-structure relationship emerges as a result of selection for protein folding stability in divergent evolution. Fitness constraints prevent the emergence of unstable protein evolutionary intermediates, thereby enforcing evolutionary paths that preserve protein structure despite broad sequence divergence. However, on longer timescales, evolution is punctuated by rare events where the fitness barriers obstructing structure evolution are overcome and discovery of new structures occurs. We outline biophysical and evolutionary rationale for broad variation in protein family sizes, prevalence of compact structures among ancient proteins, and more rapid structure evolution of proteins with lower packing density.
Collapse
|
14
|
Abrusán G, Marsh JA. Alpha Helices Are More Robust to Mutations than Beta Strands. PLoS Comput Biol 2016; 12:e1005242. [PMID: 27935949 PMCID: PMC5147804 DOI: 10.1371/journal.pcbi.1005242] [Citation(s) in RCA: 77] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Accepted: 11/08/2016] [Indexed: 12/30/2022] Open
Abstract
The rapidly increasing amount of data on human genetic variation has resulted in a growing demand to identify pathogenic mutations computationally, as their experimental validation is currently beyond reach. Here we show that alpha helices and beta strands differ significantly in their ability to tolerate mutations: helices can accumulate more mutations than strands without change, due to the higher numbers of inter-residue contacts in helices. This results in two patterns: a) the same number of mutations causes less structural change in helices than in strands; b) helices diverge more rapidly in sequence than strands within the same domains. Additionally, both helices and strands are significantly more robust than coils. Based on this observation we show that human missense mutations that change secondary structure are more likely to be pathogenic than those that do not. Moreover, inclusion of predicted secondary structure changes shows significant utility for improving upon state-of-the-art pathogenicity predictions. The factors that determine the robustness and evolvability of proteins are still largely unknown. In this work the authors show that different secondary structure elements of proteins (helices and strands) differ in their ability to tolerate mutations, and demonstrate that it is caused by differences in the number of non-covalent residue interactions within these secondary structure units. The results suggest that engineering de novo all-alpha proteins should be easier than all-beta ones, as more sequences can to fold to the same topology. Additionally, secondary structure can be used to improve current methods of pathogenicity predictions; mutations that change secondary structure are more likely to be pathogenic than mutations that do not, due to their strong destabilizing effect on protein structure.
Collapse
Affiliation(s)
- György Abrusán
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, United Kingdom
- Institute of Biochemistry, Biological Research Centre of the Hungarian Academy of Sciences, Szeged, Temesvári krt. 62, Hungary
- * E-mail:
| | - Joseph A. Marsh
- MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Western General Hospital, Crewe Road, Edinburgh EH4 2XU, United Kingdom
| |
Collapse
|
15
|
Chesmore KN, Bartlett J, Cheng C, Williams SM. Complex Patterns of Association between Pleiotropy and Transcription Factor Evolution. Genome Biol Evol 2016; 8:3159-3170. [PMID: 27635052 PMCID: PMC5174740 DOI: 10.1093/gbe/evw228] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Pleiotropy has been claimed to constrain gene evolution but specific mechanisms and extent of these constraints have been difficult to demonstrate. The expansion of molecular data makes it possible to investigate these pleiotropic effects. Few classes of genes have been characterized as intensely as human transcription factors (TFs). We therefore analyzed the evolutionary rates of full TF proteins, along with their DNA binding domains and protein-protein interacting domains (PID) in light of the degree of pleiotropy, measured by the number of TF-TF interactions, or the number of DNA-binding targets. Data were extracted from the ENCODE Chip-Seq dataset, the String v 9.2 database, and the NHGRI GWAS catalog. Evolutionary rates of proteins and domains were calculated using the PAML CodeML package. Our analysis shows that the numbers of TF-TF interactions and DNA binding targets associated with constrained gene evolution; however, the constraint caused by the number of DNA binding targets was restricted to the DNA binding domains, whereas the number of TF-TF interactions constrained the full protein and did so more strongly. Additionally, we found a positive correlation between the number of protein-PIDs and the evolutionary rates of the protein-PIDs. These findings show that not only does pleiotropy associate with constrained protein evolution but the constraint differs by domain function. Finally, we show that GWAS associated TF genes are more highly pleiotropic : The GWAS data illustrates that mutations in highly pleiotropic genes are more likely to be associated with disease phenotypes.
Collapse
Affiliation(s)
- Kevin N Chesmore
- Department of Genetics, Geisel School of Medicine, Dartmouth College, Hanover, NH
| | - Jacquelaine Bartlett
- Department of Genetics, Geisel School of Medicine, Dartmouth College, Hanover, NH
| | - Chao Cheng
- Department of Genetics, Geisel School of Medicine, Dartmouth College, Hanover, NH
| | - Scott M Williams
- Department of Genetics, Geisel School of Medicine, Dartmouth College, Hanover, NH
| |
Collapse
|
16
|
Xiang F, Fang Y, Xiang J. Structural and evolutionary relationships among RuBisCOs inferred from their large and small subunits. Z NATURFORSCH C 2016; 71:181-9. [PMID: 27049618 DOI: 10.1515/znc-2016-0014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2015] [Accepted: 03/06/2016] [Indexed: 11/15/2022]
Abstract
Ribulose 1,5-bisphosphate carboxylase/oxygenase (RuBisCO) is the key enzyme to assimilate CO(2) into the biosphere. The nonredundant structural data sets for three RuBisCO domain superfamilies, i.e. large subunit C-terminal domain (LSC), large subunit N-terminal domain (LSN) and small subunit domain (SS), were selected using QR factorization based on the structural alignment with QH as the similarity measure. The structural phylogenies were then constructed to investigate a possible functional significance of the evolutionary diversification. The LSC could have occurred in both bacteria and archaea, and has evolved towards increased complexity in both bacteria and eukaryotes with a 4-helix-2-helix-2-helix bundle being extended into a 5-helix-3-helix-3-helix one at the LSC carboxyl-terminus. The structural variations of LSN could have originated not only in bacteria with a short coil, but also in eukaryotes with a long one. Meanwhile, the SS dendrogram can be contributed to the structural variations at the βA-βB-loop region. All the structural variations observed in the coil regions have influence on catalytic performance or CO(2)/O(2) selectivities of RuBisCOs from different species. Such findings provide insights on RuBisCO improvements.
Collapse
|
17
|
Echave J, Spielman SJ, Wilke CO. Causes of evolutionary rate variation among protein sites. Nat Rev Genet 2016; 17:109-21. [PMID: 26781812 DOI: 10.1038/nrg.2015.18] [Citation(s) in RCA: 180] [Impact Index Per Article: 20.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
It has long been recognized that certain sites within a protein, such as sites in the protein core or catalytic residues in enzymes, are evolutionarily more conserved than other sites. However, our understanding of rate variation among sites remains surprisingly limited. Recent progress to address this includes the development of a wide array of reliable methods to estimate site-specific substitution rates from sequence alignments. In addition, several molecular traits have been identified that correlate with site-specific mutation rates, and novel mechanistic biophysical models have been proposed to explain the observed correlations. Nonetheless, current models explain, at best, approximately 60% of the observed variance, highlighting the limitations of current methods and models and the need for new research directions.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina
| | - Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| |
Collapse
|
18
|
Shahzad K, Mittenthal JE, Caetano-Anollés G. The organization of domains in proteins obeys Menzerath-Altmann's law of language. BMC SYSTEMS BIOLOGY 2015; 9:44. [PMID: 26260760 PMCID: PMC4531524 DOI: 10.1186/s12918-015-0192-9] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/25/2015] [Accepted: 07/30/2015] [Indexed: 11/10/2022]
Abstract
BACKGROUND The combination of domains in multidomain proteins enhances their function and structure but lengthens the molecules and increases their cost at cellular level. METHODS The dependence of domain length on the number of domains a protein holds was surveyed for a set of 60 proteomes representing free-living organisms from all kingdoms of life. Distributions were fitted using non-linear functions and fitted parameters interpreted with a formulation of decreasing returns. RESULTS We find that domain length decreases with increasing number of domains in proteins, following the Menzerath-Altmann (MA) law of language. Highly significant negative correlations exist for the set of proteomes examined. Mathematically, the MA law expresses as a power law relationship that unfolds when molecular persistence P is a function of domain accretion. P holds two terms, one reflecting the matter-energy cost of adding domains and extending their length, the other reflecting how domain length and number impinges on information and biophysics. The pattern of diminishing returns can therefore be explained as a frustrated interplay between the strategies of economy, flexibility and robustness, matching previously observed trade-offs in the domain makeup of proteomes. Proteomes of Archaea, Fungi and to a lesser degree Plants show the largest push towards molecular economy, each at their own economic stratum. Fungi increase domain size in single domain proteins while reinforcing the pattern of diminishing returns. In contrast, Metazoa, and to lesser degrees Protista and Bacteria, relax economy. Metazoa achieves maximum flexibility and robustness by harboring compact molecules and complex domain organization, offering a new functional vocabulary for molecular biology. CONCLUSIONS The tendency of parts to decrease their size when systems enlarge is universal for language and music, and now for parts of macromolecules, extending the MA law to natural systems.
Collapse
Affiliation(s)
| | - Jay E Mittenthal
- Department of Cell and Developmental Biology, Urbana, IL, 61801, USA.
| | - Gustavo Caetano-Anollés
- Illinois Informatics Institute, Urbana, IL, 61801, USA. .,Department of Crop Sciences, Evolutionary Bioinformatics Laboratory, University of Illinois, 332 NSRC, Urbana, IL, 61801, USA.
| |
Collapse
|
19
|
Faure G, Koonin EV. Universal distribution of mutational effects on protein stability, uncoupling of protein robustness from sequence evolution and distinct evolutionary modes of prokaryotic and eukaryotic proteins. Phys Biol 2015; 12:035001. [PMID: 25927823 DOI: 10.1088/1478-3975/12/3/035001] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Robustness to destabilizing effects of mutations is thought of as a key factor of protein evolution. The connections between two measures of robustness, the relative core size and the computationally estimated effect of mutations on protein stability (ΔΔG), protein abundance and the selection pressure on protein-coding genes (dN/dS) were analyzed for the organisms with a large number of available protein structures including four eukaryotes, two bacteria and one archaeon. The distribution of the effects of mutations in the core on protein stability is universal and indistinguishable in eukaryotes and bacteria, centered at slightly destabilizing amino acid replacements, and with a heavy tail of more strongly destabilizing replacements. The distribution of mutational effects in the hyperthermophilic archaeon Thermococcus gammatolerans is significantly shifted toward strongly destabilizing replacements which is indicative of stronger constraints that are imposed on proteins in hyperthermophiles. The median effect of mutations is strongly, positively correlated with the relative core size, in evidence of the congruence between the two measures of protein robustness. However, both measures show only limited correlations to the expression level and selection pressure on protein-coding genes. Thus, the degree of robustness reflected in the universal distribution of mutational effects appears to be a fundamental, ancient feature of globular protein folds whereas the observed variations are largely neutral and uncoupled from short term protein evolution. A weak anticorrelation between protein core size and selection pressure is observed only for surface residues in prokaryotes but a stronger anticorrelation is observed for all residues in eukaryotic proteins. This substantial difference between proteins of prokaryotes and eukaryotes is likely to stem from the demonstrable higher compactness of prokaryotic proteins.
Collapse
Affiliation(s)
- Guilhem Faure
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | |
Collapse
|
20
|
Tóth-Petróczy A, Tawfik DS. The robustness and innovability of protein folds. Curr Opin Struct Biol 2014; 26:131-8. [PMID: 25038399 DOI: 10.1016/j.sbi.2014.06.007] [Citation(s) in RCA: 97] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2013] [Revised: 06/26/2014] [Accepted: 06/26/2014] [Indexed: 11/30/2022]
Abstract
Assignment of protein folds to functions indicates that >60% of folds carry out one or two enzymatic functions, while few folds, for example, the TIM-barrel and Rossmann folds, exhibit hundreds. Are there structural features that make a fold amenable to functional innovation (innovability)? Do these features relate to robustness--the ability to readily accumulate sequence changes? We discuss several hypotheses regarding the relationship between the architecture of a protein and its evolutionary potential. We describe how, in a seemingly paradoxical manner, opposite properties, such as high stability and rigidity versus conformational plasticity and structural order versus disorder, promote robustness and/or innovability. We hypothesize that polarity--differentiation and low connectivity between a protein's scaffold and its active-site--is a key prerequisite for innovability.
Collapse
Affiliation(s)
- Agnes Tóth-Petróczy
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Dan S Tawfik
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot 76100, Israel.
| |
Collapse
|
21
|
Chuang TJ, Chiang TW. Impacts of pretranscriptional DNA methylation, transcriptional transcription factor, and posttranscriptional microRNA regulations on protein evolutionary rate. Genome Biol Evol 2014; 6:1530-1541. [PMID: 24923326 PMCID: PMC4080426 DOI: 10.1093/gbe/evu124] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/05/2014] [Indexed: 12/24/2022] Open
Abstract
Gene expression is largely regulated by DNA methylation, transcription factor (TF), and microRNA (miRNA) before, during, and after transcription, respectively. Although the evolutionary effects of TF/miRNA regulations have been widely studied, evolutionary analysis of simultaneously accounting for DNA methylation, TF, and miRNA regulations and whether promoter methylation and gene body (coding regions) methylation have different effects on the rate of gene evolution remain uninvestigated. Here, we compared human-macaque and human-mouse protein evolutionary rates against experimentally determined single base-resolution DNA methylation data, revealing that promoter methylation level is positively correlated with protein evolutionary rates but negatively correlated with TF/miRNA regulations, whereas the opposite was observed for gene body methylation level. Our results showed that the relative importance of these regulatory factors in determining the rate of mammalian protein evolution is as follows: Promoter methylation ≈ miRNA regulation > gene body methylation > TF regulation, and further indicated that promoter methylation and miRNA regulation have a significant dependent effect on protein evolutionary rates. Although the mechanisms underlying cooperation between DNA methylation and TFs/miRNAs in gene regulation remain unclear, our study helps to not only illuminate the impact of these regulatory factors on mammalian protein evolution but also their intricate interaction within gene regulatory networks.
Collapse
Affiliation(s)
- Trees-Juen Chuang
- Division of Physical & Computational Genomics, Genomics Research Center, Academia Sinica, Taipei, Taiwan
| | - Tai-Wei Chiang
- Division of Physical & Computational Genomics, Genomics Research Center, Academia Sinica, Taipei, Taiwan
| |
Collapse
|
22
|
Yan W, Sun M, Hu G, Zhou J, Zhang W, Chen J, Chen B, Shen B. Amino acid contact energy networks impact protein structure and evolution. J Theor Biol 2014; 355:95-104. [PMID: 24703984 DOI: 10.1016/j.jtbi.2014.03.032] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2014] [Accepted: 03/21/2014] [Indexed: 01/13/2023]
Abstract
One of the most challenging tasks in structural proteomics is to understand the relationship between protein structure, biological function, and evolution. An understanding of amino acid networks based on protein topology has an important role in the study of this relationship; however, the relationship between network parameters underlying protein topology with structural properties or evolutionary rate is still unknown. To investigate this further, we modeled the three dimensional structure of proteins as amino acid contact energy networks (AACENs) with nodes represented as amino acid residues and edges established according to environment-dependent residue-residue contact energies. Five other types of networks were also constructed to investigate their topological parameters and compare their effect on protein structure and evolution: (1) a random contact network (RCN), (2) a rewiring network with the same degree of distribution as AACEN (RNDD), (3) long-range contact energy networks with and without the backbone connectivity (LCEN_BBs and LCENs), and (4) short range contact energy networks (SCENs). The results indicated that the long-range link percentage and the network clustering coefficient showed a significantly positive and negative correlation, respectively, with protein secondary structure density. In addition, the long-range link percentage and network diameter had a significantly positive and negative correlation, respectively, with evolutionary rate. According to our knowledge, this is the first study to identify the potential role of long-range links and network diameter in protein evolution.
Collapse
Affiliation(s)
- Wenying Yan
- Center for Systems Biology, Soochow University, No. 1, Shizi Street, Suzhou, Jiangsu 215006, China
| | - Maomin Sun
- Center for Systems Biology, Soochow University, No. 1, Shizi Street, Suzhou, Jiangsu 215006, China; Laboratory Animal Research Center, School of Medical, Soochow University, China
| | - Guang Hu
- Center for Systems Biology, Soochow University, No. 1, Shizi Street, Suzhou, Jiangsu 215006, China
| | - Jianhong Zhou
- Center for Systems Biology, Soochow University, No. 1, Shizi Street, Suzhou, Jiangsu 215006, China
| | - Wenyu Zhang
- Center for Systems Biology, Soochow University, No. 1, Shizi Street, Suzhou, Jiangsu 215006, China
| | - Jiajia Chen
- Center for Systems Biology, Soochow University, No. 1, Shizi Street, Suzhou, Jiangsu 215006, China; Department of Chemistry and Biological Engineering, Suzhou University of Science and Technology, Jiangsu, Suzhou 215011, China
| | - Biao Chen
- Center for Systems Biology, Soochow University, No. 1, Shizi Street, Suzhou, Jiangsu 215006, China
| | - Bairong Shen
- Center for Systems Biology, Soochow University, No. 1, Shizi Street, Suzhou, Jiangsu 215006, China.
| |
Collapse
|
23
|
Chen YC, Cheng JH, Tsai ZTY, Tsai HK, Chuang TJ. The impact of trans-regulation on the evolutionary rates of metazoan proteins. Nucleic Acids Res 2013; 41:6371-80. [PMID: 23658220 PMCID: PMC3711421 DOI: 10.1093/nar/gkt349] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2013] [Revised: 04/10/2013] [Accepted: 04/14/2013] [Indexed: 11/13/2022] Open
Abstract
Transcription factor (TF) and microRNA (miRNA) are two crucial trans-regulatory factors that coordinately control gene expression. Understanding the impacts of these two factors on the rate of protein sequence evolution is of great importance in evolutionary biology. While many biological factors associated with evolutionary rate variations have been studied, evolutionary analysis of simultaneously accounting for TF and miRNA regulations across metazoans is still uninvestigated. Here, we provide a series of statistical analyses to assess the influences of TF and miRNA regulations on evolutionary rates across metazoans (human, mouse and fruit fly). Our results reveal that the negative correlations between trans-regulation and evolutionary rates hold well across metazoans, but the strength of TF regulation as a rate indicator becomes weak when the other confounding factors that may affect evolutionary rates are controlled. We show that miRNA regulation tends to be a more essential indicator of evolutionary rates than TF regulation, and the combination of TF and miRNA regulations has a significant dependent effect on protein evolutionary rates. We also show that trans-regulation (especially miRNA regulation) is much more important in human/mouse than in fruit fly in determining protein evolutionary rates, suggesting a considerable variation in rate determinants between vertebrates and invertebrates.
Collapse
Affiliation(s)
- Yi-Ching Chen
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan and Genomic Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Jen-Hao Cheng
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan and Genomic Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Zing Tsung-Yeh Tsai
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan and Genomic Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Huai-Kuang Tsai
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan and Genomic Research Center, Academia Sinica, Taipei 115, Taiwan
| | - Trees-Juen Chuang
- Institute of Information Science, Academia Sinica, Taipei 115, Taiwan, Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan and Genomic Research Center, Academia Sinica, Taipei 115, Taiwan
| |
Collapse
|
24
|
Choi SS, Hannenhalli S. Three independent determinants of protein evolutionary rate. J Mol Evol 2013; 76:98-111. [PMID: 23400388 DOI: 10.1007/s00239-013-9543-6] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2012] [Accepted: 01/16/2013] [Indexed: 12/15/2022]
Abstract
One of the most widely accepted ideas related to the evolutionary rates of proteins is that functionally important residues or regions evolve slower than other regions, a reasonable outcome of which should be a slower evolutionary rate of the proteins with a higher density of functionally important sites. Oddly, the role of functional importance, mainly measured by essentiality, in determining evolutionary rate has been challenged in recent studies. Several variables other than protein essentiality, such as expression level, gene compactness, protein-protein interactions, etc., have been suggested to affect protein evolutionary rate. In the present review, we try to refine the concept of functional importance of a gene, and consider three factors-functional importance, expression level, and gene compactness, as independent determinants of evolutionary rate of a protein, based not only on their known correlation with evolutionary rate but also on a reasonable mechanistic model. We suggest a framework based on these mechanistic models to correctly interpret the correlations between evolutionary rates and the various variables as well as the interrelationships among the variables.
Collapse
Affiliation(s)
- Sun Shim Choi
- Department of Medical Biotechnology, College of Biomedical Science, and Institute of Bioscience & Biotechnology, Kangwon National University, Chuncheon, South Korea.
| | | |
Collapse
|
25
|
Independent effects of protein core size and expression on residue-level structure-evolution relationships. PLoS One 2012; 7:e46602. [PMID: 23056364 PMCID: PMC3463513 DOI: 10.1371/journal.pone.0046602] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2012] [Accepted: 09/03/2012] [Indexed: 11/25/2022] Open
Abstract
Recently, we demonstrated that yeast protein evolutionary rate at the level of individual amino acid residues scales linearly with degree of solvent accessibility. This residue-level structure-evolution relationship is sensitive to protein core size: surface residues from large-core proteins evolve much faster than those from small-core proteins, while buried residues are equally constrained independent of protein core size. In this work, we investigate the joint effects of protein core size and expression on the residue-level structure-evolution relationship. At the whole-protein level, protein expression is a much more dominant determinant of protein evolutionary rate than protein core size. In contrast, at the residue level, protein core size and expression both have major impacts on protein structure-evolution relationships. In addition, protein core size and expression influence residue-level structure-evolution relationships in qualitatively different ways. Protein core size preferentially affects the non-synonymous substitution rates of surface residues compared to buried residues, and has little influence on synonymous substitution rates. In comparison, protein expression uniformly affects all residues independent of degree of solvent accessibility, and affects both non-synonymous and synonymous substitution rates. Protein core size and expression exert largely independent effects on protein evolution at the residue level, and can combine to produce dramatic changes in the slope of the linear relationship between residue evolutionary rate and solvent accessibility. Our residue-level findings demonstrate that protein core size and expression are both important, yet qualitatively different, determinants of protein evolution. These results underscore the complementary nature of residue-level and whole-protein analysis of protein evolution.
Collapse
|
26
|
Scherrer MP, Meyer AG, Wilke CO. Modeling coding-sequence evolution within the context of residue solvent accessibility. BMC Evol Biol 2012; 12:179. [PMID: 22967129 PMCID: PMC3527230 DOI: 10.1186/1471-2148-12-179] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2012] [Accepted: 09/03/2012] [Indexed: 11/30/2022] Open
Abstract
Background Protein structure mediates site-specific patterns of sequence divergence. In particular, residues in the core of a protein (solvent-inaccessible residues) tend to be more evolutionarily conserved than residues on the surface (solvent-accessible residues). Results Here, we present a model of sequence evolution that explicitly accounts for the relative solvent accessibility of each residue in a protein. Our model is a variant of the Goldman-Yang 1994 (GY94) model in which all model parameters can be functions of the relative solvent accessibility (RSA) of a residue. We apply this model to a data set comprised of nearly 600 yeast genes, and find that an evolutionary-rate ratio ω that varies linearly with RSA provides a better model fit than an RSA-independent ω or an ω that is estimated separately in individual RSA bins. We further show that the branch length t and the transition-transverion ratio κ also vary with RSA. The RSA-dependent GY94 model performs better than an RSA-dependent Muse-Gaut 1994 (MG94) model in which the synonymous and non-synonymous rates individually are linear functions of RSA. Finally, protein core size affects the slope of the linear relationship between ω and RSA, and gene expression level affects both the intercept and the slope. Conclusions Structure-aware models of sequence evolution provide a significantly better fit than traditional models that neglect structure. The linear relationship between ω and RSA implies that genes are better characterized by their ω slope and intercept than by just their mean ω.
Collapse
Affiliation(s)
- Michael P Scherrer
- Center for Computational Biology and Bioinformatics, Institute for Cellular and Molecular Biology, and Section of Integrative Biology, The University of Texas at Austin, Austin, TX 78712, USA
| | | | | |
Collapse
|
27
|
Toll-Riera M, Bostick D, Albà MM, Plotkin JB. Structure and age jointly influence rates of protein evolution. PLoS Comput Biol 2012; 8:e1002542. [PMID: 22693443 PMCID: PMC3364943 DOI: 10.1371/journal.pcbi.1002542] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2012] [Accepted: 04/17/2012] [Indexed: 12/01/2022] Open
Abstract
What factors determine a protein's rate of evolution are actively debated. Especially unclear is the relative role of intrinsic factors of present-day proteins versus historical factors such as protein age. Here we study the interplay of structural properties and evolutionary age, as determinants of protein evolutionary rate. We use a large set of one-to-one orthologs between human and mouse proteins, with mapped PDB structures. We report that previously observed structural correlations also hold within each age group – including relationships between solvent accessibility, designabililty, and evolutionary rates. However, age also plays a crucial role: age modulates the relationship between solvent accessibility and rate. Additionally, younger proteins, despite being less designable, tend to evolve faster than older proteins. We show that previously reported relationships between age and rate cannot be explained by structural biases among age groups. Finally, we introduce a knowledge-based potential function to study the stability of proteins through large-scale computation. We find that older proteins are more stable for their native structure, and more robust to mutations, than younger ones. Our results underscore that several determinants, both intrinsic and historical, can interact to determine rates of protein evolution. Rates of protein evolution vary dramatically within and between organisms. But the factors that determine a protein's evolutionary rate are still under debate, despite extensive studies over the past decade. Several determinants have been proposed, for example gene expression, the importance of the gene for the organism, the number of physical or genetic interactions it has, its structural characteristics, or when it originated. Here we study how age and structural characteristics interact with one another to influence evolutionary rates. We use a set of one-to-one orthologs of human and mouse proteins, with known crystal structures. We find that these two determinants interact: for example, the age of protein modulates how its structure correlates with evolutionary rate. Nonetheless, the influence of age on evolutionary rate cannot be explained by its interplay with structure.
Collapse
Affiliation(s)
- Macarena Toll-Riera
- Evolutionary Genomics Group, Fundació Institut Municipal d'Investigació Mèdica (FIMIM)- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - David Bostick
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
| | - M. Mar Albà
- Evolutionary Genomics Group, Fundació Institut Municipal d'Investigació Mèdica (FIMIM)- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Catalan Institution for Research and Advanced Studies (ICREA), Barcelona, Spain
- * E-mail: (MMA); (JBP)
| | - Joshua B. Plotkin
- Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America
- * E-mail: (MMA); (JBP)
| |
Collapse
|
28
|
Chen FC, Liao BY, Pan CL, Lin HY, Chang AYF. Assessing determinants of exonic evolutionary rates in mammals. Mol Biol Evol 2012; 29:3121-9. [PMID: 22504521 DOI: 10.1093/molbev/mss116] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
From studies investigating the differences in evolutionary rates between genes, gene compactness and gene expression level have been identified as important determinants of gene-level protein evolutionary rate, as represented by nonsynonymous to synonymous substitution rate (d(N)/d(S)) ratio. However, the causes of exon-level variances in d(N)/d(S) are less understood. Here, we use principal component regression to examine to what extent 13 exon features explain the variance in d(N), d(S), and the d(N)/d(S) ratio of human-rhesus macaque or human-mouse orthologous exons. The exon features were grouped into six functional categories: expression features, mRNA splicing features, structural-functional features, compactness features, exon duplicability, and other features, including G + C content and exon length. Although expression features are important for determining d(N) and d(N)/d(S) between exons of different genes, structural-functional features and splicing features explained more of the variance for exons of the same genes. Furthermore, we show that compactness features can explain only a relatively small percentage of variance in exon-level d(N) or d(N)/d(S) in either between-gene or within-gene comparison. By contrast, d(S) yielded inconsistent results in the human-mouse comparison and the human-rhesus macaque comparison. This inconsistency may suggest rapid evolutionary changes of the mutation landscape in mammals. Our results suggest that between-gene and within-gene variation in d(N)/d(S) (and d(N)) are driven by different evolutionary forces and that the role of mRNA splicing in causing the variation in evolutionary rates of coding sequences may be underappreciated.
Collapse
Affiliation(s)
- Feng-Chi Chen
- Division of Biostatistics and Bioinformatics, Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County, Taiwan, Republic of China.
| | | | | | | | | |
Collapse
|
29
|
Comparative analysis of the structural and expressional parameters of microRNA target genes. Gene 2012; 497:103-9. [PMID: 22305979 DOI: 10.1016/j.gene.2012.01.033] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2012] [Accepted: 01/18/2012] [Indexed: 02/02/2023]
Abstract
MicroRNAs (miRNAs) generally pair with the 3'UTRs of their target mRNAs to repress gene expression. It has reported that miRNA targets (TGs) are longer and evolve more slowly than non-targets (NTGs). We confirmed the observation and also found novel structural and expressional characteristics of TGs. The length difference between TGs and NTGs was greatest for the 3'UTRs, although a difference was also observed for CDSs and introns. Widely expressed genes were shorter for both TGs and NTGs; however, TGs were significantly longer than NTGs in all ranges of expression. TGs were more likely than NTGs to be widely expressed, which might explain why TGs evolve more slowly than NTGs. Finally, we found that TG mRNAs have faster decay rates. In addition, the decay rate of a TG mRNA transcript was found to be positively correlated with the number or density of target sites located in that TG's mRNA transcript.
Collapse
|
30
|
The relationship between relative solvent accessibility and evolutionary rate in protein evolution. Genetics 2011; 188:479-88. [PMID: 21467571 DOI: 10.1534/genetics.111.128025] [Citation(s) in RCA: 87] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Recent work with Saccharomyces cerevisiae shows a linear relationship between the evolutionary rate of sites and the relative solvent accessibility (RSA) of the corresponding residues in the folded protein. Here, we aim to develop a mathematical model that can reproduce this linear relationship. We first demonstrate that two models that both seem reasonable choices (a simple model in which selection strength correlates with RSA and a more complex model based on RSA-dependent amino acid distributions) fail to reproduce the observed relationship. We then develop a model on the basis of observed site-specific amino acid distributions and show that this model behaves appropriately. We conclude that evolutionary rates are directly linked to the distribution of amino acids at individual sites. Because of this link, any future insight into the biophysical mechanisms that determine amino acid distributions will improve our understanding of evolutionary rates.
Collapse
|
31
|
Levy ED. A Simple Definition of Structural Regions in Proteins and Its Use in Analyzing Interface Evolution. J Mol Biol 2010; 403:660-70. [DOI: 10.1016/j.jmb.2010.09.028] [Citation(s) in RCA: 133] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2010] [Revised: 08/19/2010] [Accepted: 09/13/2010] [Indexed: 10/19/2022]
|
32
|
Wang D, Qiu C, Zhang H, Wang J, Cui Q, Yin Y. Human microRNA oncogenes and tumor suppressors show significantly different biological patterns: from functions to targets. PLoS One 2010; 5. [PMID: 20927335 PMCID: PMC2948010 DOI: 10.1371/journal.pone.0013067] [Citation(s) in RCA: 136] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2010] [Accepted: 09/08/2010] [Indexed: 12/21/2022] Open
Abstract
MicroRNAs (miRNAs) are small noncoding RNAs which play essential roles in many important biological processes. Therefore, their dysfunction is associated with a variety of human diseases, including cancer. Increasing evidence shows that miRNAs can act as oncogenes or tumor suppressors, and although there is great interest in research into these cancer-associated miRNAs, little is known about them. In this study, we performed a comprehensive analysis of putative human miRNA oncogenes and tumor suppressors. We found that miRNA oncogenes and tumor suppressors clearly show different patterns in function, evolutionary rate, expression, chromosome distribution, molecule size, free energy, transcription factors, and targets. For example, miRNA oncogenes are located mainly in the amplified regions in human cancers, whereas miRNA tumor suppressors are located mainly in the deleted regions. miRNA oncogenes tend to cleave target mRNAs more frequently than miRNA tumor suppressors. These results indicate that these two types of cancer-associated miRNAs play different roles in cancer formation and development. Moreover, the patterns identified here can discriminate novel miRNA oncogenes and tumor suppressors with a high degree of accuracy. This study represents the first large-scale bioinformatic analysis of human miRNA oncogenes and tumor suppressors. Our findings provide help for not only understanding of miRNAs in cancer but also for the specific identification of novel miRNAs as miRNA oncogenes and tumor suppressors. In addition, the data presented in this study will be valuable for the study of both miRNAs and cancer.
Collapse
Affiliation(s)
- Dong Wang
- Department of Biomedical Informatics, Peking University Health Science Center, Beijing, China
- Institute of Systems Biomedicine, Peking University Health Science Center, Beijing, China
| | - Chengxiang Qiu
- Department of Biomedical Informatics, Peking University Health Science Center, Beijing, China
- Institute of Systems Biomedicine, Peking University Health Science Center, Beijing, China
| | - Haijun Zhang
- Department of Biomedical Informatics, Peking University Health Science Center, Beijing, China
- Institute of Systems Biomedicine, Peking University Health Science Center, Beijing, China
| | - Juan Wang
- Department of Biomedical Informatics, Peking University Health Science Center, Beijing, China
- Institute of Systems Biomedicine, Peking University Health Science Center, Beijing, China
| | - Qinghua Cui
- Department of Biomedical Informatics, Peking University Health Science Center, Beijing, China
- Institute of Systems Biomedicine, Peking University Health Science Center, Beijing, China
- * E-mail: (QC); (YY)
| | - Yuxin Yin
- Institute of Systems Biomedicine, Peking University Health Science Center, Beijing, China
- Department of Pathology, Peking University Health Science Center, Beijing, China
- * E-mail: (QC); (YY)
| |
Collapse
|
33
|
Wolf YI, Gopich IV, Lipman DJ, Koonin EV. Relative contributions of intrinsic structural-functional constraints and translation rate to the evolution of protein-coding genes. Genome Biol Evol 2010; 2:190-9. [PMID: 20624725 PMCID: PMC2940324 DOI: 10.1093/gbe/evq010] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
A long-standing assumption in evolutionary biology is that the evolution rate of protein-coding genes depends, largely, on specific constraints that affect the function of the given protein. However, recent research in evolutionary systems biology revealed unexpected, significant correlations between evolution rate and characteristics of genes or proteins that are not directly related to specific protein functions, such as expression level and protein–protein interactions. The strongest connections were consistently detected between protein sequence evolution rate and the expression level of the respective gene. A recent genome-wide proteomic study revealed an extremely strong correlation between the abundances of orthologous proteins in distantly related animals, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster. We used the extensive protein abundance data from this study along with short-term evolutionary rates (ERs) of orthologous genes in nematodes and flies to estimate the relative contributions of structural–functional constraints and the translation rate to the evolution rate of protein-coding genes. Together the intrinsic constraints and translation rate account for approximately 50% of the variance of the ERs. The contribution of constraints is estimated to be 3- to 5-fold greater than the contribution of translation rate.
Collapse
Affiliation(s)
- Yuri I Wolf
- National Center for Biotechnology Information, National Library of Medicine, Bethesda, Maryland, USA
| | | | | | | |
Collapse
|
34
|
Begum T, Ghosh TC. Understanding the Effect of Secondary Structures and Aggregation on Human Protein Folding Class Evolution. J Mol Evol 2010; 71:60-9. [DOI: 10.1007/s00239-010-9364-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2009] [Accepted: 06/23/2010] [Indexed: 12/01/2022]
|
35
|
Qiu C, Wang J, Yao P, Wang E, Cui Q. microRNA evolution in a human transcription factor and microRNA regulatory network. BMC SYSTEMS BIOLOGY 2010; 4:90. [PMID: 20584335 PMCID: PMC2914650 DOI: 10.1186/1752-0509-4-90] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/14/2010] [Accepted: 06/29/2010] [Indexed: 02/08/2023]
Abstract
BACKGROUND microRNAs (miRNAs) are important cellular components. The understanding of their evolution is of critical importance for the understanding of their function. Although some specific evolutionary rules of miRNAs have been revealed, the rules of miRNA evolution in cellular networks remain largely unexplored. According to knowledge from protein-coding genes, the investigations of gene evolution in the context of biological networks often generate valuable observations that cannot be obtained by traditional approaches. RESULTS Here, we conducted the first systems-level analysis of miRNA evolution in a human transcription factor (TF)-miRNA regulatory network that describes the regulatory relations among TFs, miRNAs, and target genes. We found that the architectural structure of the network provides constraints and functional innovations for miRNA evolution and that miRNAs showed different and even opposite evolutionary patterns from TFs and other protein-coding genes. For example, miRNAs preferentially coevolved with their activators but not with their inhibitors. During transcription, rapidly evolving TFs frequently activated but rarely repressed miRNAs. In addition, conserved miRNAs tended to regulate rapidly evolving targets, and upstream miRNAs evolved more rapidly than downstream miRNAs. CONCLUSIONS In this study, we performed the first systems level analysis of miRNA evolution. The findings suggest that miRNAs have a unique evolution process and thus may have unique functions and roles in various biological processes and diseases. Additionally, the network presented here is the first TF-miRNA regulatory network, which will be a valuable platform of systems biology.
Collapse
Affiliation(s)
- Chengxiang Qiu
- Department of Biomedical Informatics, Peking University Health Science Center, Beijing, China
| | | | | | | | | |
Collapse
|
36
|
Wilke CO, Drummond DA. Signatures of protein biophysics in coding sequence evolution. Curr Opin Struct Biol 2010; 20:385-9. [PMID: 20395125 DOI: 10.1016/j.sbi.2010.03.004] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2010] [Accepted: 03/22/2010] [Indexed: 10/19/2022]
Abstract
Since the early days of molecular evolution, the conventional wisdom has been that the evolution of protein-coding genes is primarily determined by functional constraints. Yet recent evidence indicates that the evolution of these genes is strongly shaped by the biophysical processes of protein synthesis, protein folding, and specific as well as nonspecific protein-protein interactions. Selection pressures related to these biophysical processes affect primarily the amino-acid sequence of genes, but they also leave their mark on synonymous sites at the nucleotide level. While evidence for specific selection pressures related to protein biophysics is strong, there is currently no unifying framework that integrates the various selection pressures on coding sequences and disentangles their relative importance.
Collapse
Affiliation(s)
- Claus O Wilke
- Center for Computational Biology and Bioinformatics, Institute for Cell and Molecular Biology, and Section of Integrative Biology, The University of Texas at Austin, Austin, TX, USA.
| | | |
Collapse
|
37
|
Morgan CC, Loughran NB, Walsh TA, Harrison AJ, O'Connell MJ. Positive selection neighboring functionally essential sites and disease-implicated regions of mammalian reproductive proteins. BMC Evol Biol 2010; 10:39. [PMID: 20149245 PMCID: PMC2830953 DOI: 10.1186/1471-2148-10-39] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2009] [Accepted: 02/11/2010] [Indexed: 11/16/2022] Open
Abstract
Background Reproductive proteins are central to the continuation of all mammalian species. The evolution of these proteins has been greatly influenced by environmental pressures induced by pathogens, rival sperm, sexual selection and sexual conflict. Positive selection has been demonstrated in many of these proteins with particular focus on primate lineages. However, the mammalia are a diverse group in terms of mating habits, population sizes and germ line generation times. We have examined the selective pressures at work on a number of novel reproductive proteins across a wide variety of mammalia. Results We show that selective pressures on reproductive proteins are highly varied. Of the 10 genes analyzed in detail, all contain signatures of positive selection either across specific sites or in specific lineages or a combination of both. Our analysis of SP56 and Col1a1 are entirely novel and the results show positively selected sites present in each gene. Our findings for the Col1a1 gene are suggestive of a link between positive selection and severe disease type. We find evidence in our dataset to suggest that interacting proteins are evolving in symphony: most likely to maintain interacting functionality. Conclusion Our in silico analyses show positively selected sites are occurring near catalytically important regions suggesting selective pressure to maximize efficient fertilization. In those cases where a mechanism of protein function is not fully understood, the sites presented here represent ideal candidates for mutational study. This work has highlighted the widespread rate heterogeneity in mutational rates across the mammalia and specifically has shown that the evolution of reproductive proteins is highly varied depending on the species and interacting partners. We have shown that positive selection and disease are closely linked in the Col1a1 gene.
Collapse
Affiliation(s)
- Claire C Morgan
- Bioinformatics and Molecular Evolution Group, School of Biotechnology, Dublin City University, Glasnevin, Dublin 9, Ireland
| | | | | | | | | |
Collapse
|
38
|
Drummond DA, Wilke CO. The evolutionary consequences of erroneous protein synthesis. Nat Rev Genet 2009; 10:715-24. [PMID: 19763154 DOI: 10.1038/nrg2662] [Citation(s) in RCA: 377] [Impact Index Per Article: 23.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
Errors in protein synthesis disrupt cellular fitness, cause disease phenotypes and shape gene and genome evolution. Experimental and theoretical results on this topic have accumulated rapidly in disparate fields, such as neurobiology, protein biosynthesis and degradation and molecular evolution, but with limited communication among disciplines. Here, we review studies of error frequencies, the cellular and organismal consequences of errors and the attendant long-range evolutionary responses to errors. We emphasize major areas in which little is known, such as the failure rates of protein folding, in addition to areas in which technological innovations may enable imminent gains, such as the elucidation of translational missense error frequencies. Evolutionary responses to errors fall into two broad categories: adaptations that minimize errors and their attendant costs and adaptations that exploit errors for the organism's benefit.
Collapse
Affiliation(s)
- D Allan Drummond
- FAS Center for Systems Biology, Harvard University, 52 Oxford Street, Cambridge, MA 02138, USA
| | | |
Collapse
|
39
|
Carter R, Drouin G. The evolutionary rates of eukaryotic RNA polymerases and of their transcription factors are affected by the level of concerted evolution of the genes they transcribe. Mol Biol Evol 2009; 26:2515-20. [PMID: 19633229 DOI: 10.1093/molbev/msp164] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
A defining characteristic of all eukaryotes is the presence of three RNA polymerases, each of which transcribes a particular subset of nuclear genes. RNA polymerase I transcribes rRNA genes; RNA polymerase II transcribes mRNA, miRNA, snRNA, and snoRNA genes; and RNA polymerase III transcribes 5S rRNA and tRNA genes. Here, we use the sequences of up to 25 Ascomycete species to show that the type of genes transcribed by each RNA polymerase affects their evolutionary rates and those of their transcription factors (TFs). The RNA polymerase subunits and TFs of genes whose promoters experience higher levels of concerted evolution evolve significantly faster than those experiencing lower levels of concerted evolution. The rates of evolution of RNA polymerase genes and their TFs are therefore not only the result of diverse selective constraints but are also influenced by the level of concerted evolution of the genes they transcribe.
Collapse
Affiliation(s)
- Robert Carter
- Département de Biologie et Centre de Recherche Avancée en Génomique Environnementale, Université d'Ottawa, Ottawa, Ontario, Canada
| | | |
Collapse
|
40
|
Franzosa EA, Xia Y. Structural determinants of protein evolution are context-sensitive at the residue level. Mol Biol Evol 2009; 26:2387-95. [PMID: 19597162 DOI: 10.1093/molbev/msp146] [Citation(s) in RCA: 142] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Structural properties of a protein residue's microenvironment have long been implicated as agents of selective constraint. Although these properties are inherently quantitative, structure-based studies of protein evolution tend to rely upon coarse distinctions between "surface" and "buried" residues and between "interfacial" and "noninterfacial" residues. Using homology-mapped yeast protein structures, we explore the relationships between residue evolution and continuous structural properties of the residue microenvironment, including solvent accessibility, density and distribution of residue-residue contacts, and burial depth. We confirm the role of solvent exposure as a major structural determinant of residue evolution and also identify a weak secondary effect arising from packing density. The relationship between solvent exposure and evolutionary rate (d(N)/d(S)) is found to be strong, positive, and linear. This reinforces the notion that residue burial is a continuous property with quantitative fitness implications. Next, we demonstrate systematic variation in residue-level structure-evolution relationships resulting from changes in global physical and biological contexts. We find that increasing protein-core size yields a more rapid relaxation of selective constraint as solvent exposure increases, although solvent-excluded residues remain similarly constrained. Finally, we analyze the selective constraint in protein-protein interfaces, revealing two fundamentally different yet separable components: continuous structural constraint that scales with total residue burial and a more surprising fixed functional constraint that accompanies any degree of interface involvement. These discoveries serve to elucidate and unite structure-evolution relationships at the residue and whole-protein levels.
Collapse
|
41
|
Singh ND, Arndt PF, Clark AG, Aquadro CF. Strong evidence for lineage and sequence specificity of substitution rates and patterns in Drosophila. Mol Biol Evol 2009; 26:1591-605. [PMID: 19351792 DOI: 10.1093/molbev/msp071] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
Rates of single nucleotide substitution in Drosophila are highly variable within the genome, and several examples illustrate that evolutionary rates differ among Drosophila species as well. Here, we use a maximum likelihood method to quantify lineage-specific substitutional patterns and apply this method to 4-fold degenerate synonymous sites and introns from more than 8,000 genes aligned in the Drosophila melanogaster group. We find that within species, different classes of sequence evolve at different rates, with long introns evolving most slowly and short introns evolving most rapidly. Relative rates of individual single nucleotide substitutions vary approximately 3-fold among lineages, yielding patterns of substitution that are comparatively less GC-biased in the melanogaster species complex relative to Drosophila yakuba and Drosophila erecta. These results are consistent with a model coupling a mutational shift toward reduced GC content, or a shift in mutation-selection balance, in the D. melanogaster species complex, with variation in selective constraint among different classes of DNA sequence. Finally, base composition of coding and intronic sequences is not at equilibrium with respect to substitutional patterns, which primarily reflects the slow rate of the substitutional process. These results thus support the view that mutational and/or selective processes are labile on an evolutionary timescale and that if the process is indeed selection driven, then the distribution of selective constraint is variable across the genome.
Collapse
Affiliation(s)
- Nadia D Singh
- Department of Molecular Biology and Genetics, Cornell University.
| | | | | | | |
Collapse
|
42
|
Abstract
Contemporary protein architectures can be regarded as molecular fossils, historical imprints that mark important milestones in the history of life. Whereas sequences change at a considerable pace, higher-order structures are constrained by the energetic landscape of protein folding, the exploration of sequence and structure space, and complex interactions mediated by the proteostasis and proteolytic machineries of the cell. The survey of architectures in the living world that was fuelled by recent structural genomic initiatives has been summarized in protein classification schemes, and the overall structure of fold space explored with novel bioinformatic approaches. However, metrics of general structural comparison have not yet unified architectural complexity using the 'shared and derived' tenet of evolutionary analysis. In contrast, a shift of focus from molecules to proteomes and a census of protein structure in fully sequenced genomes were able to uncover global evolutionary patterns in the structure of proteins. Timelines of discovery of architectures and functions unfolded episodes of specialization, reductive evolutionary tendencies of architectural repertoires in proteomes and the rise of modularity in the protein world. They revealed a biologically complex ancestral proteome and the early origin of the archaeal lineage. Studies also identified an origin of the protein world in enzymes of nucleotide metabolism harbouring the P-loop-containing triphosphate hydrolase fold and the explosive discovery of metabolic functions that recapitulated well-defined prebiotic shells and involved the recruitment of structures and functions. These observations have important implications for origins of modern biochemistry and diversification of life.
Collapse
|
43
|
Kahali B, Ahmad S, Ghosh TC. Exploring the evolutionary rate differences of party hub and date hub proteins in Saccharomyces cerevisiae protein-protein interaction network. Gene 2008; 429:18-22. [PMID: 18973798 DOI: 10.1016/j.gene.2008.09.032] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2008] [Revised: 09/25/2008] [Accepted: 09/25/2008] [Indexed: 01/05/2023]
Abstract
Evolutionary rates of party hub and date hub proteins of Saccharomyces cerevisiae are analyzed under the perspective of ordered/disordered ness of proteins and the three dimensional structural context such as the solvent accessibility of the amino acid residues. Our results suggest that the lowering of evolutionary rate of the party hub proteins than the date hub proteins is solely contributed by the ordered regions of the corresponding proteins. Moreover the slower evolutionary rate of the party hub proteins than the date hub counterparts can be attributed to the presence of buried amino acid residues. Thus, our work endeavors further into the understanding of the evolutionary rate differences of the two different types of hub proteins of S. cerevisiae.
Collapse
Affiliation(s)
- Bratati Kahali
- Bioinformatics Centre, Bose Institute, P 1/12, C.I.T. Scheme VII M, Kolkata 700 054, India
| | | | | |
Collapse
|
44
|
Wolf MY, Wolf YI, Koonin EV. Comparable contributions of structural-functional constraints and expression level to the rate of protein sequence evolution. Biol Direct 2008; 3:40. [PMID: 18840284 PMCID: PMC2572155 DOI: 10.1186/1745-6150-3-40] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2008] [Accepted: 10/07/2008] [Indexed: 01/01/2023] Open
Abstract
Background Proteins show a broad range of evolutionary rates. Understanding the factors that are responsible for the characteristic rate of evolution of a given protein arguably is one of the major goals of evolutionary biology. A long-standing general assumption used to be that the evolution rate is, primarily, determined by the specific functional constraints that affect the given protein. These constrains were traditionally thought to depend both on the specific features of the protein's structure and its biological role. The advent of systems biology brought about new types of data, such as expression level and protein-protein interactions, and unexpectedly, a variety of correlations between protein evolution rate and these variables have been observed. The strongest connections by far were repeatedly seen between protein sequence evolution rate and the expression level of the respective gene. It has been hypothesized that this link is due to the selection for the robustness of the protein structure to mistranslation-induced misfolding that is particularly important for highly expressed proteins and is the dominant determinant of the sequence evolution rate. Results This work is an attempt to assess the relative contributions of protein domain structure and function, on the one hand, and expression level on the other hand, to the rate of sequence evolution. To this end, we performed a genome-wide analysis of the effect of the fusion of a pair of domains in multidomain proteins on the difference in the domain-specific evolutionary rates. The mistranslation-induced misfolding hypothesis would predict that, within multidomain proteins, fused domains, on average, should evolve at substantially closer rates than the same domains in different proteins because, within a mutlidomain protein, all domains are translated at the same rate. We performed a comprehensive comparison of the evolutionary rates of mammalian and plant protein domains that are either joined in multidomain proteins or contained in distinct proteins. Substantial homogenization of evolutionary rates in multidomain proteins was, indeed, observed in both animals and plants, although highly significant differences between domain-specific rates remained. The contributions of the translation rate, as determined by the effect of the fusion of a pair of domains within a multidomain protein, and intrinsic, domain-specific structural-functional constraints appear to be comparable in magnitude. Conclusion Fusion of domains in a multidomain protein results in substantial homogenization of the domain-specific evolutionary rates but significant differences between domain-specific evolution rates remain. Thus, the rate of translation and intrinsic structural-functional constraints both exert sizable and comparable effects on sequence evolution. Reviewers This article was reviewed by Sergei Maslov, Dennis Vitkup, Claus Wilke (nominated by Orly Alter), and Allan Drummond (nominated by Joel Bader). For the full reviews, please go to the Reviewers' Reports section.
Collapse
Affiliation(s)
- Maxim Y Wolf
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|