1
|
Chang KT, Guo J, di Ronza A, Sardiello M. Aminode: Identification of Evolutionary Constraints in the Human Proteome. Sci Rep 2018; 8:1357. [PMID: 29358731 PMCID: PMC5778061 DOI: 10.1038/s41598-018-19744-w] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2017] [Accepted: 01/05/2018] [Indexed: 12/12/2022] Open
Abstract
Evolutionarily constrained regions (ECRs) are a hallmark for sites of critical importance for a protein's structure or function. ECRs can be inferred by comparing the amino acid sequences from multiple protein homologs in the context of the evolutionary relationships that link the analyzed proteins. The compilation and analysis of the datasets required to infer ECRs, however, are time consuming and require skills in coding and bioinformatics, which can limit the use of ECR analysis in the biomedical community. Here, we developed Aminode, a user-friendly webtool for the routine and rapid inference of ECRs. Aminode is pre-loaded with the results of the analysis of the whole human proteome compared with proteomes from 62 additional vertebrate species. Profiles of the relative rates of amino acid substitution and ECR maps of human proteins are available for immediate search and download on the Aminode website. Aminode can also be used for custom analyses of protein families of interest. Interestingly, mapping of known missense variants shows great enrichment of pathogenic variants and depletion of non-pathogenic variants in Aminode-generated ECRs, suggesting that ECR analysis may help evaluate the potential pathogenicity of variants of unknown significance. Aminode is freely available at http://www.aminode.org .
Collapse
Affiliation(s)
- Kevin T Chang
- Department of Molecular and Human Genetics, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA
| | - Junyan Guo
- Department of Molecular and Human Genetics, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA
- Microsoft Corporation, 1 Microsoft Way, Redmond, WA, 98052, USA
| | - Alberto di Ronza
- Department of Molecular and Human Genetics, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA
| | - Marco Sardiello
- Department of Molecular and Human Genetics, Baylor College of Medicine, Jan and Dan Duncan Neurological Research Institute, Texas Children's Hospital, Houston, TX, 77030, USA.
| |
Collapse
|
2
|
Phylogenetic Gaussian process model for the inference of functionally important regions in protein tertiary structures. PLoS Comput Biol 2014; 10:e1003429. [PMID: 24453956 PMCID: PMC3894161 DOI: 10.1371/journal.pcbi.1003429] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2013] [Accepted: 11/22/2013] [Indexed: 11/30/2022] Open
Abstract
A critical question in biology is the identification of functionally important amino acid sites in proteins. Because functionally important sites are under stronger purifying selection, site-specific substitution rates tend to be lower than usual at these sites. A large number of phylogenetic models have been developed to estimate site-specific substitution rates in proteins and the extraordinarily low substitution rates have been used as evidence of function. Most of the existing tools, e.g. Rate4Site, assume that site-specific substitution rates are independent across sites. However, site-specific substitution rates may be strongly correlated in the protein tertiary structure, since functionally important sites tend to be clustered together to form functional patches. We have developed a new model, GP4Rate, which incorporates the Gaussian process model with the standard phylogenetic model to identify slowly evolved regions in protein tertiary structures. GP4Rate uses the Gaussian process to define a nonparametric prior distribution of site-specific substitution rates, which naturally captures the spatial correlation of substitution rates. Simulations suggest that GP4Rate can potentially estimate site-specific substitution rates with a much higher accuracy than Rate4Site and tends to report slowly evolved regions rather than individual sites. In addition, GP4Rate can estimate the strength of the spatial correlation of substitution rates from the data. By applying GP4Rate to a set of mammalian B7-1 genes, we found a highly conserved region which coincides with experimental evidence. GP4Rate may be a useful tool for the in silico prediction of functionally important regions in the proteins with known structures. To understand how a protein functions, a critical step is to know which regions in its protein tertiary structure may be functionally important. Functionally important protein regions are typically more conserved than other regions because mutations in these regions are more likely to be deleterious. A number of phylogenetic models have been developed to identify conserved sites or regions in proteins by comparing protein sequences from multiple species. However, most of these methods treat amino acid sites independently and do not consider the spatial clustering of conserved sites in the protein tertiary structure. Therefore, their power of identifying functional protein regions is limited. We develop a new statistical model, GP4Rate, which combines the information from the protein sequences and the protein tertiary structure to infer conserved regions. We demonstrate that GP4Rate outperforms Rate4Site, the most widely used phylogenetic software for inferring functional amino acid sites, via simulations with a case study of B7-1 genes. GP4Rate is a potentially useful tool for guiding mutagenesis experiments or providing insights on the relationship between protein structures and functions.
Collapse
|
3
|
Bell RE, Ben-Tal N. In silico identification of functional protein interfaces. Comp Funct Genomics 2010; 4:420-3. [PMID: 18629079 PMCID: PMC2447364 DOI: 10.1002/cfg.309] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2003] [Revised: 06/03/2003] [Accepted: 06/03/2003] [Indexed: 12/02/2022] Open
Abstract
Proteins perform many of their biological roles through protein–protein, protein–DNA or protein–ligand interfaces. The identification of the amino acids comprising
these interfaces often enhances our understanding of the biological function of
the proteins. Many methods for the detection of functional interfaces have been developed,
and large-scale analyses have provided assessments of their accuracy. Among
them are those that consider the size of the protein interface, its amino acid composition
and its physicochemical and geometrical properties. Other methods to this
effect use statistical potential functions of pairwise interactions, and evolutionary
information. The rationale of the evolutionary approach is that functional and structural
constraints impose selective pressure; hence, biologically important interfaces
often evolve at a slower pace than do other external regions of the protein. Recently,
an algorithm, Rate4Site, and a web-server, ConSurf (http://consurf.tau.ac.il/), for
the identification of functional interfaces based on the evolutionary relations among
homologous proteins as reflected in phylogenetic trees, were developed in our laboratory.
The explicit use of the tree topology and branch lengths makes the method
remarkably accurate and sensitive. Here we demonstrate its potency in the identification
of the functional interfaces of a hypothetical protein, the structure of which was
determined as part of the international structural genomics effort. Finally, we propose
to combine complementary procedures, in order to enhance the overall performance
of methods for the identification of functional interfaces in proteins.
Collapse
Affiliation(s)
- Rachel E Bell
- Department of Biochemistry, The George S. Wise Faculty of Life Sciences, Tel Aviv University, Ramat Aviv 69978, Israel
| | | |
Collapse
|
4
|
Ertekin A, Nussinov R, Haliloglu T. Association of putative concave protein-binding sites with the fluctuation behavior of residues. Protein Sci 2007; 15:2265-77. [PMID: 17008715 PMCID: PMC2242393 DOI: 10.1110/ps.051815006] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Here, we propose a binding site prediction method based on the high frequency end of the spectrum in the native state of the protein structural dynamics. The spectrum is obtained using an elastic network model (GNM). High frequency vibrating (HFV) residues are determined from the fastest modes dynamics. HFV residue clusters and the associated surface patch residues are tested for their likelihood to locate at the binding interfaces using two different data sets, the Benchmark Set of mainly enzymes and antigen/antibodies and the Cluster Set of more diverse structures. The binding interface is identified to be within 7.5 A of the HFV residue clusters in the Benchmark Set and Cluster Set, for 77% and 70% of the structures, respectively. The success rate increases to 88% and 84%, respectively, by using the surface patches. The results suggest that concave binding interfaces, typically those of enzyme-binding sites, are enriched by the HFV residues. Thus, we expect that the association of HFV residues with the interfaces is mostly for enzymes. If, however, a binding region has invaginations and cavities, as in some of the antigen/antibodies and in cases in the Cluster data set, we expect it would be detected there too. This implies that binding sites possess several (inter-related) properties such as cavities, high packing density, conservation, and disposition for hotspots at binding surfaces. It further suggests that the high frequency vibrating residue-based approach is a potential tool for identification of regions likely to serve as protein-binding sites. The software is available at http://www.prc.boun.edu.tr/PRC/software.html.
Collapse
Affiliation(s)
- Asli Ertekin
- Polymer Research Center and Chemical Engineering Department, Bogazici University, Bebek 34342, Istanbul, Turkey
| | | | | |
Collapse
|
5
|
Yu J, Thorne JL. Testing for spatial clustering of amino acid replacements within protein tertiary structure. J Mol Evol 2006; 62:682-92. [PMID: 16752209 DOI: 10.1007/s00239-005-0107-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2005] [Accepted: 11/30/2005] [Indexed: 11/25/2022]
Abstract
Widely used models of protein evolution ignore protein structure. Therefore, these models do not predict spatial clustering of amino acid replacements with respect to tertiary structure. One formal and biologically implausible possibility is that there is no tendency for amino acid replacements to be spatially clustered during evolution. An alternative to this is that amino acid replacements are spatially clustered and this spatial clustering can be fully explained by a tendency for similar rates of amino acid replacement at sites that are nearby in protein tertiary structure. A third possibility is that the amount of clustering exceeds that which can be explained solely on the basis of independently evolving protein sites with spatially clustered replacement rates. We introduce two simple and not very parametric hypothesis tests that help distinguish these three possibilities. We then apply these tests to 273 homologous protein families. The null hypothesis of no spatial clustering is rejected for 102 of 273 families. The explanation of spatially clustered rates but independent change among sites is rejected for 43 families. These findings need to be reconciled with the common practice of basing evolutionary inferences on models that assume independent change among sites.
Collapse
Affiliation(s)
- Jiaye Yu
- Bioinformatics Research Center, North Carolina State University, Raleigh, NC 27695-7566, USA
| | | |
Collapse
|
6
|
Abstract
As complete genomes accumulate and the generation of genomic biodiversity proceeds at an accelerating pace, the need to understand the interaction between sequence evolution and protein structure and function rises in prominence. The pattern and pace of substitutions in proteins can provide important clues to functional importance, functional divergence, and adaptive response. Coevolution between amino acid residues and the context dependence of the evolutionary process are often ignored, however, because of their complexity, but they are critical for the accurate interpretation of reconstructed evolutionary events. Because residues interact with one another, and because the effect of substitutions can depend on the structural and physiological environment in which they occur, an accurate science of evolutionary functional genomics and a complete understanding of selection in proteins require a better understanding of how context dependence affects protein evolution. Here, we present new evidence from vertebrate cytochrome oxidase sequences that pairwise coevolutionary interactions between protein residues are highly dependent on tertiary and secondary structure. We also discuss theoretical predictions that impinge on our expectations of how protein residues may interact over long distances because of their shared need to maintain protein stability.
Collapse
Affiliation(s)
- Zhengyuan O Wang
- Department of Biological Sciences, Biological Computation and Visualization Center, Louisiana State University, Baton Rouge, Louisiana 70803, USA
| | | |
Collapse
|
7
|
Mayrose I, Mitchell A, Pupko T. Site-specific evolutionary rate inference: taking phylogenetic uncertainty into account. J Mol Evol 2005; 60:345-53. [PMID: 15871045 DOI: 10.1007/s00239-004-0183-8] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2004] [Accepted: 09/09/2004] [Indexed: 11/24/2022]
Abstract
The evolutionary rate at an amino acid site is indicative of how conserved this site is and, in turn, allows evaluating the importance of this site in maintaining the structure/function of the protein. When evolutionary rates are estimated, one must reconstruct the phylogenetic tree describing the evolutionary relationship among the sequences under study. However, if the inferred phylogenetic tree is incorrect, it can lead to erroneous site-specific rate estimates. Here we describe a novel Bayesian method that uses Markov chain Monte Carlo methodology to integrate over the space of all possible trees and model parameters. By doing so, the method considers alternative evolutionary scenarios weighted by their posterior probabilities. We show that this comprehensive evolutionary approach is superior over methods that are based on only a single tree. We illustrate the potential of our algorithm by analyzing the conservation pattern of the potassium channel protein family.
Collapse
Affiliation(s)
- Itay Mayrose
- Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Israel
| | | | | |
Collapse
|
8
|
Blouin C, Boucher Y, Roger AJ. Inferring functional constraints and divergence in protein families using 3D mapping of phylogenetic information. Nucleic Acids Res 2003; 31:790-7. [PMID: 12527789 PMCID: PMC140515 DOI: 10.1093/nar/gkg151] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Comparative sequence analysis has been used to study specific questions about the structure and function of proteins for many years. Here we propose a knowledge-based framework in which the maximum likelihood rate of evolution is used to quantify the level of constraint on the identity of a site. We demonstrate that site-rate mapping on 3D structures using datasets of rhodopsin-like G-protein receptors and alpha- and beta-tubulins provides an excellent tool for pinpointing the functional features shared between orthologous and paralogous proteins. In addition, functional divergence within protein families can be inferred by examining the differences in the site rates, the differences in the chemical properties of the side chains or amino acid usage between aligned sites. Two novel analytical methods are introduced to characterize rate- independent functional divergence. These are tested using a dataset of two classes of HMG-CoA reductases for which only one class can perform both the forward and reverse reaction. We show that functionally divergent sites occur in a cluster of sites interacting with the catalytic residues and that this information should facilitate the design of experimental strategies to directly test functional properties of residues.
Collapse
Affiliation(s)
- Christian Blouin
- Canadian Institute for Advanced Research, Program in Evolutionary Biology, Genome Atlantic, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, NS B3H 4H7, Canada.
| | | | | |
Collapse
|
9
|
Dean AM, Neuhauser C, Grenier E, Golding GB. The pattern of amino acid replacements in alpha/beta-barrels. Mol Biol Evol 2002; 19:1846-64. [PMID: 12411594 DOI: 10.1093/oxfordjournals.molbev.a004009] [Citation(s) in RCA: 58] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The determinants of site-to-site variability in the rate of amino acid replacement in alpha/beta-barrel enzyme structures are investigated. Of 125 available alpha/beta-barrel structures, only 25 meet a variety of phylogenetic and statistical criteria necessary to ensure sufficient data for reliable analysis. These 25 enzyme structures (from a wide variety of taxa with diverse lifestyles in diverse habitats) differ greatly in size, number, and topology of domains in addition to the alpha/beta-barrel, quaternary structure, metabolic role, reaction catalyzed, presence of prosthetic groups, regulatory mechanisms, use of cofactors, and catalytic mechanisms. Yet, with the exception of ribulose-1,5-bisphosphate carboxylase, all structures have similar frequency distributions of amino acid replacement rates. Hence, site-specific variability in rates of evolution is largely independent of differences in biology, biochemistry, and molecular structure. A correlation between site-specific rate variation and (1) distance from the active site, (2) solvent accessibility, and (3) treating glycines in unusual main-chain conformations as a separate class, explains approximately half the causal variation. Secondary structure exerts little influence on the pattern and distribution of replacements. Additional domains and subunits, side-chain hydrogen bonds, unusual side-chain rotamers, nonplanar peptide bonds, strained main-chain conformations, and buried hydrophilic-charged residues contribute little to variability among sites because they are rare. Nonlinear models do not improve the fits. In several enzymes, deviations from the typical pattern of replacements suggest the possible action of natural selection. A statistical analysis shows that, in all cases, much of the remaining unexplained variation is not attributable to chance and that other, as yet unidentified, causal relations must exist.
Collapse
Affiliation(s)
- Antony M Dean
- The Biological Process Technology Institute, University of Minnesota, St. Paul, 55108, USA.
| | | | | | | |
Collapse
|
10
|
Simon AL, Stone EA, Sidow A. Inference of functional regions in proteins by quantification of evolutionary constraints. Proc Natl Acad Sci U S A 2002; 99:2912-7. [PMID: 11880638 PMCID: PMC122447 DOI: 10.1073/pnas.042692299] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Likelihood estimates of local rates of evolution within proteins reveal that selective constraints on structure and function are quantitatively stable over billions of years of divergence. The stability of constraints produces an intramolecular clock that gives each protein a characteristic pattern of evolutionary rates along its sequence. This pattern allows the identification of constrained regions and, because the rate of evolution is a quantitative measure of the strength of the constraint, of their functional importance. We show that results from such analyses, which require only sequence alignments, are consistent with experimental and mutational data. The methodology has significant predictive power and may be used to guide structure--function studies for any protein represented by a modest number of homologs in sequence databases.
Collapse
Affiliation(s)
- Alexander L Simon
- Program in Cancer Biology and Department of Pathology, Stanford University Medical School, Stanford, CA 94305-5324, USA
| | | | | |
Collapse
|