1
|
Del Amparo R, Arenas M. HIV Protease and Integrase Empirical Substitution Models of Evolution: Protein-Specific Models Outperform Generalist Models. Genes (Basel) 2021; 13:61. [PMID: 35052404 PMCID: PMC8774313 DOI: 10.3390/genes13010061] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2021] [Revised: 12/22/2021] [Accepted: 12/22/2021] [Indexed: 12/24/2022] Open
Abstract
Diverse phylogenetic methods require a substitution model of evolution that should mimic, as accurately as possible, the real substitution process. At the protein level, empirical substitution models have traditionally been based on a large number of different proteins from particular taxonomic levels. However, these models assume that all of the proteins of a taxonomic level evolve under the same substitution patterns. We believe that this assumption is highly unrealistic and should be relaxed by considering protein-specific substitution models that account for protein-specific selection processes. In order to test this hypothesis, we inferred and evaluated four new empirical substitution models for the protease and integrase of HIV and other viruses. We found that these models more accurately fit, compared with any of the currently available empirical substitution models, the evolutionary process of these proteins. We conclude that evolutionary inferences from protein sequences are more accurate if they are based on protein-specific substitution models rather than taxonomic-specific (generalist) substitution models. We also present four new empirical substitution models of protein evolution that could be useful for phylogenetic inferences of viral protease and integrase.
Collapse
Affiliation(s)
- Roberto Del Amparo
- Centro de Investigacións Biomédicas (CINBIO), University of Vigo, 36310 Vigo, Spain;
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
| | - Miguel Arenas
- Centro de Investigacións Biomédicas (CINBIO), University of Vigo, 36310 Vigo, Spain;
- Department of Biochemistry, Genetics and Immunology, University of Vigo, 36310 Vigo, Spain
- Galicia Sur Health Research Institute (IIS Galicia Sur), 36310 Vigo, Spain
| |
Collapse
|
2
|
Barreto CAV, Baptista SJ, Preto AJ, Matos-Filipe P, Mourão J, Melo R, Moreira I. Prediction and targeting of GPCR oligomer interfaces. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2020; 169:105-149. [PMID: 31952684 DOI: 10.1016/bs.pmbts.2019.11.007] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
GPCR oligomerization has emerged as a hot topic in the GPCR field in the last years. Receptors that are part of these oligomers can influence each other's function, although it is not yet entirely understood how these interactions work. The existence of such a highly complex network of interactions between GPCRs generates the possibility of alternative targets for new therapeutic approaches. However, challenges still exist in the characterization of these complexes, especially at the interface level. Different experimental approaches, such as FRET or BRET, are usually combined to study GPCR oligomer interactions. Computational methods have been applied as a useful tool for retrieving information from GPCR sequences and the few X-ray-resolved oligomeric structures that are accessible, as well as for predicting new and trustworthy GPCR oligomeric interfaces. Machine-learning (ML) approaches have recently helped with some hindrances of other methods. By joining and evaluating multiple structure-, sequence- and co-evolution-based features on the same algorithm, it is possible to dilute the issues of particular structures and residues that arise from the experimental methodology into all-encompassing algorithms capable of accurately predict GPCR-GPCR interfaces. All these methods used as a single or a combined approach provide useful information about GPCR oligomerization and its role in GPCR function and dynamics. Altogether, we present experimental, computational and machine-learning methods used to study oligomers interfaces, as well as strategies that have been used to target these dynamic complexes.
Collapse
Affiliation(s)
- Carlos A V Barreto
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - Salete J Baptista
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, CTN, LRS, Portugal
| | - António José Preto
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - Pedro Matos-Filipe
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal
| | - Joana Mourão
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Institute for Interdisciplinary Research, University of Coimbra, Coimbra, Portugal
| | - Rita Melo
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Centro de Ciências e Tecnologias Nucleares, Instituto Superior Técnico, Universidade de Lisboa, CTN, LRS, Portugal
| | - Irina Moreira
- Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portugal; Science and Technology Faculty, University of Coimbra, Coimbra, Portugal.
| |
Collapse
|
3
|
Beaulieu JM, O’Meara BC, Zaretzki R, Landerer C, Chai J, Gilchrist MA. Population Genetics Based Phylogenetics Under Stabilizing Selection for an Optimal Amino Acid Sequence: A Nested Modeling Approach. Mol Biol Evol 2019; 36:834-851. [PMID: 30521036 PMCID: PMC6445302 DOI: 10.1093/molbev/msy222] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
We present a new phylogenetic approach, selection on amino acids and codons (SelAC), whose substitution rates are based on a nested model linking protein expression to population genetics. Unlike simpler codon models that assume a single substitution matrix for all sites, our model more realistically represents the evolution of protein-coding DNA under the assumption of consistent, stabilizing selection using a cost-benefit approach. This cost-benefit approach allows us to generate a set of 20 optimal amino acid-specific matrix families using just a handful of parameters and naturally links the strength of stabilizing selection to protein synthesis levels, which we can estimate. Using a yeast data set of 100 orthologs for 6 taxa, we find SelAC fits the data much better than popular models by 104-105 Akike information criterion units adjusted for small sample bias. Our results also indicated that nested, mechanistic models better predict observed data patterns highlighting the improvement in biological realism in amino acid sequence evolution that our model provides. Additional parameters estimated by SelAC indicate that a large amount of nonphylogenetic, but biologically meaningful, information can be inferred from existing data. For example, SelAC prediction of gene-specific protein synthesis rates correlates well with both empirical (r=0.33-0.48) and other theoretical predictions (r=0.45-0.64) for multiple yeast species. SelAC also provides estimates of the optimal amino acid at each site. Finally, because SelAC is a nested approach based on clearly stated biological assumptions, future modifications, such as including shifts in the optimal amino acid sequence within or across lineages, are possible.
Collapse
Affiliation(s)
- Jeremy M Beaulieu
- Department of Biological Sciences, University of Arkansas, Fayetteville, AR
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville, TN
- National Institute for Mathematical and Biological Synthesis, Knoxville, TN
| | - Brian C O’Meara
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville, TN
- National Institute for Mathematical and Biological Synthesis, Knoxville, TN
| | | | - Cedric Landerer
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville, TN
- National Institute for Mathematical and Biological Synthesis, Knoxville, TN
| | - Juanjuan Chai
- National Institute for Mathematical and Biological Synthesis, Knoxville, TN
- Suite 1039, White Plains, NY
| | - Michael A Gilchrist
- Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville, TN
- National Institute for Mathematical and Biological Synthesis, Knoxville, TN
| |
Collapse
|
4
|
Arenas M, Sánchez-Cobos A, Bastolla U. Maximum-Likelihood Phylogenetic Inference with Selection on Protein Folding Stability. Mol Biol Evol 2015; 32:2195-207. [PMID: 25837579 DOI: 10.1093/molbev/msv085] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Despite intense work, incorporating constraints on protein native structures into the mathematical models of molecular evolution remains difficult, because most models and programs assume that protein sites evolve independently, whereas protein stability is maintained by interactions between sites. Here, we address this problem by developing a new mean-field substitution model that generates independent site-specific amino acid distributions with constraints on the stability of the native state against both unfolding and misfolding. The model depends on a background distribution of amino acids and one selection parameter that we fix maximizing the likelihood of the observed protein sequence. The analytic solution of the model shows that the main determinant of the site-specific distributions is the number of native contacts of the site and that the most variable sites are those with an intermediate number of native contacts. The mean-field models obtained, taking into account misfolded conformations, yield larger likelihood than models that only consider the native state, because their average hydrophobicity is more realistic, and they produce on the average stable sequences for most proteins. We evaluated the mean-field model with respect to empirical substitution models on 12 test data sets of different protein families. In all cases, the observed site-specific sequence profiles presented smaller Kullback-Leibler divergence from the mean-field distributions than from the empirical substitution model. Next, we obtained substitution rates combining the mean-field frequencies with an empirical substitution model. The resulting mean-field substitution model assigns larger likelihood than the empirical model to all studied families when we consider sequences with identity larger than 0.35, plausibly a condition that enforces conservation of the native structure across the family. We found that the mean-field model performs better than other structurally constrained models with similar or higher complexity. With respect to the much more complex model recently developed by Bordner and Mittelmann, which takes into account pairwise terms in the amino acid distributions and also optimizes the exchangeability matrix, our model performed worse for data with small sequence divergence but better for data with larger sequence divergence. The mean-field model has been implemented into the computer program Prot_Evol that is freely available at http://ub.cbm.uam.es/software/Prot_Evol.php.
Collapse
Affiliation(s)
- Miguel Arenas
- Department of Cell Biology and Immunology, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Madrid, Spain
| | - Agustin Sánchez-Cobos
- Department of Cell Biology and Immunology, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Madrid, Spain
| | - Ugo Bastolla
- Department of Cell Biology and Immunology, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), Universidad Autónoma de Madrid, Madrid, Spain
| |
Collapse
|
5
|
Wu CH, Suchard MA, Drummond AJ. Bayesian selection of nucleotide substitution models and their site assignments. Mol Biol Evol 2012; 30:669-88. [PMID: 23233462 PMCID: PMC3563969 DOI: 10.1093/molbev/mss258] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Probabilistic inference of a phylogenetic tree from molecular sequence data is predicated on a substitution model describing the relative rates of change between character states along the tree for each site in the multiple sequence alignment. Commonly, one assumes that the substitution model is homogeneous across sites within large partitions of the alignment, assigns these partitions a priori, and then fixes their underlying substitution model to the best-fitting model from a hierarchy of named models. Here, we introduce an automatic model selection and model averaging approach within a Bayesian framework that simultaneously estimates the number of partitions, the assignment of sites to partitions, the substitution model for each partition, and the uncertainty in these selections. This new approach is implemented as an add-on to the BEAST 2 software platform. We find that this approach dramatically improves the fit of the nucleotide substitution model compared with existing approaches, and we show, using a number of example data sets, that as many as nine partitions are required to explain the heterogeneity in nucleotide substitution process across sites in a single gene analysis. In some instances, this improved modeling of the substitution process can have a measurable effect on downstream inference, including the estimated phylogeny, relative divergence times, and effective population size histories.
Collapse
Affiliation(s)
- Chieh-Hsi Wu
- Department of Computer Science, University of Auckland, Auckland, New Zealand
| | | | | |
Collapse
|
6
|
Roure B, Philippe H. Site-specific time heterogeneity of the substitution process and its impact on phylogenetic inference. BMC Evol Biol 2011; 11:17. [PMID: 21235782 PMCID: PMC3034684 DOI: 10.1186/1471-2148-11-17] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2010] [Accepted: 01/14/2011] [Indexed: 11/13/2022] Open
Abstract
Background Model violations constitute the major limitation in inferring accurate phylogenies. Characterizing properties of the data that are not being correctly handled by current models is therefore of prime importance. One of the properties of protein evolution is the variation of the relative rate of substitutions across sites and over time, the latter is the phenomenon called heterotachy. Its effect on phylogenetic inference has recently obtained considerable attention, which led to the development of new models of sequence evolution. However, thus far focus has been on the quantitative heterogeneity of the evolutionary process, thereby overlooking more qualitative variations. Results We studied the importance of variation of the site-specific amino-acid substitution process over time and its possible impact on phylogenetic inference. We used the CAT model to define an infinite mixture of substitution processes characterized by equilibrium frequencies over the twenty amino acids, a useful proxy for qualitatively estimating the evolutionary process. Using two large datasets, we show that qualitative changes in site-specific substitution properties over time occurred significantly. To test whether this unaccounted qualitative variation can lead to an erroneous phylogenetic tree, we analyzed a concatenation of mitochondrial proteins in which Cnidaria and Porifera were erroneously grouped. The progressive removal of the sites with the most heterogeneous CAT profiles across clades led to the recovery of the monophyly of Eumetazoa (Cnidaria+Bilateria), suggesting that this heterogeneity can negatively influence phylogenetic inference. Conclusion The time-heterogeneity of the amino-acid replacement process is therefore an important evolutionary aspect that should be incorporated in future models of sequence change.
Collapse
Affiliation(s)
- Béatrice Roure
- Département de Biochimie, Centre Robert-Cedergren, Université de Montréal, Succursale Centre-Ville, Québec, Canada
| | | |
Collapse
|
7
|
Marsh L. A model for protein sequence evolution based on selective pressure for protein stability: application to hemoglobins. Evol Bioinform Online 2009; 5:107-18. [PMID: 19812731 PMCID: PMC2747123 DOI: 10.4137/ebo.s3120] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Negative selection against protein instability is a central influence on evolution of proteins. Protein stability is maintained over evolution despite changes in underlying sequences. An empirical all-site stability-based model of evolution was developed to focus on the selection of residues arising from their contributions to protein stability. In this model, site rates could vary. A structure-based method was used to predict stationary frequencies of hemoglobin residues based on their propensity to promote protein stability at a site. Sites with destabilizing residues were shown to change more rapidly in hemoglobins than sites with stabilizing residues. For diverse proteins the results were consistent with stability-based selection. Maximum likelihood studies with hemoglobins supported the stability-based model over simple Poisson-based methods. These observations are consistent with suggestions that purifying selection to maintain protein structural stability plays a dominant role in protein evolution.
Collapse
Affiliation(s)
- Lorraine Marsh
- Department of Biology, Long Island University, Brooklyn, NY 11201, USA.
| |
Collapse
|
8
|
Kryazhimskiy S, Bazykin GA, Plotkin JB, Plotkin J, Dushoff J. Directionality in the evolution of influenza A haemagglutinin. Proc Biol Sci 2008; 275:2455-64. [PMID: 18647721 PMCID: PMC2603193 DOI: 10.1098/rspb.2008.0521] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The evolution of haemagglutinin (HA), an important influenza virus antigen, has been the subject of intensive research for more than two decades. Many characteristics of HA's sequence evolution are captured by standard Markov chain substitution models. Such models assign equal fitness to all accessible amino acids at a site. We show, however, that such models strongly underestimate the number of homoplastic amino acid substitutions during the course of HA's evolution, i.e. substitutions that repeatedly give rise to the same amino acid at a site. We develop statistics to detect individual homoplastic events and find that they preferentially occur at positively selected epitopic sites. Our results suggest that the evolution of the influenza A HA, including evolution by positive selection, is strongly affected by the long-term site-specific preferences for individual amino acids.
Collapse
Affiliation(s)
- Sergey Kryazhimskiy
- Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA.
| | | | | | | | | |
Collapse
|
9
|
Blackburne BP, Hay AJ, Goldstein RA. Changing selective pressure during antigenic changes in human influenza H3. PLoS Pathog 2008; 4:e1000058. [PMID: 18451985 PMCID: PMC2323114 DOI: 10.1371/journal.ppat.1000058] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2007] [Accepted: 04/04/2008] [Indexed: 11/18/2022] Open
Abstract
The rapid evolution of influenza viruses presents difficulties in maintaining the optimal efficiency of vaccines. Amino acid substitutions result in antigenic drift, a process whereby antisera raised in response to one virus have reduced effectiveness against future viruses. Interestingly, while amino acid substitutions occur at a relatively constant rate, the antigenic properties of H3 move in a discontinuous, step-wise manner. It is not clear why this punctuated evolution occurs, whether this represents simply the fact that some substitutions affect these properties more than others, or if this is indicative of a changing relationship between the virus and the host. In addition, the role of changing glycosylation of the haemagglutinin in these shifts in antigenic properties is unknown. We analysed the antigenic drift of HA1 from human influenza H3 using a model of sequence change that allows for variation in selective pressure at different locations in the sequence, as well as at different parts of the phylogenetic tree. We detect significant changes in selective pressure that occur preferentially during major changes in antigenic properties. Despite the large increase in glycosylation during the past 40 years, changes in glycosylation did not correlate either with changes in antigenic properties or with significantly more rapid changes in selective pressure. The locations that undergo changes in selective pressure are largely in places undergoing adaptive evolution, in antigenic locations, and in locations or near locations undergoing substitutions that characterise the change in antigenicity of the virus. Our results suggest that the relationship of the virus to the host changes with time, with the shifts in antigenic properties representing changes in this relationship. This suggests that the virus and host immune system are evolving different methods to counter each other. While we are able to characterise the rapid increase in glycosylation of the haemagglutinin during time in human influenza H3, an increase not present in influenza in birds, this increase seems unrelated to the observed changes in antigenic properties.
Collapse
MESH Headings
- Animals
- Antigenic Variation/genetics
- Antigenic Variation/immunology
- Antigens, Viral/immunology
- COS Cells
- Cell Fusion
- Chlorocebus aethiops
- DNA, Viral/genetics
- Evolution, Molecular
- Genetic Drift
- HeLa Cells
- Hemagglutinin Glycoproteins, Influenza Virus/genetics
- Hemagglutinin Glycoproteins, Influenza Virus/immunology
- Humans
- Influenza A virus/genetics
- Influenza A virus/immunology
- Influenza A virus/pathogenicity
- Influenza, Human/genetics
- Influenza, Human/immunology
- Influenza, Human/virology
- Leukocytes, Mononuclear/immunology
- Leukocytes, Mononuclear/virology
- Macrophages/immunology
- Macrophages/virology
- Selection, Genetic
Collapse
Affiliation(s)
- Benjamin P. Blackburne
- Division of Mathematical Biology, National Institute of Medical Research, Mill Hill, London, United Kingdom
| | - Alan J. Hay
- Division of Virology, National Institute of Medical Research, Mill Hill, London, United Kingdom
| | - Richard A. Goldstein
- Division of Mathematical Biology, National Institute of Medical Research, Mill Hill, London, United Kingdom
- * E-mail:
| |
Collapse
|
10
|
Bastolla U, Porto M, Ortíz AR. Local interactions in protein folding determined through an inverse folding model. Proteins 2008; 71:278-99. [DOI: 10.1002/prot.21730] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
11
|
The Structurally Constrained Neutral Model of Protein Evolution. ACTA ACUST UNITED AC 2007. [DOI: 10.1007/978-3-540-35306-5_4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
12
|
Conant GC, Wagner GP, Stadler PF. Modeling amino acid substitution patterns in orthologous and paralogous genes. Mol Phylogenet Evol 2006; 42:298-307. [PMID: 16942891 DOI: 10.1016/j.ympev.2006.07.006] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2006] [Revised: 06/12/2006] [Accepted: 07/06/2006] [Indexed: 11/29/2022]
Abstract
We study to what degree patterns of amino acid substitution vary between genes using two models of protein-coding gene evolution. The first divides the amino acids into groups, with one substitution rate for pairs of residues in the same group and a second for those in differing groups. Unlike previous applications of this model, the groups themselves are estimated from data by simulated annealing. The second model makes substitution rates a function of the physical and chemical similarity between two residues. Because we model the evolution of coding DNA sequences as opposed to protein sequences, artifacts arising from the differing numbers of nucleotide substitutions required to bring about various amino acid substitutions are avoided. Using 10 alignments of related sequences (five of orthologous genes and five gene families), we do find differences in substitution patterns. We also find that, although patterns of amino acid substitution vary temporally within the history of a gene, variation is not greater in paralogous than in orthologous genes. Improved understanding of such gene-specific variation in substitution patterns may have implications for applications such as sequence alignment and phylogenetic inference.
Collapse
Affiliation(s)
- Gavin C Conant
- Smurfit Institute of Genetics, Trinity College, University of Dublin, Dublin 2, Ireland.
| | | | | |
Collapse
|
13
|
Bastolla U, Porto M, Roman HE, Vendruscolo M. A protein evolution model with independent sites that reproduces site-specific amino acid distributions from the Protein Data Bank. BMC Evol Biol 2006; 6:43. [PMID: 16737532 PMCID: PMC1570368 DOI: 10.1186/1471-2148-6-43] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2005] [Accepted: 05/31/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Since thermodynamic stability is a global property of proteins that has to be conserved during evolution, the selective pressure at a given site of a protein sequence depends on the amino acids present at other sites. However, models of molecular evolution that aim at reconstructing the evolutionary history of macromolecules become computationally intractable if such correlations between sites are explicitly taken into account. RESULTS We introduce an evolutionary model with sites evolving independently under a global constraint on the conservation of structural stability. This model consists of a selection process, which depends on two hydrophobicity parameters that can be computed from protein sequences without any fit, and a mutation process for which we consider various models. It reproduces quantitatively the results of Structurally Constrained Neutral (SCN) simulations of protein evolution in which the stability of the native state is explicitly computed and conserved. We then compare the predicted site-specific amino acid distributions with those sampled from the Protein Data Bank (PDB). The parameters of the mutation model, whose number varies between zero and five, are fitted from the data. The mean correlation coefficient between predicted and observed site-specific amino acid distributions is larger than <r> = 0.70 for a mutation model with no free parameters and no genetic code. In contrast, considering only the mutation process with no selection yields a mean correlation coefficient of <r> = 0.56 with three fitted parameters. The mutation model that best fits the data takes into account increased mutation rate at CpG dinucleotides, yielding <r> = 0.90 with five parameters. CONCLUSION The effective selection process that we propose reproduces well amino acid distributions as observed in the protein sequences in the PDB. Its simplicity makes it very promising for likelihood calculations in phylogenetic studies. Interestingly, in this approach the mutation process influences the effective selection process, i.e. selection and mutation must be entangled in order to obtain effectively independent sites. This interdependence between mutation and selection reflects the deep influence that mutation has on the evolutionary process: The bias in the mutation influences the thermodynamic properties of the evolving proteins, in agreement with comparative studies of bacterial proteomes, and it also influences the rate of accepted mutations.
Collapse
Affiliation(s)
- Ugo Bastolla
- Centro de Biología Molecular "Severo Ochoa", (CSIC-UAM), Cantoblanco, 28049 Madrid, Spain
| | - Markus Porto
- Institut für Festkörperphysik, Technische Universität Darmstadt, Hochschulstr. 8, 64289 Darmstadt, Germany
| | - H Eduardo Roman
- Dipartimento di Fisica, Università di Milano Bicocca, Piazza della Scienza 3, 20126 Milano, Italy
| | - Michele Vendruscolo
- Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge CB2 1EW, UK
| |
Collapse
|
14
|
Reggio PH. Computational methods in drug design: modeling G protein-coupled receptor monomers, dimers, and oligomers. AAPS JOURNAL 2006; 8:E322-36. [PMID: 16796383 PMCID: PMC3231557 DOI: 10.1007/bf02854903] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
G protein-coupled receptors (GPCRs) are membrane proteins that serve as very important links through which cellular signal transduction mechanisms are activated. Many vital physiological events such as sensory perception, immune defense, cell communication, chemotaxis, and neurotransmission are mediated by GPCRs. Not surprisingly, GPCRs are major targets for drug development today. Most modeling studies in the GPCR field have focused upon the creation of a model of a single GPCR (ie, a GPCR monomer) based upon the crystal structure of the Class A GPCR, rhodopsin. However, the emerging concept of GPCR dimerization has challenged our notions of the monomeric GPCR as functional unit. Recent work has shown not only that many GPCRs exist as homo- and heterodimers but also that GPCR oligomeric assembly may have important functional roles. This review focuses first on methodology for the creation of monomeric GPCR models. Special emphasis is given to the identification of localized regions where the structure of a GPCR may diverge from that of bovine rhodopsin. The review then focuses on GPCR dimers and oligomers and the bioinformatics methods available for identifying homo- and heterodimer interfaces.
Collapse
Affiliation(s)
- Patricia H Reggio
- Center for Drug Design, Department of Chemistry and Biochemistry, University of North Carolina Greensboro, Greensboro, NC 27402, USA.
| |
Collapse
|
15
|
Minshull J, Ness JE, Gustafsson C, Govindarajan S. Predicting enzyme function from protein sequence. Curr Opin Chem Biol 2005; 9:202-9. [PMID: 15811806 DOI: 10.1016/j.cbpa.2005.02.003] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
There are two main reasons to try to predict an enzyme's function from its sequence. The first is to identify the components and thus the functional capabilities of an organism, the second is to create enzymes with specific properties. Genomics, expression analysis, proteomics and metabonomics are largely directed towards understanding how information flows from DNA sequence to protein functions within an organism. This review focuses on information flow in the opposite direction: the applicability of what is being learned from natural enzymes to improve methods for catalyst design.
Collapse
|
16
|
Filizola M, Weinstein H. The study of G-protein coupled receptor oligomerization with computational modeling and bioinformatics. FEBS J 2005; 272:2926-38. [PMID: 15955053 DOI: 10.1111/j.1742-4658.2005.04730.x] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
To achieve a structural context for the analysis of G-protein coupled receptor (GPCR) oligomers, molecular modeling must be used to predict the corresponding interaction interfaces. The task is complicated by the paucity of detailed structural data at atomic resolution, and the large number of possible modes in which the bundles of seven transmembrane (TM) segments of the interacting GPCR monomers can be packed together into dimers and/or higher-order oligomers. Approaches and tools offered by bioinformatics can be used to reduce the complexity of this task and, combined with computational modeling, can serve to yield testable predictions for the structural properties of oligomers. Most of the bioinformatics methods take advantage of the evolutionary relation that exists among GPCRs, as expressed in their sequences and measurable in the common elements of their structural and functional features. These common elements are responsible for the presence of detectable patterns of motifs and correlated mutations evident from the alignment of the sequences of these complex biological systems. The decoding of these patterns in terms of structural and functional determinants can provide indications about the most likely interfaces of dimerization/oligomerization of GPCRs. We review here the main approaches from bioinformatics, enhanced by computational molecular modeling, that have been used to predict likely interfaces of dimerization/oligomerization of GPCRs, and compare results from their application to rhodopsin-like GPCRs. A compilation of the most frequently predicted GPCR oligomerization interfaces points to specific regions of TMs 4-6.
Collapse
Affiliation(s)
- Marta Filizola
- Department of Physiology and Biophysics, Weill Medical College of Cornell University, NY 10021, USA.
| | | |
Collapse
|
17
|
Bastolla U, Porto M, Roman HE, Vendruscolo M. Looking at structure, stability, and evolution of proteins through the principal eigenvector of contact matrices and hydrophobicity profiles. Gene 2005; 347:219-30. [PMID: 15777696 DOI: 10.1016/j.gene.2004.12.015] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2004] [Revised: 11/29/2004] [Accepted: 12/10/2004] [Indexed: 11/28/2022]
Abstract
We review and further develop an analytical model that describes how thermodynamic constraints on the stability of the native state influence protein evolution in a site-specific manner. To this end, we represent both protein sequences and protein structures as vectors: structures are represented by the principal eigenvector (PE) of the protein contact matrix, a quantity that resembles closely the effective connectivity of each site; sequences are represented through the "interactivity" of each amino acid type, using novel parameters that are correlated with hydropathy scales. These interactivity parameters are more strongly correlated than the other hydropathy scales that we examine with: (1) the change upon mutations of the unfolding free energy of proteins with two-states thermodynamics; (2) genomic properties as the genome-size and the genome-wide GC content; (3) the main eigenvectors of the substitution matrices. The evolutionary average of the interactivity vector correlates very strongly with the PE of a protein structure. Using this result, we derive an analytic expression for site-specific distributions of amino acids across protein families in the form of Boltzmann distributions whose "inverse temperature" is a function of the PE component. We show that our predictions are in agreement with site-specific amino acid distributions obtained from the Protein Data Bank, and we determine the mutational model that best fits the observed site-specific amino acid distributions. Interestingly, the optimal model almost minimizes the rate at which deleterious mutations are eliminated by natural selection.
Collapse
Affiliation(s)
- Ugo Bastolla
- Centro de Astrobiología, INTA-CSIC, c.tra de Ajalvir km.4, E-28850, Torrejón de Ardoz, Madrid, Spain.
| | | | | | | |
Collapse
|
18
|
Porto M, Roman HE, Vendruscolo M, Bastolla U. Prediction of site-specific amino acid distributions and limits of divergent evolutionary changes in protein sequences. Mol Biol Evol 2004; 22:630-8. [PMID: 15537801 DOI: 10.1093/molbev/msi048] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We derive an analytic expression for site-specific stationary distributions of amino acids from the structurally constrained neutral (SCN) model of protein evolution with conservation of folding stability. The stationary distributions that we obtain have a Boltzmann-like shape, and their effective temperature parameter, measuring the limit of divergent evolutionary changes at a given site, can be predicted from a site-specific topological property, the principal eigenvector of the contact matrix of the native conformation of the protein. These analytic results, obtained without free parameters, are compared with simulations of the SCN model and with the site-specific amino acid distributions obtained from the Protein Data Bank. These results also provide new insights into how the topology of a protein fold influences its designability, i.e., the number of sequences compatible with that fold. The dependence of the effective temperature on the principal eigenvector decreases for longer proteins, as a possible consequence of the fact that selection for thermodynamic stability becomes weaker in this case.
Collapse
Affiliation(s)
- Markus Porto
- Institut für Festkörperphysik, Technische Universität Darmstadt, Hochschulstr. 8, 64289 Darmstadt, Germany.
| | | | | | | |
Collapse
|
19
|
McClellan DA, Palfreyman EJ, Smith MJ, Moss JL, Christensen RG, Sailsbery JK. Physicochemical Evolution and Molecular Adaptation of the Cetacean and Artiodactyl Cytochrome b Proteins. Mol Biol Evol 2004; 22:437-55. [PMID: 15509727 DOI: 10.1093/molbev/msi028] [Citation(s) in RCA: 106] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Cetaceans have most likely experienced metabolic shifts since evolutionarily diverging from their terrestrial ancestors, shifts that may be reflected in the proteins such as cytochrome b that are responsible for metabolic efficiency. However, accepted statistical methods for detecting molecular adaptation are largely biased against even moderately conservative proteins because the primary criterion involves a comparison of nonsynonymous and synonymous substitution rates (dN/dS); they do not allow for the possibility that adaptation may come in the form of very few amino acid changes. We apply the MM01 model to the possible molecular adaptation of cytochrome b among cetaceans because it does not rely on a dN/dS ratio, instead evaluating positive selection in terms of the amino acid properties that comprise protein phenotypes that selection at the molecular level may act upon. We also apply the codon-degeneracy model (CDM), which focuses on evaluating overall patterns of nucleotide substitution in terms of base exchange, codon position, and synonymy to estimate the overall effect of selection. Using these relatively new models, we characterize the molecular adaptation that has occurred in the cetacean cytochrome b protein by comparing revealed amino acid replacement patterns to those found among artiodactyls, the modern terrestrial mammals found to be most closely related to cetaceans. Our findings suggest that several regions of the cetacean cytochrome b protein have experienced molecular adaptation. Also, these adaptations are spatially associated with domain structure, protein function, and the structure and function of the cytochrome bc(1) complex and its constituents. We also have found a general correlation between the results of the analytical software programs TreeSAAP (which implements the MM01 model) and CDM (which implements the codon-degeneracy model).
Collapse
Affiliation(s)
- D A McClellan
- Department of. Integrative Biology, Brigham Young University, Provo, Utah, USA.
| | | | | | | | | | | |
Collapse
|
20
|
Soyer OS, Goldstein RA. Predicting functional sites in proteins: site-specific evolutionary models and their application to neurotransmitter transporters. J Mol Biol 2004; 339:227-42. [PMID: 15123434 DOI: 10.1016/j.jmb.2004.03.025] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2003] [Revised: 02/26/2004] [Accepted: 03/09/2004] [Indexed: 11/21/2022]
Abstract
Currently there exist several computational methods for predicting the functional sites in a set of homologous proteins based on their sequences. Due to difficulties in defining the functional site in a protein, it is not trivial to compare the performance of these methods, evaluate their limitations and quantify improvements by new approaches. Here, we use extensive mutation data from two proteins, Lac repressor and subtilisin, to perform such an analysis. Along with the evaluation of existing approaches, we describe a site class model of evolution as a tool to predict functional sites in proteins. The results indicate that this model, which simulates the evolution process at the amino acid level using site-specific substitution matrices, provides the most accurate information on functional sites in a given protein family. Secondly, we present an application of this model to neurotransmitter transporters, a superfamily of proteins of which we have limited experimental knowledge. Based on this application we present testable hypotheses regarding the mechanism of action of these proteins.
Collapse
Affiliation(s)
- Orkun S Soyer
- Department of Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | | |
Collapse
|
21
|
Lartillot N, Philippe H. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol Biol Evol 2004; 21:1095-109. [PMID: 15014145 DOI: 10.1093/molbev/msh112] [Citation(s) in RCA: 1020] [Impact Index Per Article: 51.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Most current models of sequence evolution assume that all sites of a protein evolve under the same substitution process, characterized by a 20 x 20 substitution matrix. Here, we propose to relax this assumption by developing a Bayesian mixture model that allows the amino-acid replacement pattern at different sites of a protein alignment to be described by distinct substitution processes. Our model, named CAT, assumes the existence of distinct processes (or classes) differing by their equilibrium frequencies over the 20 residues. Through the use of a Dirichlet process prior, the total number of classes and their respective amino-acid profiles, as well as the affiliations of each site to a given class, are all free variables of the model. In this way, the CAT model is able to adapt to the complexity actually present in the data, and it yields an estimate of the substitutional heterogeneity through the posterior mean number of classes. We show that a significant level of heterogeneity is present in the substitution patterns of proteins, and that the standard one-matrix model fails to account for this heterogeneity. By evaluating the Bayes factor, we demonstrate that the standard model is outperformed by CAT on all of the data sets which we analyzed. Altogether, these results suggest that the complexity of the pattern of substitution of real sequences is better captured by the CAT model, offering the possibility of studying its impact on phylogenetic reconstruction and its connections with structure-function determinants.
Collapse
Affiliation(s)
- Nicolas Lartillot
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec Canada.
| | | |
Collapse
|
22
|
Goldberg NR, Beuming T, Soyer OS, Goldstein RA, Weinstein H, Javitch JA. Probing conformational changes in neurotransmitter transporters: a structural context. Eur J Pharmacol 2003; 479:3-12. [PMID: 14612133 DOI: 10.1016/j.ejphar.2003.08.052] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The Na+/Cl-dependent neurotransmitter transporters, a family of proteins responsible for the reuptake of neurotransmitters and other small molecules from the synaptic cleft, have been the focus of intensive research in recent years. The biogenic amine transporters, a subset of this larger family, are especially intriguing as they are the targets for many psychoactive compounds, including cocaine and amphetamines, as well as many antidepressants. In the absence of a high-resolution structure for any transporter in this family, research into the structure-function relationships of these transporters has relied on analysis of the effects of site-directed mutagenesis as well as of chemical modification of reactive residues. The aim of this review is to establish a structural context for the experimental study of these transporters through various computational approaches and to highlight what is known about the conformational changes associated with function in these transporters. We also present a novel numbering scheme to assist in the comparison of aligned positions between sequences of the neurotransmitter transporter family, a comparison that will be of increasing importance as additional experimental data is amassed.
Collapse
Affiliation(s)
- Naomi R Goldberg
- Center for Molecular Recognition, Columbia University, P&S 11-401, Box 7, 630 West 168th Street, New York, NY 10032, USA
| | | | | | | | | | | |
Collapse
|
23
|
Abstract
Homologous sequences are correlated due to their common ancestry. Probabilistic models of sequence evolution are employed routinely to properly account for these phylogenetic correlations. These increasingly realistic models provide a basis for studying evolution and for exploiting it to better understand protein structure and function. Notable recent advances have been made in the treatment of insertion and deletion events, the estimation of amino-acid replacement rates, and the detection of positive selection.
Collapse
Affiliation(s)
- J L Thorne
- Program in Statistical Genetics, Statistics Department, Box 8203, North Carolina State University, raleigh, North Carolina 27695-8203, USA.
| |
Collapse
|
24
|
Pollock DD, Eisen JA, Doggett NA, Cummings MP. A case for evolutionary genomics and the comprehensive examination of sequence biodiversity. Mol Biol Evol 2000; 17:1776-88. [PMID: 11110893 DOI: 10.1093/oxfordjournals.molbev.a026278] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Comparative analysis is one of the most powerful methods available for understanding the diverse and complex systems found in biology, but it is often limited by a lack of comprehensive taxonomic sampling. Despite the recent development of powerful genome technologies capable of producing sequence data in large quantities (witness the recently completed first draft of the human genome), there has been relatively little change in how evolutionary studies are conducted. The application of genomic methods to evolutionary biology is a challenge, in part because gene segments from different organisms are manipulated separately, requiring individual purification, cloning, and sequencing. We suggest that a feasible approach to collecting genome-scale data sets for evolutionary biology (i.e., evolutionary genomics) may consist of combination of DNA samples prior to cloning and sequencing, followed by computational reconstruction of the original sequences. This approach will allow the full benefit of automated protocols developed by genome projects to be realized; taxon sampling levels can easily increase to thousands for targeted genomes and genomic regions. Sequence diversity at this level will dramatically improve the quality and accuracy of phylogenetic inference, as well as the accuracy and resolution of comparative evolutionary studies. In particular, it will be possible to make accurate estimates of normal evolution in the context of constant structural and functional constraints (i.e., site-specific substitution probabilities), along with accurate estimates of changes in evolutionary patterns, including pairwise coevolution between sites, adaptive bursts, and changes in selective constraints. These estimates can then be used to understand and predict the effects of protein structure and function on sequence evolution and to predict unknown details of protein structure, function, and functional divergence. In order to demonstrate the practicality of these ideas and the potential benefit for functional genomic analysis, we describe a pilot project we are conducting to simultaneously sequence large numbers of vertebrate mitochondrial genomes.
Collapse
Affiliation(s)
- D D Pollock
- Theoretical Biology and Biophysics, Theoretical Division, Los Alamos National Laboratory, Los Alamos, New Mexico, USA.
| | | | | | | |
Collapse
|