1
|
Li B, Fooksa M, Heinze S, Meiler J. Finding the needle in the haystack: towards solving the protein-folding problem computationally. Crit Rev Biochem Mol Biol 2018; 53:1-28. [PMID: 28976219 PMCID: PMC6790072 DOI: 10.1080/10409238.2017.1380596] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Revised: 08/22/2017] [Accepted: 09/13/2017] [Indexed: 12/22/2022]
Abstract
Prediction of protein tertiary structures from amino acid sequence and understanding the mechanisms of how proteins fold, collectively known as "the protein folding problem," has been a grand challenge in molecular biology for over half a century. Theories have been developed that provide us with an unprecedented understanding of protein folding mechanisms. However, computational simulation of protein folding is still difficult, and prediction of protein tertiary structure from amino acid sequence is an unsolved problem. Progress toward a satisfying solution has been slow due to challenges in sampling the vast conformational space and deriving sufficiently accurate energy functions. Nevertheless, several techniques and algorithms have been adopted to overcome these challenges, and the last two decades have seen exciting advances in enhanced sampling algorithms, computational power and tertiary structure prediction methodologies. This review aims at summarizing these computational techniques, specifically conformational sampling algorithms and energy approximations that have been frequently used to study protein-folding mechanisms or to de novo predict protein tertiary structures. We hope that this review can serve as an overview on how the protein-folding problem can be studied computationally and, in cases where experimental approaches are prohibitive, help the researcher choose the most relevant computational approach for the problem at hand. We conclude with a summary of current challenges faced and an outlook on potential future directions.
Collapse
Affiliation(s)
- Bian Li
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Michaela Fooksa
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
- Chemical and Physical Biology Graduate Program, Vanderbilt University, Nashville, TN, USA
| | - Sten Heinze
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
2
|
Broom A, Jacobi Z, Trainor K, Meiering EM. Computational tools help improve protein stability but with a solubility tradeoff. J Biol Chem 2017; 292:14349-14361. [PMID: 28710274 DOI: 10.1074/jbc.m117.784165] [Citation(s) in RCA: 69] [Impact Index Per Article: 9.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2017] [Revised: 07/11/2017] [Indexed: 01/18/2023] Open
Abstract
Accurately predicting changes in protein stability upon amino acid substitution is a much sought after goal. Destabilizing mutations are often implicated in disease, whereas stabilizing mutations are of great value for industrial and therapeutic biotechnology. Increasing protein stability is an especially challenging task, with random substitution yielding stabilizing mutations in only ∼2% of cases. To overcome this bottleneck, computational tools that aim to predict the effect of mutations have been developed; however, achieving accuracy and consistency remains challenging. Here, we combined 11 freely available tools into a meta-predictor (meieringlab.uwaterloo.ca/stabilitypredict/). Validation against ∼600 experimental mutations indicated that our meta-predictor has improved performance over any of the individual tools. The meta-predictor was then used to recommend 10 mutations in a previously designed protein of moderate thermodynamic stability, ThreeFoil. Experimental characterization showed that four mutations increased protein stability and could be amplified through ThreeFoil's structural symmetry to yield several multiple mutants with >2-kcal/mol stabilization. By avoiding residues within functional ties, we could maintain ThreeFoil's glycan-binding capacity. Despite successfully achieving substantial stabilization, however, almost all mutations decreased protein solubility, the most common cause of protein design failure. Examination of the 600-mutation data set revealed that stabilizing mutations on the protein surface tend to increase hydrophobicity and that the individual tools favor this approach to gain stability. Thus, whereas currently available tools can increase protein stability and combining them into a meta-predictor yields enhanced reliability, improvements to the potentials/force fields underlying these tools are needed to avoid gaining protein stability at the cost of solubility.
Collapse
Affiliation(s)
- Aron Broom
- From the Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Zachary Jacobi
- From the Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | - Kyle Trainor
- From the Department of Chemistry, University of Waterloo, Waterloo, Ontario N2L 3G1, Canada
| | | |
Collapse
|
3
|
Pucci F, Bourgeas R, Rooman M. Predicting protein thermal stability changes upon point mutations using statistical potentials: Introducing HoTMuSiC. Sci Rep 2016; 6:23257. [PMID: 26988870 PMCID: PMC4796876 DOI: 10.1038/srep23257] [Citation(s) in RCA: 75] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2015] [Accepted: 02/19/2016] [Indexed: 12/15/2022] Open
Abstract
The accurate prediction of the impact of an amino acid substitution on the thermal stability of a protein is a central issue in protein science, and is of key relevance for the rational optimization of various bioprocesses that use enzymes in unusual conditions. Here we present one of the first computational tools to predict the change in melting temperature ΔTm upon point mutations, given the protein structure and, when available, the melting temperature Tm of the wild-type protein. The key ingredients of our model structure are standard and temperature-dependent statistical potentials, which are combined with the help of an artificial neural network. The model structure was chosen on the basis of a detailed thermodynamic analysis of the system. The parameters of the model were identified on a set of more than 1,600 mutations with experimentally measured ΔTm. The performance of our method was tested using a strict 5-fold cross-validation procedure, and was found to be significantly superior to that of competing methods. We obtained a root mean square deviation between predicted and experimental ΔTm values of 4.2 °C that reduces to 2.9 °C when ten percent outliers are removed. A webserver-based tool is freely available for non-commercial use at soft.dezyme.com.
Collapse
Affiliation(s)
- Fabrizio Pucci
- Department of BioModeling, BioInformatics &BioProcesses, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, 1050 Brussels, Belgium.,Interuniversity Institute of Bioinformatics in Brussels, CP 263, Triumph Bld, 1050 Brussels, Belgium
| | - Raphaël Bourgeas
- Department of BioModeling, BioInformatics &BioProcesses, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, 1050 Brussels, Belgium.,Interuniversity Institute of Bioinformatics in Brussels, CP 263, Triumph Bld, 1050 Brussels, Belgium
| | - Marianne Rooman
- Department of BioModeling, BioInformatics &BioProcesses, Université Libre de Bruxelles, CP 165/61, Roosevelt Ave. 50, 1050 Brussels, Belgium.,Interuniversity Institute of Bioinformatics in Brussels, CP 263, Triumph Bld, 1050 Brussels, Belgium
| |
Collapse
|
4
|
Betancourt MR. Another look at the conditions for the extraction of protein knowledge-based potentials. Proteins 2009; 76:72-85. [PMID: 19089977 DOI: 10.1002/prot.22320] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Protein knowledge-based potentials are effective free energies obtained from databases of known protein structures. They are used to parameterize coarse-grained protein models in many folding simulation and structure prediction methods. Two common approaches are used in the derivation of knowledge-based potentials. One assumes that the energy parameters optimize the native structure stability. The other assumes that interaction events are related to their energies according to the Boltzmann distribution, and that they are distributed independently of other events, that is, the quasi-chemical approximation. Here, these assumptions are systematically tested by extracting contact energies from artificial databases of lattice proteins with predefined pairwise contact energies. Databases of protein sequences are designed to either satisfy the Boltzmann distribution at high or low temperatures, or to simultaneously optimize the native stability and folding kinetics. It is found that the quasi-chemical approximation, with the ideal reference state, accurately reproduce the true energies for high temperature Boltzmann distributed sequences (weakly interacting residues), but less accurately at low temperatures, where the sequences correspond to energy minima and the residues are strongly interacting. To overcome this problem, an iterative procedure for Boltzmann distributed sequences is introduced, which accounts for interacting residue correlations and eliminates the need for the quasi-chemical approximation. In this case, the energies are accurately reproduced at any ensemble temperature. However, when the database of sequences designed for optimal stability and kinetics is used, the energy correlation is less than optimal using either method, exhibiting random and systematic deviations from linearity. Therefore, the assumption that native structures are maximally stable or that sequences are determined according to the Boltzmann distribution seems to be inadequate for obtaining accurate energies. The limited number of sequences in the database and the inhomogeneous concentration of amino acids from one structure to another do not seem to be major obstacles for improving the quality of the extracted pairwise energies, with the exception of repulsive interactions.
Collapse
Affiliation(s)
- Marcos R Betancourt
- Department of Physics, Indiana University Purdue University Indianapolis, Indianapolis, Indiana 46202, USA.
| |
Collapse
|
5
|
Abstract
The amino acid composition of intrinsically disordered proteins and protein segments characteristically differs from that of ordered proteins. This observation forms the basis of several disorder prediction methods. These, however, usually perform worse for smaller proteins (or segments) than for larger ones. We show that the regions of amino acid composition space corresponding to ordered and disordered proteins overlap with each other, and the extent of the overlap (the "twilight zone") is larger for short than for long chains. To explain this finding, we used two-dimensional lattice model proteins containing hydrophobic, polar, and charged monomers and revealed the relation among chain length, amino acid composition, and disorder. Because the number of chain configurations exponentially grows with chain length, a larger fraction of longer chains can reach a low-energy, ordered state than do shorter chains. The amount of information carried by the amino acid composition about whether a protein or segment is (dis)ordered grows with increasing chain length. Smaller proteins rely more on specific interactions for stability, which limits the possible accuracy of disorder prediction methods. For proteins in the "twilight zone", size can determine order, as illustrated by the example of two-state homodimers.
Collapse
|
6
|
Rykunov D, Fiser A. Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials. Proteins 2007; 67:559-68. [PMID: 17335003 DOI: 10.1002/prot.21279] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Statistical distance dependent pair potentials are frequently used in a variety of folding, threading, and modeling studies of proteins. The applicability of these types of potentials is tightly connected to the reliability of statistical observations. We explored the possible origin and extent of false positive signals in statistical potentials by analyzing their distance dependence in a variety of randomized protein-like models. While on average potentials derived from such models are expected to equal zero at any distance, we demonstrate that systematic and significant distortions exist. These distortions originate from the limited statistical counts in local environments of proteins and from the limited size of protein structures at large distances. We suggest that these systematic errors in statistical potentials are connected to the dependence of amino acid composition on protein size and to variation in protein sizes. Additionally, atom-based potentials are dominated by a false positive signal that is due to correlation among distances measured from atoms of one residue to atoms of another residue. The significance of residue-based pairwise potentials at various spatial pair separations was assessed in this study and it was found that as few as approximately 50% of potential values were statistically significant at distances below 4 A, and only at most approximately 80% of them were significant at larger pair separations. A new definition for reference state, free of the observed systematic errors, is suggested. It has been demonstrated to generate statistical potentials that compare favorably to other publicly available ones.
Collapse
Affiliation(s)
- Dmitry Rykunov
- Department of Biochemistry, Seaver Center for Bioinformatics, Albert Einstein College of Medicine, Bronx, New York 10461, USA
| | | |
Collapse
|
7
|
Shen MY, Sali A. Statistical potential for assessment and prediction of protein structures. Protein Sci 2007; 15:2507-24. [PMID: 17075131 PMCID: PMC2242414 DOI: 10.1110/ps.062416606] [Citation(s) in RCA: 1758] [Impact Index Per Article: 103.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Protein structures in the Protein Data Bank provide a wealth of data about the interactions that determine the native states of proteins. Using the probability theory, we derive an atomic distance-dependent statistical potential from a sample of native structures that does not depend on any adjustable parameters (Discrete Optimized Protein Energy, or DOPE). DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. The DOPE potential was extracted from a nonredundant set of 1472 crystallographic structures. We tested DOPE and five other scoring functions by the detection of the native state among six multiple target decoy sets, the correlation between the score and model error, and the identification of the most accurate non-native structure in the decoy set. For all decoy sets, DOPE is the best performing function in terms of all criteria, except for a tie in one criterion for one decoy set. To facilitate its use in various applications, such as model assessment, loop modeling, and fitting into cryo-electron microscopy mass density maps combined with comparative protein structure modeling, DOPE was incorporated into the modeling package MODELLER-8.
Collapse
Affiliation(s)
- Min-Yi Shen
- Department of Biopharmaceutical Sciences, Department of Pharmaceutical Chemistry, University of California at San Francisco, San Francisco, California 94158, USA.
| | | |
Collapse
|
8
|
Gilis D, Biot C, Buisine E, Dehouck Y, Rooman M. Development of novel statistical potentials describing cation-pi interactions in proteins and comparison with semiempirical and quantum chemistry approaches. J Chem Inf Model 2006; 46:884-93. [PMID: 16563020 DOI: 10.1021/ci050395b] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Novel statistical potentials derived from known protein structures are presented. They are designed to describe cation-pi and amino-pi interactions between a positively charged amino acid or an amino acid carrying a partially charged amino group and an aromatic moiety. These potentials are based on the propensity of residue types to be separated by a certain spatial distance or to have a given relative orientation. Several such potentials, describing different kinds of correlations between residue types, distances, and orientations, are derived and combined in a way that maximizes their information content and minimizes their redundancy. To test the ability of these potentials to describe cation-pi and amino-pi systems, we compare their energies with those computed with the CHARMM molecular mechanics force field and with quantum chemistry calculations at the Hartree-Fock level (HF) and at the second order of the Møller-Plesset perturbation theory (MP2). The latter calculations are performed in the gas phase and in acetone, in order to mimic the average dielectric constant of protein environments. The energies computed with the best of our statistical potentials and with gas-phase HF or MP2 show correlation coefficients up to 0.96 when considering one side-chain degree of freedom in the statistical potentials and up to 0.94 when using a totally simplified model excluding all side-chain degrees of freedom. These potentials perform as well as, or better than, the CHARMM molecular mechanics force field that uses a much more detailed protein representation. The good performance of our cation-pi statistical potentials suggests their utility in protein structure and stability prediction and in protein design.
Collapse
Affiliation(s)
- Dimitri Gilis
- Unité de Bioinformatique Génomique et Structurale, Université Libre de Bruxelles, CP 165/61, 50 Avenue F Roosevelt, 1050 Bruxelles, Belgiumance.
| | | | | | | | | |
Collapse
|
9
|
Gilis D. In silico analysis of the thermodynamic stability changes of psychrophilic and mesophilic alpha-amylases upon exhaustive single-site mutations. J Chem Inf Model 2006; 46:1509-16. [PMID: 16711770 DOI: 10.1021/ci050473v] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Identifying sequence modifications that distinguish psychrophilic from mesophilic proteins is important for designing enzymes with different thermodynamic stabilities and to understand the underlying mechanisms. The PoPMuSiC algorithm is used to introduce, in silico, all the single-site mutations in four mesophilic and one psychrophilic chloride-dependent alpha-amylases and to evaluate the changes in thermodynamic stability. The analysis of the distribution of the sequence positions that could be stabilized upon mutation shows a clear difference between the three domains of psychrophilic and mesophilic alpha-amylases. Most of the mutations stabilizing the psychrophilic enzyme are found in domains B and C, contrary to the mesophilic proteins where they are preferentially situated in the catalytic domain A. Moreover, the calculations show that the environment of some residues responsible for the activity of the psychrophilic protein has evolved to reinforce favorable interactions with these residues. In the second part, these results are exploited to propose rationally designed mutations that are predicted to confer to the psychrophilic enzyme mesophilic-like thermodynamic properties. Interestingly, most of the mutations found in domain C strengthen the interactions with domain A, in agreement with suggestions made on the basis of structural analyses. Although this study focuses on single-site mutations, the thermodynamic effects of the recommended mutations should be additive if the mutated residues are not close in space.
Collapse
Affiliation(s)
- Dimitri Gilis
- Genomic and Structural Bioinformatics, Université Libre de Bruxelles, Avenue F. Roosevelt 50 CP 165/61, 1050 Brussels, Belgium.
| |
Collapse
|
10
|
Abstract
We propose a novel and flexible derivation scheme of statistical, database-derived, potentials, which allows one to take simultaneously into account specific correlations between several sequence and structure descriptors. This scheme leads to the decomposition of the total folding free energy of a protein into a sum of lower order terms, thereby giving the possibility to analyze independently each contribution and clarify its significance and importance, to avoid overcounting certain contributions, and to deal more efficiently with the limited size of the database. In addition, this derivation scheme appears as quite general, for many previously developed potentials can be expressed as particular cases of our formalism. We use this formalism as a framework to generate different residue-based energy functions, whose performances are assessed on the basis of their ability to discriminate genuine proteins from decoy models. The optimal potential is generated as a combination of several coupling terms, measuring correlations between residue types, backbone torsion angles, solvent accessibilities, relative positions along the sequence, and interresidue distances. This potential outperforms all tested residue-based potentials, and even several atom-based potentials. Its incorporation in algorithms aiming at predicting protein structure and stability should therefore substantially improve their performances.
Collapse
Affiliation(s)
- Y Dehouck
- Unité de Bioinformatique génomique et structurale, Université Libre de Bruxelles, 1050 Brussels, Belgium.
| | | | | |
Collapse
|
11
|
Frenz CM. Neural network-based prediction of mutation-induced protein stability changes in Staphylococcal nuclease at 20 residue positions. Proteins 2005; 59:147-51. [PMID: 15723345 DOI: 10.1002/prot.20400] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Protein-based therapeutics are playing an increasingly important role in the treatment of diseases, including diabetes and cancer. The viability of these treatments, however, are highly dependent on the stability of the therapeutic, since stability affects both the shelf life of the therapeutic as well as its active life in the body. Stability engineering can, therefore, be used to increase the effectiveness of protein-based therapeutics. Computational methods of protein stability prediction have been under development for about a decade, but complex molecular interactions make stability prediction difficult and computationally intensive. A rapid computational method of protein stability prediction is developed using feed-forward neural networks and used to predict mutation-induced stability changes in Staphylococcal nuclease. The input to the neural network consisted of sequences of evolutionarily based amino acid similarity scores that were obtained through the comparison of the amino acids in a mutation containing sequence to their positional counterparts in the baseline wild-type amino acid sequence. A training set was created which consisted of similarity score sequences, for which the stabilities of the corresponding amino acid sequences were known, paired with the relative stabilities of the sequences to that of the baseline. Back-propagation of error was used to train the network to output accurate relative stability scores for the sequences in the training set. Neural network-based relative stability predictions for 55 sequences containing mutation combinations not found in the training set had an accuracy of 92.8%.
Collapse
Affiliation(s)
- Christopher M Frenz
- Department of Biochemistry, New York Medical College, Basic Sciences Building, Valhalla, New York 10595, USA.
| |
Collapse
|
12
|
Saraboji K, Gromiha MM, Ponnuswamy MN. Relative importance of secondary structure and solvent accessibility to the stability of protein mutants. Comput Biol Chem 2005; 29:25-35. [PMID: 15680583 DOI: 10.1016/j.compbiolchem.2004.12.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2004] [Revised: 12/07/2004] [Accepted: 12/07/2004] [Indexed: 10/25/2022]
Abstract
Understanding the factors influencing the stability of protein mutants is an important task in molecular and computational biology. In this work, we have approached this problem by examining the relative importance of secondary structure and solvent accessibility of the mutant residue for understanding/predicting the stability of protein mutants. We have used hydrophobic, electrostatic and hydrogen bond free energy terms and nine unique physicochemical, energetic and conformational properties of amino acids in the present study and these parameters have been related with changes in thermal stability (DeltaTm) of all the single mutants of lysozymes based on single and multiple correlation coefficients. As expected the properties reflecting hydrophobicity and hydrophobic free energy play a major role to distinguish stabilizing and destabilizing mutants. The hydrophobic free energy due to carbon and nitrogen atoms distinguish the stability of coil and strand mutations to the accuracy of 100 and 90%, respectively. In agreement with previous results, the subgroup classification based on secondary structure and the information about its location in the structure yielded good relationship with the experimental DeltaTm. We revealed that the secondary structure information is equally or more important than solvent accessibility for understanding the stability of protein mutants. The comparison of amino acid properties with free-energy terms indicate that the energetic contribution explains the mutant stability better in coil region whereas the amino acid properties do better in strand region. Further, the combination of free energies with amino acid properties increased the correlation significantly. The present study demonstrates the importance of classifying the mutants based on secondary structure to the stability of proteins upon mutations.
Collapse
Affiliation(s)
- K Saraboji
- Department of Crystallography and Biophysics, University of Madras, Guindy Campus, Chennai 600025, India
| | | | | |
Collapse
|
13
|
Dehouck Y, Gilis D, Rooman M. Database-derived potentials dependent on protein size for in silico folding and design. Biophys J 2005; 87:171-81. [PMID: 15240455 PMCID: PMC1304340 DOI: 10.1529/biophysj.103.037861] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Knowledge-based potentials are widely used in simulations of protein folding, structure prediction, and protein design. Their advantages include limited computational requirements and the ability to deal with low-resolution protein models compatible with long-scale simulations. Their drawbacks comprehend their dependence on specific features of the dataset from which they are derived, such as the size of the proteins it contains, and their physical meaning is still a subject of debate. We address these issues by probing the theoretical validity of these potentials as mean-force potentials that take the solvent implicitly into account and involve entropic contributions due to atomic degrees of freedom and solvation. The dependence on the size of the system is checked on distance-dependent amino acid pair potentials, derived from six protein structure sets containing proteins of increasing length N. For large inter-residue distances, they are found to display the theoretically predicted 1/N behavior weighted by a factor depending on the boundaries and the compressibility of the system. For short distances, different trends are observed according to the nature of the residue pairs and their ability to form, for example, electrostatic, cation-pi or pi-pi interactions, or hydrophobic packing. The results of this analysis are used to devise a novel protein size-dependent distance potential, which displays an improved performance in discriminating native sequence-structure matches among decoy models.
Collapse
Affiliation(s)
- Yves Dehouck
- Bioinformatique Génomique et Structurale, Université Libre de Bruxelles, Brussels, Belgium.
| | | | | |
Collapse
|
14
|
EvDTree: structure-dependent substitution profiles based on decision tree classification of 3D environments. BMC Bioinformatics 2005; 6:4. [PMID: 15638949 PMCID: PMC545998 DOI: 10.1186/1471-2105-6-4] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2004] [Accepted: 01/10/2005] [Indexed: 12/04/2022] Open
Abstract
Background Structure-dependent substitution matrices increase the accuracy of sequence alignments when the 3D structure of one sequence is known, and are successful e.g. in fold recognition. We propose a new automated method, EvDTree, based on a decision tree algorithm, for automatic derivation of amino acid substitution probabilities from a set of sequence-structure alignments. The main advantage over other approaches is an unbiased automatic selection of the most informative structural descriptors and associated values or thresholds. This feature allows automatic derivation of structure-dependent substitution scores for any specific set of structures, without the need to empirically determine best descriptors and parameters. Results Decision trees for residue substitutions were constructed for each residue type from sequence-structure alignments extracted from the HOMSTRAD database. For each tree cluster, environment-dependent substitution profiles were derived. The resulting structure-dependent substitution scores were assessed using a criterion based on the mean ranking of observed substitution among all possible substitutions and in sequence-structure alignments. The automatically built EvDTree substitution scores provide significantly better results than conventional matrices and similar or slightly better results than other structure-dependent matrices. EvDTree has been applied to small disulfide-rich proteins as a test case to automatically derive specific substitutions scores providing better results than non-specific substitution scores. Analyses of the decision tree classifications provide useful information on the relative importance of different structural descriptors. Conclusions We propose a fully automatic method for the classification of structural environments and inference of structure-dependent substitution profiles. We show that this approach is more accurate than existing methods for various applications. The easy adaptation of EvDTree to any specific data set opens the way for class-specific structure-dependent substitution scores which can be used in threading-based remote homology searches.
Collapse
|
15
|
de Bakker PIW, DePristo MA, Burke DF, Blundell TL. Ab initio construction of polypeptide fragments: Accuracy of loop decoy discrimination by an all-atom statistical potential and the AMBER force field with the Generalized Born solvation model. Proteins 2003; 51:21-40. [PMID: 12596261 DOI: 10.1002/prot.10235] [Citation(s) in RCA: 120] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
The accuracy of model selection from decoy ensembles of protein loop conformations was explored by comparing the performance of the Samudrala-Moult all-atom statistical potential (RAPDF) and the AMBER molecular mechanics force field, including the Generalized Born/surface area solvation model. Large ensembles of consistent loop conformations, represented at atomic detail with idealized geometry, were generated for a large test set of protein loops of 2 to 12 residues long by a novel ab initio method called RAPPER that relies on fine-grained residue-specific phi/psi propensity tables for conformational sampling. Ranking the conformers on the basis of RAPDF scores resulted in selected conformers that had an average global, non-superimposed RMSD for all heavy mainchain atoms ranging from 1.2 A for 4-mers to 2.9 A for 8-mers to 6.2 A for 12-mers. After filtering on the basis of anchor geometry and RAPDF scores, ranking by energy minimization of the AMBER/GBSA potential energy function selected conformers that had global RMSD values of 0.5 A for 4-mers, 2.3 A for 8-mers, and 5.0 A for 12-mers. Minimized fragments had, on average, consistently lower RMSD values (by 0.1 A) than their initial conformations. The importance of the Generalized Born solvation energy term is reflected by the observation that the average RMSD accuracy for all loop lengths was worse when this term is omitted. There are, however, still many cases where the AMBER gas-phase minimization selected conformers of lower RMSD than the AMBER/GBSA minimization. The AMBER/GBSA energy function had better correlation with RMSD to native than the RAPDF. When the ensembles were supplemented with conformations extracted from experimental structures, a dramatic improvement in selection accuracy was observed at longer lengths (average RMSD of 1.3 A for 8-mers) when scoring with the AMBER/GBSA force field. This work provides the basis for a promising hybrid approach of ab initio and knowledge-based methods for loop modeling.
Collapse
Affiliation(s)
- Paul I W de Bakker
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom.
| | | | | | | |
Collapse
|
16
|
Kuznetsov IB, Rackovsky S. Discriminative ability with respect to amino acid types: assessing the performance of knowledge-based potentials without threading. Proteins 2002; 49:266-84. [PMID: 12211006 DOI: 10.1002/prot.10211] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We present a novel method designed to analyze the discriminative ability of knowledge-based potentials with respect to the 20 residue types. The method is based on the preference of amino acids for specific types of protein environment, and uses a virtual mutagenesis experiment to estimate how much information a given potential can provide about environments of each amino acid type. This allows one to test and optimize the performance of real potentials at the level of individual amino acids, using actual data on residue environments from a dataset of known protein structures. We have applied our method to long-range and medium-range pairwise distance-dependent potentials. The results of our study indicate that these potentials are only able to discriminate between a very limited number of residue types, and that discriminative ability is extremely sensitive to the choice of parameters used to construct the potentials, and even to the size of the training dataset. We also show that different types of pairwise distance potentials are dominated by different types of interactions. These dominant interactions strongly depend on the type of approximation used to define residue position. For each potential, our methodology is able to identify a potential-specific amino acid distance matrix and a reduced amino acid alphabet of any specified size, which may have implications for sequence alignment and multibody models.
Collapse
Affiliation(s)
- Igor B Kuznetsov
- Department of Biomathematical Sciences, Mount Sinai School of Medicine, New York, New York 10029, USA
| | | |
Collapse
|
17
|
Abstract
A protein structure model generally needs to be evaluated to assess whether or not it has the correct fold. To improve fold assessment, four types of a residue-level statistical potential were optimized, including distance-dependent, contact, Phi/Psi dihedral angle, and accessible surface statistical potentials. Approximately 10,000 test models with the correct and incorrect folds were built by automated comparative modeling of protein sequences of known structure. The criterion used to discriminate between the correct and incorrect models was the Z-score of the model energy. The performance of a Z-score was determined as a function of many variables in the derivation and use of the corresponding statistical potential. The performance was measured by the fractions of the correctly and incorrectly assessed test models. The most discriminating combination of any one of the four tested potentials is the sum of the normalized distance-dependent and accessible surface potentials. The distance-dependent potential that is optimal for assessing models of all sizes uses both C(alpha) and C(beta) atoms as interaction centers, distinguishes between all 20 standard residue types, has the distance range of 30 A, and is derived and used by taking into account the sequence separation of the interacting atom pairs. The terms for the sequentially local interactions are significantly less informative than those for the sequentially nonlocal interactions. The accessible surface potential that is optimal for assessing models of all sizes uses C(beta) atoms as interaction centers and distinguishes between all 20 standard residue types. The performance of the tested statistical potentials is not likely to improve significantly with an increase in the number of known protein structures used in their derivation. The parameters of fold assessment whose optimal values vary significantly with model size include the size of the known protein structures used to derive the potential and the distance range of the accessible surface potential. Fold assessment by statistical potentials is most difficult for the very small models. This difficulty presents a challenge to fold assessment in large-scale comparative modeling, which produces many small and incomplete models. The results described in this study provide a basis for an optimal use of statistical potentials in fold assessment.
Collapse
Affiliation(s)
- Francisco Melo
- Laboratories of Molecular Biophysics, Pels Family Center for Biochemistry and Structural Biology, The Rockefeller University, New York, New York 10021, USA
| | | | | |
Collapse
|
18
|
Gilis D, Massar S, Cerf NJ, Rooman M. Optimality of the genetic code with respect to protein stability and amino-acid frequencies. Genome Biol 2001; 2:RESEARCH0049. [PMID: 11737948 PMCID: PMC60310 DOI: 10.1186/gb-2001-2-11-research0049] [Citation(s) in RCA: 131] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2001] [Revised: 07/06/2001] [Accepted: 09/28/2001] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND The genetic code is known to be efficient in limiting the effect of mistranslation errors. A misread codon often codes for the same amino acid or one with similar biochemical properties, so the structure and function of the coded protein remain relatively unaltered. Previous studies have attempted to address this question quantitatively, by estimating the fraction of randomly generated codes that do better than the genetic code in respect of overall robustness. We extended these results by investigating the role of amino-acid frequencies in the optimality of the genetic code. RESULTS We found that taking the amino-acid frequency into account decreases the fraction of random codes that beat the natural code. This effect is particularly pronounced when more refined measures of the amino-acid substitution cost are used than hydrophobicity. To show this, we devised a new cost function by evaluating in silico the change in folding free energy caused by all possible point mutations in a set of protein structures. With this function, which measures protein stability while being unrelated to the code's structure, we estimated that around two random codes in a billion (109) are fitter than the natural code. When alternative codes are restricted to those that interchange biosynthetically related amino acids, the genetic code appears even more optimal. CONCLUSIONS These results lead us to discuss the role of amino-acid frequencies and other parameters in the genetic code's evolution, in an attempt to propose a tentative picture of primitive life.
Collapse
Affiliation(s)
- D Gilis
- Biomolecular Engineering, Université Libre de Bruxelles, ave F D Roosevelt, 1050 Bruxelles, Belgium.
| | | | | | | |
Collapse
|
19
|
Abstract
The location of protein subunits that form early during folding, constituted of consecutive secondary structure elements with some intrinsic stability and favorable tertiary interactions, is predicted using a combination of threading algorithms and local structure prediction methods. Two folding units are selected among the candidates identified in a database of known protein structures: the fragment 15-55 of 434 cro, an all-alpha protein, and the fragment 1-35 of ubiquitin, an alpha/beta protein. These units are further analyzed by means of Monte Carlo simulated annealing using several database-derived potentials describing different types of interactions. Our results suggest that the local interactions along the chain dominate in the first folding steps of both fragments, and that the formation of some of the secondary structures necessarily occurs before structure compaction. These findings led us to define a prediction protocol, which is efficient to improve the accuracy of the predicted structures. It involves a first simulation with a local interaction potential only, whose final conformation is used as a starting structure of a second simulation that uses a combination of local interaction and distance potentials. The root mean square deviations between the coordinates of predicted and native structures are as low as 2-4 A in most trials. The possibility of extending this protocol to the prediction of full proteins is discussed. Proteins 2001;42:164-176.
Collapse
Affiliation(s)
- D Gilis
- Ingénierie Biomoléculaire, Université Libre de Bruxelles, Bruxelles, Belgium.
| | | |
Collapse
|
20
|
Vijayakumar M, Zhou HX. Prediction of Residue−Residue Pair Frequencies in Proteins. J Phys Chem B 2000. [DOI: 10.1021/jp001757f] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- M. Vijayakumar
- Department of Physics, Drexel University, Philadelphia, Pennsylvania 19104
| | - Huan-Xiang Zhou
- Department of Physics, Drexel University, Philadelphia, Pennsylvania 19104
| |
Collapse
|
21
|
Shan Y, Zhou HX. Correspondence of potentials of mean force in proteins and in liquids. J Chem Phys 2000. [DOI: 10.1063/1.1288920] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
|
22
|
Abstract
We examine the interactions between amino acid residues in the context of their secondary structural environments (helix, strand, and coil) in proteins. Effective contact energies for an expanded 60-residue alphabet (20 aa x three secondary structural states) are estimated from the residue-residue contacts observed in known protein structures. Similar to the prototypical contact energies for 20 aa, the newly derived energy parameters reflect mainly the hydrophobic interactions; however, the relative strength of such interactions shows a strong dependence on the secondary structural environment, with nonlocal interactions in beta-sheet structures and alpha-helical structures dominating the energy table. Environment-dependent residue contact energies outperform existing residue pair potentials in both threading and three-dimensional contact prediction tests and should be generally applicable to protein structure prediction.
Collapse
Affiliation(s)
- C Zhang
- Department of Chemistry and E. O. Lawrence Berkeley National Laboratory, University of California, Berkeley, CA 94720, USA
| | | |
Collapse
|
23
|
Vendruscolo M, Najmanovich R, Domany E. Can a pairwise contact potential stabilize native protein folds against decoys obtained by threading? Proteins 2000; 38:134-48. [PMID: 10656261 DOI: 10.1002/(sici)1097-0134(20000201)38:2<134::aid-prot3>3.0.co;2-a] [Citation(s) in RCA: 95] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We present a method to derive contact energy parameters from large sets of proteins. The basic requirement on which our method is based is that for each protein in the database the native contact map has lower energy than all its decoy conformations that are obtained by threading. Only when this condition is satisfied one can use the proposed energy function for fold identification. Such a set of parameters can be found (by perceptron learning) if Mp, the number of proteins in the database, is not too large. Other aspects that influence the existence of such a solution are the exact definition of contact and the value of the critical distance Rc, below which two residues are considered to be in contact. Another important novel feature of our approach is its ability to determine whether an energy function of some suitable proposed form can or cannot be parameterized in a way that satisfies our basic requirement. As a demonstration of this, we determine the region in the (Rc, Mp) plane in which the problem is solvable, i.e., we can find a set of contact parameters that stabilize simultaneously all the native conformations. We show that for large enough databases the contact approximation to the energy cannot stabilize all the native folds even against the decoys obtained by gapless threading.
Collapse
Affiliation(s)
- M Vendruscolo
- Department of Physics of Complex Systems, Weizmann Institute of Science, Rehovot, Israel.
| | | | | |
Collapse
|
24
|
Gromiha MM, Oobatake M, Kono H, Uedaira H, Sarai A. Relationship between amino acid properties and protein stability: buried mutations. JOURNAL OF PROTEIN CHEMISTRY 1999; 18:565-78. [PMID: 10524774 DOI: 10.1023/a:1020603401001] [Citation(s) in RCA: 64] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
In order to understand the mechanism of protein stability and to develop a simple method for predicting mutation-induced stability changes, we analyzed the relationship between stability changes caused by buried mutations and changes in 48 amino acid properties. As expected from the importance of hydrophobicity, properties reflecting hydrophobicity are strongly correlated with the stability of proteins. We found that subgroup classification based on secondary structure increased correlations significantly, and mutations within beta-strand segments correlated better than did those in alpha-helical segments, which may result from stronger hydrophobicity of the beta-strands. Multiple regression analyses incorporating combinations of three properties from among all possible combinations of the 48 properties increased the correlation coefficient to 0.88 and by an average of 13% for all data sets. Analyzing the stability of tryptophan synthase mutants with Glu49 replaced by all other residues except Arg revealed that combining buriedness, solvent-accessible surface area for denatured protein, and unfolding Gibbs free energy change increased the correlation to 0.95. Consideration of sequence and structural information (neighboring residues in sequence and in space) did not significantly strengthen the correlations in buried mutations, suggesting that nonspecific interactions dominate in the interior of proteins.
Collapse
Affiliation(s)
- M M Gromiha
- Tsukuba Life Science Center, The Institute of Physical and Chemical Research (RIKEN), Ibaraki, Japan
| | | | | | | | | |
Collapse
|
25
|
Abstract
By following a consistent line of physical reasoning, some fundamental understanding about the foldability of proteins has been achieved. In recent years, this has led to the development of a number of successful algorithms for optimizing potential energy functions for folding protein models. The differences between the folding mechanisms of simple, contact-based lattice proteins and more traditional, realistic protein models, however, still call for further development of the potentials in addition to the optimization approaches.
Collapse
Affiliation(s)
- M H Hao
- Boehringer Ingelheim Pharmaceuticals Inc. R6-5, 900 Ridgebury Road, PO Box 368, Ridgefield, CT 06877, USA
| | | |
Collapse
|