1
|
Three-dimensional protein structure prediction: Methods and computational strategies. Comput Biol Chem 2014; 53PB:251-276. [DOI: 10.1016/j.compbiolchem.2014.10.001] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 10/03/2014] [Accepted: 10/07/2014] [Indexed: 01/01/2023]
|
2
|
Hamelryck T, Borg M, Paluszewski M, Paulsen J, Frellsen J, Andreetta C, Boomsma W, Bottaro S, Ferkinghoff-Borg J. Potentials of mean force for protein structure prediction vindicated, formalized and generalized. PLoS One 2010; 5:e13714. [PMID: 21103041 PMCID: PMC2978081 DOI: 10.1371/journal.pone.0013714] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2010] [Accepted: 10/04/2010] [Indexed: 11/26/2022] Open
Abstract
Understanding protein structure is of crucial importance in science, medicine and biotechnology. For about two decades, knowledge-based potentials based on pairwise distances – so-called “potentials of mean force” (PMFs) – have been center stage in the prediction and design of protein structure and the simulation of protein folding. However, the validity, scope and limitations of these potentials are still vigorously debated and disputed, and the optimal choice of the reference state – a necessary component of these potentials – is an unsolved problem. PMFs are loosely justified by analogy to the reversible work theorem in statistical physics, or by a statistical argument based on a likelihood function. Both justifications are insightful but leave many questions unanswered. Here, we show for the first time that PMFs can be seen as approximations to quantities that do have a rigorous probabilistic justification: they naturally arise when probability distributions over different features of proteins need to be combined. We call these quantities “reference ratio distributions” deriving from the application of the “reference ratio method.” This new view is not only of theoretical relevance but leads to many insights that are of direct practical use: the reference state is uniquely defined and does not require external physical insights; the approach can be generalized beyond pairwise distances to arbitrary features of protein structure; and it becomes clear for which purposes the use of these quantities is justified. We illustrate these insights with two applications, involving the radius of gyration and hydrogen bonding. In the latter case, we also show how the reference ratio method can be iteratively applied to sculpt an energy funnel. Our results considerably increase the understanding and scope of energy functions derived from known biomolecular structures.
Collapse
Affiliation(s)
- Thomas Hamelryck
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- * E-mail: (TH); (JFB)
| | - Mikael Borg
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Martin Paluszewski
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jonas Paulsen
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jes Frellsen
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Christian Andreetta
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Wouter Boomsma
- Biomedical Engineering, Technical University of Denmark (DTU) Elektro, Technical University of Denmark, Lyngby, Denmark
- Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Sandro Bottaro
- Biomedical Engineering, Technical University of Denmark (DTU) Elektro, Technical University of Denmark, Lyngby, Denmark
| | - Jesper Ferkinghoff-Borg
- Biomedical Engineering, Technical University of Denmark (DTU) Elektro, Technical University of Denmark, Lyngby, Denmark
- * E-mail: (TH); (JFB)
| |
Collapse
|
3
|
Reconstruction of protein backbones from the BriX collection of canonical protein fragments. PLoS Comput Biol 2008; 4:e1000083. [PMID: 18483555 PMCID: PMC2367438 DOI: 10.1371/journal.pcbi.1000083] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2007] [Accepted: 04/07/2008] [Indexed: 12/23/2022] Open
Abstract
As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more than 1,000 frequently observed conformations per peptide length for 6 different variation levels. Analysis of the performance of BriX revealed an average structural coverage of protein structures of more than 99% within a root mean square distance (RMSD) of 1 Angstrom. Globally, we are able to reconstruct protein structures with an average accuracy of 0.48 Angstrom RMSD. As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures. Larger loop regions could be completely reconstructed from smaller recurrent elements, between 4 and 8 residues long. Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations. When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures. Large-scale DNA sequencing efforts produce large amounts of protein sequence data. However, in order to understand the function of a protein, its tertiary three-dimensional structure is required. Despite worldwide efforts in structural biology, experimental protein structures are determined at a significantly slower pace. As a result, computational methods for protein structure prediction receive significant attention. A large part of the structure prediction problem lies in the enormous size of the problem: proteins seem to occur in an infinite variety of shapes. Here, we propose that this huge complexity may be overcome by identifying recurrent protein fragments, which are frequently reused as building blocks to construct proteins that were hitherto thought to be unrelated. The BriX database is the outcome of identifying about 2,000 canonical shapes among 1,261 protein structures. We show any given protein can be reconstructed from this library of building blocks at a very high resolution, suggesting that the modelling of protein backbones may be greatly aided by our database.
Collapse
|
4
|
Holmes JB, Tsai J. Some fundamental aspects of building protein structures from fragment libraries. Protein Sci 2005; 13:1636-50. [PMID: 15152094 PMCID: PMC2279988 DOI: 10.1110/ps.03494504] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
We have investigated some of the basic principles that influence generation of protein structures using a fragment-based, random insertion method. We tested buildup methods and fragment library quality for accuracy in constructing a set of known structures. The parameters most influential in the construction procedure are bond and torsion angles with minor inaccuracies in bond angles alone causing >6 A CalphaRMSD for a 150-residue protein. Idealization to a standard set of values corrects this problem, but changes the torsion angles and does not work for every structure. Alternatively, we found using Cartesian coordinates instead of torsion angles did not reduce performance and can potentially increase speed and accuracy. Under conditions simulating ab initio structure prediction, fragment library quality can be suboptimal and still produce near-native structures. Using various clustering criteria, we created a number of libraries and used them to predict a set of native structures based on nonnative fragments. Local CalphaRMSD fit of fragments, library size, and takeoff/landing angle criteria weakly influence the accuracy of the models. Based on a fragment's minimal perturbation upon insertion into a known structure, a seminative fragment library was created that produced more accurate structures with fragments that were less similar to native fragments than the other sets. These results suggest that fragments need only contain native-like subsections, which when correctly overlapped, can recreate a native-like model. For fragment-based, random insertion methods used in protein structure prediction and design, our findings help to define the parameters this method needs to generate near-native structures.
Collapse
Affiliation(s)
- J Bradley Holmes
- Department of Biophysics and Biochemistry, Texas A&M University, College Station, TX 77843, USA
| | | |
Collapse
|
5
|
DePristo MA, de Bakker PIW, Lovell SC, Blundell TL. Ab initio construction of polypeptide fragments: efficient generation of accurate, representative ensembles. Proteins 2003; 51:41-55. [PMID: 12596262 DOI: 10.1002/prot.10285] [Citation(s) in RCA: 116] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
We describe a novel method to generate ensembles of conformations of the main-chain atoms [N, C(alpha), C, O, Cbeta] for a sequence of amino acids within the context of a fixed protein framework. Each conformation satisfies fundamental stereo-chemical restraints such as idealized geometry, favorable phi/psi angles, and excluded volume. The ensembles include conformations both near and far from the native structure. Algorithms for effective conformational sampling and constant time overlap detection permit the generation of thousands of distinct conformations in minutes. Unlike previous approaches, our method samples dihedral angles from fine-grained phi/psi state sets, which we demonstrate is superior to exhaustive enumeration from coarse phi/psi sets. Applied to a large set of loop structures, our method samples consistently near-native conformations, averaging 0.4, 1.1, and 2.2 A main-chain root-mean-square deviations for four, eight, and twelve residue long loops, respectively. The ensembles make ideal decoy sets to assess the discriminatory power of a selection method. Using these decoy sets, we conclude that quality of anchor geometry cannot reliably identify near-native conformations, though the selection results are comparable to previous loop prediction methods. In a subsequent study (de Bakker et al.: Proteins 2003;51:21-40), we demonstrate that the AMBER forcefield with the Generalized Born solvation model identifies near-native conformations significantly better than previous methods.
Collapse
Affiliation(s)
- Mark A DePristo
- Department of Biochemistry, University of Cambridge, Cambridge, United Kingdom.
| | | | | | | |
Collapse
|
6
|
Abstract
The location of protein subunits that form early during folding, constituted of consecutive secondary structure elements with some intrinsic stability and favorable tertiary interactions, is predicted using a combination of threading algorithms and local structure prediction methods. Two folding units are selected among the candidates identified in a database of known protein structures: the fragment 15-55 of 434 cro, an all-alpha protein, and the fragment 1-35 of ubiquitin, an alpha/beta protein. These units are further analyzed by means of Monte Carlo simulated annealing using several database-derived potentials describing different types of interactions. Our results suggest that the local interactions along the chain dominate in the first folding steps of both fragments, and that the formation of some of the secondary structures necessarily occurs before structure compaction. These findings led us to define a prediction protocol, which is efficient to improve the accuracy of the predicted structures. It involves a first simulation with a local interaction potential only, whose final conformation is used as a starting structure of a second simulation that uses a combination of local interaction and distance potentials. The root mean square deviations between the coordinates of predicted and native structures are as low as 2-4 A in most trials. The possibility of extending this protocol to the prediction of full proteins is discussed. Proteins 2001;42:164-176.
Collapse
Affiliation(s)
- D Gilis
- Ingénierie Biomoléculaire, Université Libre de Bruxelles, Bruxelles, Belgium.
| | | |
Collapse
|
7
|
Convex Global Underestimation for Molecular Structure Prediction. ACTA ACUST UNITED AC 2001. [DOI: 10.1007/978-1-4757-5284-7_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
8
|
Abstract
The genome projects produce an enormous amount of sequence data that needs to be annotated in terms of molecular structure and biological function. These tasks have triggered additional initiatives like structural genomics. The intention is to determine as many protein structures as possible, in the most efficient way, and to exploit the solved structures for the assignment of biological function to hypothetical proteins. We discuss the impact of these developments on protein classification, gene function prediction, and protein structure prediction.
Collapse
Affiliation(s)
- F S Domingues
- Center for Applied Molecular Engineering, Institute for Chemistry, University of Salzburg, Jakob Haringer Strasse 3, A-5020 Salzburg, Austria
| | | | | |
Collapse
|
9
|
Abstract
We describe an extensive test of Geocore, an ab initio peptide folding algorithm. We studied 18 short molecules for which there are structures in the Protein Data Bank; chains are up to 31 monomers long. Except for the very shortest peptides, an extremely simple energy function is sufficient to discriminate the true native state from more than 10(8) lowest energy conformations that are searched explicitly for each peptide. A high incidence of native-like structures is found within the best few hundred conformations generated by Geocore for each amino acid sequence. Predictions improve when the number of discrete phi/psi choices is increased.
Collapse
Affiliation(s)
- K Ishikawa
- Central Research Laboratories, Ajinomoto Co., Kawasaki, Japan
| | | | | |
Collapse
|
10
|
Lathrop RH, Rogers RG, White JV, Gaitatzes C, Smith TF, Bienkowska J, Bryant BK, Buturović LJ, Nambudripad R. Analysis and algorithms for protein sequence–structure alignment. COMPUTATIONAL METHODS IN MOLECULAR BIOLOGY 1998. [DOI: 10.1016/s0167-7306(08)60469-x] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
11
|
Ausiello G, Cesareni G, Helmer-Citterich M. Escher: A new docking procedure applied to the reconstruction of protein tertiary structure. Proteins 1997. [DOI: 10.1002/(sici)1097-0134(199708)28:4<556::aid-prot9>3.0.co;2-7] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
12
|
Abstract
The conformation of thymosin beta 9 in solution of 40% (v/v) 1,1,1,3,3,3-hexafluoro-2-propanol-d2 in water has been investigated by two-dimensional 1H-nmr spectroscopy. Under this condition thymosin beta 9 adopts an ordered structure. The determination of the conformation of the peptide was based on a set of 304 approximate interproton distance constraints derived from nuclear Overhauser enhancement measurements. The conformation of thymosin beta 9 includes two helical regions from residues 4 to 27 and 32 to 41. The two helices are separated by a poorly defined loop region between amino acids 28 and 31; the N-terminus of thymosin beta 9 shows random-coil structure only.
Collapse
Affiliation(s)
- R Stoll
- Abteilung für Physikalische Biochemie des Physiologisch-chemischen Institutes der Universität Tübingen, FRG
| | | | | |
Collapse
|
13
|
Bahar I, Jernigan RL. Inter-residue potentials in globular proteins and the dominance of highly specific hydrophilic interactions at close separation. J Mol Biol 1997; 266:195-214. [PMID: 9054980 DOI: 10.1006/jmbi.1996.0758] [Citation(s) in RCA: 244] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Residue-specific potentials between pairs of side-chains and pairs of side-chain-backbone interaction sites have been generated by collecting radial distribution data for 302 protein structures. Multiple atomic interactions have been utilized to enhance the specificity and smooth the distance-dependence of the potentials. The potentials are demonstrated to successfully discriminate correct sequences in inverse folding experiments. Many specific effects are observable in the non-bonded potentials; grouping of residue types is inappropriate, since each residue type manifests some unique behavior. Only a weak dependence is seen on protein size and composition. Effective contact potentials operating in three different environments (self, solvent-exposed and residue-exposed) and over any distance range are presented. The effective contact potentials obtained from the integration of radial distributions over the distance interval r < or = 6.4 A are in excellent agreement with published values. The hydrophobic interactions are verified to be dominantly strong in this range. Comparison of these with a newly derived set of effective contact potentials for closer inter-residue separations (r < or = 4.0 A) demonstrates drastic changes in the most favorable interactions. In the closer approach case, where the number of pairs with a given residue is approximately one, the highly specific interactions between charged and polar side-chains predominate. These closer approach values could be utilized to select successively the relative positions and directions of residue side-chains in protein simulations, following a hierarchical algorithm optimizing side-chain-side-chain interactions over the two successively closer distance ranges. The homogeneous contribution to stability is stronger than the specific contribution by about a factor of 5. Overall, the total non-bonded interaction energy calculated for individual proteins follows a dependence on the number of residues of the form of n1.28, indicating an enhanced stability for larger proteins.
Collapse
Affiliation(s)
- I Bahar
- Molecular Structure Section, National Cancer Institute, National Institutes of Health, Bethesda MD 20892-5677, USA
| | | |
Collapse
|
14
|
Wiedemann P, Giehl K, Almo SC, Fedorov AA, Girvin M, Steinberger P, Rüdiger M, Ortner M, Sippl M, Dolecek C, Kraft D, Jockusch B, Valenta R. Molecular and structural analysis of a continuous birch profilin epitope defined by a monoclonal antibody. J Biol Chem 1996; 271:29915-21. [PMID: 8939935 DOI: 10.1074/jbc.271.47.29915] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023] Open
Abstract
The interaction of a mouse monoclonal antibody (4A6) and birch profilin, a structurally well conserved actin- and phosphoinositide-binding protein and cross-reactive allergen, was characterized. In contrast to serum IgE from allergic patients, which shows cross-reactivity with most plants, monoclonal antibody 4A6 selectively reacted with tree pollen profilins. Using synthetic overlapping peptides, a continuous hexapeptide epitope was identified. The exchange of a single amino acid (Gln-47 --> Glu) within the epitope was found to abolish the binding of monoclonal antibody 4A6 to other plant profilins. The NMR analyses of the birch and the nonreactive timothy grass profilin peptides showed that the loss of binding was not due to major structural differences. Both peptides adopted extended conformations similar to that observed for the epitope in the x-ray crystal structure of the native birch profilin. Binding studies with peptides and birch profilin mutants generated by in vitro mutagenesis demonstrated that the change of Gln-47 to acidic amino acids (e.g. Glu or Asp) led to electrostatic repulsion of monoclonal antibody 4A6. In conclusion the molecular and structural analyses of the interaction of a monoclonal antibody with a continuous peptide epitope, recognized in a conformation similar to that displayed on the native protein, are presented.
Collapse
Affiliation(s)
- P Wiedemann
- Institute of General and Experimental Pathology, University of Vienna, A-1090 Vienna, Austria
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Beutler TC, Dill KA. A fast conformational search strategy for finding low energy structures of model proteins. Protein Sci 1996; 5:2037-43. [PMID: 8897604 PMCID: PMC2143263 DOI: 10.1002/pro.5560051010] [Citation(s) in RCA: 50] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
We describe a new computer algorithm for finding low-energy conformations of proteins. It is a chain-growth method that uses a heuristic bias function to help assemble a hydrophobic core. We call it the Core-directed chain Growth method (CG). We test the CG method on several well-known literature examples of HP lattice model proteins [in which proteins are modeled as sequences of hydrophobic (H) and polar (P) monomers], ranging from 20-64 monomers in two dimensions, and up to 88-mers in three dimensions. Previous nonexhaustive methods--Monte Carlo, a Genetic Algorithm, Hydrophobic Zippers, and Contact Interactions--have been tried on these same model sequences. CG is substantially better at finding the global optima, and avoiding local optima, and it does so in comparable or shorter times. CG finds the global minimum energy of the longest HP lattice model chain for which the global optimum is known, a 3D 88-mer that has only been reachable before by the CHCC complete search method. CG has the potential advantage that it should have nonexponential scaling with chain length. We believe this is a promising method for conformational searching in protein folding algorithms.
Collapse
Affiliation(s)
- T C Beutler
- Department of Pharmaceutical Chemistry, University of California, San Francisco 94143-1204, USA
| | | |
Collapse
|
16
|
Cheng B, Nayeem A, Scheraga HA. From secondary structure to three-dimensional structure: Improved dihedral angle probability distribution function for use with energy searches for native structures of polypeptides and proteins. J Comput Chem 1996. [DOI: 10.1002/(sici)1096-987x(199609)17:12<1453::aid-jcc6>3.0.co;2-j] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
17
|
Yue K, Dill KA. Folding proteins with a simple energy function and extensive conformational searching. Protein Sci 1996; 5:254-61. [PMID: 8745403 PMCID: PMC2143350 DOI: 10.1002/pro.5560050209] [Citation(s) in RCA: 75] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
We describe a computer algorithm for predicting the three-dimensional structures of proteins using only their amino acid sequences. The method differs from others in two ways: (1) it uses very few energy parameters, representing hydrophobic and polar interactions, and (2) it uses a new "constraint-based exhaustive" searching method, which appears to be among the fastest and most complete search methods yet available for realistic protein models. It finds a relatively small number of low-energy conformations, among which are native-like conformations, for crambin (1CRN), avian pancreatic polypeptide (1PPT), melittin (2MLT), and apamin. Thus, the lowest-energy states of very simple energy functions may predict the native structures of globular proteins.
Collapse
Affiliation(s)
- K Yue
- Department of Pharmaceutical Chemistry, University of California at San Francisco, 94143, USA
| | | |
Collapse
|
18
|
Feinberg J, Mery J, Heitz F, Benyamin Y, Roustan C. Correlations between biological activity and structural properties for two short homologous sequences in thymosin beta4 and gelsolin. INTERNATIONAL JOURNAL OF PEPTIDE AND PROTEIN RESEARCH 1996; 47:62-9. [PMID: 8907501 DOI: 10.1111/j.1399-3011.1996.tb00811.x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Gelsolin and thymosin beta4 appear to be two important actin-associated proteins involved in the regulation of actin polymerization. It has been widely demonstrated that thymosin is the major cellular actin-sequestering factor shifting the polymerization equilibrium of actin towards a monomeric state. At the same time gelsolin, a Ca2+ and inositol phosphate sensitive protein, regulates actin filament length. The interactions of these two proteins with actin are rather complex and require the participation of several complementary peptide sequences. We have identified a common motif, (I, V)EKFD, in the two proteins in the functional sequences so far examined. Gelsolin- and thymosin beta4-related peptides including the common motif were synthesized and their structural and functional properties studied. These two sequences exert a major inhibitory effect on salt-induced actin polymerization. We used circular dichroism and Fourier-transform infrared spectroscopy to show that the two synthetic peptides present some secondary structure in solution. As far as the peptide derived from the thymosin sequence was concerned, alpha-helical structure was induced by trifluoroethanol as observed with the full-length molecule. These experiments underscore the importance of the conformational state of peptide fragments in their biological activities. ELISA and fluorescence measurements have been used to identify the binding regions of these fragments to a C-terminal region (subdomain 1) of the actin sequence. Our results also emphasize the relationship between the propensity of small sequences to form secondary structures and their propensity for biological activity as related to actin interaction and inhibition of actin polymerization.
Collapse
Affiliation(s)
- J Feinberg
- Centre for Research in Macromolecular Biochemistry (CNRS), Laboratory for Research on Cellular Motility, University of Montpellier 1, France
| | | | | | | | | |
Collapse
|
19
|
Flöckner H, Braxenthaler M, Lackner P, Jaritz M, Ortner M, Sippl MJ. Progress in fold recognition. Proteins 1995; 23:376-86. [PMID: 8710830 DOI: 10.1002/prot.340230311] [Citation(s) in RCA: 74] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
The prediction experiment reveals that fold recognition has become a powerful tool in structural biology. We applied our fold recognition technique to 13 target sequences. In two cases, replication terminating protein and prosequence of subtilisin, the predicted structures are very similar to the experimentally determined folds. For the first time, in a public blind test, the unknown structures of proteins have been predicted ahead of experiment to an accuracy approaching molecular detail. In two other cases the approximate folds have been predicted correctly. According to the assessors there were 12 recognizable folds among the target proteins. In our postprediction analysis we find that in 7 cases our fold recognition technique is successful. In several of the remaining cases the predicted folds have interesting features in common with the experimental results. We present our procedure, discuss the results, and comment on several fundamental and technical problems encountered in fold recognition.
Collapse
Affiliation(s)
- H Flöckner
- Center for Applied Molecular Engineering, University of Salzburg, Austria
| | | | | | | | | | | |
Collapse
|
20
|
Abstract
We describe a computer algorithm to predict native structures of proteins and peptides from their primary sequences, their known native radii of gyration, and their known disulfide bonding patterns, starting from random conformations. Proteins are represented as simplified real-space main chains with single-bead side chains. Nonlocal interactions are taken from structural database-derived statistical potentials, as in an earlier treatment. Local interactions are taken from simulations of (phi, psi) energy surfaces for each amino acid generated using the Biosym Discover program. Conformational searching is done by a genetic algorithm-based method. Reasonable structures are obtained for melittin (a 26-mer), avian pancreatic polypeptide inhibitor (a 36-mer), crambin (a 46-mer), apamin (an 18-mer), tachyplesin (a 17-mer), C-peptide of ribonuclease A (a 13-mer), and four different designed helical peptides. A hydrogen bond interaction was tested and found to be generally unnecessary for helical peptides, but it helps fold some sheet regions in these structures. For the few longer chains we tested, the method appears not to converge. In those cases, it appears to recover native-like secondary structures, but gets incorrect tertiary folds.
Collapse
Affiliation(s)
- S Sun
- Structural Biochemistry Program, Frederick Biomedical Supercomputing Center, National Cancer Institute, Frederick Cancer Research and Development Center, Maryland 21702, USA
| |
Collapse
|
21
|
Evans JS, Mathiowetz AM, Chan SI, Goddard WA. De novo prediction of polypeptide conformations using dihedral probability grid Monte Carlo methodology. Protein Sci 1995; 4:1203-16. [PMID: 7549884 PMCID: PMC2143148 DOI: 10.1002/pro.5560040618] [Citation(s) in RCA: 23] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
We tested the dihedral probability grid Monte Carlo (DPG-MC) methodology to determine optimal conformations of polypeptides by applying it to predict the low energy ensemble for two peptides whose solution NMR structures are known: integrin receptor peptide (YGRGDSP, Type II beta-turn) and S3 alpha-helical peptide (YMSEDEL KAAEAAFKRHGPT). DPG-MC involves importance sampling, local random stepping in the vicinity of a current local minima, and Metropolis sampling criteria for acceptance or rejection of new structures. Internal coordinate values are based on side-chain-specific dihedral angle probability distributions (from analysis of high-resolution protein crystal structures). Important features of DPG-MC are: (1) Each DPG-MC step selects the torsion angles (phi, psi, chi) from a discrete grid that are then applied directly to the structure. The torsion angle increments can be taken as S = 60, 30, 15, 10, or 5 degrees, depending on the application. (2) DPG-MC utilizes a temperature-dependent probability function (P) in conjunction with Metropolis sampling to accept or reject new structures. For each peptide, we found close agreement with the known structure for the low energy conformational ensemble located with DPG-MC. This suggests that DPG-MC will be useful for predicting conformations of other polypeptides.
Collapse
Affiliation(s)
- J S Evans
- Arthur Amos Noyes Laboratory for Chemical Physics (127-72), California Institute of Technology, Pasadena 91125, USA
| | | | | | | |
Collapse
|
22
|
Abstract
Knowledge based potentials and energy functions are extracted from a number of databases of known protein structures. Recent developments have shown that this type of potential is successful in many areas of protein structure research. Among these are quality assessment and error recognition of folds and the prediction of unknown structures by fold-recognition techniques.
Collapse
Affiliation(s)
- M J Sippl
- Center for Applied Molecular Engineering, University of Salzburg, Austria
| |
Collapse
|
23
|
Abstract
Currently, the prediction of three-dimensional (3D) protein structure from sequence alone is an exceedingly difficult task. As an intermediate step, a much simpler task has been pursued extensively: predicting 1D strings of secondary structure. Here, we present an analysis of another 1D projection from 3D structure: the relative solvent accessibility of each residue. We show that solvent accessibility is less conserved in 3D homologues than is secondary structure, and hence is predicted less accurately from automatic homology modeling; the correlation coefficient of relative solvent accessibility between 3D homologues is only 0.77, and the average accuracy of predictions based on sequence alignments is only 0.68. The latter number provides an effective upper limit on the accuracy of predicting accessibility from sequence when homology modeling is not possible. We introduce a neural network system that predicts relative solvent accessibility (projected onto ten discrete states) using evolutionary profiles of amino acid substitutions derived from multiple sequence alignments. Evaluated in a cross-validation test on 238 unique proteins, the correlation between predicted and observed relative accessibility is 0.54. Interpreted in terms of a three-state (buried, intermediate, exposed) description of relative accessibility, the fraction of correctly predicted residue states is about 58%. In absolute terms this accuracy appears poor, but given the relatively low conservation of accessibility in 3D families, the network system is not far from its likely optimal performance. The most reliably predicted fraction of the residues (50%) is predicted as accurately as by automatic homology modeling. Prediction is best for buried residues, e.g., 86% of the completely buried sites are correctly predicted as having 0% relative accessibility.
Collapse
Affiliation(s)
- B Rost
- Protein Design Group, European Molecular Biology Laboratory, Heidelberg, Germany
| | | |
Collapse
|
24
|
Mihelić M, Voelter W. Distribution and biological activity ofβ-thymosins. Amino Acids 1994; 6:1-13. [DOI: 10.1007/bf00808118] [Citation(s) in RCA: 20] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/1993] [Accepted: 06/11/1993] [Indexed: 11/28/2022]
|
25
|
Czisch M, Schleicher M, Hörger S, Voelter W, Holak TA. Conformation of thymosin beta 4 in water determined by NMR spectroscopy. EUROPEAN JOURNAL OF BIOCHEMISTRY 1993; 218:335-44. [PMID: 8269922 DOI: 10.1111/j.1432-1033.1993.tb18382.x] [Citation(s) in RCA: 59] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The conformational preferences of a 43-amino-acid G-actin-binding peptide, thymosin beta 4, in water at 1, 4 and 14 degrees C, and at pH 3.0 and 6.5 were studied by NMR. NMR showed that thymosin beta 4 lacks a uniquely folded conformation in water. However, some preferential alpha-helical conformations of thymosin beta 4 can be observed in aqueous solutions. The segment at residues 5-16 showed characteristic interactions for conformations in both the beta-strand and alpha-helical regions of the phi-psi space, based on strong C alpha H(i)-NH(i+1) interactions and NH-NH, C alpha H(i)-NH(i+3), and C alpha H(i)-C beta H(i+3) interactions, respectively. At 1-4 degrees C, another segment at residues 31-37 also shows both beta and alpha conformations, forming however a less well-defined helix than the segment at residues 5-16. At 14 degrees C, the conformational population of the helix at positions 5-16 is shifted more towards the random and turn-like structures, whereas the segment at positions 31-37 becomes exclusively a random coil.
Collapse
Affiliation(s)
- M Czisch
- Max-Planck-Institut für Biochemie, Germany
| | | | | | | | | |
Collapse
|
26
|
Abstract
A major problem in the determination of the three-dimensional structure of proteins concerns the quality of the structural models obtained from the interpretation of experimental data. New developments in X-ray crystallography and nuclear magnetic resonance spectroscopy have accelerated the process of structure determination and the biological community is confronted with a steadily increasing number of experimentally determined protein folds. However, in the recent past several experimentally determined protein structures have been proven to contain major errors, indicating that in some cases the interpretation of experimental data is difficult and may yield incorrect models. Such problems can be avoided when computational methods are employed which complement experimental structure determinations. A prerequisite of such computational tools is that they are independent of the parameters obtained from a particular experiment. In addition such techniques are able to support and accelerate experimental structure determinations. Here we present techniques based on knowledge based mean fields which can be used to judge the quality of protein folds. The methods can be used to identify misfolded structures as well as faulty parts of structural models. The techniques are even applicable in cases where only the C alpha trace of a protein conformation is available. The capabilities of the technique are demonstrated using correct and incorrect protein folds.
Collapse
Affiliation(s)
- M J Sippl
- Center for Applied Molecular Engineering, University of Salzburg, Austria
| |
Collapse
|
27
|
Sippl MJ. Boltzmann's principle, knowledge-based mean fields and protein folding. An approach to the computational determination of protein structures. J Comput Aided Mol Des 1993; 7:473-501. [PMID: 8229096 DOI: 10.1007/bf02337562] [Citation(s) in RCA: 269] [Impact Index Per Article: 8.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
The data base of known protein structures contains a tremendous amount of information on protein-solvent systems. Boltzmann's principle enables the extraction of this information in the form of potentials of mean force. The resulting force field constitutes an energetic model for protein-solvent systems. We outline the basic physical principles of this approach to protein folding and summarize several techniques which are useful in the development of knowledge-based force fields. Among the applications presented are the validation of experimentally determined protein structures, data base searches which aim at the identification of native-like sequence structure pairs, sequence structure alignments and the calculation of protein conformations from amino acid sequences.
Collapse
Affiliation(s)
- M J Sippl
- Center for Applied Molecular Engineering, University of Salzburg, Austria
| |
Collapse
|
28
|
Vriend G, Eijsink V. Prediction and analysis of structure, stability and unfolding of thermolysin-like proteases. J Comput Aided Mol Des 1993; 7:367-96. [PMID: 8229092 DOI: 10.1007/bf02337558] [Citation(s) in RCA: 65] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
Bacillus neutral proteases (NPs) form a group of well-characterized homologous enzymes, that exhibit large differences in thermostability. The three-dimensional (3D) structures of several of these enzymes have been modelled on the basis of the crystal structures of the NPs of B. thermoproteolyticus (thermolysin) and B. cereus. Several new techniques have been developed to improve the model-building procedures. Also a 'model-building by mutagenesis' strategy was used, in which mutants were designed just to shed light on parts of the structures that were particularly hard to model. The NP models have been used for the prediction of site-directed mutations aimed at improving the thermostability of the enzymes. Predictions were made using several novel computational techniques, such as position-specific rotamer searching, packing quality analysis and property-profile database searches. Many stabilizing mutations were predicted and produced: improvement of hydrogen bonding, exclusion of buried water molecules, capping helices, improvement of hydrophobic interactions and entropic stabilization have been applied successfully. At elevated temperatures NPs are irreversibly inactivated as a result of autolysis. It has been shown that this denaturation process is independent of the protease activity and concentration and that the inactivation follows first-order kinetics. From this it has been conjectured that local unfolding of (surface) loops, which renders the protein susceptible to autolysis, is the rate-limiting step. Despite the particular nature of the thermal denaturation process, normal rules for protein stability can be applied to NPs. However, rather than stabilizing the whole protein against global unfolding, only a small region has to be protected against local unfolding. In contrast to proteins in general, mutational effects in proteases are not additive and their magnitude is strongly dependent on the location of the mutation. Mutations that alter the stability of the NP by a large amount are located in a relatively weak region (or more precisely, they affect a local unfolding pathway with a relatively low free energy of activation). One weak region, that is supposedly important in the early steps of NP unfolding, has been determined in the NP of B. stearothermophilus. After eliminating this weakest link a drastic increase in thermostability was observed and the search for the second-weakest link, or the second-lowest energy local unfolding pathway is now in progress. Hopefully, this approach can be used to unravel the entire early phase of unfolding.
Collapse
Affiliation(s)
- G Vriend
- EMBL, Protein Design Group, Heidelberg, Germany
| | | |
Collapse
|
29
|
Sun S. Reduced representation model of protein structure prediction: statistical potential and genetic algorithms. Protein Sci 1993; 2:762-85. [PMID: 8495198 PMCID: PMC2142494 DOI: 10.1002/pro.5560020508] [Citation(s) in RCA: 125] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
A reduced representation model, which has been described in previous reports, was used to predict the folded structures of proteins from their primary sequences and random starting conformations. The molecular structure of each protein has been reduced to its backbone atoms (with ideal fixed bond lengths and valence angles) and each side chain approximated by a single virtual united-atom. The coordinate variables were the backbone dihedral angles phi and psi. A statistical potential function, which included local and nonlocal interactions and was computed from known protein structures, was used in the structure minimization. A novel approach, employing the concepts of genetic algorithms, has been developed to simultaneously optimize a population of conformations. With the information of primary sequence and the radius of gyration of the crystal structure only, and starting from randomly generated initial conformations, I have been able to fold melittin, a protein of 26 residues, with high computational convergence. The computed structures have a root mean square error of 1.66 A (distance matrix error = 0.99 A) on average to the crystal structure. Similar results for avian pancreatic polypeptide inhibitor, a protein of 36 residues, are obtained. Application of the method to apamin, an 18-residue polypeptide with two disulfide bonds, shows that it folds apamin to native-like conformations with the correct disulfide bonds formed.
Collapse
Affiliation(s)
- S Sun
- Department of Biophysical Science, State University of New York, Buffalo 14214
| |
Collapse
|