1
|
Maddhuri Venkata Subramaniya SR, Terashi G, Jain A, Kagaya Y, Kihara D. Protein Contact Map Refinement for Improving Structure Prediction Using Generative Adversarial Networks. Bioinformatics 2021; 37:3168-3174. [PMID: 33787852 PMCID: PMC8504630 DOI: 10.1093/bioinformatics/btab220] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2020] [Revised: 02/28/2021] [Accepted: 03/30/2021] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Protein structure prediction remains as one of the most important problems in computational biology and biophysics. In the past few years, protein residue-residue contact prediction has undergone substantial improvement, which has made it a critical driving force for successful protein structure prediction. Boosting the accuracy of contact predictions has, therefore, become the forefront of protein structure prediction. RESULTS We show a novel contact map refinement method, ContactGAN, which uses Generative Adversarial Networks (GAN). ContactGAN was able to make a significant improvement over predictions made by recent contact prediction methods when tested on three datasets including protein structure modeling targets in CASP13 and CASP14. We show improvement of precision in contact prediction, which translated into improvement in the accuracy of protein tertiary structure models. On the other hand, observed improvement over trRosetta was relatively small, reasons for which are discussed. ContactGAN will be a valuable addition in the structure prediction pipeline to achieve an extra gain in contact prediction accuracy. AVAILABILITY https://github.com/kiharalab/ContactGAN. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Genki Terashi
- Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
| | - Aashish Jain
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Yuki Kagaya
- Graduate School of Information Sciences, Tohoku University, Japan
| | - Daisuke Kihara
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA.,Department of Biological Sciences, Purdue University, West Lafayette, IN, 47907, USA
| |
Collapse
|
2
|
Karczyńska AS, Czaplewski C, Krupa P, Mozolewska MA, Joo K, Lee J, Liwo A. Ergodicity and model quality in template-restrained canonical and temperature/Hamiltonian replica exchange coarse-grained molecular dynamics simulations of proteins. J Comput Chem 2017; 38:2730-2746. [DOI: 10.1002/jcc.25070] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2017] [Revised: 07/10/2017] [Accepted: 09/01/2017] [Indexed: 01/22/2023]
Affiliation(s)
- Agnieszka S. Karczyńska
- Faculty of Chemistry; University of Gdańsk, ul. Wita Stwosza 63; Gdańsk 80-308 Poland
- Center for In Silico Protein Science; Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu; Seoul 02455 Republic of Korea
- School of Computational Sciences; Korea Institute for Advanced Study, 85 Hoegiro Dongdaemun-gu; Seoul 02455 Republic of Korea
| | - Cezary Czaplewski
- Faculty of Chemistry; University of Gdańsk, ul. Wita Stwosza 63; Gdańsk 80-308 Poland
| | - Paweł Krupa
- Faculty of Chemistry; University of Gdańsk, ul. Wita Stwosza 63; Gdańsk 80-308 Poland
- Institute of Physics, Polish Academy of Sciences, Aleja Lotników 32/46; Warsaw PL 02668 Poland
| | - Magdalena A. Mozolewska
- Faculty of Chemistry; University of Gdańsk, ul. Wita Stwosza 63; Gdańsk 80-308 Poland
- Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5; Warsaw 01-248 Poland
| | - Keehyoung Joo
- School of Computational Sciences; Korea Institute for Advanced Study, 85 Hoegiro Dongdaemun-gu; Seoul 02455 Republic of Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu; Seoul 02455 Republic of Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science; Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu; Seoul 02455 Republic of Korea
- School of Computational Sciences; Korea Institute for Advanced Study, 85 Hoegiro Dongdaemun-gu; Seoul 02455 Republic of Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu; Seoul 02455 Republic of Korea
| | - Adam Liwo
- Faculty of Chemistry; University of Gdańsk, ul. Wita Stwosza 63; Gdańsk 80-308 Poland
- Center for In Silico Protein Science; Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu; Seoul 02455 Republic of Korea
- School of Computational Sciences; Korea Institute for Advanced Study, 85 Hoegiro Dongdaemun-gu; Seoul 02455 Republic of Korea
| |
Collapse
|
3
|
Mozolewska MA, Krupa P, Zaborowski B, Liwo A, Lee J, Joo K, Czaplewski C. Use of Restraints from Consensus Fragments of Multiple Server Models To Enhance Protein-Structure Prediction Capability of the UNRES Force Field. J Chem Inf Model 2016; 56:2263-2279. [DOI: 10.1021/acs.jcim.6b00189] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
| | - Paweł Krupa
- Faculty
of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| | | | - Adam Liwo
- Faculty
of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
- Center
for In Silico Protein Structure and School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Jooyoung Lee
- Center
for In Silico Protein Structure and School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Keehyoung Joo
- Center
for Advanced Computation, Korea Institute for Advanced Study, 85
Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Cezary Czaplewski
- Faculty
of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| |
Collapse
|
4
|
Abstract
Protein structure prediction and protein docking prediction are two related problems in molecular biology. We suggest the use of multiple docking in the process of protein structure prediction. Once reliable structural models are predicted to disjoint fragments of the protein target sequence, a combinatorial assembly may be used to predict their native arrangement. Here, we present CombDock, a combinatorial docking algorithm for the structural units assembly problem. We have tested the algorithm on various examples using both domains and domain substructures as input. Inaccurate models of the structural units were also used, to test the robustness of the algorithm. The algorithm was able to predict a near-native arrangement of the input structural units in almost all of the cases, showing that the combinatorial approach succeeds in overcoming the inexact shape complementarity caused by the inaccuracy of the models.
Collapse
Affiliation(s)
- Yuval Inbar
- School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel,
| | - Haim J. Wolfson
- School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel
| | - Ruth Nussinov
- Sackler Institute of Molecular Medicine, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel and Basic Research Program, SAIC-Frederick Inc., Laboratory of Experimental and Computational Biology, NCI - FCRDC, Bldg 469, Rm 151, Frederick, MD 21702, USA
| |
Collapse
|
5
|
|
6
|
Abstract
Motivation: Thermodynamics-based dynamic programming RNA secondary structure algorithms have been of immense importance in molecular biology, where applications range from the detection of novel selenoproteins using expressed sequence tag (EST) data, to the determination of microRNA genes and their targets. Dynamic programming algorithms have been developed to compute the minimum free energy secondary structure and partition function of a given RNA sequence, the minimum free-energy and partition function for the hybridization of two RNA molecules, etc. However, the applicability of dynamic programming methods depends on disallowing certain types of interactions (pseudoknots, zig-zags, etc.), as their inclusion renders structure prediction an nondeterministic polynomial time (NP)-complete problem. Nevertheless, such interactions have been observed in X-ray structures. Results: A non-Boltzmannian Monte Carlo algorithm was designed by Wang and Landau to estimate the density of states for complex systems, such as the Ising model, that exhibit a phase transition. In this article, we apply the Wang-Landau (WL) method to compute the density of states for secondary structures of a given RNA sequence, and for hybridizations of two RNA sequences. Our method is shown to be much faster than existent software, such as RNAsubopt. From density of states, we compute the partition function over all secondary structures and over all pseudoknot-free hybridizations. The advantage of the WL method is that by adding a function to evaluate the free energy of arbitary pseudoknotted structures and of arbitrary hybridizations, we can estimate thermodynamic parameters for situations known to be NP-complete. This extension to pseudoknots will be made in the sequel to this article; in contrast, the current article describes the WL algorithm applied to pseudoknot-free secondary structures and hybridizations. Availability: The WL RNA hybridization web server is under construction at http://bioinformatics.bc.edu/clotelab/. Contact:clote@bc.edu
Collapse
Affiliation(s)
- Feng Lou
- Laboratoire de Recherche en Informatique, Université Paris-Sud XI, bât. 490, 91405 Orsay cedex, France
| | | |
Collapse
|
7
|
Rajgaria R, Wei Y, Floudas CA. Contact prediction for beta and alpha-beta proteins using integer linear optimization and its impact on the first principles 3D structure prediction method ASTRO-FOLD. Proteins 2010; 78:1825-46. [PMID: 20225257 PMCID: PMC2858251 DOI: 10.1002/prot.22696] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
An integer linear optimization model is presented to predict residue contacts in beta, alpha + beta, and alpha/beta proteins. The total energy of a protein is expressed as sum of a C(alpha)-C(alpha) distance dependent contact energy contribution and a hydrophobic contribution. The model selects contact that assign lowest energy to the protein structure as satisfying a set of constraints that are included to enforce certain physically observed topological information. A new method based on hydrophobicity is proposed to find the beta-sheet alignments. These beta-sheet alignments are used as constraints for contacts between residues of beta-sheets. This model was tested on three independent protein test sets and CASP8 test proteins consisting of beta, alpha + beta, alpha/beta proteins and it was found to perform very well. The average accuracy of the predictions (separated by at least six residues) was approximately 61%. The average true positive and false positive distances were also calculated for each of the test sets and they are 7.58 A and 15.88 A, respectively. Residue contact prediction can be directly used to facilitate the protein tertiary structure prediction. This proposed residue contact prediction model is incorporated into the first principles protein tertiary structure prediction approach, ASTRO-FOLD. The effectiveness of the contact prediction model was further demonstrated by the improvement in the quality of the protein structure ensemble generated using the predicted residue contacts for a test set of 10 proteins.
Collapse
Affiliation(s)
- R. Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - Y. Wei
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - C. A. Floudas
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| |
Collapse
|
8
|
McAllister SR, Floudas CA. An improved hybrid global optimization method for protein tertiary structure prediction. COMPUTATIONAL OPTIMIZATION AND APPLICATIONS 2010; 45:377-413. [PMID: 20357906 PMCID: PMC2847311 DOI: 10.1007/s10589-009-9277-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
First principles approaches to the protein structure prediction problem must search through an enormous conformational space to identify low-energy, near-native structures. In this paper, we describe the formulation of the tertiary structure prediction problem as a nonlinear constrained minimization problem, where the goal is to minimize the energy of a protein conformation subject to constraints on torsion angles and interatomic distances. The core of the proposed algorithm is a hybrid global optimization method that combines the benefits of the αBB deterministic global optimization approach with conformational space annealing. These global optimization techniques employ a local minimization strategy that combines torsion angle dynamics and rotamer optimization to identify and improve the selection of initial conformations and then applies a sequential quadratic programming approach to further minimize the energy of the protein conformations subject to constraints. The proposed algorithm demonstrates the ability to identify both lower energy protein structures, as well as larger ensembles of low-energy conformations.
Collapse
|
9
|
Abia D, Bastolla U, Chacón P, Fábrega C, Gago F, Morreale A, Tramontano A. In memoriam. Proteins 2010; 78:iii-viii. [DOI: 10.1002/prot.22660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
10
|
Liu T, Horst JA, Samudrala R. A novel method for predicting and using distance constraints of high accuracy for refining protein structure prediction. Proteins 2009; 77:220-34. [PMID: 19422061 DOI: 10.1002/prot.22434] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The principal bottleneck in protein structure prediction is the refinement of models from lower accuracies to the resolution observed by experiment. We developed a novel constraints-based refinement method that identifies a high number of accurate input constraints from initial models and rebuilds them using restrained torsion angle dynamics (rTAD). We previously created a Bayesian statistics-based residue-specific all-atom probability discriminatory function (RAPDF) to discriminate native-like models by measuring the probability of accuracy for atom type distances within a given model. Here, we exploit RAPDF to score (i.e., filter) constraints from initial predictions that may or may not be close to a native-like state, obtain consensus of top scoring constraints amongst five initial models, and compile sets with no redundant residue pair constraints. We find that this method consistently produces a large and highly accurate set of distance constraints from which to build refinement models. We further optimize the balance between accuracy and coverage of constraints by producing multiple structure sets using different constraint distance cutoffs, and note that the cutoff governs spatially near versus distant effects in model generation. This complete procedure of deriving distance constraints for rTAD simulations improves the quality of initial predictions significantly in all cases evaluated by us. Our procedure represents a significant step in solving the protein structure prediction and refinement problem, by enabling the use of consensus constraints, RAPDF, and rTAD for protein structure modeling and refinement.
Collapse
Affiliation(s)
- Tianyun Liu
- Department of Genetics, Stanford University, Stanford, California, USA
| | | | | |
Collapse
|
11
|
Lahti JL, Silverman AP, Cochran JR. Interrogating and predicting tolerated sequence diversity in protein folds: application to E. elaterium trypsin inhibitor-II cystine-knot miniprotein. PLoS Comput Biol 2009; 5:e1000499. [PMID: 19730675 PMCID: PMC2725296 DOI: 10.1371/journal.pcbi.1000499] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2009] [Accepted: 08/04/2009] [Indexed: 11/18/2022] Open
Abstract
Cystine-knot miniproteins (knottins) are promising molecular scaffolds for protein engineering applications. Members of the knottin family have multiple loops capable of displaying conformationally constrained polypeptides for molecular recognition. While previous studies have illustrated the potential of engineering knottins with modified loop sequences, a thorough exploration into the tolerated loop lengths and sequence space of a knottin scaffold has not been performed. In this work, we used the Ecballium elaterium trypsin inhibitor II (EETI) as a model member of the knottin family and constructed libraries of EETI loop-substituted variants with diversity in both amino acid sequence and loop length. Using yeast surface display, we isolated properly folded EETI loop-substituted clones and applied sequence analysis tools to assess the tolerated diversity of both amino acid sequence and loop length. In addition, we used covariance analysis to study the relationships between individual positions in the substituted loops, based on the expectation that correlated amino acid substitutions will occur between interacting residue pairs. We then used the results of our sequence and covariance analyses to successfully predict loop sequences that facilitated proper folding of the knottin when substituted into EETI loop 3. The sequence trends we observed in properly folded EETI loop-substituted clones will be useful for guiding future protein engineering efforts with this knottin scaffold. Furthermore, our findings demonstrate that the combination of directed evolution with sequence and covariance analyses can be a powerful tool for rational protein engineering.
Collapse
Affiliation(s)
- Jennifer L. Lahti
- Department of Bioengineering, Cancer Center, Bio-X Program, Stanford University, Stanford, California, United States of America
| | - Adam P. Silverman
- Department of Bioengineering, Cancer Center, Bio-X Program, Stanford University, Stanford, California, United States of America
| | - Jennifer R. Cochran
- Department of Bioengineering, Cancer Center, Bio-X Program, Stanford University, Stanford, California, United States of America
- * E-mail:
| |
Collapse
|
12
|
Rajgaria R, McAllister SR, Floudas CA. Towards accurate residue-residue hydrophobic contact prediction for alpha helical proteins via integer linear optimization. Proteins 2009; 74:929-47. [PMID: 18767158 DOI: 10.1002/prot.22202] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
A new optimization-based method is presented to predict the hydrophobic residue contacts in alpha-helical proteins. The proposed approach uses a high resolution distance dependent force field to calculate the interaction energy between different residues of a protein. The formulation predicts the hydrophobic contacts by minimizing the sum of these contact energies. These residue contacts are highly useful in narrowing down the conformational space searched by protein structure prediction algorithms. The proposed algorithm also offers the algorithmic advantage of producing a rank ordered list of the best contact sets. This model was tested on four independent alpha-helical protein test sets and was found to perform very well. The average accuracy of the predictions (separated by at least six residues) obtained using the presented method was approximately 66% for single domain proteins. The average true positive and false positive distances were also calculated for each protein test set and they are 8.87 and 14.67 A, respectively.
Collapse
Affiliation(s)
- R Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | |
Collapse
|
13
|
Vicatos S, Kaznessis YN. Separating true positive predicted residue contacts from false positive ones in mainly alpha proteins, using constrained Metropolis MC simulations. Proteins 2008; 70:539-52. [PMID: 17879348 DOI: 10.1002/prot.21553] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We present a method that significantly improves the accuracy of predicted proximal residue pairs in protein molecules. Computational methods for predicting pairs of amino acids that are distant in the protein sequence but close in the protein 3D structure can benefit attempts to in silico recognize the fold of a protein molecule. Unfortunately, currently available methods suffer from low predictive accuracy. In this work, we use Monte Carlo simulations to fold protein molecules with proximal pair predictions used as additional energy constraints. To test our methods, we study molecules with known tertiary structures. With Monte Carlo, we generate ensembles of structures for each set of residues constraints. The distribution of the root mean square deviation of the folded structures from the known native structure reveals clear information about the accuracy of the constraint sets used. With recursive substitutions of constraints, false positive predictions are identified and filtered out and significant improvements in accuracy are observed.
Collapse
Affiliation(s)
- Spyridon Vicatos
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | | |
Collapse
|
14
|
Fuchs A, Martin-Galiano AJ, Kalman M, Fleishman S, Ben-Tal N, Frishman D. Co-evolving residues in membrane proteins. Bioinformatics 2007; 23:3312-9. [DOI: 10.1093/bioinformatics/btm515] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
15
|
Cheng J, Baldi P. Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinformatics 2007; 8:113. [PMID: 17407573 PMCID: PMC1852326 DOI: 10.1186/1471-2105-8-113] [Citation(s) in RCA: 174] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2006] [Accepted: 04/02/2007] [Indexed: 11/12/2022] Open
Abstract
Background Predicting protein residue-residue contacts is an important 2D prediction task. It is useful for ab initio structure prediction and understanding protein folding. In spite of steady progress over the past decade, contact prediction remains still largely unsolved. Results Here we develop a new contact map predictor (SVMcon) that uses support vector machines to predict medium- and long-range contacts. SVMcon integrates profiles, secondary structure, relative solvent accessibility, contact potentials, and other useful features. On the same test data set, SVMcon's accuracy is 4% higher than the latest version of the CMAPpro contact map predictor. SVMcon recently participated in the seventh edition of the Critical Assessment of Techniques for Protein Structure Prediction (CASP7) experiment and was evaluated along with seven other contact map predictors. SVMcon was ranked as one of the top predictors, yielding the second best coverage and accuracy for contacts with sequence separation >= 12 on 13 de novo domains. Conclusion We describe SVMcon, a new contact map predictor that uses SVMs and a large set of informative features. SVMcon yields good performance on medium- to long-range contact predictions and can be modularly incorporated into a structure prediction pipeline.
Collapse
Affiliation(s)
- Jianlin Cheng
- School of Electrical Engineering and Computer Science, University of Central Florida, Orlando, FL 32816-2362, USA
| | - Pierre Baldi
- School of Information and Computer Sciences, University of California Irvine, Irvine, CA 92617, USA
| |
Collapse
|
16
|
Skolnick J, Kolinski A. Monte Carlo Approaches to the Protein Folding Problem. ADVANCES IN CHEMICAL PHYSICS 2007. [DOI: 10.1002/9780470141649.ch7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
|
17
|
McAllister SR, Mickus BE, Klepeis JL, Floudas CA. Novel approach for alpha-helical topology prediction in globular proteins: generation of interhelical restraints. Proteins 2007; 65:930-52. [PMID: 17029234 DOI: 10.1002/prot.21095] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The protein folding problem represents one of the most challenging problems in computational biology. Distance constraints and topology predictions can be highly useful for the folding problem in reducing the conformational space that must be searched by deterministic algorithms to find a protein structure of minimum conformational energy. We present a novel optimization framework for predicting topological contacts and generating interhelical distance restraints between hydrophobic residues in alpha-helical globular proteins. It should be emphasized that since the model does not make assumptions about the form of the helices, it is applicable to all alpha-helical proteins, including helices with kinks and irregular helices. This model aims at enhancing the ASTRO-FOLD protein folding approach of Klepeis and Floudas (Journal of Computational Chemistry 2003;24:191-208), which finds the structure of global minimum conformational energy via a constrained nonlinear optimization problem. The proposed topology prediction model was evaluated on 26 alpha-helical proteins ranging from 2 to 8 helices and 35 to 159 residues, and the best identified average interhelical distances corresponding to the predicted contacts fell below 11 A in all 26 of these systems. Given the positive results of applying the model to several protein systems, the importance of interhelical hydrophobic-to-hydrophobic contacts in determining the folding of alpha-helical globular proteins is highlighted.
Collapse
Affiliation(s)
- S R McAllister
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | | | |
Collapse
|
18
|
Jayaram B, Bhushan K, Shenoy SR, Narang P, Bose S, Agrawal P, Sahu D, Pandey V. Bhageerath: an energy based web enabled computer software suite for limiting the search space of tertiary structures of small globular proteins. Nucleic Acids Res 2006; 34:6195-204. [PMID: 17090600 PMCID: PMC1693886 DOI: 10.1093/nar/gkl789] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
We describe here an energy based computer software suite for narrowing down the search space of tertiary structures of small globular proteins. The protocol comprises eight different computational modules that form an automated pipeline. It combines physics based potentials with biophysical filters to arrive at 10 plausible candidate structures starting from sequence and secondary structure information. The methodology has been validated here on 50 small globular proteins consisting of 2–3 helices and strands with known tertiary structures. For each of these proteins, a structure within 3–6 Å RMSD (root mean square deviation) of the native has been obtained in the 10 lowest energy structures. The protocol has been web enabled and is accessible at .
Collapse
Affiliation(s)
- B Jayaram
- Department of Chemistry and Supercomputing Facility for Bioinformatics and Computational Biology, Indian Institute of Technology Delhi Hauz Khas, New Delhi 110 016, India.
| | | | | | | | | | | | | | | |
Collapse
|
19
|
Carbonaro B, Vitale F, Giordano C. On a 3D-matrix representation of the tertiary structure of a protein. ACTA ACUST UNITED AC 2006. [DOI: 10.1016/j.mcm.2005.07.003] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
|
20
|
Halperin I, Wolfson H, Nussinov R. Correlated mutations: advances and limitations. A study on fusion proteins and on the Cohesin-Dockerin families. Proteins 2006; 63:832-45. [PMID: 16508975 DOI: 10.1002/prot.20933] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Correlated mutations have been repeatedly exploited for intramolecular contact map prediction. Over the last decade these efforts yielded several methods for measuring correlated mutations. Nevertheless, the application of correlated mutations for the prediction of intermolecular interactions has not yet been explored. This gap is due to several obstacles, such as 3D complexes availability, paralog discrimination, and the availability of sequence pairs that are required for inter- but not intramolecular analyses. Here we selected for analysis fusion protein families that bypass some of these obstacles. We find that several correlated mutation measurements yield reasonable accuracy for intramolecular contact map prediction on the fusion dataset. However, the accuracy level drops sharply in intermolecular contacts prediction. This drop in accuracy does not occur always. In the Cohesin-Dockerin family, reasonable accuracy is achieved in the prediction of both intra- and intermolecular contacts. The Cohesin-Dockerin family is well suited for correlated mutation analysis. Because, however, this family constitutes a special case (it has radical mutations, has domain repeats, within each species each Dockerin domain interacts with each Cohesin domain, see below), the successful prediction in this family does not point to a general potential in using correlated mutations for predicting intermolecular contacts. Overall, the results of our study indicate that current methodologies of correlated mutations analysis are not suitable for large-scale intermolecular contact prediction, and thus cannot assist in docking. With current measurements, sequence availability, sequence annotations, and underdeveloped sequence pairing methods, correlated mutations can yield reasonable accuracy only for a handful of families.
Collapse
Affiliation(s)
- Inbal Halperin
- Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | | | | |
Collapse
|
21
|
Graña O, Eyrich VA, Pazos F, Rost B, Valencia A. EVAcon: a protein contact prediction evaluation service. Nucleic Acids Res 2005; 33:W347-51. [PMID: 15980486 PMCID: PMC1160172 DOI: 10.1093/nar/gki411] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
Here we introduce EVAcon, an automated web service that evaluates the performance of contact prediction servers. Currently, EVAcon is monitoring nine servers, four of which are specialized in contact prediction and five are general structure prediction servers. Results are compared for all newly determined experimental structures deposited into PDB (∼5–50 per week). EVAcon allows for a precise comparison of the results based on a system of common protein subsets and the commonly accepted evaluation criteria that are also used in the corresponding category of the CASP assessment. EVAcon is a new service added to the functionality of the EVA system for the continuous evaluation of protein structure prediction servers. The new service is accesible from any of the three EVA mirrors: PDG (CNB-CSIC, Madrid) (); CUBIC (Columbia University, NYC) (); and Sali Lab (UCSF, San Francisco) ().
Collapse
Affiliation(s)
| | - Volker A. Eyrich
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University650 West 168th Street BB217, New York, NY 10032, USA
| | | | - Burkhard Rost
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University650 West 168th Street BB217, New York, NY 10032, USA
| | - Alfonso Valencia
- To whom correspondence should be addressed. Tel: +34 91 585 4570; Fax: +34 91 585 4506;
| |
Collapse
|
22
|
Vicatos S, Reddy BVB, Kaznessis Y. Prediction of distant residue contacts with the use of evolutionary information. Proteins 2005; 58:935-49. [PMID: 15645442 DOI: 10.1002/prot.20370] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In this work we present a novel correlated mutations analysis (CMA) method that is significantly more accurate than previously reported CMA methods. Calculation of correlation coefficients is based on physicochemical properties of residues (predictors) and not on substitution matrices. This results in reliable prediction of pairs of residues that are distant in protein sequence but proximal in its three dimensional tertiary structure. Multiple sequence alignments (MSA) containing a sequence of known structure for 127 families from PFAM database have been selected so that all major protein architectures described in CATH classification database are represented. Protein sequences in the selected families were filtered so that only those evolutionarily close to the target protein remain in the MSA. The average accuracy obtained for the alpha beta class of proteins was 26.8% of predicted proximal pairs with average improvement over random accuracy (IOR) of 6.41. Average accuracy is 20.6% for the mainly beta class and 14.4% for the mainly alpha class. The optimum correlation coefficient cutoff (cc cutoff) was found to be around 0.65. The first predictor, which correlates to hydrophobicity, provides the most reliable results. The other two predictors give good predictions which can be used in conjunction to those of the first one. When stricter cc cutoff is chosen, the average accuracy increases significantly (38.76% for alpha beta class), but the trade off is a smaller number of predictions. The use of solvent accessible area estimations for filtering false positives out of the predictions is promising.
Collapse
Affiliation(s)
- Spyridon Vicatos
- Department of Chemical Engineering and Materials Science, University of Minnesota,Minneapolis, Minnesota 55455, USA
| | | | | |
Collapse
|
23
|
Afonnikov DA, Kolchanov NA. CRASP: a program for analysis of coordinated substitutions in multiple alignments of protein sequences. Nucleic Acids Res 2004; 32:W64-8. [PMID: 15215352 PMCID: PMC441589 DOI: 10.1093/nar/gkh451] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Recent results suggest that during evolution certain substitutions at protein sites may occur in a coordinated manner due to interactions between amino acid residues. Information on these coordinated substitutions may be useful for analysis of protein structure and function. CRASP is an Internet-available software tool for the detection and analysis of coordinated substitutions in multiple alignments of protein sequences. The approach is based on estimation of the correlation coefficient between the values of a physicochemical parameter at a pair of positions of sequence alignment. The program enables the user to detect and analyze pairwise relationships between amino acid substitutions at protein sequence positions, estimate the contribution of the coordinated substitutions to the evolutionary invariance or variability in integral protein physicochemical characteristics such as the net charge of protein residues and hydrophobic core volume. The CRASP program is available at http://wwwmgs.bionet.nsc.ru/mgs/programs/crasp/.
Collapse
|
24
|
Abstract
We present a novel method, HMMSTR-CM, for protein contact map predictions. Contact potentials were calculated by using HMMSTR, a hidden Markov model for local sequence structure correlations. Targets were aligned against protein templates using a Bayesian method, and contact maps were generated by using these alignments. Contact potentials then were used to evaluate these templates. An ab initio method based on the target contact potentials using a rule-based strategy to model the protein-folding pathway was developed. Fold recognition and ab initio methods were combined to produce accurate, protein-like contact maps. Pathways sometimes led to an unambiguous prediction of topology, even without using templates. The results on CASP5 targets are discussed. Also included is a brief update on the quality of fully automated ab initio predictions using the I-sites server.
Collapse
Affiliation(s)
- Yu Shao
- Department of Biology, Rensselaer Polytechnic Institute, Troy, New York 12180, USA
| | | |
Collapse
|
25
|
|
26
|
Bystroff C, Shao Y, Yuan X. Five Hierarchical Levels of Sequence-Structure Correlation in Proteins. ACTA ACUST UNITED AC 2004; 3:97-104. [PMID: 15693735 DOI: 10.2165/00822942-200403020-00004] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
This article reviews recent work towards modelling protein folding pathways using a bioinformatics approach. Statistical models have been developed for sequence-structure correlations in proteins at five levels of structural complexity: (i) short motifs; (ii) extended motifs; (iii) nonlocal pairs of motifs; (iv) 3-dimensional arrangements of multiple motifs; and (v) global structural homology. We review statistical models, including sequence profiles, hidden Markov models (HMMs) and interaction potentials, for the first four levels of structural detail. The I-sites (folding Initiation sites) Library models short local structure motifs. Each succeeding level has a statistical model, as follows: HMMSTR (HMM for STRucture) is an HMM for extended motifs; HMMSTR-CM (Contact Maps) is a model for pairwise interactions between motifs; and SCALI-HMM (HMMs for Structural Core ALIgnments) is a set of HMMs for the spatial arrangements of motifs. The parallels between the statistical models and theoretical models for folding pathways are discussed in this article; however, global sequence models are not discussed because they have been extensively reviewed elsewhere. The data used and algorithms presented in this article are available at http://www.bioinfo.rpi.edu/~bystrc/ (click on "servers" or "downloads") or by request to bystrc@rpi.edu .
Collapse
Affiliation(s)
- Christopher Bystroff
- Biology Department, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY 12180, USA.
| | | | | |
Collapse
|
27
|
|
28
|
Abstract
Protein residues that are critical for structure and function are expected to be conserved throughout evolution. Here, we investigate the extent to which these conserved residues are clustered in three-dimensional protein structures. In 92% of the proteins in a data set of 79 proteins, the most conserved positions in multiple sequence alignments are significantly more clustered than randomly selected sets of positions. The comparison to random subsets is not necessarily appropriate, however, because the signal could be the result of differences in the amino acid composition of sets of conserved residues compared to random subsets (hydrophobic residues tend to be close together in the protein core), or differences in sequence separation of the residues in the different sets. In order to overcome these limits, we compare the degree of clustering of the conserved positions on the native structure and on alternative conformations generated by the de novo structure prediction method Rosetta. For 65% of the 79 proteins, the conserved residues are significantly more clustered in the native structure than in the alternative conformations, indicating that the clustering of conserved residues in protein structures goes beyond that expected purely from sequence locality and composition effects. The differences in the spatial distribution of conserved residues can be utilized in de novo protein structure prediction: We find that for 79% of the proteins, selection of the Rosetta generated conformations with the greatest clustering of the conserved residues significantly enriches the fraction of close-to-native structures.
Collapse
Affiliation(s)
- Ora Schueler-Furman
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
| | | |
Collapse
|
29
|
Joshi RR, Jyothi S. Ab-initio prediction and reliability of protein structural genomics by PROPAINOR algorithm. Comput Biol Chem 2003; 27:241-52. [PMID: 12927100 DOI: 10.1016/s0097-8485(02)00074-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We have formulated the ab-initio prediction of the 3D-structure of proteins as a probabilistic programming problem where the inter-residue 3D-distances are treated as random variables. Lower and upper bounds for these random variables and the corresponding probabilities are estimated by nonparametric statistical methods and knowledge-based heuristics. In this paper we focus on the probabilistic computation of the 3D-structure using these distance interval estimates. Validation of the predicted structures shows our method to be more accurate than other computational methods reported so far. Our method is also found to be computationally more efficient than other existing ab-initio structure prediction methods. Moreover, we provide a reliability index for the predicted structures too. Because of its computational simplicity and its applicability to any random sequence, our algorithm called PROPAINOR (PROtein structure Prediction by AI an Nonparametric Regression) has significant scope in computational protein structural genomics.
Collapse
Affiliation(s)
- Rajani R Joshi
- BJM School of Bioscience and Bioengineering, Indian Institute of Technology, Powai, 400076, Mumbai, India.
| | | |
Collapse
|
30
|
Ortiz AR, Strauss CEM, Olmea O. MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci 2002; 11:2606-21. [PMID: 12381844 PMCID: PMC2373724 DOI: 10.1110/ps.0215902] [Citation(s) in RCA: 320] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Advances in structural genomics and protein structure prediction require the design of automatic, fast, objective, and well benchmarked methods capable of comparing and assessing the similarity of low-resolution three-dimensional structures, via experimental or theoretical approaches. Here, a new method for sequence-independent structural alignment is presented that allows comparison of an experimental protein structure with an arbitrary low-resolution protein tertiary model. The heuristic algorithm is given and then used to show that it can describe random structural alignments of proteins with different folds with good accuracy by an extreme value distribution. From this observation, a structural similarity score between two proteins or two different conformations of the same protein is derived from the likelihood of obtaining a given structural alignment by chance. The performance of the derived score is then compared with well established, consensus manual-based scores and data sets. We found that the new approach correlates better than other tools with the gold standard provided by a human evaluator. Timings indicate that the algorithm is fast enough for routine use with large databases of protein models. Overall, our results indicate that the new program (MAMMOTH) will be a good tool for protein structure comparisons in structural genomics applications. MAMMOTH is available from our web site at http://physbio.mssm.edu/~ortizg/.
Collapse
Affiliation(s)
- Angel R Ortiz
- Department of Physiology and Biophysics, Mount Sinai School of Medicine, New York University, New York, New York 10029, USA.
| | | | | |
Collapse
|
31
|
de la Cruz X, Hutchinson EG, Shepherd A, Thornton JM. Toward predicting protein topology: an approach to identifying beta hairpins. Proc Natl Acad Sci U S A 2002; 99:11157-62. [PMID: 12177429 PMCID: PMC123226 DOI: 10.1073/pnas.162376199] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Although secondary structure prediction methods have recently improved, progress from secondary to tertiary structure prediction has been limited. A promising but largely unexplored route to this goal is to predict structure motifs from secondary structure knowledge. Here we present a novel method for the recognition of beta hairpins that combines secondary structure predictions and threading methods by using a database search and a neural network approach. The method successfully predicts 48 and 77%, respectively, of all of hairpin and nonhairpin beta-coil-beta motifs in a protein database. We find that the main contributors to motif recognition are predicted accessibility and turn propensities.
Collapse
Affiliation(s)
- Xavier de la Cruz
- Institut Català per la Recerca i Estudis Avançats (ICREA), Passeig Lluis Companys, 23, 08018 Barcelona, Spain.
| | | | | | | |
Collapse
|
32
|
Szyperski T, Yeh DC, Sukumaran DK, Moseley HNB, Montelione GT. Reduced-dimensionality NMR spectroscopy for high-throughput protein resonance assignment. Proc Natl Acad Sci U S A 2002; 99:8009-14. [PMID: 12060747 PMCID: PMC123011 DOI: 10.1073/pnas.122224599] [Citation(s) in RCA: 155] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2001] [Accepted: 04/12/2002] [Indexed: 11/18/2022] Open
Abstract
A suite of reduced-dimensionality (13)C,(15)N,(1)H-triple-resonance NMR experiments is presented for rapid and complete protein resonance assignment. Even when using short measurement times, these experiments allow one to retain the high spectral resolution required for efficient automated analysis. "Sampling limited" and "sensitivity limited" data collection regimes are defined, respectively, depending on whether the sampling of the indirect dimensions or the sensitivity of a multidimensional NMR experiments per se determines the minimally required measurement time. We show that reduced-dimensionality NMR spectroscopy is a powerful approach to avoid the "sampling limited regime"--i.e., a standard set of ten experiments proposed here allows one to effectively adapt minimal measurement times to sensitivity requirements. This is of particular interest in view of the greatly increased sensitivity of NMR spectrometers equipped with cryogenic probes. As a step toward fully automated analysis, the program AUTOASSIGN has been extended to provide sequential backbone and (13)C(beta) resonance assignments from these reduced-dimensionality NMR data.
Collapse
Affiliation(s)
- Thomas Szyperski
- Departments of Chemistry and Structural Biology, State University of New York, Buffalo, NY 14260, USA.
| | | | | | | | | |
Collapse
|
33
|
Zhang C, Hou J, Kim SH. Fold prediction of helical proteins using torsion angle dynamics and predicted restraints. Proc Natl Acad Sci U S A 2002; 99:3581-5. [PMID: 11904420 PMCID: PMC122566 DOI: 10.1073/pnas.052003799] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We describe a procedure for predicting the tertiary folds of alpha-helical proteins from their primary sequences. The central component of the procedure is a method for predicting interhelical contacts that is based on a helix-packing model. Instead of predicting the individual contacts, our method attempts to identify the entire patch of contacts that involve residues regularly spaced in the sequences. We use this component to glue together two powerful existing methods: a secondary structure prediction program, whose output serves as the input to the contact prediction algorithm, and the tortion angle dynamics program, which uses the predicted tertiary contacts and secondary structural states to assemble three-dimensional structures. In the final step, the procedure uses the initial set of simulated structures to refine the predicted contacts for a new round of structure calculation. When tested against 24 small to medium-sized proteins representing a wide range of helical folds, the completely automated procedure is able to generate native-like models within a limited number of trials consistently.
Collapse
Affiliation(s)
- Chao Zhang
- Department of Chemistry and E. O. Lawrence Berkeley National Laboratory, University of California, Berkeley, CA 94720, USA
| | | | | |
Collapse
|
34
|
Visiers I, Ballesteros JA, Weinstein H. Three-dimensional representations of G protein-coupled receptor structures and mechanisms. Methods Enzymol 2002; 343:329-71. [PMID: 11665578 DOI: 10.1016/s0076-6879(02)43145-x] [Citation(s) in RCA: 151] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Affiliation(s)
- Irache Visiers
- Department of Physiology and Biophysics, Mount Sinai School of Medicine, New York, New York 10029, USA
| | | | | |
Collapse
|
35
|
de la Cruz X, Sillitoe I, Orengo C. Use of structure comparison methods for the refinement of protein structure predictions. I. Identifying the structural family of a protein from low-resolution models. Proteins 2002; 46:72-84. [PMID: 11746704 DOI: 10.1002/prot.10002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Predicting the three-dimensional structure of proteins is still one of the most challenging problems in molecular biology. Despite its difficulty, several investigators have started to produce consistently low-resolution predictions for small proteins. However, in most of these cases, the prediction accuracy is still too low to make them useful. In the present article, we address the problem of obtaining better-quality predictions, starting from low-resolution models. To this end, we have devised a new procedure that uses these models, together with structure comparison methods, to identify the structural family of the target protein. This would allow, in a second step not described in the present work, to refine the predictions using conserved features of the identified family. In our approach, the structure database is investigated using predictions, at different accuracy levels, for a given protein. As query structures, we used both low-resolution versions of the native structures, as well as different sets of low accuracy predictions. In general, we found that for predictions with a resolution of > or =5-7 A, structure comparison methods were able to identify the fold of a protein in the top positions.
Collapse
Affiliation(s)
- Xavier de la Cruz
- Departmento de Bioquímica y Biología Molecular Facultad de Químicas; Universidad de Barcelona, Barcelona, Spain.
| | | | | |
Collapse
|
36
|
Kihara D, Lu H, Kolinski A, Skolnick J. TOUCHSTONE: an ab initio protein structure prediction method that uses threading-based tertiary restraints. Proc Natl Acad Sci U S A 2001; 98:10125-30. [PMID: 11504922 PMCID: PMC56926 DOI: 10.1073/pnas.181328398] [Citation(s) in RCA: 100] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2001] [Accepted: 06/28/2001] [Indexed: 11/18/2022] Open
Abstract
The successful prediction of protein structure from amino acid sequence requires two features: an efficient conformational search algorithm and an energy function with a global minimum in the native state. As a step toward addressing both issues, a threading-based method of secondary and tertiary restraint prediction has been developed and applied to ab initio folding. Such restraints are derived by extracting consensus contacts and local secondary structure from at least weakly scoring structures that, in some cases, can lack any global similarity to the sequence of interest. Furthermore, to generate representative protein structures, a reduced lattice-based protein model is used with replica exchange Monte Carlo to explore conformational space. We report results on the application of this methodology, termed TOUCHSTONE, to 65 proteins whose lengths range from 39 to 146 residues. For 47 (40) proteins, a cluster centroid whose rms deviation from native is below 6.5 (5) A is found in one of the five lowest energy centroids. The number of correctly predicted proteins increases to 50 when atomic detail is added and a knowledge-based atomic potential is combined with clustered and nonclustered structures for candidate selection. The combination of the ratio of the relative number of contacts to the protein length and the number of clusters generated by the folding algorithm is a reliable indicator of the likelihood of successful fold prediction, thereby opening the way for genome-scale ab initio folding.
Collapse
Affiliation(s)
- D Kihara
- Laboratory of Computational Genomics, Donald Danforth Plant Science Center, 893 North Warson Road, St. Louis, MO 63141, USA
| | | | | | | |
Collapse
|
37
|
Bonneau R, Baker D. Ab initio protein structure prediction: progress and prospects. ANNUAL REVIEW OF BIOPHYSICS AND BIOMOLECULAR STRUCTURE 2001; 30:173-89. [PMID: 11340057 DOI: 10.1146/annurev.biophys.30.1.173] [Citation(s) in RCA: 226] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Considerable recent progress has been made in the field of ab initio protein structure prediction, as witnessed by the third Critical Assessment of Structure Prediction (CASP3). In spite of this progress, much work remains, for the field has yet to produce consistently reliable ab initio structure prediction protocols. In this work, we review the features of current ab initio protocols in an attempt to highlight the foundations of recent progress in the field and suggest promising directions for future work.
Collapse
Affiliation(s)
- R Bonneau
- Department of Biochemistry, University of Washington, Seattle, Washington, Box 357350, 98195, USA.
| | | |
Collapse
|
38
|
Hassinen T, Peräkylä M. New energy terms for reduced protein models implemented in an off-lattice force field. J Comput Chem 2001. [DOI: 10.1002/jcc.1080] [Citation(s) in RCA: 106] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
39
|
Abstract
The cooperative folding of proteins implies a description by multibody potentials. Such multibody potentials can be generalized from common two-body statistical potentials through a relation to probability distributions of residue clusters via the Boltzmann condition. In this exploratory study, we compare a four-body statistical potential, defined by the Delaunay tessellation of protein structures, to the Miyazawa-Jernigan (MJ) potential for protein structure prediction, using a lattice chain growth algorithm. We use the four-body potential as a discriminatory function for conformational ensembles generated with the MJ potential and examine performance on a set of 22 proteins of 30-76 residues in length. We find that the four-body potential yields comparable results to the two-body MJ potential, namely, an average coordinate root-mean-square deviation (cRMSD) value of 8 A for the lowest energy configurations of all-alpha proteins, and somewhat poorer cRMSD values for other protein classes. For both two and four-body potentials, superpositions of some predicted and native structures show a rough overall agreement. Formulating the four-body potential using larger data sets and direct, but costly, generation of conformational ensembles with multibody potentials may offer further improvements. Proteins 2001;43:161-174.
Collapse
Affiliation(s)
- H H Gan
- Department of Chemistry and Courant Institute of Mathematical Sciences, New York University and the Howard Hughes Medical Institute, 251 Mercer Street, New York, NY 10012, USA
| | | | | |
Collapse
|
40
|
Abstract
The location of protein subunits that form early during folding, constituted of consecutive secondary structure elements with some intrinsic stability and favorable tertiary interactions, is predicted using a combination of threading algorithms and local structure prediction methods. Two folding units are selected among the candidates identified in a database of known protein structures: the fragment 15-55 of 434 cro, an all-alpha protein, and the fragment 1-35 of ubiquitin, an alpha/beta protein. These units are further analyzed by means of Monte Carlo simulated annealing using several database-derived potentials describing different types of interactions. Our results suggest that the local interactions along the chain dominate in the first folding steps of both fragments, and that the formation of some of the secondary structures necessarily occurs before structure compaction. These findings led us to define a prediction protocol, which is efficient to improve the accuracy of the predicted structures. It involves a first simulation with a local interaction potential only, whose final conformation is used as a starting structure of a second simulation that uses a combination of local interaction and distance potentials. The root mean square deviations between the coordinates of predicted and native structures are as low as 2-4 A in most trials. The possibility of extending this protocol to the prediction of full proteins is discussed. Proteins 2001;42:164-176.
Collapse
Affiliation(s)
- D Gilis
- Ingénierie Biomoléculaire, Université Libre de Bruxelles, Bruxelles, Belgium.
| | | |
Collapse
|
41
|
Abstract
We present a novel technique of sampling the configurations of helical proteins. Assuming knowledge of native secondary structure, we employ assembly rules gathered from a database of existing structures to enumerate the geometrically possible three-dimensional arrangements of the constituent helices. We produce a library of possible folds for 25 helical protein cores. In each case, our method finds significant numbers of conformations close to the native structure. In addition, we assign coordinates to all atoms for four of the 25 proteins and show that this has a small effect on the number of near-native conformations. In the context of database driven exhaustive enumeration our method performs extremely well, yielding significant percentages of conformations (between 0.02% and 82%) within 6 A of the native structure. The method's speed and efficiency make it a valuable tool for predicting protein structure.
Collapse
Affiliation(s)
- B Fain
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | |
Collapse
|
42
|
|
43
|
Abstract
We propose a coarse-grained model of proteins that take into account solvent effects and apply it for simulating folding of a three-helix-bundle protein. The energy functional form, refined from our previous work (Takada et al., J Chem Phys 1999;110:11616-11629), tries to closely imitate real physico-chemical interactions. In particular, the hydrogen bond that depends on local dielectric constant, the helix capping effect, and side-chain entropic effects are included. With use of the model, we simulate folding of the GA module of an albumin binding domain, 1prb(7-53), finding most trajectories reach at the native topology within 1 micros. In the simulation, helices 1 and 3 are mostly formed earlier accompanied by non-specific collapse, while second helix is intrinsically less stable and is formed with the help of tertiary contacts at later stage. We compute an analog of the transition state ensemble and compare it with those of other three-helix-bundle proteins. The transition state of 1prb(7-53) includes a few specific tertiary contacts of C terminus of helix 3 with the loop region between helices 1 and 2. This resembles, but is not equivalent to, an early formed region of fragment B of staphylococcal protein A, but is quite different from the folding transient structures of a de novo designed three-helix-bundle peptide.
Collapse
Affiliation(s)
- S Takada
- Department of Chemistry, Faculty of Science, Kobe University, Rokkodai Nada, Kobe, Japan.
| |
Collapse
|
44
|
Wilder J, Shakhnovich EI. Proteins with selected sequences: a heteropolymeric study. PHYSICAL REVIEW. E, STATISTICAL PHYSICS, PLASMAS, FLUIDS, AND RELATED INTERDISCIPLINARY TOPICS 2000; 62:7100-10. [PMID: 11102067 DOI: 10.1103/physreve.62.7100] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/05/2000] [Indexed: 11/07/2022]
Abstract
Protein sequences are expected not to be random but selected in order to form a stable native structure that is kinetically accessible. Therefore our model contains a selective temperature in sequence space (see [S. Ramanathan and E. Shakhnovich, Phys. Rev. E 50, 1303 (1994)] ) to optimize the sequence for the target conformation statistically. Replica calculations, which go beyond quadratic approximations in the field-theoretical Hamiltonian, are presented. A phase diagram indicating the temperatures and selective temperatures at which transitions to a frozen globule, i.e., the native state, occur is obtained. It is shown that going beyond the quadratic approximation in the field Hamiltonian is very important, since it results in a significant change of the phase diagram. Moreover, we suggest that a one-step replica permutation symmetry scheme is sufficient to solve the model. In addition to this we present a result for the sequence correlation function along the chain in the case of a short-ranged potential between the monomers. A correlation function between monomers that form a contact in the native state is given depending on the temperature and the interaction parameter.
Collapse
Affiliation(s)
- J Wilder
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, Massachusetts 02138, USA.
| | | |
Collapse
|
45
|
Simmerling C, Lee MR, Ortiz AR, Kolinski A, Skolnick J, Kollman PA. Combining MONSSTER and LES/PME to Predict Protein Structure from Amino Acid Sequence: Application to the Small Protein CMTI-1. J Am Chem Soc 2000. [DOI: 10.1021/ja993119k] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Carlos Simmerling
- Contribution from the Department of Pharmaceutical Chemistry, University of California, 513 Parnassus, San Francisco, California 94143-0446, Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037
| | - Matthew R. Lee
- Contribution from the Department of Pharmaceutical Chemistry, University of California, 513 Parnassus, San Francisco, California 94143-0446, Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037
| | - Angel. R. Ortiz
- Contribution from the Department of Pharmaceutical Chemistry, University of California, 513 Parnassus, San Francisco, California 94143-0446, Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037
| | - Andrzej Kolinski
- Contribution from the Department of Pharmaceutical Chemistry, University of California, 513 Parnassus, San Francisco, California 94143-0446, Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037
| | - Jeffrey Skolnick
- Contribution from the Department of Pharmaceutical Chemistry, University of California, 513 Parnassus, San Francisco, California 94143-0446, Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037
| | - Peter A. Kollman
- Contribution from the Department of Pharmaceutical Chemistry, University of California, 513 Parnassus, San Francisco, California 94143-0446, Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037
| |
Collapse
|
46
|
Xia Y, Huang ES, Levitt M, Samudrala R. Ab initio construction of protein tertiary structures using a hierarchical approach. J Mol Biol 2000; 300:171-85. [PMID: 10864507 DOI: 10.1006/jmbi.2000.3835] [Citation(s) in RCA: 141] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We present a hierarchical method to predict protein tertiary structure models from sequence. We start with complete enumeration of conformations using a simple tetrahedral lattice model. We then build conformations with increasing detail, and at each step select a subset of conformations using empirical energy functions with increasing complexity. After enumeration on lattice, we select a subset of low energy conformations using a statistical residue-residue contact energy function, and generate all-atom models using predicted secondary structure. A combined knowledge-based atomic level energy function is then used to select subsets of the all-atom models. The final predictions are generated using a consensus distance geometry procedure. We test the feasibility of the procedure on a set of 12 small proteins covering a wide range of protein topologies. A rigorous double-blind test of our method was made under the auspices of the CASP3 experiment, where we did ab initio structure predictions for 12 proteins using this approach. The performance of our methodology at CASP3 is reasonably good and completely consistent with our initial tests.
Collapse
Affiliation(s)
- Y Xia
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA, 94305, USA
| | | | | | | |
Collapse
|
47
|
Skolnick J, Fetrow JS. From genes to protein structure and function: novel applications of computational approaches in the genomic era. Trends Biotechnol 2000; 18:34-9. [PMID: 10631780 DOI: 10.1016/s0167-7799(99)01398-0] [Citation(s) in RCA: 92] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
The genome-sequencing projects are providing a detailed 'parts list' of life. A key to comprehending this list is understanding the function of each gene and each protein at various levels. Sequence-based methods for function prediction are inadequate because of the multifunctional nature of proteins. However, just knowing the structure of the protein is also insufficient for prediction of multiple functional sites. Structural descriptors for protein functional sites are crucial for unlocking the secrets in both the sequence and structural-genomics projects.
Collapse
Affiliation(s)
- J Skolnick
- Danforth Plant Science Center, Laboratory of Computational Genomics, St Louis, MO 63108, USA.
| | | |
Collapse
|
48
|
Gan HH, Tropsha A, Schlick T. Generating folded protein structures with a lattice chain growth algorithm. J Chem Phys 2000. [DOI: 10.1063/1.1289822] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
49
|
|
50
|
Olmea O, Rost B, Valencia A. Effective use of sequence correlation and conservation in fold recognition. J Mol Biol 1999; 293:1221-39. [PMID: 10547297 DOI: 10.1006/jmbi.1999.3208] [Citation(s) in RCA: 131] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Protein families are a rich source of information; sequence conservation and sequence correlation are two of the main properties that can be derived from the analysis of multiple sequence alignments. Sequence conservation is related to the direct evolutionary pressure to retain the chemical characteristics of some positions in order to maintain a given function. Sequence correlation is attributed to the small sequence adjustments needed to maintain protein stability against constant mutational drift. Here, we showed that sequence conservation and correlation were each frequently informative enough to detect incorrectly folded proteins. Furthermore, combining conservation, correlation, and polarity, we achieved an almost perfect discrimination between native and incorrectly folded proteins. Thus, we made use of this information for threading by evaluating the models suggested by a threading method according to the degree of proximity of the corresponding correlated, conserved, and apolar residues. The results showed that the fold recognition capacity of a given threading approach could be improved almost fourfold by selecting the alignments that score best under the three different sequence-based approaches.
Collapse
Affiliation(s)
- O Olmea
- Protein Design Group, CNB-CSIC, Cantoblanco, Madrid, E-28049, Spain
| | | | | |
Collapse
|