1
|
Bæk KT, Kepp KP. Assessment of AlphaFold2 for Human Proteins via Residue Solvent Exposure. J Chem Inf Model 2022; 62:3391-3400. [PMID: 35785970 DOI: 10.1021/acs.jcim.2c00243] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
As only 35% of human proteins feature (often partial) PDB structures, the protein structure prediction tool AlphaFold2 (AF2) could have massive impact on human biology and medicine fields, making independent benchmarks of interest. We studied AF2's ability to describe the backbone solvent exposure as a functionally important and easily interpretable "natural coordinate" of protein conformation, using human proteins as test case. After screening for appropriate comparative sets, we matched 1818 human proteins predicted by AF2 against 7585 unique experimental PDBs, and after curation for sequence overlap, we assessed 1264 comparative pairs comprising 115 unique AF2 structures and 652 unique experimental structures. AF2 performed markedly worse for multimers, whereas ligands, cofactors, and experimental resolution were interestingly not very important for performance. AF2 performed excellently for monomer proteins. Challenges relating to specific groups of residues and multimers were analyzed. We identified larger deviations for lower-confidence scores (pLDDT), and exposed residues and polar residues (e.g., Asp, Glu, Asn) being less accurately described than hydrophobic residues. Proline conformations were the hardest to predict, probably due to a common location in dynamic solvent-accessible parts. In summary, using solvent exposure as a metric, we quantified the performance of AF2 for human proteins and provided estimates of the expected agreement as a function of ligand presence, multimer/monomer status, local residue solvent exposure, pLDDT, and amino acid type. Overall performance was found to be excellent.
Collapse
Affiliation(s)
- Kristoffer T Bæk
- DTU Chemistry, Technical University of Denmark, Building 206, Kgs. Lyngby 2800, Denmark
| | - Kasper P Kepp
- DTU Chemistry, Technical University of Denmark, Building 206, Kgs. Lyngby 2800, Denmark
| |
Collapse
|
2
|
Skolnick J, Gao M, Zhou H, Singh S. AlphaFold 2: Why It Works and Its Implications for Understanding the Relationships of Protein Sequence, Structure, and Function. J Chem Inf Model 2021; 61:4827-4831. [PMID: 34586808 DOI: 10.1021/acs.jcim.1c01114] [Citation(s) in RCA: 81] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
AlphaFold 2 (AF2) was the star of CASP14, the last biannual structure prediction experiment. Using novel deep learning, AF2 predicted the structures of many difficult protein targets at or near experimental resolution. Here, we present our perspective of why AF2 works and show that it is a very sophisticated fold recognition algorithm that exploits the completeness of the library of single domain PDB structures. It has also learned local side chain packing rearrangements that enable it to refine proteins to high resolution. The benefits and limitations of its ability to predict the structures of many more proteins at or close to atomic detail are discussed.
Collapse
Affiliation(s)
- Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Mu Gao
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Hongyi Zhou
- Center for the Study of Systems Biology, School of Biological Sciences, Georgia Institute of Technology, Atlanta, Georgia 30332, United States
| | - Suresh Singh
- Twilight Design, 4 Adams Road, Kendall Park, New Jersey 08824, United States
| |
Collapse
|
3
|
DESTINI: A deep-learning approach to contact-driven protein structure prediction. Sci Rep 2019; 9:3514. [PMID: 30837676 PMCID: PMC6401133 DOI: 10.1038/s41598-019-40314-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2018] [Accepted: 02/12/2019] [Indexed: 11/09/2022] Open
Abstract
The amino acid sequence of a protein encodes the blueprint of its native structure. To predict the corresponding structural fold from the protein’s sequence is one of most challenging problems in computational biology. In this work, we introduce DESTINI (deep structural inference for proteins), a novel computational approach that combines a deep-learning algorithm for protein residue/residue contact prediction with template-based structural modelling. For the first time, the significantly improved predictive ability is demonstrated in the large-scale tertiary structure prediction of over 1,200 single-domain proteins. DESTINI successfully predicts the tertiary structure of four times the number of “hard” targets (those with poor quality templates) that were previously intractable, viz, a “glass-ceiling” for previous template-based approaches, and also improves model quality for “easy” targets (those with good quality templates). The significantly better performance by DESTINI is largely due to the incorporation of better contact prediction into template modelling. To understand why deep-learning accomplishes more accurate contact prediction, systematic clustering reveals that deep-learning predicts coherent, native-like contact patterns compared to co-evolutionary analysis. Taken together, this work presents a promising strategy towards solving the protein structure prediction problem.
Collapse
|
4
|
Karczyńska AS, Czaplewski C, Krupa P, Mozolewska MA, Joo K, Lee J, Liwo A. Ergodicity and model quality in template-restrained canonical and temperature/Hamiltonian replica exchange coarse-grained molecular dynamics simulations of proteins. J Comput Chem 2017; 38:2730-2746. [DOI: 10.1002/jcc.25070] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2017] [Revised: 07/10/2017] [Accepted: 09/01/2017] [Indexed: 01/22/2023]
Affiliation(s)
- Agnieszka S. Karczyńska
- Faculty of Chemistry; University of Gdańsk, ul. Wita Stwosza 63; Gdańsk 80-308 Poland
- Center for In Silico Protein Science; Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu; Seoul 02455 Republic of Korea
- School of Computational Sciences; Korea Institute for Advanced Study, 85 Hoegiro Dongdaemun-gu; Seoul 02455 Republic of Korea
| | - Cezary Czaplewski
- Faculty of Chemistry; University of Gdańsk, ul. Wita Stwosza 63; Gdańsk 80-308 Poland
| | - Paweł Krupa
- Faculty of Chemistry; University of Gdańsk, ul. Wita Stwosza 63; Gdańsk 80-308 Poland
- Institute of Physics, Polish Academy of Sciences, Aleja Lotników 32/46; Warsaw PL 02668 Poland
| | - Magdalena A. Mozolewska
- Faculty of Chemistry; University of Gdańsk, ul. Wita Stwosza 63; Gdańsk 80-308 Poland
- Institute of Computer Science, Polish Academy of Sciences, ul. Jana Kazimierza 5; Warsaw 01-248 Poland
| | - Keehyoung Joo
- School of Computational Sciences; Korea Institute for Advanced Study, 85 Hoegiro Dongdaemun-gu; Seoul 02455 Republic of Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu; Seoul 02455 Republic of Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science; Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu; Seoul 02455 Republic of Korea
- School of Computational Sciences; Korea Institute for Advanced Study, 85 Hoegiro Dongdaemun-gu; Seoul 02455 Republic of Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu; Seoul 02455 Republic of Korea
| | - Adam Liwo
- Faculty of Chemistry; University of Gdańsk, ul. Wita Stwosza 63; Gdańsk 80-308 Poland
- Center for In Silico Protein Science; Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu; Seoul 02455 Republic of Korea
- School of Computational Sciences; Korea Institute for Advanced Study, 85 Hoegiro Dongdaemun-gu; Seoul 02455 Republic of Korea
| |
Collapse
|
5
|
Mozolewska MA, Krupa P, Zaborowski B, Liwo A, Lee J, Joo K, Czaplewski C. Use of Restraints from Consensus Fragments of Multiple Server Models To Enhance Protein-Structure Prediction Capability of the UNRES Force Field. J Chem Inf Model 2016; 56:2263-2279. [DOI: 10.1021/acs.jcim.6b00189] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Affiliation(s)
| | - Paweł Krupa
- Faculty
of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| | | | - Adam Liwo
- Faculty
of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
- Center
for In Silico Protein Structure and School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Jooyoung Lee
- Center
for In Silico Protein Structure and School of Computational Sciences, Korea Institute for Advanced Study, 85 Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Keehyoung Joo
- Center
for Advanced Computation, Korea Institute for Advanced Study, 85
Hoegiro, Dongdaemun-gu, Seoul 130-722, Republic of Korea
| | - Cezary Czaplewski
- Faculty
of Chemistry, University of Gdańsk, Wita Stwosza 63, 80-308 Gdańsk, Poland
| |
Collapse
|
6
|
Hoque MT, Yang Y, Mishra A, Zhou Y. s
DFIRE
: Sequence‐specific statistical energy function for protein structure prediction by decoy selections. J Comput Chem 2016; 37:1119-24. [DOI: 10.1002/jcc.24298] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2015] [Revised: 12/06/2015] [Accepted: 12/13/2015] [Indexed: 12/15/2022]
Affiliation(s)
- Md Tamjidul Hoque
- Computer Science, University of New Orleans, New OrleansLouisiana70148
| | - Yuedong Yang
- Institute for Glycomics and School of Informatics and Communication Technology, Griffith UniversityQueensland4222 Australia
| | - Avdesh Mishra
- Computer Science, University of New Orleans, New OrleansLouisiana70148
| | - Yaoqi Zhou
- Institute for Glycomics and School of Informatics and Communication Technology, Griffith UniversityQueensland4222 Australia
| |
Collapse
|
7
|
Kurcinski M, Jamroz M, Blaszczyk M, Kolinski A, Kmiecik S. CABS-dock web server for the flexible docking of peptides to proteins without prior knowledge of the binding site. Nucleic Acids Res 2015; 43:W419-24. [PMID: 25943545 PMCID: PMC4489223 DOI: 10.1093/nar/gkv456] [Citation(s) in RCA: 265] [Impact Index Per Article: 29.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2015] [Accepted: 04/24/2015] [Indexed: 01/15/2023] Open
Abstract
Protein–peptide interactions play a key role in cell functions. Their structural characterization, though challenging, is important for the discovery of new drugs. The CABS-dock web server provides an interface for modeling protein–peptide interactions using a highly efficient protocol for the flexible docking of peptides to proteins. While other docking algorithms require pre-defined localization of the binding site, CABS-dock does not require such knowledge. Given a protein receptor structure and a peptide sequence (and starting from random conformations and positions of the peptide), CABS-dock performs simulation search for the binding site allowing for full flexibility of the peptide and small fluctuations of the receptor backbone. This protocol was extensively tested over the largest dataset of non-redundant protein–peptide interactions available to date (including bound and unbound docking cases). For over 80% of bound and unbound dataset cases, we obtained models with high or medium accuracy (sufficient for practical applications). Additionally, as optional features, CABS-dock can exclude user-selected binding modes from docking search or to increase the level of flexibility for chosen receptor fragments. CABS-dock is freely available as a web server at http://biocomp.chem.uw.edu.pl/CABSdock.
Collapse
Affiliation(s)
- Mateusz Kurcinski
- Department of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Michal Jamroz
- Department of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Maciej Blaszczyk
- Department of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Andrzej Kolinski
- Department of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Sebastian Kmiecik
- Department of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| |
Collapse
|
8
|
Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, Sander C. Protein 3D structure computed from evolutionary sequence variation. PLoS One 2011; 6:e28766. [PMID: 22163331 PMCID: PMC3233603 DOI: 10.1371/journal.pone.0028766] [Citation(s) in RCA: 731] [Impact Index Per Article: 56.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2011] [Accepted: 11/14/2011] [Indexed: 11/19/2022] Open
Abstract
The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing. In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy. We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues., including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7–4.8 Å Cα-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org). This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of protein structures, new strategies in protein and drug design, and the identification of functional genetic variants in normal and disease genomes.
Collapse
Affiliation(s)
- Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, United States of America.
| | | | | | | | | | | | | |
Collapse
|
9
|
Zhou Y, Duan Y, Yang Y, Faraggi E, Lei H. Trends in template/fragment-free protein structure prediction. Theor Chem Acc 2011; 128:3-16. [PMID: 21423322 PMCID: PMC3030773 DOI: 10.1007/s00214-010-0799-2] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2010] [Accepted: 08/15/2010] [Indexed: 12/13/2022]
Abstract
Predicting the structure of a protein from its amino acid sequence is a long-standing unsolved problem in computational biology. Its solution would be of both fundamental and practical importance as the gap between the number of known sequences and the number of experimentally solved structures widens rapidly. Currently, the most successful approaches are based on fragment/template reassembly. Lacking progress in template-free structure prediction calls for novel ideas and approaches. This article reviews trends in the development of physical and specific knowledge-based energy functions as well as sampling techniques for fragment-free structure prediction. Recent physical- and knowledge-based studies demonstrated that it is possible to sample and predict highly accurate protein structures without borrowing native fragments from known protein structures. These emerging approaches with fully flexible sampling have the potential to move the field forward.
Collapse
Affiliation(s)
- Yaoqi Zhou
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Yong Duan
- UC Davis Genome Center and Department of Applied Science, University of California, One Shields Avenue, Davis, CA USA
- College of Physics, Huazhong University of Science and Technology, 1037 Luoyu Road, 430074 Wuhan, China
| | - Yuedong Yang
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Eshel Faraggi
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Hongxing Lei
- UC Davis Genome Center and Department of Applied Science, University of California, One Shields Avenue, Davis, CA USA
- Beijing Institute of Genomics, Chinese Academy of Sciences, 100029 Beijing, China
| |
Collapse
|
10
|
Rajgaria R, Wei Y, Floudas CA. Contact prediction for beta and alpha-beta proteins using integer linear optimization and its impact on the first principles 3D structure prediction method ASTRO-FOLD. Proteins 2010; 78:1825-46. [PMID: 20225257 PMCID: PMC2858251 DOI: 10.1002/prot.22696] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
An integer linear optimization model is presented to predict residue contacts in beta, alpha + beta, and alpha/beta proteins. The total energy of a protein is expressed as sum of a C(alpha)-C(alpha) distance dependent contact energy contribution and a hydrophobic contribution. The model selects contact that assign lowest energy to the protein structure as satisfying a set of constraints that are included to enforce certain physically observed topological information. A new method based on hydrophobicity is proposed to find the beta-sheet alignments. These beta-sheet alignments are used as constraints for contacts between residues of beta-sheets. This model was tested on three independent protein test sets and CASP8 test proteins consisting of beta, alpha + beta, alpha/beta proteins and it was found to perform very well. The average accuracy of the predictions (separated by at least six residues) was approximately 61%. The average true positive and false positive distances were also calculated for each of the test sets and they are 7.58 A and 15.88 A, respectively. Residue contact prediction can be directly used to facilitate the protein tertiary structure prediction. This proposed residue contact prediction model is incorporated into the first principles protein tertiary structure prediction approach, ASTRO-FOLD. The effectiveness of the contact prediction model was further demonstrated by the improvement in the quality of the protein structure ensemble generated using the predicted residue contacts for a test set of 10 proteins.
Collapse
Affiliation(s)
- R. Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - Y. Wei
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - C. A. Floudas
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| |
Collapse
|
11
|
Faraggi E, Yang Y, Zhang S, Zhou Y. Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. Structure 2010; 17:1515-27. [PMID: 19913486 DOI: 10.1016/j.str.2009.09.006] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2009] [Revised: 09/01/2009] [Accepted: 09/03/2009] [Indexed: 11/30/2022]
Abstract
Local structures predicted from protein sequences are used extensively in every aspect of modeling and prediction of protein structure and function. For more than 50 years, they have been predicted at a low-resolution coarse-grained level (e.g., three-state secondary structure). Here, we combine a two-state classifier with real-value predictor to predict local structure in continuous representation by backbone torsion angles. The accuracy of the angles predicted by this approach is close to that derived from NMR chemical shifts. Their substitution for predicted secondary structure as restraints for ab initio structure prediction doubles the success rate. This result demonstrates the potential of predicted local structure for fragment-free tertiary-structure prediction. It further implies potentially significant benefits from using predicted real-valued torsion angles as a replacement for or supplement to the secondary-structure prediction tools used almost exclusively in many computational methods ranging from sequence alignment to function prediction.
Collapse
Affiliation(s)
- Eshel Faraggi
- Indiana University School of Informatics, Indiana University-Purdue University and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | | | | | | |
Collapse
|
12
|
Abia D, Bastolla U, Chacón P, Fábrega C, Gago F, Morreale A, Tramontano A. In memoriam. Proteins 2010; 78:iii-viii. [DOI: 10.1002/prot.22660] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
13
|
Aloy P, Oliva B. Splitting statistical potentials into meaningful scoring functions: testing the prediction of near-native structures from decoy conformations. BMC STRUCTURAL BIOLOGY 2009; 9:71. [PMID: 19917096 PMCID: PMC2783033 DOI: 10.1186/1472-6807-9-71] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2009] [Accepted: 11/16/2009] [Indexed: 11/20/2022]
Abstract
Background Recent advances on high-throughput technologies have produced a vast amount of protein sequences, while the number of high-resolution structures has seen a limited increase. This has impelled the production of many strategies to built protein structures from its sequence, generating a considerable amount of alternative models. The selection of the closest model to the native conformation has thus become crucial for structure prediction. Several methods have been developed to score protein models by energies, knowledge-based potentials and combination of both. Results Here, we present and demonstrate a theory to split the knowledge-based potentials in scoring terms biologically meaningful and to combine them in new scores to predict near-native structures. Our strategy allows circumventing the problem of defining the reference state. In this approach we give the proof for a simple and linear application that can be further improved by optimizing the combination of Zscores. Using the simplest composite score () we obtained predictions similar to state-of-the-art methods. Besides, our approach has the advantage of identifying the most relevant terms involved in the stability of the protein structure. Finally, we also use the composite Zscores to assess the conformation of models and to detect local errors. Conclusion We have introduced a method to split knowledge-based potentials and to solve the problem of defining a reference state. The new scores have detected near-native structures as accurately as state-of-art methods and have been successful to identify wrongly modeled regions of many near-native conformations.
Collapse
Affiliation(s)
- Patrick Aloy
- Institut de Recerca Biomèdica and Barcelona Supercomputing Center, 10-12 08028 Barcelona, Catalonia, Spain.
| | | |
Collapse
|
14
|
Reconstruction and stability of secondary structure elements in the context of protein structure prediction. Biophys J 2009; 96:4399-408. [PMID: 19486664 DOI: 10.1016/j.bpj.2009.02.057] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2008] [Revised: 01/28/2009] [Accepted: 02/19/2009] [Indexed: 11/20/2022] Open
Abstract
Efficient and accurate reconstruction of secondary structure elements in the context of protein structure prediction is the major focus of this work. We present a novel approach capable of reconstructing alpha-helices and beta-sheets in atomic detail. The method is based on Metropolis Monte Carlo simulations in a force field of empirical potentials that are designed to stabilize secondary structure elements in room-temperature simulations. Particular attention is paid to lateral side-chain interactions in beta-sheets and between the turns of alpha-helices, as well as backbone hydrogen bonding. The force constants are optimized using contrastive divergence, a novel machine learning technique, from a data set of known structures. Using this approach, we demonstrate the applicability of the framework to the problem of reconstructing the overall protein fold for a number of commonly studied small proteins, based on only predicted secondary structure and contact map. For protein G and chymotrypsin inhibitor 2, we are able to reconstruct the secondary structure elements in atomic detail and the overall protein folds with a root mean-square deviation of <10 A. For cold-shock protein and the SH3 domain, we accurately reproduce the secondary structure elements and the topology of the 5-stranded beta-sheets, but not the barrel structure. The importance of high-quality secondary structure and contact map prediction is discussed.
Collapse
|
15
|
Rajgaria R, McAllister SR, Floudas CA. Towards accurate residue-residue hydrophobic contact prediction for alpha helical proteins via integer linear optimization. Proteins 2009; 74:929-47. [PMID: 18767158 DOI: 10.1002/prot.22202] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
A new optimization-based method is presented to predict the hydrophobic residue contacts in alpha-helical proteins. The proposed approach uses a high resolution distance dependent force field to calculate the interaction energy between different residues of a protein. The formulation predicts the hydrophobic contacts by minimizing the sum of these contact energies. These residue contacts are highly useful in narrowing down the conformational space searched by protein structure prediction algorithms. The proposed algorithm also offers the algorithmic advantage of producing a rank ordered list of the best contact sets. This model was tested on four independent alpha-helical protein test sets and was found to perform very well. The average accuracy of the predictions (separated by at least six residues) obtained using the presented method was approximately 66% for single domain proteins. The average true positive and false positive distances were also calculated for each protein test set and they are 8.87 and 14.67 A, respectively.
Collapse
Affiliation(s)
- R Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | |
Collapse
|
16
|
Miller CS, Eisenberg D. Using inferred residue contacts to distinguish between correct and incorrect protein models. ACTA ACUST UNITED AC 2008; 24:1575-82. [PMID: 18511466 PMCID: PMC2638260 DOI: 10.1093/bioinformatics/btn248] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Motivation: The de novo prediction of 3D protein structure is enjoying a period of dramatic improvements. Often, a remaining difficulty is to select the model closest to the true structure from a group of low-energy candidates. To what extent can inter-residue contact predictions from multiple sequence alignments, information which is orthogonal to that used in most structure prediction algorithms, be used to identify those models most similar to the native protein structure? Results: We present a Bayesian inference procedure to identify residue pairs that are spatially proximal in a protein structure. The method takes as input a multiple sequence alignment, and outputs an accurate posterior probability of proximity for each residue pair. We exploit a recent metagenomic sequencing project to create large, diverse and informative multiple sequence alignments for a test set of 1656 known protein structures. The method infers spatially proximal residue pairs in this test set with good accuracy: top-ranked predictions achieve an average accuracy of 38% (for an average 21-fold improvement over random predictions) in cross-validation tests. Notably, the accuracy of predicted 3D models generated by a range of structure prediction algorithms strongly correlates with how well the models satisfy probable residue contacts inferred via our method. This correlation allows for confident rejection of incorrect structural models. Availability: An implementation of the method is freely available at http://www.doe-mbi.ucla.edu/services Contact:david@mbi.ucla.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher S Miller
- UCLA-DOE Institute for Genomics & Proteomics, Molecular Biology Institute, Box 951570, UCLA, Los Angeles, CA 90095, USA
| | | |
Collapse
|
17
|
Vicatos S, Kaznessis YN. Separating true positive predicted residue contacts from false positive ones in mainly alpha proteins, using constrained Metropolis MC simulations. Proteins 2008; 70:539-52. [PMID: 17879348 DOI: 10.1002/prot.21553] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We present a method that significantly improves the accuracy of predicted proximal residue pairs in protein molecules. Computational methods for predicting pairs of amino acids that are distant in the protein sequence but close in the protein 3D structure can benefit attempts to in silico recognize the fold of a protein molecule. Unfortunately, currently available methods suffer from low predictive accuracy. In this work, we use Monte Carlo simulations to fold protein molecules with proximal pair predictions used as additional energy constraints. To test our methods, we study molecules with known tertiary structures. With Monte Carlo, we generate ensembles of structures for each set of residues constraints. The distribution of the root mean square deviation of the folded structures from the known native structure reveals clear information about the accuracy of the constraint sets used. With recursive substitutions of constraints, false positive predictions are identified and filtered out and significant improvements in accuracy are observed.
Collapse
Affiliation(s)
- Spyridon Vicatos
- Department of Chemical Engineering and Materials Science, University of Minnesota, Minneapolis, Minnesota 55455, USA
| | | |
Collapse
|
18
|
Fuchs A, Martin-Galiano AJ, Kalman M, Fleishman S, Ben-Tal N, Frishman D. Co-evolving residues in membrane proteins. Bioinformatics 2007; 23:3312-9. [DOI: 10.1093/bioinformatics/btm515] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
19
|
Frenkel-Morgenstern M, Magid R, Eyal E, Pietrokovski S. Refining intra-protein contact prediction by graph analysis. BMC Bioinformatics 2007; 8 Suppl 5:S6. [PMID: 17570865 PMCID: PMC1892094 DOI: 10.1186/1471-2105-8-s5-s6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Background Accurate prediction of intra-protein residue contacts from sequence information will allow the prediction of protein structures. Basic predictions of such specific contacts can be further refined by jointly analyzing predicted contacts, and by adding information on the relative positions of contacts in the protein primary sequence. Results We introduce a method for graph analysis refinement of intra-protein contacts, termed GARP. Our previously presented intra-contact prediction method by means of pair-to-pair substitution matrix (P2PConPred) was used to test the GARP method. In our approach, the top contact predictions obtained by a basic prediction method were used as edges to create a weighted graph. The edges were scored by a mutual clustering coefficient that identifies highly connected graph regions, and by the density of edges between the sequence regions of the edge nodes. A test set of 57 proteins with known structures was used to determine contacts. GARP improves the accuracy of the P2PConPred basic prediction method in whole proteins from 12% to 18%. Conclusion Using a simple approach we increased the contact prediction accuracy of a basic method by 1.5 times. Our graph approach is simple to implement, can be used with various basic prediction methods, and can provide input for further downstream analyses.
Collapse
Affiliation(s)
| | - Rachel Magid
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Eran Eyal
- Department of Computational Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Shmuel Pietrokovski
- Department of Molecular Genetics, Weizmann Institute of Science, Rehovot, 76100, Israel
| |
Collapse
|
20
|
Eyal E, Frenkel-Morgenstern M, Sobolev V, Pietrokovski S. A pair-to-pair amino acids substitution matrix and its applications for protein structure prediction. Proteins 2007; 67:142-53. [PMID: 17243158 DOI: 10.1002/prot.21223] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We present a new structurally derived pair-to-pair substitution matrix (P2PMAT). This matrix is constructed from a very large amount of integrated high quality multiple sequence alignments (Blocks) and protein structures. It evaluates the likelihoods of all 160,000 pair-to-pair substitutions. P2PMAT matrix implicitly accounts for evolutionary conservation, correlated mutations, and residue-residue contact potentials. The usefulness of the matrix for structural predictions is shown in this article. Predicting protein residue-residue contacts from sequence information alone, by our method (P2PConPred) is particularly accurate in the protein cores, where it performs better than other basic contact prediction methods (increasing accuracy by 25-60%). The method mean accuracy for protein cores is 24% for 59 diverse families and 34% for a subset of proteins shorter than 100 residues. This is above the level that was recently shown to be sufficient to significantly improve ab initio protein structure prediction. We also demonstrate the ability of our approach to identify native structures within large sets of (300-2000) protein decoys. On the basis of evolutionary information alone our method ranks the native structure in the top 0.3% of the decoys in 4/10 of the sets, and in 8/10 of sets the native structure is ranked in the top 10% of the decoys. The method can, thus, be used to assist filtering wrong models, complementing traditional scoring functions.
Collapse
Affiliation(s)
- Eran Eyal
- Department of Plant Sciences, Weizmann Institute of Science, Rehovot 76100, Israel.
| | | | | | | |
Collapse
|
21
|
Skolnick J, Kolinski A. Monte Carlo Approaches to the Protein Folding Problem. ADVANCES IN CHEMICAL PHYSICS 2007. [DOI: 10.1002/9780470141649.ch7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
|
22
|
Liu D, Xiong X, DasGupta B, Zhang H. Motif discoveries in unaligned molecular sequences using self-organizing neural networks. IEEE TRANSACTIONS ON NEURAL NETWORKS 2006; 17:919-928. [PMID: 16856655 DOI: 10.1109/tnn.2006.875987] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In this paper, we study the problem of motif discoveries in unaligned DNA and protein sequences. The problem of motif identification in DNA and protein sequences has been studied for many years in the literature. Major hurdles at this point include computational complexity and reliability of the search algorithms. We propose a self-organizing neural network structure for solving the problem of motif identification in DNA and protein sequences. Our network contains several layers, with each layer performing classifications at different levels. The top layer divides the input space into a small number of regions and the bottom layer classifies all input patterns into motifs and nonmotif patterns. Depending on the number of input patterns to be classified, several layers between the top layer and the bottom layer are needed to perform intermediate classifications. We maintain a low computational complexity through the use of the layered structure so that each pattern's classification is performed with respect to a small subspace of the whole input space. Our self-organizing neural network will grow as needed (e.g., when more motif patterns are classified). It will give the same amount of attention to each input pattern and will not omit any potential motif patterns. Finally, simulation results show that our algorithm outperforms existing algorithms in certain aspects. In particular, simulation results show that our algorithm can identify motifs with more mutations than existing algorithms. Our algorithm works well for long DNA sequences as well.
Collapse
|
23
|
Perez-Jimenez R, Godoy-Ruiz R, Parody-Morreale A, Ibarra-Molero B, Sanchez-Ruiz JM. A simple tool to explore the distance distribution of correlated mutations in proteins. Biophys Chem 2005; 119:240-6. [PMID: 16239060 DOI: 10.1016/j.bpc.2005.09.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2005] [Revised: 09/15/2005] [Accepted: 09/15/2005] [Indexed: 11/18/2022]
Abstract
The analysis of correlated mutations in protein sequence alignments is of considerable interest, since it may provide useful energetic and even structural information (ideally, residue contacts). However, a number of recent experimental studies support the existence of long-distance communication in proteins, a fact that may lead to correlation between distant residues. We introduce in this work a simple statistical procedure to describe the relation structure--alignments on the basis of the residue--residue distance dependence of the number of residue couples over given thresholds of a correlation measure (such as a covariance value). This procedure may lead to clear pictures of the distance distribution of correlated mutations and may provide a simple but efficient tool to explore the different structural features that are reflected in the sequence alignments.
Collapse
Affiliation(s)
- Raul Perez-Jimenez
- Departamento de Quimica Fisica, Facultad de Ciencias, Universidad de Granada, Fuentenueva s/n, 18071-Granada, Spain
| | | | | | | | | |
Collapse
|
24
|
Vicatos S, Reddy BVB, Kaznessis Y. Prediction of distant residue contacts with the use of evolutionary information. Proteins 2005; 58:935-49. [PMID: 15645442 DOI: 10.1002/prot.20370] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In this work we present a novel correlated mutations analysis (CMA) method that is significantly more accurate than previously reported CMA methods. Calculation of correlation coefficients is based on physicochemical properties of residues (predictors) and not on substitution matrices. This results in reliable prediction of pairs of residues that are distant in protein sequence but proximal in its three dimensional tertiary structure. Multiple sequence alignments (MSA) containing a sequence of known structure for 127 families from PFAM database have been selected so that all major protein architectures described in CATH classification database are represented. Protein sequences in the selected families were filtered so that only those evolutionarily close to the target protein remain in the MSA. The average accuracy obtained for the alpha beta class of proteins was 26.8% of predicted proximal pairs with average improvement over random accuracy (IOR) of 6.41. Average accuracy is 20.6% for the mainly beta class and 14.4% for the mainly alpha class. The optimum correlation coefficient cutoff (cc cutoff) was found to be around 0.65. The first predictor, which correlates to hydrophobicity, provides the most reliable results. The other two predictors give good predictions which can be used in conjunction to those of the first one. When stricter cc cutoff is chosen, the average accuracy increases significantly (38.76% for alpha beta class), but the trade off is a smaller number of predictions. The use of solvent accessible area estimations for filtering false positives out of the predictions is promising.
Collapse
Affiliation(s)
- Spyridon Vicatos
- Department of Chemical Engineering and Materials Science, University of Minnesota,Minneapolis, Minnesota 55455, USA
| | | | | |
Collapse
|
25
|
|
26
|
Ortiz AR, Strauss CEM, Olmea O. MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci 2002; 11:2606-21. [PMID: 12381844 PMCID: PMC2373724 DOI: 10.1110/ps.0215902] [Citation(s) in RCA: 320] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Advances in structural genomics and protein structure prediction require the design of automatic, fast, objective, and well benchmarked methods capable of comparing and assessing the similarity of low-resolution three-dimensional structures, via experimental or theoretical approaches. Here, a new method for sequence-independent structural alignment is presented that allows comparison of an experimental protein structure with an arbitrary low-resolution protein tertiary model. The heuristic algorithm is given and then used to show that it can describe random structural alignments of proteins with different folds with good accuracy by an extreme value distribution. From this observation, a structural similarity score between two proteins or two different conformations of the same protein is derived from the likelihood of obtaining a given structural alignment by chance. The performance of the derived score is then compared with well established, consensus manual-based scores and data sets. We found that the new approach correlates better than other tools with the gold standard provided by a human evaluator. Timings indicate that the algorithm is fast enough for routine use with large databases of protein models. Overall, our results indicate that the new program (MAMMOTH) will be a good tool for protein structure comparisons in structural genomics applications. MAMMOTH is available from our web site at http://physbio.mssm.edu/~ortizg/.
Collapse
Affiliation(s)
- Angel R Ortiz
- Department of Physiology and Biophysics, Mount Sinai School of Medicine, New York University, New York, New York 10029, USA.
| | | | | |
Collapse
|
27
|
Abstract
Predicting protein structures from their amino acid sequences is a problem of global optimization. Global optima (native structures) are often sought using stochastic sampling methods such as Monte Carlo or molecular dynamics, but these methods are slow. In contrast, there are fast deterministic methods that find near-optimal solutions of well-known global optimization problems such as the traveling salesman problem (TSP). But fast TSP strategies have yet to be applied to protein folding, because of fundamental differences in the two types of problems. Here, we show how protein folding can be framed in terms of the TSP, to which we apply a variation of the Durbin-Willshaw elastic net optimization strategy. We illustrate using a simple model of proteins with database-derived statistical potentials and predicted secondary structure restraints. This optimization strategy can be applied to many different models and potential functions, and can readily incorporate experimental restraint information. It is also fast; with the simple model used here, the method finds structures that are within 5-6 A all-Calpha-atom RMSD of the known native structures for 40-mers in about 8 s on a PC; 100-mers take about 20 s. The computer time tau scales as tau approximately n, where n is the number of amino acids. This method may prove to be useful for structure refinement and prediction.
Collapse
Affiliation(s)
- Keith D Ball
- Department of Pharmaceutical Chemistry, University of California at San Francisco, 94118, USA.
| | | | | |
Collapse
|
28
|
Gouldson PR, Dean MK, Snell CR, Bywater RP, Gkoutos G, Reynolds CA. Lipid-facing correlated mutations and dimerization in G-protein coupled receptors. PROTEIN ENGINEERING 2001; 14:759-67. [PMID: 11739894 DOI: 10.1093/protein/14.10.759] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
A correlated mutation analysis has been performed on the aligned protein sequences of a number of class A G-protein coupled receptor families, including the chemokine, neurokinin, opioid, somatostatin, thyrotrophin and the whole biogenic amine family. Many of the correlated mutations are observed flanking or neighbouring conserved residues. The correlated residues have been plotted onto the transmembrane portion of the rhodopsin crystal structure. The structure shows that a significant proportion of the correlated mutations are located on the external (lipid-facing) region of the helices. The occurrence of these highly correlated patterns of change amongst the external residues suggest that they are sites for protein-protein interactions. In particular, it is suggested that the correlated residues may be involved in either large conformational changes, the formation of heterodimers or homodimers (which may be domain swapped) or oligomers required for activation or internalization. The results are discussed in the light of the subtype-specific heterodimerization observed for the chemokine, opioid and somatostatin receptors.
Collapse
MESH Headings
- Amino Acid Sequence
- Dimerization
- GTP-Binding Proteins/chemistry
- GTP-Binding Proteins/genetics
- Lipids
- Models, Molecular
- Mutation
- Protein Binding
- Protein Structure, Quaternary/genetics
- Protein Structure, Tertiary/genetics
- Protein Structure, Tertiary/physiology
- Receptors, Cell Surface/chemistry
- Receptors, Cell Surface/genetics
- Receptors, Cell Surface/physiology
- Receptors, Opioid/chemistry
- Receptors, Opioid/genetics
- Receptors, Somatostatin/chemistry
- Receptors, Somatostatin/genetics
- Receptors, Thyrotropin/chemistry
- Receptors, Thyrotropin/genetics
- Receptors, Thyrotropin/physiology
Collapse
Affiliation(s)
- P R Gouldson
- Department of Biological Sciences, University of Essex, Wivenhoe Park, Colchester, Essex CO4 3SQ, UK
| | | | | | | | | | | |
Collapse
|
29
|
|
30
|
Bonneau R, Baker D. Ab initio protein structure prediction: progress and prospects. ANNUAL REVIEW OF BIOPHYSICS AND BIOMOLECULAR STRUCTURE 2001; 30:173-89. [PMID: 11340057 DOI: 10.1146/annurev.biophys.30.1.173] [Citation(s) in RCA: 226] [Impact Index Per Article: 9.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Considerable recent progress has been made in the field of ab initio protein structure prediction, as witnessed by the third Critical Assessment of Structure Prediction (CASP3). In spite of this progress, much work remains, for the field has yet to produce consistently reliable ab initio structure prediction protocols. In this work, we review the features of current ab initio protocols in an attempt to highlight the foundations of recent progress in the field and suggest promising directions for future work.
Collapse
Affiliation(s)
- R Bonneau
- Department of Biochemistry, University of Washington, Seattle, Washington, Box 357350, 98195, USA.
| | | |
Collapse
|
31
|
Jacchieri SG. Stepwise assembling of polypeptide chain energy distributions. COMPUTERS & CHEMISTRY 2001; 25:145-59. [PMID: 11219430 DOI: 10.1016/s0097-8485(00)00076-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
The principles and application of conformational analysis software that makes use of a new algorithm are described. It is known that the existence of a local energy minimum in the energy landscape is in general related to the clustering of polypeptide chain conformations near that energy value or, in other words, to a high density of states. A criterion based on this principle is part of an algorithm employed to select subsets of polypeptide chain conformations in broad energy ranges. Chain fragments belonging to these subsets are then combined to build larger polypeptide chains and the corresponding energy distributions. The functionality of the various operations employed in the process is described and the FORTRAN 77 source code that defines the algorithm is listed. The methodology is illustrated with a calculation involving three chain fragments belonging to the cellular prion protein (PrP(C)).
Collapse
|
32
|
Abstract
The location of protein subunits that form early during folding, constituted of consecutive secondary structure elements with some intrinsic stability and favorable tertiary interactions, is predicted using a combination of threading algorithms and local structure prediction methods. Two folding units are selected among the candidates identified in a database of known protein structures: the fragment 15-55 of 434 cro, an all-alpha protein, and the fragment 1-35 of ubiquitin, an alpha/beta protein. These units are further analyzed by means of Monte Carlo simulated annealing using several database-derived potentials describing different types of interactions. Our results suggest that the local interactions along the chain dominate in the first folding steps of both fragments, and that the formation of some of the secondary structures necessarily occurs before structure compaction. These findings led us to define a prediction protocol, which is efficient to improve the accuracy of the predicted structures. It involves a first simulation with a local interaction potential only, whose final conformation is used as a starting structure of a second simulation that uses a combination of local interaction and distance potentials. The root mean square deviations between the coordinates of predicted and native structures are as low as 2-4 A in most trials. The possibility of extending this protocol to the prediction of full proteins is discussed. Proteins 2001;42:164-176.
Collapse
Affiliation(s)
- D Gilis
- Ingénierie Biomoléculaire, Université Libre de Bruxelles, Bruxelles, Belgium.
| | | |
Collapse
|
33
|
Yue K, Dill KA. Constraint-based assembly of tertiary protein structures from secondary structure elements. Protein Sci 2000; 9:1935-46. [PMID: 11106167 PMCID: PMC2144474] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/18/2023]
Abstract
A challenge in computational protein folding is to assemble secondary structure elements-helices and strands-into well-packed tertiary structures. Particularly difficult is the formation of beta-sheets from strands, because they involve large conformational searches at the same time as precise packing and hydrogen bonding. Here we describe a method, called Geocore-2, that (1) grows chains one monomer or secondary structure at a time, then (2) disconnects the loops and performs a fast rigid-body docking step to achieve canonical packings, then (3) in the case of intrasheet strand packing, adjusts the side-chain rotamers; and finally (4) reattaches loops. Computational efficiency is enhanced by using a branch-and-bound search in which pruning rules aim to achieve a hydrophobic core and satisfactory hydrogen bonding patterns. We show that the pruning rules reduce computational time by 10(3)- to 10(5)-fold, and that this strategy is computationally practical at least for molecules up to about 100 amino acids long.
Collapse
Affiliation(s)
- K Yue
- Department of Pharmaceutical Chemistry, University of California at San Francisco, 94143, USA
| | | |
Collapse
|
34
|
Simmerling C, Lee MR, Ortiz AR, Kolinski A, Skolnick J, Kollman PA. Combining MONSSTER and LES/PME to Predict Protein Structure from Amino Acid Sequence: Application to the Small Protein CMTI-1. J Am Chem Soc 2000. [DOI: 10.1021/ja993119k] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Carlos Simmerling
- Contribution from the Department of Pharmaceutical Chemistry, University of California, 513 Parnassus, San Francisco, California 94143-0446, Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037
| | - Matthew R. Lee
- Contribution from the Department of Pharmaceutical Chemistry, University of California, 513 Parnassus, San Francisco, California 94143-0446, Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037
| | - Angel. R. Ortiz
- Contribution from the Department of Pharmaceutical Chemistry, University of California, 513 Parnassus, San Francisco, California 94143-0446, Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037
| | - Andrzej Kolinski
- Contribution from the Department of Pharmaceutical Chemistry, University of California, 513 Parnassus, San Francisco, California 94143-0446, Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037
| | - Jeffrey Skolnick
- Contribution from the Department of Pharmaceutical Chemistry, University of California, 513 Parnassus, San Francisco, California 94143-0446, Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037
| | - Peter A. Kollman
- Contribution from the Department of Pharmaceutical Chemistry, University of California, 513 Parnassus, San Francisco, California 94143-0446, Department of Molecular Biology, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, California 92037
| |
Collapse
|
35
|
Abstract
The methodology for generating a homology model of the T1 TCR-PbCS-K(d) class I major histocompatibility complex (MHC) class I complex is presented. The resulting model provides a qualitative explanation of the effect of over 50 different mutations in the region of the complementarity determining region (CDR) loops of the T cell receptor (TCR), the peptide and the MHC's alpha(1)/alpha(2) helices. The peptide is modified by an azido benzoic acid photoreactive group, which is part of the epitope recognized by the TCR. The construction of the model makes use of closely related homologs (the A6 TCR-Tax-HLA A2 complex, the 2C TCR, the 14.3.d TCR Vbeta chain, the 1934.4 TCR Valpha chain, and the H-2 K(b)-ovalbumine peptide), ab initio sampling of CDR loops conformations and experimental data to select from the set of possibilities. The model shows a complex arrangement of the CDR3alpha, CDR1beta, CDR2beta and CDR3beta loops that leads to the highly specific recognition of the photoreactive group. The protocol can be applied systematically to a series of related sequences, permitting the analysis at the structural level of the large TCR repertoire specific for a given peptide-MHC complex.
Collapse
MESH Headings
- Algorithms
- Amino Acid Sequence
- Amino Acid Substitution/genetics
- Binding Sites
- Computer Simulation
- Epitopes, T-Lymphocyte/chemistry
- Epitopes, T-Lymphocyte/immunology
- Epitopes, T-Lymphocyte/metabolism
- Histocompatibility Antigens Class I/chemistry
- Histocompatibility Antigens Class I/immunology
- Histocompatibility Antigens Class I/metabolism
- Humans
- Hydrogen Bonding
- Models, Molecular
- Molecular Sequence Data
- Mutation/genetics
- Peptide Fragments/chemistry
- Peptide Fragments/immunology
- Peptide Fragments/metabolism
- Protein Conformation
- Receptors, Antigen, T-Cell/chemistry
- Receptors, Antigen, T-Cell/genetics
- Receptors, Antigen, T-Cell/immunology
- Receptors, Antigen, T-Cell/metabolism
- Reproducibility of Results
- Sequence Alignment
- Software
- Static Electricity
- Substrate Specificity
- Thermodynamics
Collapse
Affiliation(s)
- O Michielin
- Ludwig Institute for Cancer Research, Lausanne Branch, Epalinges, Switzerland
| | | | | |
Collapse
|
36
|
Abstract
We examine the interactions between amino acid residues in the context of their secondary structural environments (helix, strand, and coil) in proteins. Effective contact energies for an expanded 60-residue alphabet (20 aa x three secondary structural states) are estimated from the residue-residue contacts observed in known protein structures. Similar to the prototypical contact energies for 20 aa, the newly derived energy parameters reflect mainly the hydrophobic interactions; however, the relative strength of such interactions shows a strong dependence on the secondary structural environment, with nonlocal interactions in beta-sheet structures and alpha-helical structures dominating the energy table. Environment-dependent residue contact energies outperform existing residue pair potentials in both threading and three-dimensional contact prediction tests and should be generally applicable to protein structure prediction.
Collapse
Affiliation(s)
- C Zhang
- Department of Chemistry and E. O. Lawrence Berkeley National Laboratory, University of California, Berkeley, CA 94720, USA
| | | |
Collapse
|
37
|
Olmea O, Rost B, Valencia A. Effective use of sequence correlation and conservation in fold recognition. J Mol Biol 1999; 293:1221-39. [PMID: 10547297 DOI: 10.1006/jmbi.1999.3208] [Citation(s) in RCA: 131] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Protein families are a rich source of information; sequence conservation and sequence correlation are two of the main properties that can be derived from the analysis of multiple sequence alignments. Sequence conservation is related to the direct evolutionary pressure to retain the chemical characteristics of some positions in order to maintain a given function. Sequence correlation is attributed to the small sequence adjustments needed to maintain protein stability against constant mutational drift. Here, we showed that sequence conservation and correlation were each frequently informative enough to detect incorrectly folded proteins. Furthermore, combining conservation, correlation, and polarity, we achieved an almost perfect discrimination between native and incorrectly folded proteins. Thus, we made use of this information for threading by evaluating the models suggested by a threading method according to the degree of proximity of the corresponding correlated, conserved, and apolar residues. The results showed that the fold recognition capacity of a given threading approach could be improved almost fourfold by selecting the alignments that score best under the three different sequence-based approaches.
Collapse
Affiliation(s)
- O Olmea
- Protein Design Group, CNB-CSIC, Cantoblanco, Madrid, E-28049, Spain
| | | | | |
Collapse
|
38
|
Debe DA, Carlson MJ, Sadanobu J, Chan SI, Goddard WA. Protein Fold Determination from Sparse Distance Restraints: The Restrained Generic Protein Direct Monte Carlo Method. J Phys Chem B 1999. [DOI: 10.1021/jp983429+] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
39
|
Application of Reduced Models to Protein Structure Prediction. ACTA ACUST UNITED AC 1999. [DOI: 10.1016/s1380-7323(99)80086-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
40
|
Ortiz AR, Kolinski A, Rotkiewicz P, Ilkowski B, Skolnick J. Ab initio folding of proteins using restraints derived from evolutionary information. Proteins 1999. [DOI: 10.1002/(sici)1097-0134(1999)37:3+<177::aid-prot22>3.0.co;2-e] [Citation(s) in RCA: 87] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
41
|
Wolf YI, Brenner SE, Bash PA, Koonin EV. Distribution of Protein Folds in the Three Superkingdoms of Life. Genome Res 1999. [DOI: 10.1101/gr.9.1.17] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
A sensitive protein-fold recognition procedure was developed on the basis of iterative database search using the PSI-BLAST program. A collection of 1193 position-dependent weight matrices that can be used as fold identifiers was produced. In the completely sequenced genomes, folds could be automatically identified for 20%–30% of the proteins, with 3%–6% more detectable by additional analysis of conserved motifs. The distribution of the most common folds is very similar in bacteria and archaea but distinct in eukaryotes. Within the bacteria, this distribution differs between parasitic and free-living species. In all analyzed genomes, the P-loop NTPases are the most abundant fold. In bacteria and archaea, the next most common folds are ferredoxin-like domains, TIM-barrels, and methyltransferases, whereas in eukaryotes, the second to fourth places belong to protein kinases, β-propellers and TIM-barrels. The observed diversity of protein folds in different proteomes is approximately twice as high as it would be expected from a simple stochastic model describing a proteome as a finite sample from an infinite pool of proteins with an exponential distribution of the fold fractions. Distribution of the number of domains with different folds in one protein fits the geometric model, which is compatible with the evolution of multidomain proteins by random combination of domains.[Fold predictions for proteins from 14 proteomes are available on the World Wide Web atftp://ncbi.nlm.nih.gov/pub/koonin/FOLDS/index.html. The FIDs are available by anonymous ftp at the same location.]
Collapse
|
42
|
Pons T, Olmea O, Chinea G, Beldarraín A, Márquez G, Acosta N, Rodríguez L, Valencia A. Structural model for family 32 of glycosyl-hydrolase enzymes. Proteins 1998; 33:383-95. [PMID: 9829697 DOI: 10.1002/(sici)1097-0134(19981115)33:3<383::aid-prot7>3.0.co;2-r] [Citation(s) in RCA: 50] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A structural model is presented for family 32 of the glycosyl-hydrolase enzymes based on the beta-propeller fold. The model is derived from the common prediction of two different threading methods, TOPITS and THREADER. In addition, we used a correlated mutation analysis and prediction of active-site residues to corroborate the proposed model. Physical techniques (circular dichroism and differential scanning calorimetry) confirmed two aspects of the prediction, the proposed all-beta fold and the multi-domain structure. The most reliable three-dimensional model was obtained using the structure of neuraminidase (1nscA) as template. The analysis of the position of the active site residues in this model is compatible with the catalytic mechanism proposed by Reddy and Maley (J. Biol. Chem. 271:13953-13958, 1996), which includes three conserved residues, Asp, Glu, and Cys. Based on this analysis, we propose the participation of one more conserved residue (Asp 162) in the catalytic mechanism. The model will facilitate further studies of the physical and biochemical characteristics of family 32 of the glycosyl-hydrolases.
Collapse
Affiliation(s)
- T Pons
- Centro de Ingeniería Genética y Biotecnología, Havana, Cuba.
| | | | | | | | | | | | | | | |
Collapse
|
43
|
Mirny LA, Shakhnovich EI. Protein structure prediction by threading. Why it works and why it does not. J Mol Biol 1998; 283:507-26. [PMID: 9769221 DOI: 10.1006/jmbi.1998.2092] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We developed a novel Monte Carlo threading algorithm which allows gaps and insertions both in the template structure and threaded sequence. The algorithm is able to find the optimal sequence-structure alignment and sample suboptimal alignments. Using our algorithm we performed sequence-structure alignments for a number of examples for three protein folds (ubiquitin, immunoglobulin and globin) using both "ideal" set of potentials (optimized to provide the best Z-score for a given protein) and more realistic knowledge-based potentials. Two physically different scenarios emerged. If a template structure is similar to the native one (within 2 A RMS), then (i) the optimal threading alignment is correct and robust with respect to deviations of the potential from the "ideal" one; (ii) suboptimal alignments are very similar to the optimal one; (iii) as Monte Carlo temperature decreases a sharp cooperative transition to the optimal alignment is observed. In contrast, if the template structure is only moderately close to the native structure (RMS greater than 3.5 A), then (i) the optimal alignment changes dramatically when an "ideal" potential is substituted by the real one; (ii) the structures of suboptimal alignments are very different from the optimal one, reducing the reliability of the alignment; (iii) the transition to the apparently optimal alignment is non-cooperative. In the intermediate cases when the RMS between the template and the native conformations is in the range between 2 A and 3.5 A, the success of threading alignment may depend on the quality of potentials used. These results are rationalized in terms of a threading free energy landscape. Possible ways to overcome the fundamental limitations of threading are discussed briefly.
Collapse
Affiliation(s)
- L A Mirny
- Department of Chemistry and Chemical Biology, Harvard University, 12 Oxford Street, Cambridge, MA, 02138, USA
| | | |
Collapse
|