1
|
Statistical potentials from the Gaussian scaling behaviour of chain fragments buried within protein globules. PLoS One 2022; 17:e0254969. [PMID: 35085247 PMCID: PMC8794220 DOI: 10.1371/journal.pone.0254969] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Accepted: 10/28/2021] [Indexed: 11/19/2022] Open
Abstract
Knowledge-based approaches use the statistics collected from protein data-bank structures to estimate effective interaction potentials between amino acid pairs. Empirical relations are typically employed that are based on the crucial choice of a reference state associated to the null interaction case. Despite their significant effectiveness, the physical interpretation of knowledge-based potentials has been repeatedly questioned, with no consensus on the choice of the reference state. Here we use the fact that the Flory theorem, originally derived for chains in a dense polymer melt, holds also for chain fragments within the core of globular proteins, if the average over buried fragments collected from different non-redundant native structures is considered. After verifying that the ensuing Gaussian statistics, a hallmark of effectively non-interacting polymer chains, holds for a wide range of fragment lengths, although with significant deviations at short spatial scales, we use it to define a ‘bona fide’ reference state. Notably, despite the latter does depend on fragment length, deviations from it do not. This allows to estimate an effective interaction potential which is not biased by the presence of correlations due to the connectivity of the protein chain. We show how different sequence-independent effective statistical potentials can be derived using this approach by coarse-graining the protein representation at varying levels. The possibility of defining sequence-dependent potentials is explored.
Collapse
|
2
|
Abstract
Atom pairwise potential functions make up an essential part of many scoring functions for protein decoy detection. With the development of machine learning (ML) tools, there are multiple ways to combine potential functions to create novel ML models and methods. Potential function parameters can be easily extracted; however, it is usually hard to directly obtain the calculated atom pairwise energies from scoring functions. Amber, as one of the most popular suites of modeling programs, has an extensive history and library of force field potential functions. In this work, we directly used the force field parameters in ff94 and ff14SB from Amber and encoded them to calculate atom pairwise energies for different interactions. Two sets of structures (single amino acid set and a dipeptide set) were used to evaluate the performance of our encoded Amber potentials. From the comparison results between energy terms obtained from our encoding and Amber, we find energy difference within ±0.06 kcal/mol for all tested structures. Previously we have shown that the Random Forest (RF) model can help to emphasize more important atom pairwise interactions and ignore insignificant ones [Pei, J.; Zheng, Z.; Merz, K. M. J. Chem. Inf. Model. 2019, 59, 1919-1929]. Here, as an example of combining ML methods with traditional potential functions, we followed the same work flow to combine the RF models with force field potential functions from Amber. To determine the performance of our RF models with force field potential functions, 224 different protein native-decoy systems were used as our training and testing sets We find that the RF models with ff94 and ff14SB force field parameters outperformed all other scoring functions (RF models with KECSA2, RWplus, DFIRE, dDFIRE, and GOAP) considered in this work for native structure detection, and they performed similarly in detecting the best decoy. Through inclusion of best decoy to decoy comparisons in building our RF models, we were able to generate models that outperformed the score functions tested herein both on accuracy and best decoy detection, again showing the performance and flexibility of our RF models to tackle this problem. Finally, the importance of the RF algorithm and force field parameters were also tested and the comparison results suggest that both the RF algorithm and force field potentials are important with the ML scoring function achieving its best performance only by combining them together. All code and data used in this work are available at https://github.com/JunPei000/FFENCODER_for_Protein_Folding_Pose_Selection.
Collapse
Affiliation(s)
- Jun Pei
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Lin Frank Song
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| | - Kenneth M Merz
- Department of Chemistry and the Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824, United States
| |
Collapse
|
3
|
Wallner B. Estimating local protein model quality: prospects for molecular replacement. ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY 2020; 76:285-290. [PMID: 32133992 PMCID: PMC7057213 DOI: 10.1107/s2059798320000972] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Accepted: 01/24/2020] [Indexed: 11/10/2022]
Abstract
Model quality assessment programs estimate the quality of protein models and can be used to estimate local error in protein models. ProQ3D is the most recent and most accurate version of our software. Here, it is demonstrated that it is possible to use local error estimates to substantially increase the quality of the models for molecular replacement (MR). Adjusting the B factors using ProQ3D improved the log-likelihood gain (LLG) score by over 50% on average, resulting in significantly more successful models in MR compared with not using error estimates. On a data set of 431 homology models to address difficult MR targets, models with error estimates from ProQ3D received an LLG of >50 for almost half of the models 209/431 (48.5%), compared with 175/431 (40.6%) for the previous version, ProQ2, and only 74/431 (17.2%) for models with no error estimates, clearly demonstrating the added value of using error estimates to enable MR for more targets. ProQ3D is available from http://proq3.bioinfo.se/ both as a server and as a standalone download.
Collapse
Affiliation(s)
- Björn Wallner
- Division of Bioinformatics, Department of Physics, Chemistry and Biology, Linköping University, SE-581 83 Linköping, Sweden
| |
Collapse
|
4
|
Xu G, Ma T, Wang Q, Ma J. OPUS-SSF: A side-chain-inclusive scoring function for ranking protein structural models. Protein Sci 2019; 28:1157-1162. [PMID: 30919509 DOI: 10.1002/pro.3608] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Revised: 03/21/2019] [Accepted: 03/27/2019] [Indexed: 12/21/2022]
Abstract
We introduce a side-chain-inclusive scoring function, named OPUS-SSF, for ranking protein structural models. The method builds a scoring function based on the native distributions of the coordinate components of certain anchoring points in a local molecular system for peptide segments of 5, 7, 9, and 11 residues in length. Differing from our previous OPUS-CSF [Xu et al., Protein Sci. 2018; 27: 286-292], which exclusively uses main chain information, OPUS-SSF employs anchoring points on side chains so that the effect of side chains is taken into account. The performance of OPUS-SSF was tested on 15 decoy sets containing totally 603 proteins, and 571 of them had their native structures recognized from their decoys. Similar to OPUS-CSF, OPUS-SSF does not employ the Boltzmann formula in constructing scoring functions. The results indicate that OPUS-SSF has achieved a significant improvement on decoy recognition and it should be a very useful tool for protein structural prediction and modeling.
Collapse
Affiliation(s)
- Gang Xu
- School of Life Sciences, Tsinghua University, Beijing 100084, People's Republic of China
| | - Tianqi Ma
- Applied Physics Program, Rice University, Houston, Texas 77005.,Department of Bioengineering, Rice University, Houston, Texas 77005
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030
| | - Jianpeng Ma
- School of Life Sciences, Tsinghua University, Beijing 100084, People's Republic of China.,Applied Physics Program, Rice University, Houston, Texas 77005.,Department of Bioengineering, Rice University, Houston, Texas 77005.,Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas 77030
| |
Collapse
|
5
|
Pei J, Zheng Z, Merz KM. Random Forest Refinement of the KECSA2 Knowledge-Based Scoring Function for Protein Decoy Detection. J Chem Inf Model 2019; 59:1919-1929. [DOI: 10.1021/acs.jcim.8b00734] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Jun Pei
- Department of Chemistry, Michigan State University, 578 S. Shaw Lane, East Lansing, Michigan 48824, United States
| | - Zheng Zheng
- Department of Chemistry, Michigan State University, 578 S. Shaw Lane, East Lansing, Michigan 48824, United States
| | - Kenneth M. Merz
- Department of Chemistry, Michigan State University, 578 S. Shaw Lane, East Lansing, Michigan 48824, United States
- Institute for Cyber Enabled Research, Michigan State University, 567 Wilson Road, East Lansing, Michigan 48824, United States
| |
Collapse
|
6
|
Uziela K, Menéndez Hurtado D, Shu N, Wallner B, Elofsson A. Improved protein model quality assessments by changing the target function. Proteins 2018. [PMID: 29524250 DOI: 10.1002/prot.25492] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Protein modeling quality is an important part of protein structure prediction. We have for more than a decade developed a set of methods for this problem. We have used various types of description of the protein and different machine learning methodologies. However, common to all these methods has been the target function used for training. The target function in ProQ describes the local quality of a residue in a protein model. In all versions of ProQ the target function has been the S-score. However, other quality estimation functions also exist, which can be divided into superposition- and contact-based methods. The superposition-based methods, such as S-score, are based on a rigid body superposition of a protein model and the native structure, while the contact-based methods compare the local environment of each residue. Here, we examine the effects of retraining our latest predictor, ProQ3D, using identical inputs but different target functions. We find that the contact-based methods are easier to predict and that predictors trained on these measures provide some advantages when it comes to identifying the best model. One possible reason for this is that contact based methods are better at estimating the quality of multi-domain targets. However, training on the S-score gives the best correlation with the GDT_TS score, which is commonly used in CASP to score the global model quality. To take the advantage of both of these features we provide an updated version of ProQ3D that predicts local and global model quality estimates based on different quality estimates.
Collapse
Affiliation(s)
- Karolis Uziela
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - David Menéndez Hurtado
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - Nanjiang Shu
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden.,Bioinformatics Short-term Support and Infrastructure (BILS), Science for Life Laboratory, Solna, Sweden
| | - Björn Wallner
- Department of Physics, Chemistry and Biology (IFM)/Bioinformatics, Linköping University, Linköping, Sweden
| | - Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden
| |
Collapse
|
7
|
Li B, Fooksa M, Heinze S, Meiler J. Finding the needle in the haystack: towards solving the protein-folding problem computationally. Crit Rev Biochem Mol Biol 2018; 53:1-28. [PMID: 28976219 PMCID: PMC6790072 DOI: 10.1080/10409238.2017.1380596] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Revised: 08/22/2017] [Accepted: 09/13/2017] [Indexed: 12/22/2022]
Abstract
Prediction of protein tertiary structures from amino acid sequence and understanding the mechanisms of how proteins fold, collectively known as "the protein folding problem," has been a grand challenge in molecular biology for over half a century. Theories have been developed that provide us with an unprecedented understanding of protein folding mechanisms. However, computational simulation of protein folding is still difficult, and prediction of protein tertiary structure from amino acid sequence is an unsolved problem. Progress toward a satisfying solution has been slow due to challenges in sampling the vast conformational space and deriving sufficiently accurate energy functions. Nevertheless, several techniques and algorithms have been adopted to overcome these challenges, and the last two decades have seen exciting advances in enhanced sampling algorithms, computational power and tertiary structure prediction methodologies. This review aims at summarizing these computational techniques, specifically conformational sampling algorithms and energy approximations that have been frequently used to study protein-folding mechanisms or to de novo predict protein tertiary structures. We hope that this review can serve as an overview on how the protein-folding problem can be studied computationally and, in cases where experimental approaches are prohibitive, help the researcher choose the most relevant computational approach for the problem at hand. We conclude with a summary of current challenges faced and an outlook on potential future directions.
Collapse
Affiliation(s)
- Bian Li
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Michaela Fooksa
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
- Chemical and Physical Biology Graduate Program, Vanderbilt University, Nashville, TN, USA
| | - Sten Heinze
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| | - Jens Meiler
- Department of Chemistry, Vanderbilt University, Nashville, TN, USA
- Center for Structural Biology, Vanderbilt University, Nashville, TN, USA
| |
Collapse
|
8
|
Xu G, Ma T, Zang T, Wang Q, Ma J. OPUS-CSF: A C-atom-based scoring function for ranking protein structural models. Protein Sci 2017; 27:286-292. [PMID: 29047165 PMCID: PMC5734313 DOI: 10.1002/pro.3327] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2017] [Revised: 10/14/2017] [Accepted: 10/16/2017] [Indexed: 12/12/2022]
Abstract
We report a C‐atom‐based scoring function, named OPUS‐CSF, for ranking protein structural models. Rather than using traditional Boltzmann formula, we built a scoring function (CSF score) based on the native distributions (derived from the entire PDB) of coordinate components of mainchain C (carbonyl) atoms on selected residues of peptide segments of 5, 7, 9, and 11 residues in length. In testing OPUS‐CSF on decoy recognition, it maximally recognized 257 native structures out of 278 targets in 11 commonly used decoy sets, significantly outperforming other popular all‐atom empirical potentials. The average correlation coefficient with TM‐score was also comparable with those of other potentials. OPUS‐CSF is a highly coarse‐grained scoring function, which only requires input of partial mainchain information, and very fast. Thus, it is suitable for applications at early stage of structural building.
Collapse
Affiliation(s)
- Gang Xu
- School of Life Sciences, Tsinghua University, Beijing, China
| | - Tianqi Ma
- Applied Physics Program, Rice University, Houston, Texas.,Department of Bioengineering, Rice University, Houston, Texas
| | - Tianwu Zang
- Applied Physics Program, Rice University, Houston, Texas.,Department of Bioengineering, Rice University, Houston, Texas
| | - Qinghua Wang
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas
| | - Jianpeng Ma
- School of Life Sciences, Tsinghua University, Beijing, China.,Applied Physics Program, Rice University, Houston, Texas.,Department of Bioengineering, Rice University, Houston, Texas.,Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas
| |
Collapse
|
9
|
Xu G, Ma T, Zang T, Sun W, Wang Q, Ma J. OPUS-DOSP: A Distance- and Orientation-Dependent All-Atom Potential Derived from Side-Chain Packing. J Mol Biol 2017; 429:3113-3120. [PMID: 28864201 DOI: 10.1016/j.jmb.2017.08.013] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2017] [Revised: 07/27/2017] [Accepted: 08/22/2017] [Indexed: 01/18/2023]
Abstract
We report a new distance- and orientation-dependent, all-atom statistical potential derived from side-chain packing, named OPUS-DOSP, for protein structure modeling. The framework of OPUS-DOSP is based on OPUS-PSP, previously developed by us [JMB (2008), 376, 288-301], with refinement and new features. In particular, distance or orientation contribution is considered depending on the range of contact distance. A new auxiliary function in energy function is also introduced, in addition to the traditional Boltzmann term, in order to adjust the contributions of extreme cases. OPUS-DOSP was tested on 11 decoy sets commonly used for statistical potential benchmarking. Among 278 native structures, 239 and 249 native structures were recognized by OPUS-DOSP without and with the auxiliary function, respectively. The results show that OPUS-DOSP has an increased decoy recognition capability comparing with those of other relevant potentials to date.
Collapse
Affiliation(s)
- Gang Xu
- School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Tianqi Ma
- Applied Physics Program, Rice University, Houston, TX 77005, United States; Department of Bioengineering, Rice University, Houston, TX 77005, United States
| | - Tianwu Zang
- Applied Physics Program, Rice University, Houston, TX 77005, United States; Department of Bioengineering, Rice University, Houston, TX 77005, United States
| | - Weitao Sun
- Zhou Pei-Yuan Center for Applied Mathematics, Tsinghua University, Beijing 100084, China
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, United States
| | - Jianpeng Ma
- School of Life Sciences, Tsinghua University, Beijing 100084, China; Applied Physics Program, Rice University, Houston, TX 77005, United States; Department of Bioengineering, Rice University, Houston, TX 77005, United States; Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, United States.
| |
Collapse
|
10
|
ProQ3: Improved model quality assessments using Rosetta energy terms. Sci Rep 2016; 6:33509. [PMID: 27698390 PMCID: PMC5048106 DOI: 10.1038/srep33509] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2016] [Accepted: 08/26/2016] [Indexed: 01/17/2023] Open
Abstract
Quality assessment of protein models using no other information than the structure of the model itself has been shown to be useful for structure prediction. Here, we introduce two novel methods, ProQRosFA and ProQRosCen, inspired by the state-of-art method ProQ2, but using a completely different description of a protein model. ProQ2 uses contacts and other features calculated from a model, while the new predictors are based on Rosetta energies: ProQRosFA uses the full-atom energy function that takes into account all atoms, while ProQRosCen uses the coarse-grained centroid energy function. The two new predictors also include residue conservation and terms corresponding to the agreement of a model with predicted secondary structure and surface area, as in ProQ2. We show that the performance of these predictors is on par with ProQ2 and significantly better than all other model quality assessment programs. Furthermore, we show that combining the input features from all three predictors, the resulting predictor ProQ3 performs better than any of the individual methods. ProQ3, ProQRosFA and ProQRosCen are freely available both as a webserver and stand-alone programs at http://proq3.bioinfo.se/.
Collapse
|
11
|
Zheng Z, Wang T, Li P, Merz KM. KECSA-Movable Type Implicit Solvation Model (KMTISM). J Chem Theory Comput 2016; 11:667-82. [PMID: 25691832 PMCID: PMC4325602 DOI: 10.1021/ct5007828] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2014] [Indexed: 11/30/2022]
Abstract
![]()
Computation
of the solvation free energy for chemical and biological
processes has long been of significant interest. The key challenges
to effective solvation modeling center on the choice of potential
function and configurational sampling. Herein, an energy sampling
approach termed the “Movable Type” (MT) method, and
a statistical energy function for solvation modeling, “Knowledge-based
and Empirical Combined Scoring Algorithm” (KECSA) are developed
and utilized to create an implicit solvation model: KECSA-Movable
Type Implicit Solvation Model (KMTISM) suitable for the study of chemical
and biological systems. KMTISM is an implicit solvation model, but
the MT method performs energy sampling at the atom pairwise level.
For a specific molecular system, the MT method collects energies from
prebuilt databases for the requisite atom pairs at all relevant distance
ranges, which by its very construction encodes all possible molecular
configurations simultaneously. Unlike traditional statistical energy
functions, KECSA converts structural statistical information into
categorized atom pairwise interaction energies as a function of the
radial distance instead of a mean force energy function. Within the
implicit solvent model approximation, aqueous solvation free energies
are then obtained from the NVT ensemble partition function generated
by the MT method. Validation is performed against several subsets
selected from the Minnesota Solvation Database v2012. Results are
compared with several solvation free energy calculation methods, including
a one-to-one comparison against two commonly used classical implicit
solvation models: MM-GBSA and MM-PBSA. Comparison against a quantum
mechanics based polarizable continuum model is also discussed (Cramer
and Truhlar’s Solvation Model 12).
Collapse
Affiliation(s)
- Zheng Zheng
- Institute for Cyber Enabled Research, Department of Chemistry and Department of Biochemistry and Molecular Biology, Michigan State University, 578 South Shaw Lane, East Lansing, Michigan 48824-1322, United States
| | | | | | | |
Collapse
|
12
|
Touw WG, Joosten RP, Vriend G. New Biological Insights from Better Structure Models. J Mol Biol 2016; 428:1375-1393. [PMID: 26869101 DOI: 10.1016/j.jmb.2016.02.002] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2015] [Revised: 01/04/2016] [Accepted: 02/01/2016] [Indexed: 02/01/2023]
Abstract
Structure validation is a key component of all steps in the structure determination process, from structure building, refinement, deposition, and evaluation all the way to post-deposition optimisation of structures in the Protein Data Bank (PDB) by re-refinement and re-building. Today, many aspects of protein structures are understood better than 10years ago, and combined with improved software and more computing power, the automated PDB_REDO procedure can significantly improve about 85% of all X-ray structures ever deposited in the PDB. We review structure validation, structure improvement, and a series of validation resources and facilities that give access to improved PDB files and to reports on the quality of the original and the improved structures. Post-deposition optimisation generally leads to improved protein structures and a series of examples will illustrate how that, in turn, leads to improved or even novel biological insights.
Collapse
Affiliation(s)
- Wouter G Touw
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Geert Grooteplein-Zuid 26-28, 6525 GA Nijmegen, The Netherlands
| | - Robbie P Joosten
- Department of Biochemistry, Netherlands Cancer Institute, Plesmanlaan 121, 1066 CX Amsterdam, The Netherlands
| | - Gert Vriend
- Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Geert Grooteplein-Zuid 26-28, 6525 GA Nijmegen, The Netherlands.
| |
Collapse
|
13
|
|
14
|
Three-dimensional protein structure prediction: Methods and computational strategies. Comput Biol Chem 2014; 53PB:251-276. [DOI: 10.1016/j.compbiolchem.2014.10.001] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 10/03/2014] [Accepted: 10/07/2014] [Indexed: 01/01/2023]
|
15
|
Joseph AP, de Brevern AG. From local structure to a global framework: recognition of protein folds. J R Soc Interface 2014; 11:20131147. [PMID: 24740960 DOI: 10.1098/rsif.2013.1147] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Protein folding has been a major area of research for many years. Nonetheless, the mechanisms leading to the formation of an active biological fold are still not fully apprehended. The huge amount of available sequence and structural information provides hints to identify the putative fold for a given sequence. Indeed, protein structures prefer a limited number of local backbone conformations, some being characterized by preferences for certain amino acids. These preferences largely depend on the local structural environment. The prediction of local backbone conformations has become an important factor to correctly identifying the global protein fold. Here, we review the developments in the field of local structure prediction and especially their implication in protein fold recognition.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Oxford, , Didcot OX11 0QX, UK
| | | |
Collapse
|
16
|
Maurice KJ. SSThread: Template-free protein structure prediction by threading pairs of contacting secondary structures followed by assembly of overlapping pairs. J Comput Chem 2014; 35:644-56. [PMID: 24523210 DOI: 10.1002/jcc.23543] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2013] [Revised: 11/15/2013] [Accepted: 01/05/2014] [Indexed: 11/12/2022]
Abstract
Acquiring the three-dimensional structure of a protein from its amino acid sequence alone, despite a great deal of work and significant progress on the subject, is still an unsolved problem. SSThread, a new template-free algorithm is described here that consists of making several predictions of contacting pairs of α-helices and β-strands derived from a database of experimental structures using a knowledge-based potential, secondary structure prediction, and contact map prediction followed by assembly of overlapping pair predictions to create an ensemble of core structure predictions whose loops are then predicted. In a set of seven CASP10 targets SSThread outperformed the two leading methods for two targets each. The targets were all β-strand containing structures and most of them have a high relative contact order which demonstrates the advantages of SSThread. The primary bottlenecks based on sets of 74 and 21 test cases are the pair prediction and loop prediction stages.
Collapse
|
17
|
Dong GQ, Fan H, Schneidman-Duhovny D, Webb B, Sali A. Optimized atomic statistical potentials: assessment of protein interfaces and loops. Bioinformatics 2013; 29:3158-66. [PMID: 24078704 PMCID: PMC3842762 DOI: 10.1093/bioinformatics/btt560] [Citation(s) in RCA: 95] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2013] [Revised: 08/13/2013] [Accepted: 09/22/2013] [Indexed: 01/16/2023] Open
Abstract
MOTIVATION Statistical potentials have been widely used for modeling whole proteins and their parts (e.g. sidechains and loops) as well as interactions between proteins, nucleic acids and small molecules. Here, we formulate the statistical potentials entirely within a statistical framework, avoiding questionable statistical mechanical assumptions and approximations, including a definition of the reference state. RESULTS We derive a general Bayesian framework for inferring statistically optimized atomic potentials (SOAP) in which the reference state is replaced with data-driven 'recovery' functions. Moreover, we restrain the relative orientation between two covalent bonds instead of a simple distance between two atoms, in an effort to capture orientation-dependent interactions such as hydrogen bonds. To demonstrate this general approach, we computed statistical potentials for protein-protein docking (SOAP-PP) and loop modeling (SOAP-Loop). For docking, a near-native model is within the top 10 scoring models in 40% of the PatchDock benchmark cases, compared with 23 and 27% for the state-of-the-art ZDOCK and FireDock scoring functions, respectively. Similarly, for modeling 12-residue loops in the PLOP benchmark, the average main-chain root mean square deviation of the best scored conformations by SOAP-Loop is 1.5 Å, close to the average root mean square deviation of the best sampled conformations (1.2 Å) and significantly better than that selected by Rosetta (2.1 Å), DFIRE (2.3 Å), DOPE (2.5 Å) and PLOP scoring functions (3.0 Å). Our Bayesian framework may also result in more accurate statistical potentials for additional modeling applications, thus affording better leverage of the experimentally determined protein structures. AVAILABILITY AND IMPLEMENTATION SOAP-PP and SOAP-Loop are available as part of MODELLER (http://salilab.org/modeller).
Collapse
Affiliation(s)
- Guang Qiang Dong
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry and California Institute for Quantitative Biosciences (QB3), University of California, San Francisco, CA 94158, USA
| | | | | | | | | |
Collapse
|
18
|
Zheng Z, Ucisik MN, Merz KM. The Movable Type Method Applied to Protein-Ligand Binding. J Chem Theory Comput 2013; 9:5526-5538. [PMID: 24535920 DOI: 10.1021/ct4005992] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Accurately computing the free energy for biological processes like protein folding or protein-ligand association remains a challenging problem. Both describing the complex intermolecular forces involved and sampling the requisite configuration space make understanding these processes innately difficult. Herein, we address the sampling problem using a novel methodology we term "movable type". Conceptually it can be understood by analogy with the evolution of printing and, hence, the name movable type. For example, a common approach to the study of protein-ligand complexation involves taking a database of intact drug-like molecules and exhaustively docking them into a binding pocket. This is reminiscent of early woodblock printing where each page had to be laboriously created prior to printing a book. However, printing evolved to an approach where a database of symbols (letters, numerals, etc.) was created and then assembled using a movable type system, which allowed for the creation of all possible combinations of symbols on a given page, thereby, revolutionizing the dissemination of knowledge. Our movable type (MT) method involves the identification of all atom pairs seen in protein-ligand complexes and then creating two databases: one with their associated pairwise distant dependent energies and another associated with the probability of how these pairs can combine in terms of bonds, angles, dihedrals and non-bonded interactions. Combining these two databases coupled with the principles of statistical mechanics allows us to accurately estimate binding free energies as well as the pose of a ligand in a receptor. This method, by its mathematical construction, samples all of configuration space of a selected region (the protein active site here) in one shot without resorting to brute force sampling schemes involving Monte Carlo, genetic algorithms or molecular dynamics simulations making the methodology extremely efficient. Importantly, this method explores the free energy surface eliminating the need to estimate the enthalpy and entropy components individually. Finally, low free energy structures can be obtained via a free energy minimization procedure yielding all low free energy poses on a given free energy surface. Besides revolutionizing the protein-ligand docking and scoring problem this approach can be utilized in a wide range of applications in computational biology which involve the computation of free energies for systems with extensive phase spaces including protein folding, protein-protein docking and protein design.
Collapse
Affiliation(s)
- Zheng Zheng
- Department of Chemistry and the Quantum Theory Project, 2328 New Physics Building, P.O. Box 118435, University of Florida, Gainesville, Florida 32611-8435
| | - Melek N Ucisik
- Department of Chemistry and the Quantum Theory Project, 2328 New Physics Building, P.O. Box 118435, University of Florida, Gainesville, Florida 32611-8435
| | - Kenneth M Merz
- Department of Chemistry and the Quantum Theory Project, 2328 New Physics Building, P.O. Box 118435, University of Florida, Gainesville, Florida 32611-8435
| |
Collapse
|
19
|
Zheng Z, Merz KM. Development of the knowledge-based and empirical combined scoring algorithm (KECSA) to score protein-ligand interactions. J Chem Inf Model 2013; 53:1073-83. [PMID: 23560465 DOI: 10.1021/ci300619x] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
We describe a novel knowledge-based protein-ligand scoring function that employs a new definition for the reference state, allowing us to relate a statistical potential to a Lennard-Jones (LJ) potential. In this way, the LJ potential parameters were generated from protein-ligand complex structural data contained in the Protein Databank (PDB). Forty-nine (49) types of atomic pairwise interactions were derived using this method, which we call the knowledge-based and empirical combined scoring algorithm (KECSA). Two validation benchmarks were introduced to test the performance of KECSA. The first validation benchmark included two test sets that address the training set and enthalpy/entropy of KECSA. The second validation benchmark suite included two large-scale and five small-scale test sets, to compare the reproducibility of KECSA, with respect to two empirical score functions previously developed in our laboratory (LISA and LISA+), as well as to other well-known scoring methods. Validation results illustrate that KECSA shows improved performance in all test sets when compared with other scoring methods, especially in its ability to minimize the root mean square error (RMSE). LISA and LISA+ displayed similar performance using the correlation coefficient and Kendall τ as the metric of quality for some of the small test sets. Further pathways for improvement are discussed for which would allow KECSA to be more sensitive to subtle changes in ligand structure.
Collapse
Affiliation(s)
- Zheng Zheng
- Department of Chemistry and the Quantum Theory Project, University of Florida, Gainesville, Florida 32611-8435, United States
| | | |
Collapse
|
20
|
Lee HW, Lee HC, Lee LK, Teber ET, Church WB. The use of soluble protein structures in modeling helical proteins in a layered membrane. J Biomol Struct Dyn 2013; 32:308-18. [PMID: 23527746 DOI: 10.1080/07391102.2013.765808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Major advances have been made in the prediction of soluble protein structures, led by the knowledge-based modeling methods that extract useful structural trends from known protein structures and incorporate them into scoring functions. The same cannot be reported for the class of transmembrane proteins, primarily due to the lack of high-resolution structural data for transmembrane proteins, which render many of the knowledge-based method unreliable or invalid. We have developed a method that harnesses the vast structural knowledge available in soluble protein data for use in the modeling of transmembrane proteins. At the core of the method, a set of transmembrane protein decoy sets that allow us to filter and train features recognized from soluble proteins for transmembrane protein modeling into a set of scoring functions. We have demonstrated that structures of soluble proteins can provide significant insight into transmembrane protein structures. A complementary novel two-stage modeling/selection process that mimics the two-stage helical membrane protein folding was developed. Combined with the scoring function, the method was successfully applied to model 5 transmembrane proteins. The root mean square deviations of the predicted models ranged from 5.0 to 8.8 Å to the native structures.
Collapse
Affiliation(s)
- Hong Wing Lee
- a Faculty of Pharmacy , Group in Biomolecular Structure and Informatics, University of Sydney , Sydney , NSW , 2006 , Australia
| | | | | | | | | |
Collapse
|
21
|
Zhao F, Xu J. A position-specific distance-dependent statistical potential for protein structure and functional study. Structure 2012; 20:1118-26. [PMID: 22608968 PMCID: PMC3372698 DOI: 10.1016/j.str.2012.04.003] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2012] [Revised: 04/09/2012] [Accepted: 04/10/2012] [Indexed: 10/28/2022]
Abstract
Although studied extensively, designing highly accurate protein energy potential is still challenging. A lot of knowledge-based statistical potentials are derived from the inverse of the Boltzmann law and consist of two major components: observed atomic interacting probability and reference state. These potentials mainly distinguish themselves in the reference state and use a similar simple counting method to estimate the observed probability, which is usually assumed to correlate with only atom types. This article takes a rather different view on the observed probability and parameterizes it by the protein sequence profile context of the atoms and the radius of the gyration, in addition to atom types. Experiments confirm that our position-specific statistical potential outperforms currently the popular ones in several decoy discrimination tests. Our results imply that, in addition to reference state, the observed probability also makes energy potentials different and evolutionary information greatly boost performance of energy potentials.
Collapse
Affiliation(s)
- Feng Zhao
- Toyota Technological Institute at Chicago, Chicago IL, USA 60637
| | - Jinbo Xu
- Toyota Technological Institute at Chicago, Chicago IL, USA 60637
| |
Collapse
|
22
|
Zhou H, Skolnick J. GOAP: a generalized orientation-dependent, all-atom statistical potential for protein structure prediction. Biophys J 2012; 101:2043-52. [PMID: 22004759 DOI: 10.1016/j.bpj.2011.09.012] [Citation(s) in RCA: 197] [Impact Index Per Article: 16.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2011] [Revised: 09/07/2011] [Accepted: 09/09/2011] [Indexed: 12/18/2022] Open
Abstract
An accurate scoring function is a key component for successful protein structure prediction. To address this important unsolved problem, we develop a generalized orientation and distance-dependent all-atom statistical potential. The new statistical potential, generalized orientation-dependent all-atom potential (GOAP), depends on the relative orientation of the planes associated with each heavy atom in interacting pairs. GOAP is a generalization of previous orientation-dependent potentials that consider only representative atoms or blocks of side-chain or polar atoms. GOAP is decomposed into distance- and angle-dependent contributions. The DFIRE distance-scaled finite ideal gas reference state is employed for the distance-dependent component of GOAP. GOAP was tested on 11 commonly used decoy sets containing 278 targets, and recognized 226 native structures as best from the decoys, whereas DFIRE recognized 127 targets. The major improvement comes from decoy sets that have homology-modeled structures that are close to native (all within ∼4.0 Å) or from the ROSETTA ab initio decoy set. For these two kinds of decoys, orientation-independent DFIRE or only side-chain orientation-dependent RWplus performed poorly. Although the OPUS-PSP block-based orientation-dependent, side-chain atom contact potential performs much better (recognizing 196 targets) than DFIRE, RWplus, and dDFIRE, it is still ∼15% worse than GOAP. Thus, GOAP is a promising advance in knowledge-based, all-atom statistical potentials. GOAP is available for download at http://cssb.biology.gatech.edu/GOAP.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia, USA
| | | |
Collapse
|
23
|
Bettella F, Rasinski D, Knapp EW. Protein Secondary Structure Prediction with SPARROW. J Chem Inf Model 2012; 52:545-56. [DOI: 10.1021/ci200321u] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Francesco Bettella
- Freie Universität
Berlin,
Institut für Chemie, Fabeckstr. 36a, D-14195 Berlin, Germany
- deCODE genetics, Sturlugata
8, 101 Reykjavik, Iceland
| | - Dawid Rasinski
- Freie Universität
Berlin,
Institut für Chemie, Fabeckstr. 36a, D-14195 Berlin, Germany
| | - Ernst Walter Knapp
- Freie Universität
Berlin,
Institut für Chemie, Fabeckstr. 36a, D-14195 Berlin, Germany
| |
Collapse
|
24
|
Fan H, Schneidman-Duhovny D, Irwin JJ, Dong G, Shoichet BK, Sali A. Statistical potential for modeling and ranking of protein-ligand interactions. J Chem Inf Model 2011; 51:3078-92. [PMID: 22014038 DOI: 10.1021/ci200377u] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
Applications in structural biology and medicinal chemistry require protein-ligand scoring functions for two distinct tasks: (i) ranking different poses of a small molecule in a protein binding site and (ii) ranking different small molecules by their complementarity to a protein site. Using probability theory, we developed two atomic distance-dependent statistical scoring functions: PoseScore was optimized for recognizing native binding geometries of ligands from other poses and RankScore was optimized for distinguishing ligands from nonbinding molecules. Both scores are based on a set of 8,885 crystallographic structures of protein-ligand complexes but differ in the values of three key parameters. Factors influencing the accuracy of scoring were investigated, including the maximal atomic distance and non-native ligand geometries used for scoring, as well as the use of protein models instead of crystallographic structures for training and testing the scoring function. For the test set of 19 targets, RankScore improved the ligand enrichment (logAUC) and early enrichment (EF(1)) scores computed by DOCK 3.6 for 13 and 14 targets, respectively. In addition, RankScore performed better at rescoring than each of seven other scoring functions tested. Accepting both the crystal structure and decoy geometries with all-atom root-mean-square errors of up to 2 Å from the crystal structure as correct binding poses, PoseScore gave the best score to a correct binding pose among 100 decoys for 88% of all cases in a benchmark set containing 100 protein-ligand complexes. PoseScore accuracy is comparable to that of DrugScore(CSD) and ITScore/SE and superior to 12 other tested scoring functions. Therefore, RankScore can facilitate ligand discovery, by ranking complexes of the target with different small molecules; PoseScore can be used for protein-ligand complex structure prediction, by ranking different conformations of a given protein-ligand pair. The statistical potentials are available through the Integrative Modeling Platform (IMP) software package (http://salilab.org/imp) and the LigScore Web server (http://salilab.org/ligscore/).
Collapse
Affiliation(s)
- Hao Fan
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, USA
| | | | | | | | | | | |
Collapse
|
25
|
Vishnepolsky B, Pirtskhalava M. CONTSOR--a new knowledge-based fold recognition potential, based on side chain orientation and contacts between residue terminal groups. Protein Sci 2011; 21:134-41. [PMID: 22057923 DOI: 10.1002/pro.763] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Revised: 10/18/2011] [Accepted: 10/31/2011] [Indexed: 11/09/2022]
Abstract
Recognizing the structural similarity without significant sequence identity (fold recognition) is an effective method for protein structure prediction. Previously, we developed a fold recognition potential called SORDIS, which incorporated side chain orientation in relation to hydrophobic core centers, distance of the residues from the protein globule center and secondary structure terms. But this potential does not include terms, based on close contacts between residues. In this paper a new fold recognition potential CONTSOR was presented, which based on SORDIS terms and the term, based on contacts between amino acid terminal groups. The performance of this potential was evaluated on SABmark benchmark for alignment accuracy and on SABmark and Lindahl benchmarks for fold recognition. The results show that CONTSOR has the best performance among other potentials on SABmark benchmark both for alignment accuracy and fold recognition and one of the best performances on Lindahl benchmark. CONTSOR software package is available for download at http://www.lifescience.org.ge/downloads/contsor.zip.
Collapse
Affiliation(s)
- Boris Vishnepolsky
- Life Science Research Centre, Laboratory of Bioinformatics, 14 Gotua Street, Tbilisi, Georgia.
| | | |
Collapse
|
26
|
Sun W, He J. From isotropic to anisotropic side chain representations: comparison of three models for residue contact estimation. PLoS One 2011; 6:e19238. [PMID: 21552527 PMCID: PMC3084275 DOI: 10.1371/journal.pone.0019238] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Accepted: 03/29/2011] [Indexed: 11/19/2022] Open
Abstract
The criterion to determine residue contact is a fundamental problem in deriving knowledge-based mean-force potential energy calculations for protein structures. A frequently used criterion is to require the side chain center-to-center distance or the -to- atom distance to be within a pre-determined cutoff distance. However, the spatially anisotropic nature of the side chain determines that it is challenging to identify the contact pairs. This study compares three side chain contact models: the Atom Distance criteria (ADC) model, the Isotropic Sphere Side chain (ISS) model and the Anisotropic Ellipsoid Side chain (AES) model using 424 high resolution protein structures in the Protein Data Bank. The results indicate that the ADC model is the most accurate and ISS is the worst. The AES model eliminates about 95% of the incorrectly counted contact-pairs in the ISS model. Algorithm analysis shows that AES model is the most computational intensive while ADC model has moderate computational cost. We derived a dataset of the mis-estimated contact pairs by AES model. The most misjudged pairs are Arg-Glu, Arg-Asp and Arg-Tyr. Such a dataset can be useful for developing the improved AES model by incorporating the pair-specific information for the cutoff distance.
Collapse
Affiliation(s)
- Weitao Sun
- Zhou Pei-Yuan Center for Applied Mathematics, Tsinghua University, Beijing, China.
| | | |
Collapse
|
27
|
Mittal A, Jayaram B. Backbones of Folded Proteins Reveal Novel Invariant Amino Acid Neighborhoods. J Biomol Struct Dyn 2011; 28:443-54. [DOI: 10.1080/073911011010524954] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
|
28
|
Solis AD, Rackovsky SR. Information-theoretic analysis of the reference state in contact potentials used for protein structure prediction. Proteins 2010; 78:1382-97. [PMID: 20034109 DOI: 10.1002/prot.22652] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Using information-theoretic concepts, we examine the role of the reference state, a crucial component of empirical potential functions, in protein fold recognition. We derive an information-based connection between the probability distribution functions of the reference state and those that characterize the decoy set used in threading. In examining commonly used contact reference states, we find that the quasi-chemical approximation is informatically superior to other variant models designed to include characteristics of real protein chains, such as finite length and variable amino acid composition from protein to protein. We observe that in these variant models, the total divergence, the operative function that quantifies discrimination, decreases along with threading performance. We find that any amount of nativeness encoded in the reference state model does not significantly improve threading performance. A promising avenue for the development of better potentials is suggested by our information-theoretic analysis of the action of contact potentials on individual protein sequences. Our results show that contact potentials perform better when the compositional properties of the data set used to derive the score function probabilities are similar to the properties of the sequence of interest. Results also suggest to use only sequences of similar composition in deriving contact potentials, to tailor the contact potential specifically for a test sequence.
Collapse
Affiliation(s)
- Armando D Solis
- Department of Pharmacology and Systems Therapeutics, Mount Sinai School of Medicine, New York, New York 10029, USA.
| | | |
Collapse
|
29
|
Rykunov D, Fiser A. New statistical potential for quality assessment of protein models and a survey of energy functions. BMC Bioinformatics 2010; 11:128. [PMID: 20226048 PMCID: PMC2853469 DOI: 10.1186/1471-2105-11-128] [Citation(s) in RCA: 72] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2009] [Accepted: 03/12/2010] [Indexed: 11/30/2022] Open
Abstract
Background Scoring functions, such as molecular mechanic forcefields and statistical potentials are fundamentally important tools in protein structure modeling and quality assessment. Results The performances of a number of publicly available scoring functions are compared with a statistical rigor, with an emphasis on knowledge-based potentials. We explored the effect on accuracy of alternative choices for representing interaction center types and other features of scoring functions, such as using information on solvent accessibility, on torsion angles, accounting for secondary structure preferences and side chain orientation. Partially based on the observations made, we present a novel residue based statistical potential, which employs a shuffled reference state definition and takes into account the mutual orientation of residue side chains. Atom- and residue-level statistical potentials and Linux executables to calculate the energy of a given protein proposed in this work can be downloaded from http://www.fiserlab.org/potentials. Conclusions Among the most influential terms we observed a critical role of a proper reference state definition and the benefits of including information about the microenvironment of interaction centers. Molecular mechanical potentials were also tested and found to be over-sensitive to small local imperfections in a structure, requiring unfeasible long energy relaxation before energy scores started to correlate with model quality.
Collapse
Affiliation(s)
- Dmitry Rykunov
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, 1300 Morris Park Ave,, Bronx, NY 10461, USA
| | | |
Collapse
|
30
|
Mean-Force Scoring Functions for Protein–Ligand Binding. ACTA ACUST UNITED AC 2010. [DOI: 10.1016/s1574-1400(10)06014-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
31
|
Buchholz M, Hamann A, Aust S, Brandt W, Böhme L, Hoffmann T, Schilling S, Demuth HU, Heiser U. Inhibitors for Human Glutaminyl Cyclase by Structure Based Design and Bioisosteric Replacement. J Med Chem 2009; 52:7069-80. [DOI: 10.1021/jm900969p] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
| | | | | | - Wolfgang Brandt
- Department of Bioorganic Chemistry, Leibniz Institute of Plant Biochemistry, Weinberg 3, D-06120 Halle, Germany
| | | | | | | | - Hans-Ulrich Demuth
- Department of Medicinal Chemistry
- Department of Enzymology
- Department of Preclinical Pharmacology
| | | |
Collapse
|
32
|
Lobanov MY, Finkel’shtein AV. Analogy-based protein structure prediction: II. Testing of substitution matrices and pseudopotentials used to align protein sequences with spatial structures. Mol Biol 2009. [DOI: 10.1134/s0026893309040207] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
33
|
Betancourt MR. Another look at the conditions for the extraction of protein knowledge-based potentials. Proteins 2009; 76:72-85. [PMID: 19089977 DOI: 10.1002/prot.22320] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Protein knowledge-based potentials are effective free energies obtained from databases of known protein structures. They are used to parameterize coarse-grained protein models in many folding simulation and structure prediction methods. Two common approaches are used in the derivation of knowledge-based potentials. One assumes that the energy parameters optimize the native structure stability. The other assumes that interaction events are related to their energies according to the Boltzmann distribution, and that they are distributed independently of other events, that is, the quasi-chemical approximation. Here, these assumptions are systematically tested by extracting contact energies from artificial databases of lattice proteins with predefined pairwise contact energies. Databases of protein sequences are designed to either satisfy the Boltzmann distribution at high or low temperatures, or to simultaneously optimize the native stability and folding kinetics. It is found that the quasi-chemical approximation, with the ideal reference state, accurately reproduce the true energies for high temperature Boltzmann distributed sequences (weakly interacting residues), but less accurately at low temperatures, where the sequences correspond to energy minima and the residues are strongly interacting. To overcome this problem, an iterative procedure for Boltzmann distributed sequences is introduced, which accounts for interacting residue correlations and eliminates the need for the quasi-chemical approximation. In this case, the energies are accurately reproduced at any ensemble temperature. However, when the database of sequences designed for optimal stability and kinetics is used, the energy correlation is less than optimal using either method, exhibiting random and systematic deviations from linearity. Therefore, the assumption that native structures are maximally stable or that sequences are determined according to the Boltzmann distribution seems to be inadequate for obtaining accurate energies. The limited number of sequences in the database and the inhomogeneous concentration of amino acids from one structure to another do not seem to be major obstacles for improving the quality of the extracted pairwise energies, with the exception of repulsive interactions.
Collapse
Affiliation(s)
- Marcos R Betancourt
- Department of Physics, Indiana University Purdue University Indianapolis, Indianapolis, Indiana 46202, USA.
| |
Collapse
|
34
|
Organ-specific attenuation of murine hepatitis virus strain A59 by replacement of catalytic residues in the putative viral cyclic phosphodiesterase ns2. J Virol 2009; 83:3743-53. [PMID: 19176619 DOI: 10.1128/jvi.02203-08] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
The Murine hepatitis virus (MHV) strain A59 ns2 protein is a 30-kDa nonstructural protein that is expressed from a subgenomic mRNA in the cytoplasm of virus-infected cells. Its homologs are also encoded in other closely related group 2a coronaviruses and more distantly related toroviruses. Together, these proteins comprise a subset of a large superfamily of 2H phosphoesterase proteins that are distinguished by a pair of conserved His-x-Thr/Ser motifs encompassing catalytically important residues. We have used a vaccinia virus-based reverse genetic system to produce recombinant viruses encoding ns2 proteins with single-amino-acid substitutions in, or adjacent to, these conserved motifs, namely, inf-ns2 H46A, inf-ns2 S48A, inf-ns2-S120A, and inf-ns2-H126R. All of the mutant viruses replicate in mouse 17 clone 1 fibroblast cells and mouse embryonic cells to the same extent as the parental wild-type recombinant virus, inf-MHV-A59. However, compared to inf-MHV-A59, the inf-ns2 H46A and inf-ns2-H126R mutants are highly attenuated for replication in mouse liver following intrahepatic inoculation. Interestingly, none of the mutant viruses were attenuated for replication in mouse brain following intracranial inoculation. These results show that the ns2 protein of MHV-A59 has an important role in virus pathogenicity and that a substitution of the histidine residues of the MHV-A59 ns2 His-x-Thr/Ser motifs is critical for virus virulence in the liver but not in the brain. This novel phenotype suggests a strategy to investigate the function of the MHV-A59 ns2 protein involving the search for organ-specific proteins or RNAs that react differentially to wild-type and mutant ns2 proteins.
Collapse
|
35
|
Hart SE, Howe CJ, Mizuguchi K, Fernandez-Recio J. Docking of cytochrome c6 and plastocyanin to the aa3-type cytochrome c oxidase in the cyanobacterium Phormidium laminosum. Protein Eng Des Sel 2008; 21:689-98. [DOI: 10.1093/protein/gzn051] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
36
|
Kulharia M, Goody RS, Jackson RM. Information Theory-Based Scoring Function for the Structure-Based Prediction of Protein−Ligand Binding Affinity. J Chem Inf Model 2008; 48:1990-8. [DOI: 10.1021/ci800125k] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Mahesh Kulharia
- Department of Physical Biochemistry, Max Planck Institute of Molecular Physiology, Otto Hahn Strasse 11, Dortmund, Germany 44227, and Institute of Molecular and Cellular Biology, University of Leeds, Leeds, U.K. LS2 9JT
| | - Roger S. Goody
- Department of Physical Biochemistry, Max Planck Institute of Molecular Physiology, Otto Hahn Strasse 11, Dortmund, Germany 44227, and Institute of Molecular and Cellular Biology, University of Leeds, Leeds, U.K. LS2 9JT
| | - Richard M. Jackson
- Department of Physical Biochemistry, Max Planck Institute of Molecular Physiology, Otto Hahn Strasse 11, Dortmund, Germany 44227, and Institute of Molecular and Cellular Biology, University of Leeds, Leeds, U.K. LS2 9JT
| |
Collapse
|
37
|
Vishnepolsky B, Managadze G, Pirtskhalava M. Comparison of the efficiency of evolutionary change-based and side chain orientation-based fold recognition potentials. Proteins 2008; 71:1863-78. [PMID: 18175309 DOI: 10.1002/prot.21871] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The present article describes residue level knowledge based potential SORDIS. SORDIS incorporates the information on side-chain orientation in relation to hydrophobic core centres, distance of residue from the globule centre and secondary structure. SORDIS has been tested and compared with widespread evolutionary change-based substitution matrices (BLOSUM, PAM, GONNET, Johnson-Overington, BLAJ, HSDM, and STROMA) in fold recognition experiments within the zone of weak sequence similarity (<16%). The obtained results show that the lower is the amino acid similarity between homologous pairs the higher is the performance of SORDIS in comparison with the potentials, based on the information about the evolutionary changes. Therefore, we propose that the employment of SORDIS in fold recognition can be useful.
Collapse
Affiliation(s)
- Boris Vishnepolsky
- Institute of Molecular Biology and Biological Physics, Tbilisi 0160, Georgia
| | | | | |
Collapse
|
38
|
Feng Y, Kloczkowski A, Jernigan RL. Four-body contact potentials derived from two protein datasets to discriminate native structures from decoys. Proteins 2007; 68:57-66. [PMID: 17393455 DOI: 10.1002/prot.21362] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Two-body inter-residue contact potentials for proteins have often been extracted and extensively used for threading. Here, we have developed a new scheme to derive four-body contact potentials as a way to consider protein interactions in a more cooperative model. We use several datasets of protein native structures to demonstrate that around 500 chains are sufficient to provide a good estimate of these four-body contact potentials by obtaining convergent threading results. We also have deliberately chosen two sets of protein native structures differing in resolution, one with all chains' resolution better than 1.5 A and the other with 94.2% of the structures having a resolution worse than 1.5 A to investigate whether potentials from well-refined protein datasets perform better in threading. However, potentials from well-refined proteins did not generate statistically significant better threading results. Our four-body contact potentials can discriminate well between native structures and partially unfolded or deliberately misfolded structures. Compared with another set of four-body contact potentials derived by using a Delaunay tessellation algorithm, our four-body contact potentials appear to offer a better characterization of the interactions between backbones and side chains and provide better threading results, somewhat complementary to those found using other potentials.
Collapse
Affiliation(s)
- Yaping Feng
- Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, Iowa 50011-0320, USA
| | | | | |
Collapse
|
39
|
Zheng S, Robertson TA, Varani G. A knowledge-based potential function predicts the specificity and relative binding energy of RNA-binding proteins. FEBS J 2007; 274:6378-91. [PMID: 18005254 DOI: 10.1111/j.1742-4658.2007.06155.x] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
RNA-protein interactions are fundamental to gene expression. Thus, the molecular basis for the sequence dependence of protein-RNA recognition has been extensively studied experimentally. However, there have been very few computational studies of this problem, and no sustained attempt has been made towards using computational methods to predict or alter the sequence-specificity of these proteins. In the present study, we provide a distance-dependent statistical potential function derived from our previous work on protein-DNA interactions. This potential function discriminates native structures from decoys, successfully predicts the native sequences recognized by sequence-specific RNA-binding proteins, and recapitulates experimentally determined relative changes in binding energy due to mutations of individual amino acids at protein-RNA interfaces. Thus, this work demonstrates that statistical models allow the quantitative analysis of protein-RNA recognition based on their structure and can be applied to modeling protein-RNA interfaces for prediction and design purposes.
Collapse
Affiliation(s)
- Suxin Zheng
- Department of Chemistry, University of Washington, Seattle, WA 98195, USA
| | | | | |
Collapse
|
40
|
Ferrada E, Vergara IA, Melo F. A knowledge-based potential with an accurate description of local interactions improves discrimination between native and near-native protein conformations. Cell Biochem Biophys 2007; 49:111-24. [PMID: 17906366 DOI: 10.1007/s12013-007-0050-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2007] [Revised: 11/30/1999] [Accepted: 07/16/2007] [Indexed: 10/22/2022]
Abstract
The correct discrimination between native and near-native protein conformations is essential for achieving accurate computer-based protein structure prediction. However, this has proven to be a difficult task, since currently available physical energy functions, empirical potentials and statistical scoring functions are still limited in achieving this goal consistently. In this work, we assess and compare the ability of different full atom knowledge-based potentials to discriminate between native protein structures and near-native protein conformations generated by comparative modeling. Using a benchmark of 152 near-native protein models and their corresponding native structures that encompass several different folds, we demonstrate that the incorporation of close non-bonded pairwise atom terms improves the discriminating power of the empirical potentials. Since the direct and unbiased derivation of close non-bonded terms from current experimental data is not possible, we obtained and used those terms from the corresponding pseudo-energy functions of a non-local knowledge-based potential. It is shown that this methodology significantly improves the discrimination between native and near-native protein conformations, suggesting that a proper description of close non-bonded terms is important to achieve a more complete and accurate description of native protein conformations. Some external knowledge-based energy functions that are widely used in model assessment performed poorly, indicating that the benchmark of models and the specific discrimination task tested in this work constitutes a difficult challenge.
Collapse
Affiliation(s)
- Evandro Ferrada
- Departamento de Genética Molecular y Microbiología, Facultad de Ciencias Biológicas, Pontificia Universidad Católica de Chile, Alameda 340, Santiago, Chile
| | | | | |
Collapse
|
41
|
Fitzgerald JE, Jha AK, Colubri A, Sosnick TR, Freed KF. Reduced C(beta) statistical potentials can outperform all-atom potentials in decoy identification. Protein Sci 2007; 16:2123-39. [PMID: 17893359 PMCID: PMC2204143 DOI: 10.1110/ps.072939707] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
We developed a series of statistical potentials to recognize the native protein from decoys, particularly when using only a reduced representation in which each side chain is treated as a single C(beta) atom. Beginning with a highly successful all-atom statistical potential, the Discrete Optimized Protein Energy function (DOPE), we considered the implications of including additional information in the all-atom statistical potential and subsequently reducing to the C(beta) representation. One of the potentials includes interaction energies conditional on backbone geometries. A second potential separates sequence local from sequence nonlocal interactions and introduces a novel reference state for the sequence local interactions. The resultant potentials perform better than the original DOPE statistical potential in decoy identification. Moreover, even upon passing to a reduced C(beta) representation, these statistical potentials outscore the original (all-atom) DOPE potential in identifying native states for sets of decoys. Interestingly, the backbone-dependent statistical potential is shown to retain nearly all of the information content of the all-atom representation in the C(beta) representation. In addition, these new statistical potentials are combined with existing potentials to model hydrogen bonding, torsion energies, and solvation energies to produce even better performing potentials. The ability of the C(beta) statistical potentials to accurately represent protein interactions bodes well for computational efficiency in protein folding calculations using reduced backbone representations, while the extensions to DOPE illustrate general principles for improving knowledge-based potentials.
Collapse
Affiliation(s)
- James E Fitzgerald
- Department of Physics, The University of Chicago, Chicago, Illinois 60637, USA
| | | | | | | | | |
Collapse
|
42
|
Wu Y, Lu M, Chen M, Li J, Ma J. OPUS-Ca: a knowledge-based potential function requiring only Calpha positions. Protein Sci 2007; 16:1449-63. [PMID: 17586777 PMCID: PMC2206690 DOI: 10.1110/ps.072796107] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
In this paper, we report a knowledge-based potential function, named the OPUS-Ca potential, that requires only Calpha positions as input. The contributions from other atomic positions were established from pseudo-positions artificially built from a Calpha trace for auxiliary purposes. The potential function is formed based on seven major representative molecular interactions in proteins: distance-dependent pairwise energy with orientational preference, hydrogen bonding energy, short-range energy, packing energy, tri-peptide packing energy, three-body energy, and solvation energy. From the testing of decoy recognition on a number of commonly used decoy sets, it is shown that the new potential function outperforms all known Calpha-based potentials and most other coarse-grained ones that require more information than Calpha positions. We hope that this potential function adds a new tool for protein structural modeling.
Collapse
Affiliation(s)
- Yinghao Wu
- Department of Bioengineering, Rice University, Houston, TX 77005, USA
| | | | | | | | | |
Collapse
|
43
|
Djurdjevic DP, Biggs MJ. Ab initio protein fold prediction using evolutionary algorithms: influence of design and control parameters on performance. J Comput Chem 2007; 27:1177-95. [PMID: 16752367 DOI: 10.1002/jcc.20440] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
True ab initio prediction of protein 3D structure requires only the protein primary structure, a physicochemical free energy model, and a search method for identifying the free energy global minimum. Various characteristics of evolutionary algorithms (EAs) mean they are in principle well suited to the latter. Studies to date have been less than encouraging, however. This is because of the limited consideration given to EA design and control parameter issues. A comprehensive study of these issues was, therefore, undertaken for ab initio protein fold prediction using a full atomistic protein model. The performance and optimal control parameter settings of twelve EA designs where first established using a 15-residue polyalanine molecule-design aspects varied include the encoding alphabet, crossover operator, and replacement strategy. It can be concluded that real encoding and multipoint crossover are superior, while both generational and steady-state replacement strategies have merits. The scaling between the optimal control parameter settings and polyalanine size was also identified for both generational and steady-state designs based on real encoding and multipoint crossover. Application of the steady-state design to met-enkephalin indicated that these scalings are potentially transferable to real proteins. Comparison of the performance of the steady state design for met-enkephalin with other ab initio methods indicates that EAs can be competitive provided the correct design and control parameter values are used.
Collapse
Affiliation(s)
- Dusan P Djurdjevic
- Institute for Materials and Processes, University of Edinburgh, King's Buildings, Mayfield Road, Edinburgh EH9 3JL, United Kingdom
| | | |
Collapse
|
44
|
Rykunov D, Fiser A. Effects of amino acid composition, finite size of proteins, and sparse statistics on distance-dependent statistical pair potentials. Proteins 2007; 67:559-68. [PMID: 17335003 DOI: 10.1002/prot.21279] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Statistical distance dependent pair potentials are frequently used in a variety of folding, threading, and modeling studies of proteins. The applicability of these types of potentials is tightly connected to the reliability of statistical observations. We explored the possible origin and extent of false positive signals in statistical potentials by analyzing their distance dependence in a variety of randomized protein-like models. While on average potentials derived from such models are expected to equal zero at any distance, we demonstrate that systematic and significant distortions exist. These distortions originate from the limited statistical counts in local environments of proteins and from the limited size of protein structures at large distances. We suggest that these systematic errors in statistical potentials are connected to the dependence of amino acid composition on protein size and to variation in protein sizes. Additionally, atom-based potentials are dominated by a false positive signal that is due to correlation among distances measured from atoms of one residue to atoms of another residue. The significance of residue-based pairwise potentials at various spatial pair separations was assessed in this study and it was found that as few as approximately 50% of potential values were statistically significant at distances below 4 A, and only at most approximately 80% of them were significant at larger pair separations. A new definition for reference state, free of the observed systematic errors, is suggested. It has been demonstrated to generate statistical potentials that compare favorably to other publicly available ones.
Collapse
Affiliation(s)
- Dmitry Rykunov
- Department of Biochemistry, Seaver Center for Bioinformatics, Albert Einstein College of Medicine, Bronx, New York 10461, USA
| | | |
Collapse
|
45
|
Wollacott AM, Merz KM. Assessment of Semiempirical Quantum Mechanical Methods for the Evaluation of Protein Structures. J Chem Theory Comput 2007; 3:1609-1619. [PMID: 18728758 DOI: 10.1021/ct600325q] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The ability to discriminate native structures from computer-generated misfolded ones is key to predicting the three-dimensional structure of a protein from its amino acid sequence. Here we describe an assessment of semiempirical methods for discriminating native protein structures from decoy models. The discrimination of decoys entails an analysis of a large number of protein structures, and provides a large-scale validation of quantum mechanical methods and their ability to accurately model proteins. We combine our analysis of semiempirical methods with a comparison of an AMBER force field to discriminate decoys in conjunction with a continuum solvent model. Protein decoys provide a rigorous and reliable benchmark for the evaluation of scoring functions, not only in their ability to accurately identify native structures but also to be computationally tractable to sample a large set of non-native models.
Collapse
|
46
|
Anashkina A, Kuznetsov E, Esipova N, Tumanyan V. Comprehensive statistical analysis of residues interaction specificity at protein-protein interfaces. Proteins 2007; 67:1060-77. [PMID: 17357164 DOI: 10.1002/prot.21363] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We calculated interchain contacts on the atomic level for nonredundant set of 4602 protein-protein interfaces using an unbiased Voronoi-Delaune tessellation method, and made 20x20 residue contact matrixes both for homodimers and heterocomplexes. The area of contacts and the distance distribution for these contacts were calculated on both the residue and the atomic levels. We analyzed residue area distribution and showed the existence of two types of interresidue contacts: stochastic and specific. We also derived formulas describing the distribution of contact area for stochastic and specific interactions in parametric form. Maximum pairing preference index was found for Cys-Cys contacts and for oppositely charged interactions. A significant difference in residue contacts was observed between homodimers and heterocomplexes. Interfaces in homodimers were enriched with contacts between residues of the same type due to the effects of structure symmetry.
Collapse
Affiliation(s)
- Anastasya Anashkina
- Laboratory of bioinformatics and system biology, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Moscow, Russia.
| | | | | | | |
Collapse
|
47
|
Cheng J, Pei J, Lai L. A free-rotating and self-avoiding chain model for deriving statistical potentials based on protein structures. Biophys J 2007; 92:3868-77. [PMID: 17351015 PMCID: PMC1868969 DOI: 10.1529/biophysj.106.102152] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Statistical potentials have been widely used in protein studies despite the much-debated theoretical basis. In this work, we have applied two physical reference states for deriving the statistical potentials based on protein structure features to achieve zero interaction and orthogonalization. The free-rotating chain-based potential applies a local free-rotating chain reference state, which could theoretically be described by the Gaussian distribution. The self-avoiding chain-based potential applies a reference state derived from a database of artificial self-avoiding backbones generated by Monte Carlo simulation. These physical reference states are independent of known protein structures and are based solely on the analytical formulation or simulation method. The new potentials performed better and yielded higher Z-scores and success rates compared to other statistical potentials. The end-to-end distance distribution produced by the self-avoiding chain model was similar to the distance distribution of protein atoms in structure database. This fact may partly explain the basis of the reference states that depend on the atom pair frequency observed in the protein database. The current study showed that a more physical reference model improved the performance of statistical potentials in protein fold recognition, which could also be extended to other types of applications.
Collapse
Affiliation(s)
- Ji Cheng
- State Key Laboratory for Structural Chemistry of Stable and Unstable Species, College of Chemistry and Molecular Engineering, and Center for Theoretical Biology, Peking University, Beijing, China
| | | | | |
Collapse
|
48
|
Mukherjee A, Bhimalapuram P, Bagchi B. Orientation-dependent potential of mean force for protein folding. J Chem Phys 2007; 123:014901. [PMID: 16035863 DOI: 10.1063/1.1940058] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We present a solvent-implicit minimalistic model potential among the amino acid residues of proteins, obtained by using the known native structures [deposited in the Protein Data Bank (PDB)]. In this model, the amino acid side chains are represented by a single ellipsoidal site, defined by the group of atoms about the center of mass of the side chain. These ellipsoidal sites interact with other sites through an orientation-dependent interaction potential which we construct in the following fashion. First, the site-site potential of mean force (PMF) between heavy atoms is calculated [following F. Melo and E. Feytsman, J. Mol. Biol. 267, 207 (1997)] from statistics of their distance separation obtained from crystal structures. These site-site potentials are then used to calculate the distance and the orientation-dependent potential between side chains of all the amino acid residues (AAR). The distance and orientation dependencies show several interesting results. For example, we find that the PMF between two hydrophobic AARs, such as phenylalanine, is strongly attractive at short distances (after the obvious repulsive region at very short separation) and is characterized by a deep minimum, for specific orientations. For the interaction between two hydrophilic AARs, such a deep minimum is absent and in addition, the potential interestingly reveals the combined effect of polar (charge) and hydrophobic interactions among some of these AARs. The effectiveness of our potential has been tested by calculating the Z-scores for a large set of proteins. The calculated Z-scores show high negative values for most of them, signifying the success of the potential to identify the native structure from among a large number of its decoy states.
Collapse
Affiliation(s)
- Arnab Mukherjee
- Solid State and Structural Chemistry Unit, Indian Institute of Science, Bangalore, India 560 012
| | | | | |
Collapse
|
49
|
Shen MY, Sali A. Statistical potential for assessment and prediction of protein structures. Protein Sci 2007; 15:2507-24. [PMID: 17075131 PMCID: PMC2242414 DOI: 10.1110/ps.062416606] [Citation(s) in RCA: 1758] [Impact Index Per Article: 103.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
Protein structures in the Protein Data Bank provide a wealth of data about the interactions that determine the native states of proteins. Using the probability theory, we derive an atomic distance-dependent statistical potential from a sample of native structures that does not depend on any adjustable parameters (Discrete Optimized Protein Energy, or DOPE). DOPE is based on an improved reference state that corresponds to noninteracting atoms in a homogeneous sphere with the radius dependent on a sample native structure; it thus accounts for the finite and spherical shape of the native structures. The DOPE potential was extracted from a nonredundant set of 1472 crystallographic structures. We tested DOPE and five other scoring functions by the detection of the native state among six multiple target decoy sets, the correlation between the score and model error, and the identification of the most accurate non-native structure in the decoy set. For all decoy sets, DOPE is the best performing function in terms of all criteria, except for a tie in one criterion for one decoy set. To facilitate its use in various applications, such as model assessment, loop modeling, and fitting into cryo-electron microscopy mass density maps combined with comparative protein structure modeling, DOPE was incorporated into the modeling package MODELLER-8.
Collapse
Affiliation(s)
- Min-Yi Shen
- Department of Biopharmaceutical Sciences, Department of Pharmaceutical Chemistry, University of California at San Francisco, San Francisco, California 94158, USA.
| | | |
Collapse
|
50
|
|