1
|
Adiyaman R, McGuffin LJ. Using Local Protein Model Quality Estimates to Guide a Molecular Dynamics-Based Refinement Strategy. Methods Mol Biol 2023; 2627:119-140. [PMID: 36959445 DOI: 10.1007/978-1-0716-2974-1_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2023]
Abstract
The refinement of predicted 3D models aims to bring them closer to the native structure by fixing errors including unusual bonds and torsion angles and irregular hydrogen bonding patterns. Refinement approaches based on molecular dynamics (MD) simulations using different types of restraints have performed well since CASP10. ReFOLD, developed by the McGuffin group, was one of the many MD-based refinement approaches, which were tested in CASP 12. When the performance of the ReFOLD method in CASP12 was evaluated, it was observed that ReFOLD suffered from the absence of a reliable guidance mechanism to reach consistent improvement for the quality of predicted 3D models, particularly in the case of template-based modelling (TBM) targets. Therefore, here we propose to utilize the local quality assessment score produced by ModFOLD6 to guide the MD-based refinement approach to further increase the accuracy of the predicted 3D models. The relative performance of the new local quality assessment guided MD-based refinement protocol and the original MD-based protocol ReFOLD are compared utilizing many different official scoring methods. By using the per-residue accuracy (or local quality) score to guide the refinement process, we are able to prevent the refined models from undesired structural deviations, thereby leading to more consistent improvements. This chapter will include a detailed analysis of the performance of the local quality assessment guided MD-based protocol versus that deployed in the original ReFOLD method.
Collapse
Affiliation(s)
- Recep Adiyaman
- School of Biological Sciences, University of Reading, Reading, UK
| | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Reading, UK.
| |
Collapse
|
2
|
Holland J, Grigoryan G. Structure‐conditioned amino‐acid couplings: how contact geometry affects pairwise sequence preferences. Protein Sci 2022; 31:900-917. [PMID: 35060221 PMCID: PMC8927866 DOI: 10.1002/pro.4280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 01/06/2022] [Accepted: 01/12/2022] [Indexed: 11/11/2022]
Abstract
Relating a protein's sequence to its conformation is a central challenge for both structure prediction and sequence design. Statistical contact potentials, as well as their more descriptive versions that account for side‐chain orientation and other geometric descriptors, have served as simplistic but useful means of representing second‐order contributions in sequence–structure relationships. Here we ask what happens when a pairwise potential is conditioned on the fully defined geometry of interacting backbones fragments. We show that the resulting structure‐conditioned coupling energies more accurately reflect pair preferences as a function of structural contexts. These structure‐conditioned energies more reliably encode native sequence information and more highly correlate with experimentally determined coupling energies. Clustering a database of interaction motifs by structure results in ensembles of similar energies and clustering them by energy results in ensembles of similar structures. By comparing many pairs of interaction motifs and showing that structural similarity and energetic similarity go hand‐in‐hand, we provide a tangible link between modular sequence and structure elements. This link is applicable to structural modeling, and we show that scoring CASP models with structured‐conditioned energies results in substantially higher correlation with structural quality than scoring the same models with a contact potential. We conclude that structure‐conditioned coupling energies are a good way to model the impact of interaction geometry on second‐order sequence preferences.
Collapse
Affiliation(s)
- Jack Holland
- Department of Computer Science Dartmouth College Hanover New Hampshire USA
| | - Gevorg Grigoryan
- Department of Computer Science Dartmouth College Hanover New Hampshire USA
| |
Collapse
|
3
|
Simpkin AJ, Rodríguez FS, Mesdaghi S, Kryshtafovych A, Rigden DJ. Evaluation of model refinement in CASP14. Proteins 2021; 89:1852-1869. [PMID: 34288138 PMCID: PMC8616799 DOI: 10.1002/prot.26185] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2021] [Revised: 06/19/2021] [Accepted: 07/11/2021] [Indexed: 12/15/2022]
Abstract
We report here an assessment of the model refinement category of the 14th round of Critical Assessment of Structure Prediction (CASP14). As before, predictors submitted up to five ranked refinements, along with associated residue-level error estimates, for targets that had a wide range of starting quality. The ability of groups to accurately rank their submissions and to predict coordinate error varied widely. Overall, only four groups out-performed a "naïve predictor" corresponding to the resubmission of the starting model. Among the top groups, there are interesting differences of approach and in the spread of improvements seen: some methods are more conservative, others more adventurous. Some targets were "double-barreled" for which predictors were offered a high-quality AlphaFold 2 (AF2)-derived prediction alongside another of lower quality. The AF2-derived models were largely unimprovable, many of their apparent errors being found to reside at domain and, especially, crystal lattice contacts. Refinement is shown to have a mixed impact overall on structure-based function annotation methods to predict nucleic acid binding, spot catalytic sites, and dock protein structures.
Collapse
Affiliation(s)
- Adam J. Simpkin
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | - Filomeno Sánchez Rodríguez
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
- Life Science, Diamond Light Source, Harwell Science and Innovation Campus, Didcot, Oxfordshire OX11 0DE, England
| | - Shahram Mesdaghi
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| | | | - Daniel J. Rigden
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Liverpool L69 7ZB, England
| |
Collapse
|
4
|
Heo L, Janson G, Feig M. Physics-based protein structure refinement in the era of artificial intelligence. Proteins 2021; 89:1870-1887. [PMID: 34156124 PMCID: PMC8616793 DOI: 10.1002/prot.26161] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 05/31/2021] [Accepted: 06/08/2021] [Indexed: 12/21/2022]
Abstract
Protein structure refinement is the last step in protein structure prediction pipelines. Physics-based refinement via molecular dynamics (MD) simulations has made significant progress during recent years. During CASP14, we tested a new refinement protocol based on an improved sampling strategy via MD simulations. MD simulations were carried out at an elevated temperature (360 K). An optimized use of biasing restraints and the use of multiple starting models led to enhanced sampling. The new protocol generally improved the model quality. In comparison with our previous protocols, the CASP14 protocol showed clear improvements. Our approach was successful with most initial models, many based on deep learning methods. However, we found that our approach was not able to refine machine-learning models from the AlphaFold2 group, often decreasing already high initial qualities. To better understand the role of refinement given new types of models based on machine-learning, a detailed analysis via MD simulations and Markov state modeling is presented here. We continue to find that MD-based refinement has the potential to improve AI predictions. We also identified several practical issues that make it difficult to realize that potential. Increasingly important is the consideration of inter-domain and oligomeric contacts in simulations; the presence of large kinetic barriers in refinement pathways also continues to present challenges. Finally, we provide a perspective on how physics-based refinement could continue to play a role in the future for improving initial predictions based on machine learning-based methods.
Collapse
Affiliation(s)
- Lim Heo
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, MI, 48824, USA
| |
Collapse
|
5
|
Kryshtafovych A, Moult J, Billings WM, Della Corte D, Fidelis K, Kwon S, Olechnovič K, Seok C, Venclovas Č, Won J. Modeling SARS-CoV-2 proteins in the CASP-commons experiment. Proteins 2021; 89:1987-1996. [PMID: 34462960 PMCID: PMC8616790 DOI: 10.1002/prot.26231] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/23/2021] [Accepted: 08/26/2021] [Indexed: 01/21/2023]
Abstract
Critical Assessment of Structure Prediction (CASP) is an organization aimed at advancing the state of the art in computing protein structure from sequence. In the spring of 2020, CASP launched a community project to compute the structures of the most structurally challenging proteins coded for in the SARS-CoV-2 genome. Forty-seven research groups submitted over 3000 three-dimensional models and 700 sets of accuracy estimates on 10 proteins. The resulting models were released to the public. CASP community members also worked together to provide estimates of local and global accuracy and identify structure-based domain boundaries for some proteins. Subsequently, two of these structures (ORF3a and ORF8) have been solved experimentally, allowing assessment of both model quality and the accuracy estimates. Models from the AlphaFold2 group were found to have good agreement with the experimental structures, with main chain GDT_TS accuracy scores ranging from 63 (a correct topology) to 87 (competitive with experiment).
Collapse
Affiliation(s)
| | - John Moult
- Department of Cell Biology and Molecular genetics, Institute for Bioscience and Biotechnology Research, University of Maryland, Rockville, Maryland, USA
| | - Wendy M Billings
- Department of Physics & Astronomy, Brigham Young University, Provo, Utah, USA
| | - Dennis Della Corte
- Department of Physics & Astronomy, Brigham Young University, Provo, Utah, USA
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, Davis, California, USA
| | - Sohee Kwon
- Department of Chemistry, Seoul National University, Seoul, South Korea
| | - Kliment Olechnovič
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul, South Korea
| | - Česlovas Venclovas
- Institute of Biotechnology, Life Sciences Center, Vilnius University, Vilnius, Lithuania
| | - Jonghun Won
- Department of Chemistry, Seoul National University, Seoul, South Korea
| | | |
Collapse
|
6
|
Adiyaman R, McGuffin LJ. ReFOLD3: refinement of 3D protein models with gradual restraints based on predicted local quality and residue contacts. Nucleic Acids Res 2021; 49:W589-W596. [PMID: 34009387 PMCID: PMC8218204 DOI: 10.1093/nar/gkab300] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Revised: 03/23/2021] [Accepted: 04/16/2021] [Indexed: 12/16/2022] Open
Abstract
ReFOLD3 is unique in its application of gradual restraints, calculated from local model quality estimates and contact predictions, which are used to guide the refinement of theoretical 3D protein models towards the native structures. ReFOLD3 achieves improved performance by using an iterative refinement protocol to fix incorrect residue contacts and local errors, including unusual bonds and angles, which are identified in the submitted models by our leading ModFOLD8 model quality assessment method. Following refinement, the likely resulting improvements to the submitted models are recognized by ModFOLD8, which produces both global and local quality estimates. During the CASP14 prediction season (May-Aug 2020), we used the ReFOLD3 protocol to refine hundreds of 3D models, for both the refinement and the main tertiary structure prediction categories. Our group improved the global and local quality scores for numerous starting models in the refinement category, where we ranked in the top 10 according to the official assessment. The ReFOLD3 protocol was also used for the refinement of the SARS-CoV-2 targets as a part of the CASP Commons COVID-19 initiative, and we provided a significant number of the top 10 models. The ReFOLD3 web server is freely available at https://www.reading.ac.uk/bioinf/ReFOLD/.
Collapse
Affiliation(s)
- Recep Adiyaman
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| | - Liam J McGuffin
- School of Biological Sciences, University of Reading, Whiteknights, Reading RG6 6AS, UK
| |
Collapse
|
7
|
Kapla J, Rodríguez-Espigares I, Ballante F, Selent J, Carlsson J. Can molecular dynamics simulations improve the structural accuracy and virtual screening performance of GPCR models? PLoS Comput Biol 2021; 17:e1008936. [PMID: 33983933 PMCID: PMC8186765 DOI: 10.1371/journal.pcbi.1008936] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 06/08/2021] [Accepted: 04/02/2021] [Indexed: 01/14/2023] Open
Abstract
The determination of G protein-coupled receptor (GPCR) structures at atomic resolution has improved understanding of cellular signaling and will accelerate the development of new drug candidates. However, experimental structures still remain unavailable for a majority of the GPCR family. GPCR structures and their interactions with ligands can also be modelled computationally, but such predictions have limited accuracy. In this work, we explored if molecular dynamics (MD) simulations could be used to refine the accuracy of in silico models of receptor-ligand complexes that were submitted to a community-wide assessment of GPCR structure prediction (GPCR Dock). Two simulation protocols were used to refine 30 models of the D3 dopamine receptor (D3R) in complex with an antagonist. Close to 60 μs of simulation time was generated and the resulting MD refined models were compared to a D3R crystal structure. In the MD simulations, the receptor models generally drifted further away from the crystal structure conformation. However, MD refinement was able to improve the accuracy of the ligand binding mode. The best refinement protocol improved agreement with the experimentally observed ligand binding mode for a majority of the models. Receptor structures with improved virtual screening performance, which was assessed by molecular docking of ligands and decoys, could also be identified among the MD refined models. Application of weak restraints to the transmembrane helixes in the MD simulations further improved predictions of the ligand binding mode and second extracellular loop. These results provide guidelines for application of MD refinement in prediction of GPCR-ligand complexes and directions for further method development.
Collapse
Affiliation(s)
- Jon Kapla
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Ismael Rodríguez-Espigares
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences of Pompeu Fabra University (UPF), Hospital del Mar Medical Research Institute (IMIM), Barcelona, Spain
| | - Flavio Ballante
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - Jana Selent
- Research Programme on Biomedical Informatics (GRIB), Department of Experimental and Health Sciences of Pompeu Fabra University (UPF), Hospital del Mar Medical Research Institute (IMIM), Barcelona, Spain
| | - Jens Carlsson
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
8
|
Egbert M, Porter KA, Ghani U, Kotelnikov S, Nguyen T, Ashizawa R, Kozakov D, Vajda S. Conservation of binding properties in protein models. Comput Struct Biotechnol J 2021; 19:2549-2566. [PMID: 34025942 PMCID: PMC8114079 DOI: 10.1016/j.csbj.2021.04.048] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2021] [Revised: 04/19/2021] [Accepted: 04/22/2021] [Indexed: 01/09/2023] Open
Abstract
We study the models submitted to round 12 of the Critical Assessment of protein Structure Prediction (CASP) experiment to assess how well the binding properties are conserved when the X-ray structures of the target proteins are replaced by their models. To explore small molecule binding we generate distributions of molecular probes - which are fragment-sized organic molecules of varying size, shape, and polarity - around the protein, and count the number of interactions between each residue and the probes, resulting in a vector of interactions we call a binding fingerprint. The similarity between two fingerprints, one for the X-ray structure and the other for a model of the protein, is determined by calculating the correlation coefficient between the two vectors. The resulting correlation coefficients are shown to correlate with global measures of accuracy established in CASP, and the relationship yields an accuracy threshold that has to be reached for meaningful binding surface conservation. The clusters formed by the probe molecules reliably predict binding hot spots and ligand binding sites in both X-ray structures and reasonably accurate models of the target, but ensembles of models may be needed for assessing the availability of proper binding pockets. We explored ligand docking to the few targets that had bound ligands in the X-ray structure. More targets were available to assess the ability of the models to reproduce protein-protein interactions by docking both the X-ray structures and models to their interaction partners in complexes. It was shown that this application is more difficult than finding small ligand binding sites, and the success rates heavily depend on the local structure in the potential interface. In particular, predicted conformations of flexible loops are frequently incorrect in otherwise highly accurate models, and may prevent predicting correct protein-protein interactions.
Collapse
Affiliation(s)
- Megan Egbert
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, United States
| | - Kathryn A. Porter
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, United States
| | - Usman Ghani
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, United States
| | - Sergei Kotelnikov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794, United States
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794, United States
| | - Thu Nguyen
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794, United States
| | - Ryota Ashizawa
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794, United States
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794, United States
| | - Dima Kozakov
- Department of Applied Mathematics and Statistics, Stony Brook University, Stony Brook, NY 11794, United States
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, NY 11794, United States
| | - Sandor Vajda
- Department of Biomedical Engineering, Boston University, Boston, MA 02215, United States
- Department of Chemistry, Boston University, Boston, MA 02215, United States
| |
Collapse
|
9
|
Protein Structure Refinement Using Multi-Objective Particle Swarm Optimization with Decomposition Strategy. Int J Mol Sci 2021; 22:ijms22094408. [PMID: 33922489 PMCID: PMC8122964 DOI: 10.3390/ijms22094408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2021] [Revised: 04/16/2021] [Accepted: 04/20/2021] [Indexed: 12/02/2022] Open
Abstract
Protein structure refinement is a crucial step for more accurate protein structure predictions. Most existing approaches treat it as an energy minimization problem to intuitively improve the quality of initial models by searching for structures with lower energy. Considering that a single energy function could not reflect the accurate energy landscape of all the proteins, our previous AIR 1.0 pipeline uses multiple energy functions to realize a multi-objectives particle swarm optimization-based model refinement. It is expected to provide a general balanced conformation search protocol guided from different energy evaluations. However, AIR 1.0 solves the multi-objective optimization problem as a whole, which could not result in good solution diversity and convergence on some targets. In this study, we report a decomposition-based method AIR 2.0, which is an updated version of AIR, for protein structure refinement. AIR 2.0 decomposes a multi-objective optimization problem into a number of subproblems and optimizes them simultaneously using particle swarm optimization algorithm. The solutions yielded by AIR 2.0 show better convergence and diversity compared to its previous version, which increases the possibilities of digging out better structure conformations. The experimental results on CASP13 refinement benchmark targets and blind tests in CASP 14 demonstrate the efficacy of AIR 2.0.
Collapse
|
10
|
Heo L, Arbour CF, Janson G, Feig M. Improved Sampling Strategies for Protein Model Refinement Based on Molecular Dynamics Simulation. J Chem Theory Comput 2021; 17:1931-1943. [PMID: 33562962 DOI: 10.1021/acs.jctc.0c01238] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Protein structures provide valuable information for understanding biological processes. Protein structures can be determined by experimental methods such as X-ray crystallography, nuclear magnetic resonance spectroscopy, or cryogenic electron microscopy. As an alternative, in silico methods can be used to predict protein structures. These methods utilize protein structure databases for structure prediction via template-based modeling or for training machine-learning models to generate predictions. Structure prediction for proteins distant from proteins with known structures often results in lower accuracy with respect to the true physiological structures. Physics-based protein model refinement methods can be applied to improve model accuracy in the predicted models. Refinement methods rely on conformational sampling around the predicted structures, and if structures closer to the native states are sampled, improvements in the model quality become possible. Molecular dynamics simulations have been especially successful for improving model qualities but although consistent refinement can be achieved, the improvements in model qualities are still moderate. To extend the refinement performance of a simulation-based protocol, we explored new schemes that focus on optimized use of biasing functions and the application of increased simulation temperatures. In addition, we tested the use of alternative initial models so that the simulations can explore the conformational space more broadly. Based on the insights of this analysis, we are proposing a new refinement protocol that significantly outperformed previous state-of-the-art molecular dynamics simulation-based protocols in the benchmark tests described here.
Collapse
Affiliation(s)
- Lim Heo
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Collin F Arbour
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Giacomo Janson
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| | - Michael Feig
- Department of Biochemistry and Molecular Biology, Michigan State University, East Lansing, Michigan 48824, United States
| |
Collapse
|
11
|
Runthala A. Probabilistic divergence of a template-based modelling methodology from the ideal protocol. J Mol Model 2021; 27:25. [PMID: 33411019 DOI: 10.1007/s00894-020-04640-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 12/09/2020] [Indexed: 12/27/2022]
Abstract
Protein structural information is essential for the detailed mapping of a functional protein network. For a higher modelling accuracy and quicker implementation, template-based algorithms have been extensively deployed and redefined. The methods only assess the predicted structure against its native state/template and do not estimate the accuracy for each modelling step. A divergence measure is therefore postulated to estimate the modelling accuracy against its theoretical optimal benchmark. By freezing the domain boundaries, the divergence measures are predicted for the most crucial steps of a modelling algorithm. To precisely refine the score using weighting constants, big data analysis could further be deployed.
Collapse
Affiliation(s)
- Ashish Runthala
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, 522502, India.
| |
Collapse
|
12
|
Bhattacharya D. refineD: improved protein structure refinement using machine learning based restrained relaxation. Bioinformatics 2020; 35:3320-3328. [PMID: 30759180 DOI: 10.1093/bioinformatics/btz101] [Citation(s) in RCA: 22] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Revised: 01/22/2019] [Accepted: 02/11/2019] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Protein structure refinement aims to bring moderately accurate template-based protein models closer to the native state through conformational sampling. However, guiding the sampling towards the native state by effectively using restraints remains a major issue in structure refinement. RESULTS Here, we develop a machine learning based restrained relaxation protocol that uses deep discriminative learning based binary classifiers to predict multi-resolution probabilistic restraints from the starting structure and subsequently converts these restraints to be integrated into Rosetta all-atom energy function as additional scoring terms during structure refinement. We use four restraint resolutions as adopted in GDT-HA (0.5, 1, 2 and 4 Å), centered on the Cα atom of each residue that are predicted by ensemble of four deep discriminative classifiers trained using combinations of sequence and structure-derived features as well as several energy terms from Rosetta centroid scoring function. The proposed method, refineD, has been found to produce consistent and substantial structural refinement through the use of cumulative and non-cumulative restraints on 150 benchmarking targets. refineD outperforms unrestrained relaxation strategy or relaxation that is restrained to starting structures using the FastRelax application of Rosetta or atomic-level energy minimization based ModRefiner method as well as molecular dynamics (MD) simulation based FG-MD protocol. Furthermore, by adjusting restraint resolutions, the method addresses the tradeoff that exists between degree and consistency of refinement. These results demonstrate a promising new avenue for improving accuracy of template-based protein models by effectively guiding conformational sampling during structure refinement through the use of machine learning based restraints. AVAILABILITY AND IMPLEMENTATION http://watson.cse.eng.auburn.edu/refineD/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| |
Collapse
|
13
|
Lee GR, Won J, Heo L, Seok C. GalaxyRefine2: simultaneous refinement of inaccurate local regions and overall protein structure. Nucleic Acids Res 2020; 47:W451-W455. [PMID: 31001635 PMCID: PMC6602442 DOI: 10.1093/nar/gkz288] [Citation(s) in RCA: 57] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2019] [Revised: 04/01/2019] [Accepted: 04/11/2019] [Indexed: 11/12/2022] Open
Abstract
The 3D structure of a protein can be predicted from its amino acid sequence with high accuracy for a large fraction of cases because of the availability of large quantities of experimental data and the advance of computational algorithms. Recently, deep learning methods exploiting the coevolution information obtained by comparing related protein sequences have been successfully used to generate highly accurate model structures even in the absence of template structure information. However, structures predicted based on either template structures or related sequences require further improvement in regions for which information is missing. Refining a predicted protein structure with insufficient information on certain regions is critical because these regions may be connected to functional specificity that is not conserved among related proteins. The GalaxyRefine2 web server, freely available via http://galaxy.seoklab.org/refine2, is an upgraded version of the GalaxyRefine protein structure refinement server and reflects recent developments successfully tested through CASP blind prediction experiments. This method adopts an iterative optimization approach involving various structure move sets to refine both local and global structures. The estimation of local error and hybridization of available homolog structures are also employed for effective conformation search.
Collapse
Affiliation(s)
- Gyu Rie Lee
- Department of Chemistry, Seoul National University, Seoul 08826, Korea
| | - Jonghun Won
- Department of Chemistry, Seoul National University, Seoul 08826, Korea
| | - Lim Heo
- Department of Chemistry, Seoul National University, Seoul 08826, Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul 08826, Korea
| |
Collapse
|
14
|
Piana S, Robustelli P, Tan D, Chen S, Shaw DE. Development of a Force Field for the Simulation of Single-Chain Proteins and Protein-Protein Complexes. J Chem Theory Comput 2020; 16:2494-2507. [PMID: 31914313 DOI: 10.1021/acs.jctc.9b00251] [Citation(s) in RCA: 82] [Impact Index Per Article: 20.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
The accuracy of atomistic physics-based force fields for the simulation of biological macromolecules has typically been benchmarked experimentally using biophysical data from simple, often single-chain systems. In the case of proteins, the careful refinement of force field parameters associated with torsion-angle potentials and the use of improved water models have enabled a great deal of progress toward the highly accurate simulation of such monomeric systems in both folded and, more recently, disordered states. In living organisms, however, proteins constantly interact with other macromolecules, such as proteins and nucleic acids, and these interactions are often essential for proper biological function. Here, we show that state-of-the-art force fields tuned to provide an accurate description of both ordered and disordered proteins can be limited in their ability to accurately describe protein-protein complexes. This observation prompted us to perform an extensive reparameterization of one variant of the Amber protein force field. Our objective involved refitting not only the parameters associated with torsion-angle potentials but also the parameters used to model nonbonded interactions, the specification of which is expected to be central to the accurate description of multicomponent systems. The resulting force field, which we call DES-Amber, allows for more accurate simulations of protein-protein complexes, while still providing a state-of-the-art description of both ordered and disordered single-chain proteins. Despite the improvements, calculated protein-protein association free energies still appear to deviate substantially from experiment, a result suggesting that more fundamental changes to the force field, such as the explicit treatment of polarization effects, may simultaneously further improve the modeling of single-chain proteins and protein-protein complexes.
Collapse
Affiliation(s)
- Stefano Piana
- D. E. Shaw Research, New York, New York 10036, United States
| | - Paul Robustelli
- D. E. Shaw Research, New York, New York 10036, United States
| | - Dazhi Tan
- D. E. Shaw Research, New York, New York 10036, United States
| | - Songela Chen
- D. E. Shaw Research, New York, New York 10036, United States
| | - David E Shaw
- D. E. Shaw Research, New York, New York 10036, United States.,Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, United States
| |
Collapse
|
15
|
Badaczewska-Dawid AE, Kolinski A, Kmiecik S. Computational reconstruction of atomistic protein structures from coarse-grained models. Comput Struct Biotechnol J 2019; 18:162-176. [PMID: 31969975 PMCID: PMC6961067 DOI: 10.1016/j.csbj.2019.12.007] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Accepted: 12/10/2019] [Indexed: 01/02/2023] Open
Abstract
Three-dimensional protein structures, whether determined experimentally or theoretically, are often too low resolution. In this mini-review, we outline the computational methods for protein structure reconstruction from incomplete coarse-grained to all atomistic models. Typical reconstruction schemes can be divided into four major steps. Usually, the first step is reconstruction of the protein backbone chain starting from the C-alpha trace. This is followed by side-chains rebuilding based on protein backbone geometry. Subsequently, hydrogen atoms can be reconstructed. Finally, the resulting all-atom models may require structure optimization. Many methods are available to perform each of these tasks. We discuss the available tools and their potential applications in integrative modeling pipelines that can transfer coarse-grained information from computational predictions, or experiment, to all atomistic structures.
Collapse
Affiliation(s)
| | | | - Sebastian Kmiecik
- Faculty of Chemistry, Biological and Chemical Research Center, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| |
Collapse
|
16
|
Cai Y, Li X, Sun Z, Lu Y, Zhao H, Hanson J, Paliwal K, Litfin T, Zhou Y, Yang Y. SPOT-Fold: Fragment-Free Protein Structure Prediction Guided by Predicted Backbone Structure and Contact Map. J Comput Chem 2019; 41:745-750. [PMID: 31845383 DOI: 10.1002/jcc.26132] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2019] [Revised: 10/07/2019] [Accepted: 12/01/2019] [Indexed: 02/01/2023]
Abstract
Protein structure determination has long been one of the most challenging problems in molecular biology for the past 60 years. Here we present an ab initio protein tertiary-structure prediction method assisted by predicted contact maps from SPOT-Contact and predicted dihedral angles from SPIDER 3. These predicted properties were then fed to the crystallography and NMR system (CNS) for restrained structure modeling. The resulted structures are first evaluated by the potential energy calculated by CNS, followed by dDFIRE energy function for model selections. The method called SPOT-Fold has been tested on 241 CASP targets between 67 and 670 amino acid residues, 60 randomly selected globular proteins under 100 amino acids. The method has a comparable accuracy to other contact-map-based modeling techniques. © 2019 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Yufeng Cai
- School of Data and Computer Science, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou, 510006, China
| | - Xiongjun Li
- School of Data and Computer Science, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou, 510006, China
| | - Zhe Sun
- School of Data and Computer Science, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou, 510006, China
| | - Yutong Lu
- School of Data and Computer Science, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou, 510006, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, 510000, China
| | - Jack Hanson
- Signal Processing Laboratory, Griffith University, Brisbane, Queensland, 4122, Australia
| | - Kuldip Paliwal
- Signal Processing Laboratory, Griffith University, Brisbane, Queensland, 4122, Australia
| | - Thomas Litfin
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Queensland, 4222, Australia
| | - Yaoqi Zhou
- Institute for Glycomics and School of Information and Communication Technology, Griffith University, Southport, Queensland, 4222, Australia
| | - Yuedong Yang
- School of Data and Computer Science, Sun Yat-Sen University, 132 East Circle at University City, Guangzhou, 510006, China
| |
Collapse
|
17
|
Kryshtafovych A, Schwede T, Topf M, Fidelis K, Moult J. Critical assessment of methods of protein structure prediction (CASP)-Round XIII. Proteins 2019; 87:1011-1020. [PMID: 31589781 DOI: 10.1002/prot.25823] [Citation(s) in RCA: 269] [Impact Index Per Article: 53.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2019] [Revised: 09/25/2019] [Accepted: 09/27/2019] [Indexed: 12/24/2022]
Abstract
CASP (critical assessment of structure prediction) assesses the state of the art in modeling protein structure from amino acid sequence. The most recent experiment (CASP13 held in 2018) saw dramatic progress in structure modeling without use of structural templates (historically "ab initio" modeling). Progress was driven by the successful application of deep learning techniques to predict inter-residue distances. In turn, these results drove dramatic improvements in three-dimensional structure accuracy: With the proviso that there are an adequate number of sequences known for the protein family, the new methods essentially solve the long-standing problem of predicting the fold topology of monomeric proteins. Further, the number of sequences required in the alignment has fallen substantially. There is also substantial improvement in the accuracy of template-based models. Other areas-model refinement, accuracy estimation, and the structure of protein assemblies-have again yielded interesting results. CASP13 placed increased emphasis on the use of sparse data together with modeling and chemical crosslinking, SAXS, and NMR all yielded more mature results. This paper summarizes the key outcomes of CASP13. The special issue of PROTEINS contains papers describing the CASP13 assessments in each modeling category and contributions from the participants.
Collapse
Affiliation(s)
| | - Torsten Schwede
- Biozentrum & SIB Swiss Institute of Bioinformatics, University of Basel, Basel, Switzerland
| | - Maya Topf
- Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, UK
| | | | - John Moult
- Institute for Bioscience and Biotechnology Research, Rockville, Maryland.,Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland
| |
Collapse
|
18
|
Read RJ, Sammito MD, Kryshtafovych A, Croll TI. Evaluation of model refinement in CASP13. Proteins 2019; 87:1249-1262. [PMID: 31365160 PMCID: PMC6851427 DOI: 10.1002/prot.25794] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2019] [Revised: 07/03/2019] [Accepted: 07/27/2019] [Indexed: 12/25/2022]
Abstract
Performance in the model refinement category of the 13th round of Critical Assessment of Structure Prediction (CASP13) is assessed, showing that some groups consistently improve most starting models whereas the majority of participants continue to degrade the starting model on average. Using the ranking formula developed for CASP12, it is shown that only 7 of 32 groups perform better than a “naïve predictor” who just submits the starting model. Common features in their approaches include a dependence on physics‐based force fields to judge alternative conformations and the use of molecular dynamics to relax models to local minima, usually with some restraints to prevent excessively large movements. In addition to the traditional CASP metrics that focus largely on the quality of the overall fold, alternative metrics are evaluated, including comparisons of the main‐chain and side‐chain torsion angles, and the utility of the models for solving crystal structures by the molecular replacement method. It is proposed that the introduction of these metrics, as well as consideration of the accuracy of coordinate error estimates, would improve the discrimination between good and very good models.
Collapse
Affiliation(s)
- Randy J Read
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
| | - Massimo D Sammito
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
| | | | - Tristan I Croll
- Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Cambridge, UK
| |
Collapse
|
19
|
Park H, Lee GR, Kim DE, Anishchenko I, Cong Q, Baker D. High-accuracy refinement using Rosetta in CASP13. Proteins 2019; 87:1276-1282. [PMID: 31325340 DOI: 10.1002/prot.25784] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2019] [Revised: 07/11/2019] [Accepted: 07/12/2019] [Indexed: 11/06/2022]
Abstract
Because proteins generally fold to their lowest free energy states, energy-guided refinement in principle should be able to systematically improve the quality of protein structure models generated using homologous structure or co-evolution derived information. However, because of the high dimensionality of the search space, there are far more ways to degrade the quality of a near native model than to improve it, and hence, refinement methods are very sensitive to energy function errors. In the 13th Critial Assessment of techniques for protein Structure Prediction (CASP13), we sought to carry out a thorough search for low energy states in the neighborhood of a starting model using restraints to avoid straying too far. The approach was reasonably successful in improving both regions largely incorrect in the starting models as well as core regions that started out closer to the correct structure. Models with GDT-HA over 70 were obtained for five targets and for one of those, an accuracy of 0.5 å backbone root-mean-square deviation (RMSD) was achieved. An important current challenge is to improve performance in refining oligomers and larger proteins, for which the search problem remains extremely difficult.
Collapse
Affiliation(s)
- Hahnbeom Park
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington
| | - Gyu Rie Lee
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington
| | - David E Kim
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington
| | - Ivan Anishchenko
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington
| | - Qian Cong
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington
| | - David Baker
- Department of Biochemistry and Institute for Protein Design, University of Washington, Seattle, Washington.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington
| |
Collapse
|
20
|
Geng H, Chen F, Ye J, Jiang F. Applications of Molecular Dynamics Simulation in Structure Prediction of Peptides and Proteins. Comput Struct Biotechnol J 2019; 17:1162-1170. [PMID: 31462972 PMCID: PMC6709365 DOI: 10.1016/j.csbj.2019.07.010] [Citation(s) in RCA: 43] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 07/07/2019] [Accepted: 07/23/2019] [Indexed: 12/21/2022] Open
Abstract
Compared with rapid accumulation of protein sequences from high-throughput DNA sequencing, obtaining experimental 3D structures of proteins is still much more difficult, making protein structure prediction (PSP) potentially very useful. Currently, a vast majority of PSP efforts are based on data mining of known sequences, structures and their relationships (informatics-based). However, if closely related template is not available, these methods are usually much less reliable than experiments. They may also be problematic in predicting the structures of naturally occurring or designed peptides. On the other hand, physics-based methods including molecular dynamics (MD) can utilize our understanding of detailed atomic interactions determining biomolecular structures. In this mini-review, we show that all-atom MD can predict structures of cyclic peptides and other peptide foldamers with accuracy similar to experiments. Then, some notable successes in reproducing experimental 3D structures of small proteins through MD simulations (some with replica-exchange) of the folding were summarized. We also describe advancements of MD-based refinement of structure models, and the integration of limited experimental or bioinformatics data into MD-based structure modeling.
Collapse
Affiliation(s)
- Hao Geng
- Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Fangfang Chen
- Guangdong and Shenzhen Key Laboratory of Male Reproductive Medicine and Genetics, Peking University Shenzhen Hospital, Shenzhen PKU-HKUST Medical Center, Shenzhen 518036, China
| | - Jing Ye
- Guangdong and Shenzhen Key Laboratory of Male Reproductive Medicine and Genetics, Peking University Shenzhen Hospital, Shenzhen PKU-HKUST Medical Center, Shenzhen 518036, China
| | - Fan Jiang
- Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
- NanoAI Biotech Co.,Ltd., Silicon Valley Compound, Longhua District, Shenzhen 518109, China
- Corresponding author at: Lab of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China.
| |
Collapse
|
21
|
Methods for the Refinement of Protein Structure 3D Models. Int J Mol Sci 2019; 20:ijms20092301. [PMID: 31075942 PMCID: PMC6539982 DOI: 10.3390/ijms20092301] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 04/24/2019] [Accepted: 05/07/2019] [Indexed: 12/25/2022] Open
Abstract
The refinement of predicted 3D protein models is crucial in bringing them closer towards experimental accuracy for further computational studies. Refinement approaches can be divided into two main stages: The sampling and scoring stages. Sampling strategies, such as the popular Molecular Dynamics (MD)-based protocols, aim to generate improved 3D models. However, generating 3D models that are closer to the native structure than the initial model remains challenging, as structural deviations from the native basin can be encountered due to force-field inaccuracies. Therefore, different restraint strategies have been applied in order to avoid deviations away from the native structure. For example, the accurate prediction of local errors and/or contacts in the initial models can be used to guide restraints. MD-based protocols, using physics-based force fields and smart restraints, have made significant progress towards a more consistent refinement of 3D models. The scoring stage, including energy functions and Model Quality Assessment Programs (MQAPs) are also used to discriminate near-native conformations from non-native conformations. Nevertheless, there are often very small differences among generated 3D models in refinement pipelines, which makes model discrimination and selection problematic. For this reason, the identification of the most native-like conformations remains a major challenge.
Collapse
|
22
|
Treado JD, Mei Z, Regan L, O'Hern CS. Void distributions reveal structural link between jammed packings and protein cores. Phys Rev E 2019; 99:022416. [PMID: 30934238 DOI: 10.1103/physreve.99.022416] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2018] [Indexed: 11/07/2022]
Abstract
Dense packing of hydrophobic residues in the cores of globular proteins determines their stability. Recently, we have shown that protein cores possess packing fraction ϕ≈0.56, which is the same as dense, random packing of amino-acid-shaped particles. In this article, we compare the structural properties of protein cores and jammed packings of amino-acid-shaped particles in much greater depth by measuring their local and connected void regions. We find that the distributions of surface Voronoi cell volumes and local porosities obey similar statistics in both systems. We also measure the probability that accessible, connected void regions percolate as a function of the size of a spherical probe particle and show that both systems possess the same critical probe size. We measure the critical exponent τ that characterizes the size distribution of connected void clusters at the onset of percolation. We find that the cluster size statistics are similar for void percolation in packings of amino-acid-shaped particles and randomly placed spheres, but different from that for void percolation in jammed sphere packings. We propose that the connected void regions are a defining structural feature of proteins and can be used to differentiate experimentally observed proteins from decoy structures that are generated using computational protein design software. This work emphasizes that jammed packings of amino-acid-shaped particles can serve as structural and mechanical analogs of protein cores, and could therefore be useful in modeling the response of protein cores to cavity-expanding and -reducing mutations.
Collapse
Affiliation(s)
- John D Treado
- Department of Mechanical Engineering & Materials Science, Yale University, New Haven, Connecticut 06520, USA.,Integrated Graduate Program in Physical & Engineering Biology, Yale University, New Haven, Connecticut 06520, USA
| | - Zhe Mei
- Integrated Graduate Program in Physical & Engineering Biology, Yale University, New Haven, Connecticut 06520, USA.,Department of Chemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Lynne Regan
- Integrated Graduate Program in Physical & Engineering Biology, Yale University, New Haven, Connecticut 06520, USA.,Department of Chemistry, Yale University, New Haven, Connecticut 06520, USA.,Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | - Corey S O'Hern
- Department of Mechanical Engineering & Materials Science, Yale University, New Haven, Connecticut 06520, USA.,Integrated Graduate Program in Physical & Engineering Biology, Yale University, New Haven, Connecticut 06520, USA.,Department of Physics, Yale University, New Haven, Connecticut 06520, USA.,Department of Applied Physics, Yale University, New Haven, Connecticut 06520, USA.,Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520, USA
| |
Collapse
|
23
|
Abstract
Refining predicted protein structures with all-atom molecular dynamics simulations is one route to producing, entirely by computational means, structural models of proteins that rival in quality those that are determined by X-ray diffraction experiments. Slow rearrangements within the compact folded state, however, make routine refinement of predicted structures by unrestrained simulations infeasible. In this work, we draw inspiration from the fields of metallurgy and blacksmithing, where practitioners have worked out practical means of controlling equilibration by mechanically deforming their samples. We describe a two-step refinement procedure that involves identifying collective variables for mechanical deformations using a coarse-grained model and then sampling along these deformation modes in all-atom simulations. Identifying those low-frequency collective modes that change the contact map the most proves to be an effective strategy for choosing which deformations to use for sampling. The method is tested on 20 refinement targets from the CASP12 competition and is found to induce large structural rearrangements that drive the structures closer to the experimentally determined structures during relatively short all-atom simulations of 50 ns. By examining the accuracy of side-chain rotamer states in subensembles of structures that have varying degrees of similarity to the experimental structure, we identified the reorientation of aromatic side chains as a step that remains slow even when encouraging global mechanical deformations in the all-atom simulations. Reducing the side-chain rotamer isomerization barriers in the all-atom force field is found to further speed up refinement.
Collapse
|
24
|
Jiang F, Wu HN, Kang W, Wu YD. Developments and Applications of Coil-Library-Based Residue-Specific Force Fields for Molecular Dynamics Simulations of Peptides and Proteins. J Chem Theory Comput 2019; 15:2761-2773. [DOI: 10.1021/acs.jctc.8b00794] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Fan Jiang
- Laboratory of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Hao-Nan Wu
- Laboratory of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
| | - Wei Kang
- Laboratory of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
- College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| | - Yun-Dong Wu
- Laboratory of Computational Chemistry and Drug Design, State Key Laboratory of Chemical Oncogenomics, Peking University Shenzhen Graduate School, Shenzhen 518055, China
- College of Chemistry and Molecular Engineering, Peking University, Beijing 100871, China
| |
Collapse
|
25
|
Wang A, Zhang Z, Li G. Higher Accuracy Achieved in the Simulations of Protein Structure Refinement, Protein Folding, and Intrinsically Disordered Proteins Using Polarizable Force Fields. J Phys Chem Lett 2018; 9:7110-7116. [PMID: 30514082 DOI: 10.1021/acs.jpclett.8b03471] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
The accuracy of molecular mechanics force fields is of vital importance in biomolecular simulations. However, the admittedly more accurate polarizable force fields were recently reported to be less able to reproduce the experimental properties in comparison to additive force fields in some cases. Here, we perform long-time-scale molecular dynamics simulations to systematically evaluate the effect of explicit electronic polarization in polarizable force fields. The results show that the inclusion of electrostatic polarization effect in polarizable force fields can improve their accuracies in protein structure refinement and generate conformational ensembles more approximate to experiments for intrinsically disordered proteins. In contrast, it is difficult for polarizable force fields to approach the native structure, let alone to predict the native state when it is unknown a priori in the real protein structure predictions. We speculate that these effects might be attributed to the preference of protein-water interactions in polarizable force fields.
Collapse
Affiliation(s)
- Anhui Wang
- Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics , Dalian Institute of Chemical Physics, Chinese Academy of Sciences , Dalian 116023 , China
- State Key Laboratory of Fine Chemicals, School of Chemistry , Dalian University of Technology , Dalian 116024 , China
| | - Zhichao Zhang
- State Key Laboratory of Fine Chemicals, School of Chemistry , Dalian University of Technology , Dalian 116024 , China
| | - Guohui Li
- Laboratory of Molecular Modeling and Design, State Key Laboratory of Molecular Reaction Dynamics , Dalian Institute of Chemical Physics, Chinese Academy of Sciences , Dalian 116023 , China
| |
Collapse
|
26
|
Ma T, Zang T, Wang Q, Ma J. Refining protein structures using enhanced sampling techniques with restraints derived from an ensemble-based model. Protein Sci 2018; 27:1842-1849. [PMID: 30098055 DOI: 10.1002/pro.3486] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Revised: 07/05/2018] [Accepted: 07/18/2018] [Indexed: 12/12/2022]
Abstract
This paper reports a method for high-accuracy protein structural refinement, which is a direct extension of the method in our recent publication (Zang, J Chem Phys 2018; 149:072319). It combines a parallel continuous simulated tempering (PCST) method with a temperature-dependent restraint and a blind model selection scheme. In this work, a single-reference-based restraint in previous work was changed to an ensemble-based model (EBM), in which the non-bonded Lennard-Jones term for each contacting atomic pair in previous restraining potential was replaced by a multi-Gaussian function whose parameters are derived from an ensemble of structures such as the ones from various CASP participating groups. The purpose of EBM is to take advantage of partial "correctness" distributed among members of the structural ensemble. Totally 18 targets were refined from the refinement category of CASP10, CASP11 and CASP12. In Top-1 group, 11 out of 18 targets had better models (greater GDT_TS scores) than the CASPR participants. In Top-5 group, nine out of 18 were better. Our results show that PCST-EBM method can considerably improve the low-accuracy structures.
Collapse
Affiliation(s)
- Tianqi Ma
- Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas, 77005
| | - Tianwu Zang
- Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas, 77005
| | - Qinghua Wang
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, 77030
| | - Jianpeng Ma
- Applied Physics Program and Department of Bioengineering, Rice University, Houston, Texas, 77005.,Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, 77030
| |
Collapse
|
27
|
Pfeiffenberger E, Bates PA. Predicting improved protein conformations with a temporal deep recurrent neural network. PLoS One 2018; 13:e0202652. [PMID: 30180164 PMCID: PMC6122789 DOI: 10.1371/journal.pone.0202652] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2018] [Accepted: 08/07/2018] [Indexed: 02/03/2023] Open
Abstract
Accurate protein structure prediction from amino acid sequence is still an unsolved problem. The most reliable methods centre on template based modelling. However, the accuracy of these models entirely depends on the availability of experimentally resolved homologous template structures. In order to generate more accurate models, extensive physics based molecular dynamics (MD) refinement simulations are performed to sample many different conformations to find improved conformational states. In this study, we propose a deep recurrent network model, called DeepTrajectory, that is able to identify these improved conformational states, with high precision, from a variety of different MD based sampling protocols. The proposed model learns the temporal patterns of features computed from MD trajectory data in order to classify whether each recorded simulation snapshot is an improved quality conformational state, decreased quality conformational state or whether there is no perceivable change in state with respect to the starting conformation. The model was trained and tested on 904 trajectories from 42 different protein systems with a cumulative number of more than 1.7 million snapshots. We show that our model outperforms other state of the art machine-learning algorithms that do not consider temporal dependencies. To our knowledge, DeepTrajectory is the first implementation of a time-dependent deep-learning protocol that is re-trainable and able to adapt to any new MD based sampling procedure, thereby demonstrating how a neural network can be used to learn the latter part of the protein folding funnel.
Collapse
Affiliation(s)
- Erik Pfeiffenberger
- Biomolecular Modelling Laboratory, The Francis Crick Institute, 1 Midland Road, London NW1 1AT, United Kingdom
| | - Paul A. Bates
- Biomolecular Modelling Laboratory, The Francis Crick Institute, 1 Midland Road, London NW1 1AT, United Kingdom
| |
Collapse
|
28
|
Delarue M, Koehl P. Combined approaches from physics, statistics, and computer science for ab initio protein structure prediction: ex unitate vires (unity is strength)? F1000Res 2018; 7. [PMID: 30079234 PMCID: PMC6058471 DOI: 10.12688/f1000research.14870.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/19/2018] [Indexed: 11/20/2022] Open
Abstract
Connecting the dots among the amino acid sequence of a protein, its structure, and its function remains a central theme in molecular biology, as it would have many applications in the treatment of illnesses related to misfolding or protein instability. As a result of high-throughput sequencing methods, biologists currently live in a protein sequence-rich world. However, our knowledge of protein structure based on experimental data remains comparatively limited. As a consequence, protein structure prediction has established itself as a very active field of research to fill in this gap. This field, once thought to be reserved for theoretical biophysicists, is constantly reinventing itself, borrowing ideas informed by an ever-increasing assembly of scientific domains, from biology, chemistry, (statistical) physics, mathematics, computer science, statistics, bioinformatics, and more recently data sciences. We review the recent progress arising from this integration of knowledge, from the development of specific computer architecture to allow for longer timescales in physics-based simulations of protein folding to the recent advances in predicting contacts in proteins based on detection of coevolution using very large data sets of aligned protein sequences.
Collapse
Affiliation(s)
- Marc Delarue
- Unité Dynamique Structurale des Macromolécules, Institut Pasteur, and UMR 3528 du CNRS, Paris, France
| | - Patrice Koehl
- Department of Computer Science, Genome Center, University of California, Davis, Davis, California, USA
| |
Collapse
|
29
|
Park H, Ovchinnikov S, Kim DE, DiMaio F, Baker D. Protein homology model refinement by large-scale energy optimization. Proc Natl Acad Sci U S A 2018; 115:3054-3059. [PMID: 29507254 PMCID: PMC5866580 DOI: 10.1073/pnas.1719115115] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Proteins fold to their lowest free-energy structures, and hence the most straightforward way to increase the accuracy of a partially incorrect protein structure model is to search for the lowest-energy nearby structure. This direct approach has met with little success for two reasons: first, energy function inaccuracies can lead to false energy minima, resulting in model degradation rather than improvement; and second, even with an accurate energy function, the search problem is formidable because the energy only drops considerably in the immediate vicinity of the global minimum, and there are a very large number of degrees of freedom. Here we describe a large-scale energy optimization-based refinement method that incorporates advances in both search and energy function accuracy that can substantially improve the accuracy of low-resolution homology models. The method refined low-resolution homology models into correct folds for 50 of 84 diverse protein families and generated improved models in recent blind structure prediction experiments. Analyses of the basis for these improvements reveal contributions from both the improvements in conformational sampling techniques and the energy function.
Collapse
Affiliation(s)
- Hahnbeom Park
- Department of Biochemistry, University of Washington, Seattle, WA 98105
- Institute for Protein Design, University of Washington, Seattle, WA 98105
| | - Sergey Ovchinnikov
- Department of Biochemistry, University of Washington, Seattle, WA 98105
- Institute for Protein Design, University of Washington, Seattle, WA 98105
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA 98105
| | - David E Kim
- Institute for Protein Design, University of Washington, Seattle, WA 98105
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98105
| | - Frank DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA 98105
- Institute for Protein Design, University of Washington, Seattle, WA 98105
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98105;
- Institute for Protein Design, University of Washington, Seattle, WA 98105
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98105
| |
Collapse
|
30
|
Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)-Round XII. Proteins 2018; 86 Suppl 1:7-15. [PMID: 29082672 PMCID: PMC5897042 DOI: 10.1002/prot.25415] [Citation(s) in RCA: 245] [Impact Index Per Article: 40.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2017] [Revised: 10/25/2017] [Accepted: 10/27/2017] [Indexed: 12/24/2022]
Abstract
This article reports the outcome of the 12th round of Critical Assessment of Structure Prediction (CASP12), held in 2016. CASP is a community experiment to determine the state of the art in modeling protein structure from amino acid sequence. Participants are provided sequence information and in turn provide protein structure models and related information. Analysis of the submitted structures by independent assessors provides a comprehensive picture of the capabilities of current methods, and allows progress to be identified. This was again an exciting round of CASP, with significant advances in 4 areas: (i) The use of new methods for predicting three-dimensional contacts led to a two-fold improvement in contact accuracy. (ii) As a consequence, model accuracy for proteins where no template was available improved dramatically. (iii) Models based on a structural template showed overall improvement in accuracy. (iv) Methods for estimating the accuracy of a model continued to improve. CASP continued to develop new areas: (i) Assessing methods for building quaternary structure models, including an expansion of the collaboration between CASP and CAPRI. (ii) Modeling with the aid of experimental data was extended to include SAXS data, as well as again using chemical cross-linking information. (iii) A team of assessors evaluated the suitability of models for a range of applications, including mutation interpretation, analysis of ligand binding properties, and identification of interfaces. This article describes the experiment and summarizes the results. The rest of this special issue of PROTEINS contains papers describing CASP12 results and assessments in more detail.
Collapse
Affiliation(s)
- John Moult
- Institute for Bioscience and Biotechnology Research and Department of Cell Biology and Molecular Genetics, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
| | - Krzysztof Fidelis
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, USA
| | - Andriy Kryshtafovych
- Genome Center, University of California, Davis, 451 Health Sciences Drive, Davis, CA 95616, USA
| | - Torsten Schwede
- University of Basel, Biozentrum & SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Anna Tramontano
- Department of Physics and Istituto Pasteur - Fondazione Cenci Bolognetti, Sapienza University of Rome, P.le Aldo Moro, 5, 00185 Rome, Italy
| |
Collapse
|