1
|
Newton MH, Zaman R, Mataeimoghadam F, Rahman J, Sattar A. Constraint Guided Beta-Sheet Refinement for Protein Structure Prediction. Comput Biol Chem 2022; 101:107773. [DOI: 10.1016/j.compbiolchem.2022.107773] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2022] [Revised: 09/15/2022] [Accepted: 09/16/2022] [Indexed: 11/16/2022]
|
2
|
Abstract
Summary Motivation. Predicting the native state of a protein has long been considered a gateway problem for understanding protein folding. Recent advances in structural modeling driven by deep learning have achieved unprecedented success at predicting a protein’s crystal structure, but it is not clear if these models are learning the physics of how proteins dynamically fold into their equilibrium structure or are just accurate knowledge-based predictors of the final state. Results. In this work, we compare the pathways generated by state-of-the-art protein structure prediction methods to experimental data about protein folding pathways. The methods considered were AlphaFold 2, RoseTTAFold, trRosetta, RaptorX, DMPfold, EVfold, SAINT2 and Rosetta. We find evidence that their simulated dynamics capture some information about the folding pathway, but their predictive ability is worse than a trivial classifier using sequence-agnostic features like chain length. The folding trajectories produced are also uncorrelated with experimental observables such as intermediate structures and the folding rate constant. These results suggest that recent advances in structure prediction do not yet provide an enhanced understanding of protein folding. Availability. The data underlying this article are available in GitHub at https://github.com/oxpig/structure-vs-folding/ Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Carlos Outeiral
- Department of Statistics, University of Oxford, Oxford OX1 3PB, UK
| | - Daniel A Nissley
- Department of Statistics, University of Oxford, Oxford OX1 3PB, UK
| | | |
Collapse
|
3
|
A Peptides Prediction Methodology for Tertiary Structure Based on Simulated Annealing. MATHEMATICAL AND COMPUTATIONAL APPLICATIONS 2021. [DOI: 10.3390/mca26020039] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The Protein Folding Problem (PFP) is a big challenge that has remained unsolved for more than fifty years. This problem consists of obtaining the tertiary structure or Native Structure (NS) of a protein knowing its amino acid sequence. The computational methodologies applied to this problem are classified into two groups, known as Template-Based Modeling (TBM) and ab initio models. In the latter methodology, only information from the primary structure of the target protein is used. In the literature, Hybrid Simulated Annealing (HSA) algorithms are among the best ab initio algorithms for PFP; Golden Ratio Simulated Annealing (GRSA) is a PFP family of these algorithms designed for peptides. Moreover, for the algorithms designed with TBM, they use information from a target protein’s primary structure and information from similar or analog proteins. This paper presents GRSA-SSP methodology that implements a secondary structure prediction to build an initial model and refine it with HSA algorithms. Additionally, we compare the performance of the GRSAX-SSP algorithms versus its corresponding GRSAX. Finally, our best algorithm GRSAX-SSP is compared with PEP-FOLD3, I-TASSER, QUARK, and Rosetta, showing that it competes in small peptides except when predicting the largest peptides.
Collapse
|
4
|
Liu J, Zhou XG, Zhang Y, Zhang GJ. CGLFold: a contact-assisted de novo protein structure prediction using global exploration and loop perturbation sampling algorithm. Bioinformatics 2020; 36:2443-2450. [PMID: 31860059 DOI: 10.1093/bioinformatics/btz943] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2019] [Revised: 12/10/2019] [Accepted: 12/18/2019] [Indexed: 12/27/2022] Open
Abstract
MOTIVATION Regions that connect secondary structure elements in a protein are known as loops, whose slight change will produce dramatic effect on the entire topology. This study investigates whether the accuracy of protein structure prediction can be improved using a loop-specific sampling strategy. RESULTS A novel de novo protein structure prediction method that combines global exploration and loop perturbation is proposed in this study. In the global exploration phase, the fragment recombination and assembly are used to explore the massive conformational space and generate native-like topology. In the loop perturbation phase, a loop-specific local perturbation model is designed to improve the accuracy of the conformation and is solved by differential evolution algorithm. These two phases enable a cooperation between global exploration and local exploitation. The filtered contact information is used to construct the conformation selection model for guiding the sampling. The proposed CGLFold is tested on 145 benchmark proteins, 14 free modeling (FM) targets of CASP13 and 29 FM targets of CASP12. The experimental results show that the loop-specific local perturbation can increase the structure diversity and success rate of conformational update and gradually improve conformation accuracy. CGLFold obtains template modeling score ≥ 0.5 models on 95 standard test proteins, 7 FM targets of CASP13 and 9 FM targets of CASP12. AVAILABILITY AND IMPLEMENTATION The source code and executable versions are freely available at https://github.com/iobio-zjut/CGLFold. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Liu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| | - Xiao-Gen Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109-2218, USA
| | - Gui-Jun Zhang
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023, China
| |
Collapse
|
5
|
Zhang GJ, Ma LF, Wang XQ, Zhou XG. Secondary Structure and Contact Guided Differential Evolution for Protein Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1068-1081. [PMID: 30295627 DOI: 10.1109/tcbb.2018.2873691] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Ab initio protein tertiary structure prediction is one of the long-standing problems in structural bioinformatics. With the help of residue-residue contact and secondary structure prediction information, the accuracy of ab initio structure prediction can be enhanced. In this study, an improved differential evolution with secondary structure and residue-residue contact information referred to as SCDE is proposed for protein structure prediction. In SCDE, two score models based on secondary structure and contact information are proposed, and two selection strategies, namely, secondary structure-based selection strategy and contact-based selection strategy, are designed to guide conformation space search. A probability distribution function is designed to balance these two selection strategies. Experimental results on a benchmark dataset with 28 proteins and four free model targets in CASP12 demonstrate that the proposed SCDE is effective and efficient.
Collapse
|
6
|
West CE, de Oliveira SHP, Deane CM. RFQAmodel: Random Forest Quality Assessment to identify a predicted protein structure in the correct fold. PLoS One 2019; 14:e0218149. [PMID: 31634369 PMCID: PMC6802825 DOI: 10.1371/journal.pone.0218149] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2019] [Accepted: 10/02/2019] [Indexed: 02/06/2023] Open
Abstract
While template-free protein structure prediction protocols now produce good quality models for many targets, modelling failure remains common. For these methods to be useful it is important that users can both choose the best model from the hundreds to thousands of models that are commonly generated for a target, and determine whether this model is likely to be correct. We have developed Random Forest Quality Assessment (RFQAmodel), which assesses whether models produced by a protein structure prediction pipeline have the correct fold. RFQAmodel uses a combination of existing quality assessment scores with two predicted contact map alignment scores. These alignment scores are able to identify correct models for targets that are not otherwise captured. Our classifier was trained on a large set of protein domains that are structurally diverse and evenly balanced in terms of protein features known to have an effect on modelling success, and then tested on a second set of 244 protein domains with a similar spread of properties. When models for each target in this second set were ranked according to the RFQAmodel score, the highest-ranking model had a high-confidence RFQAmodel score for 67 modelling targets, of which 52 had the correct fold. At the other end of the scale RFQAmodel correctly predicted that for 59 targets the highest-ranked model was incorrect. In comparisons to other methods we found that RFQAmodel is better able to identify correct models for targets where only a few of the models are correct. We found that RFQAmodel achieved a similar performance on the model sets for CASP12 and CASP13 free-modelling targets. Finally, by iteratively generating models and running RFQAmodel until a model is produced that is predicted to be correct with high confidence, we demonstrate how such a protocol can be used to focus computational efforts on difficult modelling targets. RFQAmodel and the accompanying data can be downloaded from http://opig.stats.ox.ac.uk/resources.
Collapse
Affiliation(s)
- Clare E. West
- Department of Statistics, University of Oxford, Oxford, England, United Kingdom
| | - Saulo H. P. de Oliveira
- SLAC National Accelerator Laboratory, Stanford University, Menlo Park, California, United States of America
- Bioengineering, Stanford University, Stanford, California, United States of America
| | - Charlotte M. Deane
- Department of Statistics, University of Oxford, Oxford, England, United Kingdom
- * E-mail:
| |
Collapse
|
7
|
Marks C, Deane CM. Increasing the accuracy of protein loop structure prediction with evolutionary constraints. Bioinformatics 2019; 35:2585-2592. [PMID: 30535347 DOI: 10.1093/bioinformatics/bty996] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 09/28/2018] [Accepted: 12/07/2018] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Accurate prediction of loop structures remains challenging. This is especially true for long loops where the large conformational space and limited coverage of experimentally determined structures often leads to low accuracy. Co-evolutionary contact predictors, which provide information about the proximity of pairs of residues, have been used to improve whole-protein models generated through de novo techniques. Here we investigate whether these evolutionary constraints can enhance the prediction of long loop structures. RESULTS As a first stage, we assess the accuracy of predicted contacts that involve loop regions. We find that these are less accurate than contacts in general. We also observe that some incorrectly predicted contacts can be identified as they are never satisfied in any of our generated loop conformations. We examined two different strategies for incorporating contacts, and on a test set of long loops (10 residues or more), both approaches improve the accuracy of prediction. For a set of 135 loops, contacts were predicted and hence our methods were applicable in 97 cases. Both strategies result in an increase in the proportion of near-native decoys in the ensemble, leading to more accurate predictions and in some cases improving the root-mean-square deviation of the final model by more than 3 Å. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Claire Marks
- Department of Statistics, University of Oxford, Oxford, UK
| | | |
Collapse
|
8
|
Kandathil SM, Garza-Fabre M, Handl J, Lovell SC. Improved fragment-based protein structure prediction by redesign of search heuristics. Sci Rep 2018; 8:13694. [PMID: 30209258 PMCID: PMC6135816 DOI: 10.1038/s41598-018-31891-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Accepted: 08/22/2018] [Indexed: 11/09/2022] Open
Abstract
Difficulty in sampling large and complex conformational spaces remains a key limitation in fragment-based de novo prediction of protein structure. Our previous work has shown that even for small-to-medium-sized proteins, some current methods inadequately sample alternative structures. We have developed two new conformational sampling techniques, one employing a bilevel optimisation framework and the other employing iterated local search. We combine strategies of forced structural perturbation (where some fragment insertions are accepted regardless of their impact on scores) and greedy local optimisation, allowing greater exploration of the available conformational space. Comparisons against the Rosetta Abinitio method indicate that our protocols more frequently generate native-like predictions for many targets, even following the low-resolution phase, using a given set of fragment libraries. By contrasting results across two different fragment sets, we show that our methods are able to better take advantage of high-quality fragments. These improvements can also translate into more reliable identification of near-native structures in a simple clustering-based model selection procedure. We show that when fragment libraries are sufficiently well-constructed, improved breadth of exploration within runs improves prediction accuracy. Our results also suggest that in benchmarking scenarios, a total exclusion of fragments drawn from homologous templates can make performance differences between methods appear less pronounced.
Collapse
Affiliation(s)
- Shaun M Kandathil
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, M13 9PL, United Kingdom. .,Department of Computer Science, University College London, Gower Street, London, WC1E 6BT, United Kingdom.
| | - Mario Garza-Fabre
- Decision and Cognitive Sciences Research Centre, University of Manchester, Manchester, M13 9PL, United Kingdom.,Center for Research and Advanced Studies of the National Polytechnic Institute (CINVESTAV-IPN), Km. 5.5 Carretera Cd. Victoria-Soto La Marina, Cd. Victoria, Tamaulipas, 87130, Mexico
| | - Julia Handl
- Decision and Cognitive Sciences Research Centre, University of Manchester, Manchester, M13 9PL, United Kingdom
| | - Simon C Lovell
- Division of Evolution and Genomic Sciences, School of Biological Sciences, Faculty of Biology, Medicine and Health, University of Manchester, Manchester, M13 9PL, United Kingdom
| |
Collapse
|