1
|
Manalastas-Cantos K, Adoni KR, Pfeifer M, Märtens B, Grünewald K, Thalassinos K, Topf M. Modeling Flexible Protein Structure With AlphaFold2 and Crosslinking Mass Spectrometry. Mol Cell Proteomics 2024; 23:100724. [PMID: 38266916 PMCID: PMC10884514 DOI: 10.1016/j.mcpro.2024.100724] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 12/23/2023] [Accepted: 12/27/2023] [Indexed: 01/26/2024] Open
Abstract
We propose a pipeline that combines AlphaFold2 (AF2) and crosslinking mass spectrometry (XL-MS) to model the structure of proteins with multiple conformations. The pipeline consists of two main steps: ensemble generation using AF2 and conformer selection using XL-MS data. For conformer selection, we developed two scores-the monolink probability score (MP) and the crosslink probability score (XLP)-both of which are based on residue depth from the protein surface. We benchmarked MP and XLP on a large dataset of decoy protein structures and showed that our scores outperform previously developed scores. We then tested our methodology on three proteins having an open and closed conformation in the Protein Data Bank: Complement component 3 (C3), luciferase, and glutamine-binding periplasmic protein, first generating ensembles using AF2, which were then screened for the open and closed conformations using experimental XL-MS data. In five out of six cases, the most accurate model within the AF2 ensembles-or a conformation within 1 Å of this model-was identified using crosslinks, as assessed through the XLP score. In the remaining case, only the monolinks (assessed through the MP score) successfully identified the open conformation of glutamine-binding periplasmic protein, and these results were further improved by including the "occupancy" of the monolinks. This serves as a compelling proof-of-concept for the effectiveness of monolinks. In contrast, the AF2 assessment score was only able to identify the most accurate conformation in two out of six cases. Our results highlight the complementarity of AF2 with experimental methods like XL-MS, with the MP and XLP scores providing reliable metrics to assess the quality of the predicted models. The MP and XLP scoring functions mentioned above are available at https://gitlab.com/topf-lab/xlms-tools.
Collapse
Affiliation(s)
- Karen Manalastas-Cantos
- Center for Data and Computing in Natural Sciences, Universität Hamburg, Hamburg, Germany; Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany
| | - Kish R Adoni
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK; Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, United Kingdom
| | - Matthias Pfeifer
- Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany; Universitätsklinikum Hamburg Eppendorf (UKE), Hamburg, Germany
| | - Birgit Märtens
- Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany; Universitätsklinikum Hamburg Eppendorf (UKE), Hamburg, Germany
| | - Kay Grünewald
- Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany; Department of Chemistry, Universität Hamburg, Hamburg, Germany
| | - Konstantinos Thalassinos
- Institute of Structural and Molecular Biology, Division of Biosciences, University College London, London, UK; Institute of Structural and Molecular Biology, Birkbeck College, University of London, London, United Kingdom
| | - Maya Topf
- Department of Integrative Virology, Leibniz-Institut für Virologie (LIV), Centre for Structural Systems Biology (CSSB), Hamburg, Germany; Universitätsklinikum Hamburg Eppendorf (UKE), Hamburg, Germany.
| |
Collapse
|
2
|
Salmas R, Borysik AJ. Deep Learning Enables Automatic Correction of Experimental HDX-MS Data with Applications in Protein Modeling. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2024; 35:197-204. [PMID: 38262924 PMCID: PMC10853964 DOI: 10.1021/jasms.3c00285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/08/2023] [Revised: 12/24/2023] [Accepted: 01/04/2024] [Indexed: 01/25/2024]
Abstract
Observed mass shifts associated with deuterium incorporation in hydrogen-deuterium exchange mass spectrometry (HDX-MS) frequently deviate from the initial signals due to back and forward exchange. In typical HDX-MS experiments, the impact of these disparities on data interpretation is generally low because relative and not absolute mass changes are investigated. However, for more advanced data processing including optimization, experimental error correction is imperative for accurate results. Here the potential for automatic HDX-MS data correction using models generated by deep neural networks is demonstrated. A multilayer perceptron (MLP) is used to learn a mapping between uncorrected HDX-MS data and data with mass shifts corrected for back and forward exchange. The model is rigorously tested at various levels including peptide level mass changes, residue level protection factors following optimization, and ability to correctly identify native protein folds using HDX-MS guided protein modeling. AI is shown to demonstrate considerable potential for amending HDX-MS data and improving fidelity across all levels. With access to big data, online tools may eventually be able to predict corrected mass shifts in HDX-MS profiles. This should improve throughput in workflows that require the reporting of real mass changes as well as allow retrospective correction of historic profiles to facilitate new discoveries with these data.
Collapse
Affiliation(s)
| | - Antoni J. Borysik
- Department of Chemistry, King’s
College London, Britannia House, London SE1 1DB, U.K.
| |
Collapse
|
3
|
Khrustalev VV, Stojarov AN, Akunevich AA, Baranov OE, Popinako AV, Samoilovich EO, Yermalovich MA, Semeiko GV, Sapon EG, Cheprasova VI, Shalygo NV, Poboinev VV, Khrustaleva TA, Khrustaleva OV. Structural Shifts of the Parvovirus B19 Capsid Receptor-binding Domain: A Peptide Study. Protein Pept Lett 2024; 31:128-140. [PMID: 38053353 DOI: 10.2174/0109298665272845231121064717] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2023] [Revised: 09/25/2023] [Accepted: 11/07/2023] [Indexed: 12/07/2023]
Abstract
BACKGROUND Binding appropriate cellular receptors is a crucial step of a lifecycle for any virus. Structure of receptor-binding domain for a viral surface protein has to be determined before the start of future drug design projects. OBJECTIVES Investigation of pH-induced changes in the secondary structure for a capsid peptide with loss of function mutation can shed some light on the mechanism of entrance. METHODS Spectroscopic methods were accompanied by electrophoresis, ultrafiltration, and computational biochemistry. RESULTS In this study, we showed that a peptide from the receptor-binding domain of Parvovirus B19 VP1 capsid (residues 13-31) is beta-structural at pH=7.4 in 0.01 M phosphate buffer, but alpha- helical at pH=5.0, according to the circular dichroism (CD) spectroscopy results. Results of infra- red (IR) spectroscopy showed that the same peptide exists in both alpha-helical and beta-structural conformations in partial dehydration conditions both at pH=7.4 and pH=5.0. In contrast, the peptide with Y20W mutation, which is known to block the internalization of the virus, forms mostly alpha-helical conformation in partial dehydration conditions at pH=7.4. According to our hypothesis, an intermolecular antiparallel beta structure formed by the wild-type peptide in its tetramers at pH=7.4 is the prototype of the similar intermolecular antiparallel beta structure formed by the corresponding part of Parvovirus B19 receptor-binding domain with its cellular receptor (AXL). CONCLUSION Loss of function Y20W substitution in VP1 capsid protein prevents the shift into the beta-structural state by the way of alpha helix stabilization and the decrease of its ability to turn into the disordered state.
Collapse
Affiliation(s)
| | | | | | - Oleg Evgenyevich Baranov
- Bach Institute of Biochemistry, Shared-Access Equipment Centre "Industrial Biotechnology" of Russian Academy of Science, Leninskiy prospect, 33/2, Moscow, 119071, Russian Federation
| | - Anna Vladimirovna Popinako
- Bach Institute of Biochemistry, Research Center of Biotechnology of the Russian Academy of Sciences, Leninskiy prospect, 33/2, Moscow, 119071, Russian Federation
| | - Elena Olegovna Samoilovich
- Laboratory of Vaccine-controlled Infections, Republican Research and Practical Center for Epidemiology and Microbiology, Filimonova 23, Minsk, 220114, Belarus
| | - Marina Anatolyevna Yermalovich
- Laboratory of Vaccine-controlled Infections, Republican Research and Practical Center for Epidemiology and Microbiology, Filimonova 23, Minsk, 220114, Belarus
| | - Galina Valeryevna Semeiko
- Laboratory of Vaccine-controlled Infections, Republican Research and Practical Center for Epidemiology and Microbiology, Filimonova 23, Minsk, 220114, Belarus
| | - Egor Gennadyevich Sapon
- Laboratory of infra-red spectroscopy and infra-red microscopy, Belarusian State Technological University, Sverdlova 13a, Minsk, 220006, Belarus
| | - Victoria Igorevna Cheprasova
- Laboratory of infra-red spectroscopy and infra-red microscopy, Belarusian State Technological University, Sverdlova 13a, Minsk, 220006, Belarus
| | | | - Victor Vitoldovich Poboinev
- Department of General Chemistry, Belarusian State Medical University, Dzerzhinskogo 83, Minsk, 220045, Belarus
| | - Tatyana Aleksandrovna Khrustaleva
- Laboratory of Biomedical Technologies and Medical Rehabilitation, Institute of Physiology of the National Academy of Sciences of Belarus, Academicheskaya 28, Minsk, 220072; Belarus
| | - Olga Victorovna Khrustaleva
- Department of General Chemistry, Belarusian State Medical University, Dzerzhinskogo 83, Minsk, 220045, Belarus
| |
Collapse
|
4
|
Klukowski P, Riek R, Güntert P. Time-optimized protein NMR assignment with an integrative deep learning approach using AlphaFold and chemical shift prediction. SCIENCE ADVANCES 2023; 9:eadi9323. [PMID: 37992167 PMCID: PMC10664993 DOI: 10.1126/sciadv.adi9323] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Accepted: 10/20/2023] [Indexed: 11/24/2023]
Abstract
Chemical shift assignment is vital for nuclear magnetic resonance (NMR)-based studies of protein structures, dynamics, and interactions, providing crucial atomic-level insight. However, obtaining chemical shift assignments is labor intensive and requires extensive measurement time. To address this limitation, we previously proposed ARTINA, a deep learning method for automatic assignment of two-dimensional (2D)-4D NMR spectra. Here, we present an integrative approach that combines ARTINA with AlphaFold and UCBShift, enabling chemical shift assignment with reduced experimental data, increased accuracy, and enhanced robustness for larger systems, as presented in a comprehensive study with more than 5000 automated assignment calculations on 89 proteins. We demonstrate that five 3D spectra yield more accurate assignments (92.59%) than pure ARTINA runs using all experimentally available NMR data (on average 10 3D spectra per protein, 91.37%), considerably reducing the required measurement time. We also showcase automated assignments of only 15N-labeled samples, and report improved assignment accuracy in larger synthetic systems of up to 500 residues.
Collapse
Affiliation(s)
- Piotr Klukowski
- Institute of Molecular Physical Science, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Roland Riek
- Institute of Molecular Physical Science, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
| | - Peter Güntert
- Institute of Molecular Physical Science, ETH Zurich, Vladimir-Prelog-Weg 2, 8093 Zurich, Switzerland
- Institute of Biophysical Chemistry, Goethe University Frankfurt, Max-von-Laue-Str. 9, 60438 Frankfurt am Main, Germany
- Department of Chemistry, Tokyo Metropolitan University, 1-1 Minami-Osawa, Hachioji, 192-0397 Tokyo, Japan
| |
Collapse
|
5
|
Pearce R, Huang X, Omenn GS, Zhang Y. De novo protein fold design through sequence-independent fragment assembly simulations. Proc Natl Acad Sci U S A 2023; 120:e2208275120. [PMID: 36656852 PMCID: PMC9942881 DOI: 10.1073/pnas.2208275120] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 12/22/2022] [Indexed: 01/20/2023] Open
Abstract
De novo protein design generally consists of two steps, including structure and sequence design. Many protein design studies have focused on sequence design with scaffolds adapted from native structures in the PDB, which renders novel areas of protein structure and function space unexplored. We developed FoldDesign to create novel protein folds from specific secondary structure (SS) assignments through sequence-independent replica-exchange Monte Carlo (REMC) simulations. The method was tested on 354 non-redundant topologies, where FoldDesign consistently created stable structural folds, while recapitulating on average 87.7% of the SS elements. Meanwhile, the FoldDesign scaffolds had well-formed structures with buried residues and solvent-exposed areas closely matching their native counterparts. Despite the high fidelity to the input SS restraints and local structural characteristics of native proteins, a large portion of the designed scaffolds possessed global folds completely different from natural proteins in the PDB, highlighting the ability of FoldDesign to explore novel areas of protein fold space. Detailed data analyses revealed that the major contributions to the successful structure design lay in the optimal energy force field, which contains a balanced set of SS packing terms, and REMC simulations, which were coupled with multiple auxiliary movements to efficiently search the conformational space. Additionally, the ability to recognize and assemble uncommon super-SS geometries, rather than the unique arrangement of common SS motifs, was the key to generating novel folds. These results demonstrate a strong potential to explore both structural and functional spaces through computational design simulations that natural proteins have not reached through evolution.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI48109
| | - Xiaoqiang Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI48109
| | - Gilbert S. Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI48109
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI48109
- Department of Human Genetics, University of Michigan, Ann Arbor, MI48109
- School of Public Health, University of Michigan, Ann Arbor, MI48109
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI48109
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI48109
- Department of Computer Science, School of Computing, National University of Singapore117417, Singapore
- Cancer Science Institute of Singapore, National University of Singapore117599, Singapore
| |
Collapse
|
6
|
Kaushik R, Zhang KY. An Integrated Protein Structure Fitness Scoring Approach for Identifying Native-Like Model Structures. Comput Struct Biotechnol J 2022; 20:6467-6472. [DOI: 10.1016/j.csbj.2022.11.032] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 11/14/2022] [Accepted: 11/14/2022] [Indexed: 11/18/2022] Open
|
7
|
Yang H, Xiong Z, Zonta F. Construction of a Deep Neural Network Energy Function for Protein Physics. J Chem Theory Comput 2022; 18:5649-5658. [PMID: 35939398 PMCID: PMC9476656 DOI: 10.1021/acs.jctc.2c00069] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The traditional approach of computational biology consists of calculating molecule properties by using approximate classical potentials. Interactions between atoms are described by an energy function derived from physical principles or fitted to experimental data. Their functional form is usually limited to pairwise interactions between atoms and does not consider complex multibody effects. More recently, neural networks have emerged as an alternative way of describing the interactions between biomolecules. In this approach, the energy function does not have an explicit functional form and is learned bottom-up from simulations at the atomistic or quantum level. In this study, we attempt a top-down approach and use deep learning methods to obtain an energy function by exploiting the large amount of experimental data acquired with years in the field of structural biology. The energy function is represented by a probability density model learned from a large repertoire of building blocks representing local clusters of amino acids paired with their sequence signature. We demonstrated the feasibility of this approach by generating a neural network energy function and testing its validity on several applications such as discriminating decoys, assessing qualities of structural models, sampling structural conformations, and designing new protein sequences. We foresee that, in the future, our methodology could exploit the continuously increasing availability of experimental data and simulations and provide a new method for the parametrization of protein energy functions.
Collapse
Affiliation(s)
- Huan Yang
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, China
| | - Zhaoping Xiong
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, China
| | - Francesco Zonta
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai 201210, China
| |
Collapse
|
8
|
Equilibrium Between Dimeric and Monomeric Forms of Human Epidermal Growth Factor is Shifted Towards Dimers in a Solution. Protein J 2022; 41:245-259. [PMID: 35348971 DOI: 10.1007/s10930-022-10051-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/22/2022] [Indexed: 10/18/2022]
Abstract
An interplay between monomeric and dimeric forms of human epidermal growth factor (EGF) affecting its interaction with EGF receptor (EGFR) is poorly understood. While EGF dimeric structure was resolved at pH 8.1, the possibility of EGF dimerization under physiological conditions is still unclear. This study aimed to describe the oligomeric state of EGF in a solution at physiological pH value. With centrifugal ultrafiltration followed by blue native gel electrophoresis, we showed that synthetic human EGF in a solution at a concentration of 0.1 mg/ml exists mainly in the dimeric form at pH 7.4 and temperature of 37 °C, although a small fraction of its monomers was also observed. Based on bioinformatics predictions, we introduced the D46G substitution to examine if EGF C-terminal part is directly involved in the intermolecular interface formation of the observed dimers. We found a reduced ability of the resulting EGF D46G dimers to dissociate at temperatures up to 50 °C. The D46G substitution also increased the intermolecular antiparallel β-structure content within the EGF peptide in a solution according to the CD spectra analysis that was confirmed by HATR-FTIR results. Additionally, the energy transfer between Tyr and Trp residues was detected by fluorescence spectroscopy for the EGF D46G mutant, but not for the native EGF. This allowed us to suggest the elongation and rearrangement of the intermolecular β-structure that leads to the observed stabilization of EGF D46G dimers. The results imply EGF dimerization under physiological pH value and temperature and the involvement of EGF C-terminal part in this process.
Collapse
|
9
|
A Benchmark Dataset for Evaluating Practical Performance of Model Quality Assessment of Homology Models. Bioengineering (Basel) 2022; 9:bioengineering9030118. [PMID: 35324806 PMCID: PMC8945737 DOI: 10.3390/bioengineering9030118] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 03/08/2022] [Accepted: 03/11/2022] [Indexed: 11/25/2022] Open
Abstract
Protein structure prediction is an important issue in structural bioinformatics. In this process, model quality assessment (MQA), which estimates the accuracy of the predicted structure, is also practically important. Currently, the most commonly used dataset to evaluate the performance of MQA is the critical assessment of the protein structure prediction (CASP) dataset. However, the CASP dataset does not contain enough targets with high-quality models, and thus cannot sufficiently evaluate the MQA performance in practical use. Additionally, most application studies employ homology modeling because of its reliability. However, the CASP dataset includes models generated by de novo methods, which may lead to the mis-estimation of MQA performance. In this study, we created new benchmark datasets, named a homology models dataset for model quality assessment (HMDM), that contain targets with high-quality models derived using homology modeling. We then benchmarked the performance of the MQA methods using the new datasets and compared their performance to that of the classical selection based on the sequence identity of the template proteins. The results showed that model selection by the latest MQA methods using deep learning is better than selection by template sequence identity and classical statistical potentials. Using HMDM, it is possible to verify the MQA performance for high-accuracy homology models.
Collapse
|
10
|
Kaushik R, Zhang KYJ. ProFitFun: a protein tertiary structure fitness function for quantifying the accuracies of model structures. Bioinformatics 2022; 38:369-376. [PMID: 34542606 DOI: 10.1093/bioinformatics/btab666] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 09/06/2021] [Accepted: 09/16/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION An accurate estimation of the quality of protein model structures typifies as a cornerstone in protein structure prediction regimes. Despite the recent groundbreaking success in the field of protein structure prediction, there are certain prospects for the improvement in model quality estimation at multiple stages of protein structure prediction and thus, to further push the prediction accuracy. Here, a novel approach, named ProFitFun, for assessing the quality of protein models is proposed by harnessing the sequence and structural features of experimental protein structures in terms of the preferences of backbone dihedral angles and relative surface accessibility of their amino acid residues at the tripeptide level. The proposed approach leverages upon the backbone dihedral angle and surface accessibility preferences of the residues by accounting for its N-terminal and C-terminal neighbors in the protein structure. These preferences are used to evaluate protein structures through a machine learning approach and tested on an extensive dataset of diverse proteins. RESULTS The approach was extensively validated on a large test dataset (n = 25 005) of protein structures, comprising 23 661 models of 82 non-homologous proteins and 1344 non-homologous experimental structures. In addition, an external dataset of 40 000 models of 200 non-homologous proteins was also used for the validation of the proposed method. Both datasets were further used for benchmarking the proposed method with four different state-of-the-art methods for protein structure quality assessment. In the benchmarking, the proposed method outperformed some state-of-the-art methods in terms of Spearman's and Pearson's correlation coefficients, average GDT-TS loss, sum of z-scores and average absolute difference of predictions over corresponding observed values. The high accuracy of the proposed approach promises a potential use of the sequence and structural features in computational protein design. AVAILABILITY AND IMPLEMENTATION http://github.com/KYZ-LSB/ProTerS-FitFun. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Rahul Kaushik
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, Yokohama, Kanagawa 230-0045, Japan
| | - Kam Y J Zhang
- Laboratory for Structural Bioinformatics, Center for Biosystems Dynamics Research, RIKEN, Yokohama, Kanagawa 230-0045, Japan
| |
Collapse
|
11
|
Greener JG, Jones DT. Differentiable molecular simulation can learn all the parameters in a coarse-grained force field for proteins. PLoS One 2021; 16:e0256990. [PMID: 34473813 PMCID: PMC8412298 DOI: 10.1371/journal.pone.0256990] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 08/19/2021] [Indexed: 11/26/2022] Open
Abstract
Finding optimal parameters for force fields used in molecular simulation is a challenging and time-consuming task, partly due to the difficulty of tuning multiple parameters at once. Automatic differentiation presents a general solution: run a simulation, obtain gradients of a loss function with respect to all the parameters, and use these to improve the force field. This approach takes advantage of the deep learning revolution whilst retaining the interpretability and efficiency of existing force fields. We demonstrate that this is possible by parameterising a simple coarse-grained force field for proteins, based on training simulations of up to 2,000 steps learning to keep the native structure stable. The learned potential matches chemical knowledge and PDB data, can fold and reproduce the dynamics of small proteins, and shows ability in protein design and model scoring applications. Problems in applying differentiable molecular simulation to all-atom models of proteins are discussed along with possible solutions and the variety of available loss functions. The learned potential, simulation scripts and training code are made available at https://github.com/psipred/cgdms.
Collapse
Affiliation(s)
- Joe G. Greener
- Department of Computer Science, University College London, London, United Kingdom
| | - David T. Jones
- Department of Computer Science, University College London, London, United Kingdom
| |
Collapse
|
12
|
Postic G, Janel N, Moroy G. Representations of protein structure for exploring the conformational space: A speed-accuracy trade-off. Comput Struct Biotechnol J 2021; 19:2618-2625. [PMID: 34025948 PMCID: PMC8120936 DOI: 10.1016/j.csbj.2021.04.049] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2021] [Revised: 04/19/2021] [Accepted: 04/20/2021] [Indexed: 11/25/2022] Open
Abstract
We compare ten structural representations, either atomistic or coarse-grained. Thus, ten distance-dependent statistical potentials of mean force (PMF) were built. The Cβ-only and Cα + Cβ representations provide the best speed–accuracy trade-off. Including glycines through Cα, in a Cβ-only representation, yields a higher accuracy. We generalize the conclusions to the total information gain (TIG) scoring function.
The recent breakthrough in the field of protein structure prediction shows the relevance of using knowledge-based based scoring functions in combination with a low-resolution 3D representation of protein macromolecules. The choice of not using all atoms is barely supported by any data in the literature, and is mostly motivated by empirical and practical reasons, such as the computational cost of assessing the numerous folds of the protein conformational space. Here, we present a comprehensive study, carried on a large and balanced benchmark of predicted protein structures, to see how different types of structural representations rank in either accuracy or calculation speed, and which ones offer the best compromise between these two criteria. We tested ten representations, including low-resolution, high-resolution, and coarse-grained approaches. We also investigated the generalization of the findings to other formalisms than the widely-used “potential of mean force” (PMF) method. Thus, we observed that representing protein structures by their β carbons—combined or not with Cα—provides the best speed–accuracy trade-off, when using a “total information gain” scoring function. For statistical PMFs, using MARTINI backbone and side-chains beads is the best option. Finally, we also demonstrated the necessity of training the reference state on all atom types, and of including the Cα atoms of glycine residues, in a Cβ-based representation.
Collapse
Affiliation(s)
- Guillaume Postic
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
- Corresponding author.
| | - Nathalie Janel
- Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France
| | - Gautier Moroy
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
| |
Collapse
|
13
|
Cao X, Tian P. Molecular free energy optimization on a computational graph. RSC Adv 2021; 11:12929-12937. [PMID: 35423805 PMCID: PMC8697515 DOI: 10.1039/d1ra01455b] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 03/26/2021] [Indexed: 11/21/2022] Open
Abstract
Free energy is arguably the most important property of molecular systems. Despite great progress in both its efficient estimation by scoring functions/potentials and more rigorous computation based on extensive sampling, we remain far from accurately predicting and manipulating biomolecular structures and their interactions. There are fundamental limitations, including accuracy of interaction description and difficulty of sampling in high dimensional space, to be tackled. Computational graph underlies major artificial intelligence platforms and is proven to facilitate training, optimization and learning. Combining autodifferentiation, coordinates transformation and generalized solvation free energy theory, we construct a computational graph infrastructure to realize seamless integration of fully trainable local free energy landscape with end to end differentiable iterative free energy optimization. This new framework drastically improves efficiency by replacing local sampling with differentiation. Its specific implementation in protein structure refinement achieves superb efficiency and competitive accuracy when compared with state of the art all-atom mainstream methods.
Collapse
Affiliation(s)
- Xiaoyong Cao
- School of Life Sciences, Jilin University Changchun 130012 China +86 431 85155287
| | - Pu Tian
- School of Life Sciences, Jilin University Changchun 130012 China +86 431 85155287
- School of Artificial Intelligence, Jilin University Changchun 130012 China
| |
Collapse
|
14
|
Stam MJ, Wood CW. DE-STRESS: a user-friendly web application for the evaluation of protein designs. Protein Eng Des Sel 2021; 34:gzab029. [PMID: 34908138 PMCID: PMC8672653 DOI: 10.1093/protein/gzab029] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 10/11/2021] [Accepted: 10/25/2021] [Indexed: 11/16/2022] Open
Abstract
De novo protein design is a rapidly growing field, and there are now many interesting and useful examples of designed proteins in the literature. However, most designs could be classed as failures when characterised in the lab, usually as a result of low expression, misfolding, aggregation or lack of function. This high attrition rate makes protein design unreliable and costly. It is possible that some of these failures could be caught earlier in the design process if it were quick and easy to generate information and a set of high-quality metrics regarding designs, which could be used to make reproducible and data-driven decisions about which designs to characterise experimentally. We present DE-STRESS (DEsigned STRucture Evaluation ServiceS), a web application for evaluating structural models of designed and engineered proteins. DE-STRESS has been designed to be simple, intuitive to use and responsive. It provides a wealth of information regarding designs, as well as tools to help contextualise the results and formally describe the properties that a design requires to be fit for purpose.
Collapse
Affiliation(s)
- Michael J Stam
- School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK
| | - Christopher W Wood
- School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3FF, UK
| |
Collapse
|
15
|
Bhattacharya S, Banerjee A, Ray S. Development of new vaccine target against SARS-CoV2 using envelope (E) protein: An evolutionary, molecular modeling and docking based study. Int J Biol Macromol 2020; 172:74-81. [PMID: 33385461 PMCID: PMC7833863 DOI: 10.1016/j.ijbiomac.2020.12.192] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2020] [Revised: 12/24/2020] [Accepted: 12/25/2020] [Indexed: 02/04/2023]
Abstract
COVID-19 is one of the fatal pandemic throughout the world. For cellular fusion, its antigenic peptides are presented by major histocompatibility complex (MHC) in humans. Therefore, exploration into residual interaction details of CoV2 with MHCs shall be a promising point for instigating the vaccine development. Envelope (E) protein, the smallest outer surface protein from SARS-CoV2 genome was found to possess the highest antigenicity and is therefore used to identify B-cell and T-cell epitopes. Four novel mutations (T55S, V56F, E69R and G70del) were observed in E-protein of SARS-CoV2 after evolutionary analysis. It showed a coil➔helix transition in the protein conformation. Antigenic variability of the epitopes was also checked to explore the novel mutations in the epitope region. It was found that the interactions were more when SARS-CoV2 E-protein interacted with MHC-I than with MHC-II through several ionic and H-bonds. Tyr42 and Tyr57 played a predominant role upon interaction with MHC-I. The higher ΔG values with lesser dissociation constant values also affirm the stronger and spontaneous interaction by SARS-CoV2 proteins with MHCs. On comparison with the consensus E-protein, SARS-CoV2 E-protein showed stronger interaction with the MHCs with lesser solvent accessibility. E-protein can therefore be targeted as a potential vaccine target against SARS-CoV2.
Collapse
Affiliation(s)
- Shreya Bhattacharya
- Department of Biosciences and Bioengineering, Indian Institute of Technology Guwahati, Guwahati, Assam, India
| | - Arundhati Banerjee
- Department of Biochemistry and Biophysics, University of Kalyani, Kalyani, Nadia, India
| | - Sujay Ray
- Amity Institute of Bioechnology, Amity University, Kolkata, India.
| |
Collapse
|
16
|
Fowler NJ, Sljoka A, Williamson MP. A method for validating the accuracy of NMR protein structures. Nat Commun 2020; 11:6321. [PMID: 33339822 PMCID: PMC7749147 DOI: 10.1038/s41467-020-20177-1] [Citation(s) in RCA: 37] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2020] [Accepted: 11/13/2020] [Indexed: 01/13/2023] Open
Abstract
We present a method that measures the accuracy of NMR protein structures. It compares random coil index [RCI] against local rigidity predicted by mathematical rigidity theory, calculated from NMR structures [FIRST], using a correlation score (which assesses secondary structure), and an RMSD score (which measures overall rigidity). We test its performance using: structures refined in explicit solvent, which are much better than unrefined structures; decoy structures generated for 89 NMR structures; and conventional predictors of accuracy such as number of restraints per residue, restraint violations, energy of structure, ensemble RMSD, Ramachandran distribution, and clashscore. Restraint violations and RMSD are poor measures of accuracy. Comparisons of NMR to crystal structures show that secondary structure is equally accurate, but crystal structures are typically too rigid in loops, whereas NMR structures are typically too floppy overall. We show that the method is a useful addition to existing measures of accuracy.
Collapse
Affiliation(s)
- Nicholas J Fowler
- Dept of Molecular Biology and Biotechnology, University of Sheffield, Sheffield, UK
| | - Adnan Sljoka
- RIKEN Center for Advanced Intelligence Project, RIKEN, 1-4-1 Nihombashi, Chuo-ku, Tokyo, 103-0027, Japan.
- Dept of Chemistry, University of Toronto, UTM, 3359 Mississauga Road North, Mississauga, ON, L5L 1C6, Canada.
| | - Mike P Williamson
- Dept of Molecular Biology and Biotechnology, University of Sheffield, Sheffield, UK.
| |
Collapse
|
17
|
Grigas AT, Mei Z, Treado JD, Levine ZA, Regan L, O'Hern CS. Using physical features of protein core packing to distinguish real proteins from decoys. Protein Sci 2020; 29:1931-1944. [PMID: 32710566 PMCID: PMC7454528 DOI: 10.1002/pro.3914] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2020] [Revised: 07/10/2020] [Accepted: 07/20/2020] [Indexed: 01/06/2023]
Abstract
The ability to consistently distinguish real protein structures from computationally generated model decoys is not yet a solved problem. One route to distinguish real protein structures from decoys is to delineate the important physical features that specify a real protein. For example, it has long been appreciated that the hydrophobic cores of proteins contribute significantly to their stability. We used two sources to obtain datasets of decoys to compare with real protein structures: submissions to the biennial Critical Assessment of protein Structure Prediction competition, in which researchers attempt to predict the structure of a protein only knowing its amino acid sequence, and also decoys generated by 3DRobot, which have user-specified global root-mean-squared deviations from experimentally determined structures. Our analysis revealed that both sets of decoys possess cores that do not recapitulate the key features that define real protein cores. In particular, the model structures appear more densely packed (because of energetically unfavorable atomic overlaps), contain too few residues in the core, and have improper distributions of hydrophobic residues throughout the structure. Based on these observations, we developed a feed-forward neural network, which incorporates key physical features of protein cores, to predict how well a computational model recapitulates the real protein structure without knowledge of the structure of the target sequence. By identifying the important features of protein structure, our method is able to rank decoy structures with similar accuracy to that obtained by state-of-the-art methods that incorporate many additional features. The small number of physical features makes our model interpretable, emphasizing the importance of protein packing and hydrophobicity in protein structure prediction.
Collapse
Affiliation(s)
- Alex T. Grigas
- Graduate Program in Computational Biology and BioinformaticsYale UniversityNew HavenConnecticutUSA
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
| | - Zhe Mei
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of ChemistryYale UniversityNew HavenConnecticutUSA
| | - John D. Treado
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of Mechanical Engineering and Materials ScienceYale UniversityNew HavenConnecticutUSA
| | - Zachary A. Levine
- Department of PathologyYale UniversityNew HavenConnecticutUSA
- Department of Molecular Biophysics and BiochemistryYale UniversityNew HavenConnecticutUSA
| | - Lynne Regan
- Institute of Quantitative Biology, Biochemistry and Biotechnology, Centre for Synthetic and Systems Biology, School of Biological SciencesUniversity of EdinburghEdinburghUK
| | - Corey S. O'Hern
- Graduate Program in Computational Biology and BioinformaticsYale UniversityNew HavenConnecticutUSA
- Integrated Graduate Program in Physical and Engineering BiologyYale UniversityNew HavenConnecticutUSA
- Department of Mechanical Engineering and Materials ScienceYale UniversityNew HavenConnecticutUSA
- Department of PhysicsYale UniversityNew HavenConnecticutUSA
- Department of Applied PhysicsYale UniversityNew HavenConnecticutUSA
| |
Collapse
|
18
|
Sinnott M, Malhotra S, Madhusudhan MS, Thalassinos K, Topf M. Combining Information from Crosslinks and Monolinks in the Modeling of Protein Structures. Structure 2020; 28:1061-1070.e3. [DOI: 10.1016/j.str.2020.05.012] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2020] [Revised: 05/08/2020] [Accepted: 05/22/2020] [Indexed: 11/30/2022]
|
19
|
Postic G, Janel N, Tufféry P, Moroy G. An information gain-based approach for evaluating protein structure models. Comput Struct Biotechnol J 2020; 18:2228-2236. [PMID: 32837711 PMCID: PMC7431362 DOI: 10.1016/j.csbj.2020.08.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Revised: 08/06/2020] [Accepted: 08/07/2020] [Indexed: 12/23/2022] Open
Abstract
For three decades now, knowledge-based scoring functions that operate through the "potential of mean force" (PMF) approach have continuously proven useful for studying protein structures. Although these statistical potentials are not to be confused with their physics-based counterparts of the same name-i.e. PMFs obtained by molecular dynamics simulations-their particular success in assessing the native-like character of protein structure predictions has lead authors to consider the computed scores as approximations of the free energy. However, this physical justification is a matter of controversy since the beginning. Alternative interpretations based on Bayes' theorem have been proposed, but the misleading formalism that invokes the inverse Boltzmann law remains recurrent in the literature. In this article, we present a conceptually new method for ranking protein structure models by quality, which is (i) independent of any physics-based explanation and (ii) relevant to statistics and to a general definition of information gain. The theoretical development described in this study provides new insights into how statistical PMFs work, in comparison with our approach. To prove the concept, we have built interatomic distance-dependent scoring functions, based on the former and new equations, and compared their performance on an independent benchmark of 60,000 protein structures. The results demonstrate that our new formalism outperforms statistical PMFs in evaluating the quality of protein structural decoys. Therefore, this original type of score offers a possibility to improve the success of statistical PMFs in the various fields of structural biology where they are applied. The open-source code is available for download at https://gitlab.rpbs.univ-paris-diderot.fr/src/ig-score.
Collapse
Affiliation(s)
- Guillaume Postic
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France.,Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France.,Institut Français de Bioinformatique (IFB), UMS 3601-CNRS, Université Paris-Saclay, Orsay, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Nathalie Janel
- Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France
| | - Pierre Tufféry
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Gautier Moroy
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
| |
Collapse
|
20
|
Bhattacharya D. refineD: improved protein structure refinement using machine learning based restrained relaxation. Bioinformatics 2020; 35:3320-3328. [PMID: 30759180 DOI: 10.1093/bioinformatics/btz101] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2018] [Revised: 01/22/2019] [Accepted: 02/11/2019] [Indexed: 12/20/2022] Open
Abstract
MOTIVATION Protein structure refinement aims to bring moderately accurate template-based protein models closer to the native state through conformational sampling. However, guiding the sampling towards the native state by effectively using restraints remains a major issue in structure refinement. RESULTS Here, we develop a machine learning based restrained relaxation protocol that uses deep discriminative learning based binary classifiers to predict multi-resolution probabilistic restraints from the starting structure and subsequently converts these restraints to be integrated into Rosetta all-atom energy function as additional scoring terms during structure refinement. We use four restraint resolutions as adopted in GDT-HA (0.5, 1, 2 and 4 Å), centered on the Cα atom of each residue that are predicted by ensemble of four deep discriminative classifiers trained using combinations of sequence and structure-derived features as well as several energy terms from Rosetta centroid scoring function. The proposed method, refineD, has been found to produce consistent and substantial structural refinement through the use of cumulative and non-cumulative restraints on 150 benchmarking targets. refineD outperforms unrestrained relaxation strategy or relaxation that is restrained to starting structures using the FastRelax application of Rosetta or atomic-level energy minimization based ModRefiner method as well as molecular dynamics (MD) simulation based FG-MD protocol. Furthermore, by adjusting restraint resolutions, the method addresses the tradeoff that exists between degree and consistency of refinement. These results demonstrate a promising new avenue for improving accuracy of template-based protein models by effectively guiding conformational sampling during structure refinement through the use of machine learning based restraints. AVAILABILITY AND IMPLEMENTATION http://watson.cse.eng.auburn.edu/refineD/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, AL, USA
| |
Collapse
|
21
|
Xu G, Wang Q, Ma J. OPUS-Fold: An Open-Source Protein Folding Framework Based on Torsion-Angle Sampling. J Chem Theory Comput 2020; 16:3970-3976. [DOI: 10.1021/acs.jctc.0c00186] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States
- Department of Bioengineering, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
22
|
Chen J, Siu SWI. Machine Learning Approaches for Quality Assessment of Protein Structures. Biomolecules 2020; 10:biom10040626. [PMID: 32316682 PMCID: PMC7226485 DOI: 10.3390/biom10040626] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2020] [Revised: 04/07/2020] [Accepted: 04/09/2020] [Indexed: 11/16/2022] Open
Abstract
Protein structures play a very important role in biomedical research, especially in drug discovery and design, which require accurate protein structures in advance. However, experimental determinations of protein structure are prohibitively costly and time-consuming, and computational predictions of protein structures have not been perfected. Methods that assess the quality of protein models can help in selecting the most accurate candidates for further work. Driven by this demand, many structural bioinformatics laboratories have developed methods for estimating model accuracy (EMA). In recent years, EMA by machine learning (ML) have consistently ranked among the top-performing methods in the community-wide CASP challenge. Accordingly, we systematically review all the major ML-based EMA methods developed within the past ten years. The methods are grouped by their employed ML approach-support vector machine, artificial neural networks, ensemble learning, or Bayesian learning-and their significances are discussed from a methodology viewpoint. To orient the reader, we also briefly describe the background of EMA, including the CASP challenge and its evaluation metrics, and introduce the major ML/DL techniques. Overall, this review provides an introductory guide to modern research on protein quality assessment and directions for future research in this area.
Collapse
|
23
|
Liu S, Xiang X, Gao X, Liu H. Neighborhood Preference of Amino Acids in Protein Structures and its Applications in Protein Structure Assessment. Sci Rep 2020; 10:4371. [PMID: 32152349 PMCID: PMC7062742 DOI: 10.1038/s41598-020-61205-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Accepted: 02/24/2020] [Indexed: 12/02/2022] Open
Abstract
Amino acids form protein 3D structures in unique manners such that the folded structure is stable and functional under physiological conditions. Non-specific and non-covalent interactions between amino acids exhibit neighborhood preferences. Based on structural information from the protein data bank, a statistical energy function was derived to quantify amino acid neighborhood preferences. The neighborhood of one amino acid is defined by its contacting residues, and the energy function is determined by the neighboring residue types and relative positions. The neighborhood preference of amino acids was exploited to facilitate structural quality assessment, which was implemented in the neighborhood preference program NEPRE. The source codes are available via https://github.com/LiuLab-CSRC/NePre.
Collapse
Affiliation(s)
- Siyuan Liu
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Xilun Xiang
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Xiang Gao
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Haiguang Liu
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China.
- Physics Department, Beijing Normal University, Haidian, Beijing, 100875, China.
| |
Collapse
|
24
|
Li J, Bennett KC, Liu Y, Martin MV, Head-Gordon T. Accurate prediction of chemical shifts for aqueous protein structure on "Real World" data. Chem Sci 2020; 11:3180-3191. [PMID: 34122823 PMCID: PMC8152569 DOI: 10.1039/c9sc06561j] [Citation(s) in RCA: 26] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2019] [Accepted: 03/02/2020] [Indexed: 02/04/2023] Open
Abstract
Here we report a new machine learning algorithm for protein chemical shift prediction that outperforms existing chemical shift calculators on realistic data that is not heavily curated, nor eliminates test predictions ad hoc. Our UCBShift predictor implements two modules: a transfer prediction module that employs both sequence and structural alignment to select reference candidates for experimental chemical shift replication, and a redesigned machine learning module based on random forest regression which utilizes more, and more carefully curated, feature extracted data. When combined together, this new predictor achieves state-of-the-art accuracy for predicting chemical shifts on a randomly selected dataset without careful curation, with root-mean-square errors of 0.31 ppm for amide hydrogens, 0.19 ppm for Hα, 0.84 ppm for C', 0.81 ppm for Cα, 1.00 ppm for Cβ, and 1.81 ppm for N. When similar sequences or structurally related proteins are available, UCBShift shows superior native state selection from misfolded decoy sets compared to SPARTA+ and SHIFTX2, and even without homology we exceed current prediction accuracy of all other popular chemical shift predictors.
Collapse
Affiliation(s)
- Jie Li
- Pitzer Center for Theoretical Chemistry, University of California Berkeley CA 94720 USA
- Department of Chemistry, University of California Berkeley CA 94720 USA
| | - Kochise C Bennett
- Pitzer Center for Theoretical Chemistry, University of California Berkeley CA 94720 USA
- Department of Chemistry, University of California Berkeley CA 94720 USA
| | - Yuchen Liu
- Pitzer Center for Theoretical Chemistry, University of California Berkeley CA 94720 USA
- Department of Chemistry, University of California Berkeley CA 94720 USA
| | - Michael V Martin
- Department of Bioengineering, University of California Berkeley CA 94720 USA
| | - Teresa Head-Gordon
- Pitzer Center for Theoretical Chemistry, University of California Berkeley CA 94720 USA
- Department of Chemistry, University of California Berkeley CA 94720 USA
- Department of Bioengineering, University of California Berkeley CA 94720 USA
- Department of Chemical and Biomolecular Engineering, University of California Berkeley CA 94720 USA
| |
Collapse
|
25
|
Alapati R, Shuvo MH, Bhattacharya D. SPECS: Integration of side-chain orientation and global distance-based measures for improved evaluation of protein structural models. PLoS One 2020; 15:e0228245. [PMID: 32053611 PMCID: PMC7018003 DOI: 10.1371/journal.pone.0228245] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 01/11/2020] [Indexed: 12/23/2022] Open
Abstract
Significant advancements in the field of protein structure prediction have necessitated the need for objective and robust evaluation of protein structural models by comparing predicted models against the experimentally determined native structures to quantitate their structural similarities. Existing protein model versus native similarity metrics either consider the distances between alpha carbon (Cα) or side-chain atoms for computing the similarity. However, side-chain orientation of a protein plays a critical role in defining its conformation at the atomic-level. Despite its importance, inclusion of side-chain orientation in structural similarity evaluation has not yet been addressed. Here, we present SPECS, a side-chain-orientation-included protein model-native similarity metric for improved evaluation of protein structural models. SPECS combines side-chain orientation and global distance based measures in an integrated framework using the united-residue model of polypeptide conformation for computing model-native similarity. Experimental results demonstrate that SPECS is a reliable measure for evaluating structural similarity at the global level including and beyond the accuracy of Cα positioning. Moreover, SPECS delivers superior performance in capturing local quality aspect compared to popular global Cα positioning-based metrics ranging from models at near-experimental accuracies to models with correct overall folds-making it a robust measure suitable for both high- and moderate-resolution models. Finally, SPECS is sensitive to minute variations in side-chain χ angles even for models with perfect Cα trace, revealing the power of including side-chain orientation. Collectively, SPECS is a versatile evaluation metric covering a wide spectrum of protein modeling scenarios and simultaneously captures complementary aspects of structural similarities at multiple levels of granularities. SPECS is freely available at http://watson.cse.eng.auburn.edu/SPECS/.
Collapse
Affiliation(s)
- Rahul Alapati
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
| | - Md. Hossain Shuvo
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science and Software Engineering, Auburn University, Auburn, Alabama, United States of America
- Department of Biological Sciences, Auburn University, Auburn, Alabama, United States of America
| |
Collapse
|
26
|
Xu G, Wang Q, Ma J. OPUS-Refine: A Fast Sampling-Based Framework for Refining Protein Backbone Torsion Angles and Global Conformation. J Chem Theory Comput 2020; 16:1359-1366. [DOI: 10.1021/acs.jctc.9b01054] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States
- Department of Bioengineering, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
27
|
Mirzaie M. Identification of native protein structures captured by principal interactions. BMC Bioinformatics 2019; 20:604. [PMID: 31752663 PMCID: PMC6873546 DOI: 10.1186/s12859-019-3186-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 11/01/2019] [Indexed: 11/20/2022] Open
Abstract
Background Evaluation of protein structure is based on trustworthy potential function. The total potential of a protein structure is approximated as the summation of all pair-wise interaction potentials. Knowledge-based potentials (KBP) are one type of potential functions derived by known experimentally determined protein structures. Although several KBP functions with different methods have been introduced, the key interactions that capture the total potential have not studied yet. Results In this study, we seek the interaction types that preserve as much of the total potential as possible. We employ a procedure based on the principal component analysis (PCA) to extract the significant and key interactions in native protein structures. We call these interactions as principal interactions and show that the results of the model that considers only these interactions are very close to the full interaction model that considers all interactions in protein fold recognition. In fact, the principal interactions maintain the discriminative power of the full interaction model. This method was evaluated on 3 KBPs with different contact definitions and thresholds of distance and revealed that their corresponding principal interactions are very similar and have a lot in common. Additionally, the principal interactions consisted of 20 % of the full interactions on average, and they are between residues, which are considered important in protein folding. Conclusions This work shows that all interaction types are not equally important in discrimination of native structure. The results of the reduced model based on principal interactions that were very close to the full interaction model suggest that a new strategy is needed to capture the role of remaining interactions (non-principal interactions) to improve the power of knowledge-based potential functions.
Collapse
Affiliation(s)
- Mehdi Mirzaie
- Department of Applied Mathematics, Faculty of Mathematical Sciences, Tarbiat Modares University, Jalal Ale Ahmad Highway, P.O.Box: 14115-134, Tehran, Iran.
| |
Collapse
|
28
|
Long S, Tian P. A simple neural network implementation of generalized solvation free energy for assessment of protein structural models. RSC Adv 2019; 9:36227-36233. [PMID: 35540566 PMCID: PMC9074945 DOI: 10.1039/c9ra05168f] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Accepted: 10/14/2019] [Indexed: 11/21/2022] Open
Abstract
Rapid and accurate assessment of protein structural models is essential for protein structure prediction and design. Great progress has been made in this regard, especially by recent application of "knowledge-based" potentials. Various machine learning based protein structural model quality assessment methods are also quite successful. However, performance of traditional "physics-based" models has not been as effective. Based on our analysis of the fundamental computational limitation behind unsatisfactory performance of "physics-based" models, we propose a generalized solvation free energy (GSFE) framework, which is intrinsically flexible for multi-scale treatments and is amenable for machine learning implementation. Finally, we implemented a simple example of backbone-based residue level GSFE with neural network, which was found to have competitive performance when compared with highly complex latest "knowledge-based" atomic potentials in distinguishing native structures from decoys.
Collapse
Affiliation(s)
- Shiyang Long
- School of Chemistry, Jilin University Changchun China
| | - Pu Tian
- School of Life Science and School of Artificial Intelligence, Jilin University 2699 Qianjin Street Changchun China 130012
| |
Collapse
|
29
|
Sato R, Ishida T. Protein model accuracy estimation based on local structure quality assessment using 3D convolutional neural network. PLoS One 2019; 14:e0221347. [PMID: 31487288 PMCID: PMC6728020 DOI: 10.1371/journal.pone.0221347] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 08/05/2019] [Indexed: 11/23/2022] Open
Abstract
In protein tertiary structure prediction, model quality assessment programs (MQAPs) are often used to select the final structural models from a pool of candidate models generated by multiple templates and prediction methods. The 3-dimensional convolutional neural network (3DCNN) is an expansion of the 2DCNN and has been applied in several fields, including object recognition. The 3DCNN is also used for MQA tasks, but the performance is low due to several technical limitations related to protein tertiary structures, such as orientation alignment. We proposed a novel single-model MQA method based on local structure quality evaluation using a deep neural network containing 3DCNN layers. The proposed method first assesses the quality of local structures for each residue and then evaluates the quality of whole structures by integrating estimated local qualities. We analyzed the model using the CASP11, CASP12, and 3D-Robot datasets and compared the performance of the model with that of the previous 3DCNN method based on whole protein structures. The proposed method showed a significant improvement compared to the previous 3DCNN method for multiple evaluation measures. We also compared the proposed method to other state-of-the-art methods. Our method showed better performance than the previous 3DCNN-based method and comparable accuracy as the current best single-model methods; particularly, in CASP11 stage2, our method showed a Pearson coefficient of 0.486, which was better than those of the best single-model methods (0.366–0.405). A standalone version of the proposed method and data files are available at https://github.com/ishidalab-titech/3DCNN_MQA.
Collapse
Affiliation(s)
- Rin Sato
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo, Japan
| | - Takashi Ishida
- Department of Computer Science, School of Computing, Tokyo Institute of Technology, Ookayama, Meguro-ku, Tokyo, Japan
- * E-mail:
| |
Collapse
|
30
|
Xu G, Ma T, Du J, Wang Q, Ma J. OPUS-Rota2: An Improved Fast and Accurate Side-Chain Modeling Method. J Chem Theory Comput 2019; 15:5154-5160. [PMID: 31412199 DOI: 10.1021/acs.jctc.9b00309] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Side-chain modeling plays a critical role in protein structure prediction. However, in many current methods, balancing the speed and accuracy is still challenging. In this paper, on the basis of our previous work OPUS-Rota (Protein Sci. 2008, 17, 1576-1585), we introduce a new side-chain modeling method, OPUS-Rota2, which is tested on both a 65-protein test set (DB65) in the OPUS-Rota paper and a 379-protein test set (DB379) in the SCWRL4 paper. If the main chain is native, OPUS-Rota2 is more accurate than OPUS-Rota, SCWRL4, and OSCAR-star but slightly less accurate than OSCAR-o. Also, if the main chain is non-native, OPUS-Rota2 is more accurate than any other method. Moreover, OPUS-Rota2 is significantly faster than any other method, in particular, 2 orders of magnitude faster than OSCAR-o. Thus, the combination of higher accuracy and speed of OPUS-Rota2 in modeling side chains on both the native and non-native main chains makes OPUS-Rota2 a very useful tool in protein structure modeling.
Collapse
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems , Fudan University , Shanghai 200433 , China.,School of Life Sciences , Tsinghua University , Beijing 100084 , China
| | | | - Junqing Du
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology , Baylor College of Medicine , One Baylor Plaza, BCM-125 , Houston , Texas 77030 , United States
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology , Baylor College of Medicine , One Baylor Plaza, BCM-125 , Houston , Texas 77030 , United States
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems , Fudan University , Shanghai 200433 , China.,School of Life Sciences , Tsinghua University , Beijing 100084 , China.,Verna and Marrs Mclean Department of Biochemistry and Molecular Biology , Baylor College of Medicine , One Baylor Plaza, BCM-125 , Houston , Texas 77030 , United States.,School of Life Sciences , Fudan University , Shanghai 200433 , China
| |
Collapse
|
31
|
Wang Y, Virtanen J, Xue Z, Zhang Y. I-TASSER-MR: automated molecular replacement for distant-homology proteins using iterative fragment assembly and progressive sequence truncation. Nucleic Acids Res 2019; 45:W429-W434. [PMID: 28472524 PMCID: PMC5793832 DOI: 10.1093/nar/gkx349] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2017] [Accepted: 04/20/2017] [Indexed: 11/16/2022] Open
Abstract
Molecular replacement (MR) is one of the most common techniques used for solving the phase problem in X-ray crystal diffraction. The success rate of MR however drops quickly when the sequence identity between query and templates is reduced, while the I-TASSER-MR server is designed to solve the phase problem for proteins that lack close homologous templates. Starting from a sequence, it first generates full-length models using I-TASSER by iterative structural fragment reassembly. A progressive sequence truncation procedure is then used for editing the models based on local variations of the structural assembly simulations. Next, the edited models are submitted to MR-REX to search for optimal placements in the crystal unit-cells through replica-exchange Monte Carlo simulations, with the phasing results used by CNS for final atomic model refinement and selection. The I-TASSER-MR algorithm was tested in large-scale benchmark datasets and solved 36% more targets compared to using the best threading templates. The server takes primary sequence and raw crystal diffraction data as input, with output containing annotated phase information and refined structure models. It also allows users to choose between different methods for setting B-factors and the number of models used for phasing. The online server is freely available at http://zhanglab.ccmb.med.umich.edu/I-TASSER-MR.
Collapse
Affiliation(s)
- Yan Wang
- Key Laboratory of Molecular Biophysics of the Ministry of Education, School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China.,Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jouko Virtanen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Zhidong Xue
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.,Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
32
|
Yu Z, Yao Y, Deng H, Yi M. ANDIS: an atomic angle- and distance-dependent statistical potential for protein structure quality assessment. BMC Bioinformatics 2019; 20:299. [PMID: 31159742 PMCID: PMC6547486 DOI: 10.1186/s12859-019-2898-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 05/13/2019] [Indexed: 01/05/2023] Open
Abstract
Background The knowledge-based statistical potential has been widely used in protein structure modeling and model quality assessment. They are commonly evaluated based on their abilities of native recognition as well as decoy discrimination. However, these two aspects are found to be mutually exclusive in many statistical potentials. Results We developed an atomic ANgle- and DIStance-dependent (ANDIS) statistical potential for protein structure quality assessment with distance cutoff being a tunable parameter. When distance cutoff is ≤9.0 Å, “effective atomic interaction” is employed to enhance the ability of native recognition. For a distance cutoff of ≥10 Å, the distance-dependent atom-pair potential with random-walk reference state is combined to strengthen the ability of decoy discrimination. Benchmark tests on 632 structural decoy sets from diverse sources demonstrate that ANDIS outperforms other state-of-the-art potentials in both native recognition and decoy discrimination. Conclusions Distance cutoff is a crucial parameter for distance-dependent statistical potentials. A lower distance cutoff is better for native recognition, while a higher one is favorable for decoy discrimination. The ANDIS potential is freely available as a standalone application at http://qbp.hzau.edu.cn/ANDIS/. Electronic supplementary material The online version of this article (10.1186/s12859-019-2898-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Zhongwang Yu
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yuangen Yao
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China
| | - Haiyou Deng
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China. .,Institute of Applied Physics, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Ming Yi
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070, China. .,Institute of Applied Physics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
33
|
Conover M, Staples M, Si D, Sun M, Cao R. AngularQA: Protein Model Quality Assessment with LSTM Networks. COMPUTATIONAL AND MATHEMATICAL BIOPHYSICS 2019. [DOI: 10.1515/cmb-2019-0001] [Citation(s) in RCA: 27] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023] Open
Abstract
Abstract
Quality Assessment (QA) plays an important role in protein structure prediction. Traditional multimodel QA method usually suffer from searching databases or comparing with other models for making predictions, which usually fail when the poor quality models dominate the model pool. We propose a novel protein single-model QA method which is built on a new representation that converts raw atom information into a series of carbon-alpha (Cα) atoms with side-chain information, defined by their dihedral angles and bond lengths to the prior residue. An LSTM network is used to predict the quality by treating each amino acid as a time-step and consider the final value returned by the LSTM cells. To the best of our knowledge, this is the first time anyone has attempted to use an LSTM model on the QA problem; furthermore, we use a new representation which has not been studied for QA. In addition to angles, we make use of sequence properties like secondary structure parsed from protein structure at each time-step without using any database, which is different than all existed QA methods. Our model achieves an overall correlation of 0.651 on the CASP12 testing dataset. Our experiment points out new directions for QA problem and our method could be widely used for protein structure prediction problem. The software is freely available at GitHub: https://github.com/caorenzhi/AngularQA
Collapse
Affiliation(s)
- Matthew Conover
- Department of Computer Science , Pacific Lutheran University , Tacoma , WA 98447 , USA
| | - Max Staples
- Department of Computer Science , Pacific Lutheran University , Tacoma , WA 98447 , USA
| | - Dong Si
- Division of Computing and Software Systems , University of Washington-Bothell , Bothell , WA 98011 , USA
| | - Miao Sun
- JingChi, Sunnyvale , CA 94089 , USA
| | - Renzhi Cao
- Department of Computer Science , Pacific Lutheran University , Tacoma , WA 98447 , USA
| |
Collapse
|
34
|
Discrimination power of knowledge-based potential dictated by the dominant energies in native protein structures. Amino Acids 2019; 51:1029-1038. [PMID: 31098784 DOI: 10.1007/s00726-019-02743-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 05/08/2019] [Indexed: 01/20/2023]
Abstract
Extracting a well-designed energy function is important for protein structure evaluation. Knowledge-based potential functions are one type of the energy functions which can be obtained from known protein structures. The pairwise potential between atom types is approximated using Boltzmann's law which relates the frequency of atom types to its potential. The total energy is approximated as a summation of pairwise potential between the atomic pairs. In the present study, the performance of knowledge-based potential function was assessed based on the strength of interaction between groups of amino acids. The dominant energies involved in the pairwise potentials were revealed by eigenvalue analysis of the matrix, the elements of which represent the energy between amino acids. For this purpose, the matrix including the mean of the energies of residue-residue interaction types was constructed using 500 native protein structures. The matrix has a dominant eigenvalue and amino acids, with LEU, VAL, ILE, PHE, TYR, ALA and TRP having high values along the dominant eigenvector. The results show that the ranking of amino acids is consistent with the power of amino acids in discriminating native structures using K-alphabet reduced model. In the reduced interactions, only amino acids from a subset of all 20 amino acids, along with their interactions are considered to assess the energy. In the K-alphabet reduced model, the reduced structures are constructed based on only the K-amino acid types. The dominant K-alphabet reduced model derived for the k-first amino acids in the list [LEU, VAL, PHE, ILE, TYR, ALA, TRP] of amino acids has the best discrimination of native structure among all possible K-alphabet reduced models. Knowledge-based potentials might be improved with a new strategy.
Collapse
|
35
|
Wang X, Huang SY. Integrating Bonded and Nonbonded Potentials in the Knowledge-Based Scoring Function for Protein Structure Prediction. J Chem Inf Model 2019; 59:3080-3090. [DOI: 10.1021/acs.jcim.9b00057] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Affiliation(s)
- Xinxiang Wang
- Institute of Biophysics, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Sheng-You Huang
- Institute of Biophysics, School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| |
Collapse
|
36
|
Pearce R, Huang X, Setiawan D, Zhang Y. EvoDesign: Designing Protein-Protein Binding Interactions Using Evolutionary Interface Profiles in Conjunction with an Optimized Physical Energy Function. J Mol Biol 2019; 431:2467-2476. [PMID: 30851277 DOI: 10.1016/j.jmb.2019.02.028] [Citation(s) in RCA: 47] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2018] [Revised: 02/10/2019] [Accepted: 02/26/2019] [Indexed: 01/19/2023]
Abstract
EvoDesign (https://zhanglab.ccmb.med.umich.edu/EvoDesign) is an online server system for protein design. The method uses evolutionary profiles to guide the sequence search simulation and demonstrated significant advantages over physics-based approaches in terms of more accurately designing proteins that adopt desired target folds. Despite the success, the previous EvoDesign program focused only on monomer protein design, which limited its ability and usefulness in terms of designing functional proteins. In this work, we propose a new EvoDesign server, which extends the principles of evolution-based design to design protein-protein interactions. Starting from a two-chain complex structure, structurally similar interfaces are identified from known protein-protein interaction databases. An interface evolutionary profile is then constructed from a multiple sequence alignment of the interface analogies, which is combined with a newly developed, atomic-level physical energy function to guide the replica-exchange Monte Carlo simulation search. The purpose of the server is to redesign the specified complex chain to increase its stability and binding affinity for the other chain in the complex. With the improved scope and accuracy of the methodology, the new EvoDesign pipeline should become a useful online tool for functional protein design and drug discovery studies.
Collapse
Affiliation(s)
- Robin Pearce
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xiaoqiang Huang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Dani Setiawan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA; Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA.
| |
Collapse
|
37
|
Khare S, Bhasin M, Sahoo A, Varadarajan R. Protein model discrimination attempts using mutational sensitivity, predicted secondary structure, and model quality information. Proteins 2019; 87:326-336. [PMID: 30615225 DOI: 10.1002/prot.25654] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 12/22/2018] [Accepted: 01/02/2019] [Indexed: 01/02/2023]
Abstract
Structure prediction methods often generate a large number of models for a target sequence. Even if the correct fold for the target sequence is sampled in this dataset, it is difficult to distinguish it from other decoy structures. An attempt to solve this problem using experimental mutational sensitivity data for the CcdB protein was described previously by exploiting the correlation of residue depth with mutational sensitivity (r ~ 0.6). We now show that such a correlation extends to four other proteins with localized active sites, and for which saturation mutagenesis datasets exist. We also examine whether incorporation of predicted secondary structure information and the DOPE model quality assessment score, in addition to mutational sensitivity, improves the accuracy of model discrimination using a decoy dataset of 163 targets from CASP. Although most CASP models would have been subjected to model quality assessment prior to submission, we find that the DOPE score makes a substantial contribution to the observed improvement. We therefore also applied the approach to CcdB and four other proteins for which reliable experimental mutational data exist and observe that inclusion of experimental mutational data results in a small qualitative improvement in model discrimination relative to that seen with just the DOPE score. This is largely because of our limited ability to quantitatively predict effects of point mutations on in vivo protein activity. Further improvements in the methodology are required to facilitate improved utilization of single mutant data.
Collapse
Affiliation(s)
- Shruti Khare
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Munmun Bhasin
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Anusmita Sahoo
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Raghavan Varadarajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India.,Chemical Biology Unit, Jawaharlal Nehru Center for Advanced Scientific Research, Bangalore, India
| |
Collapse
|
38
|
López-Blanco JR, Chacón P. KORP: knowledge-based 6D potential for fast protein and loop modeling. Bioinformatics 2019; 35:3013-3019. [DOI: 10.1093/bioinformatics/btz026] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2018] [Revised: 01/03/2019] [Accepted: 01/08/2019] [Indexed: 12/18/2022] Open
Abstract
Abstract
Motivation
Knowledge-based statistical potentials constitute a simpler and easier alternative to physics-based potentials in many applications, including folding, docking and protein modeling. Here, to improve the effectiveness of the current approximations, we attempt to capture the six-dimensional nature of residue–residue interactions from known protein structures using a simple backbone-based representation.
Results
We have developed KORP, a knowledge-based pairwise potential for proteins that depends on the relative position and orientation between residues. Using a minimalist representation of only three backbone atoms per residue, KORP utilizes a six-dimensional joint probability distribution to outperform state-of-the-art statistical potentials for native structure recognition and best model selection in recent critical assessment of protein structure prediction and loop-modeling benchmarks. Compared with the existing methods, our side-chain independent potential has a lower complexity and better efficiency. The superior accuracy and robustness of KORP represent a promising advance for protein modeling and refinement applications that require a fast but highly discriminative energy function.
Availability and implementation
http://chaconlab.org/modeling/korp.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- José Ramón López-Blanco
- Department of Biological Chemical Physics, Rocasolano Institute of Physical Chemistry C.S.I.C, Madrid, Spain
| | - Pablo Chacón
- Department of Biological Chemical Physics, Rocasolano Institute of Physical Chemistry C.S.I.C, Madrid, Spain
| |
Collapse
|
39
|
Harris MJ, Raghavan D, Borysik AJ. Quantitative Evaluation of Native Protein Folds and Assemblies by Hydrogen Deuterium Exchange Mass Spectrometry (HDX-MS). JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2019; 30:58-66. [PMID: 30280315 PMCID: PMC6318237 DOI: 10.1007/s13361-018-2070-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Revised: 09/14/2018] [Accepted: 09/14/2018] [Indexed: 06/08/2023]
Abstract
Hydrogen deuterium exchange mass spectrometry (HDX-MS) has significant potential for protein structure initiatives but its relationship with protein conformations is unclear. We report on the efficacy of HDX-MS to distinguish between native and non-native proteins using a popular approach to calculate HDX protection factors (PFs) from protein structures. The ability of HDX-MS to identify native protein conformations is quantified by binary structural classification such that merits of the approach for protein modelling can be quantified and better understood. We show that highly accurate PF calculations are not a prerequisite for HDX-MS simulations that are capable of effectively discriminating between native and non-native protein folds. The simulations can also be performed directly on unique structures facilitating high-throughput evaluation of many alternate conformations. The ability of HDX-MS to classify the conformations of homo-protein assemblies is also investigated. In contrast to protein monomers, we show a significant lack of correspondence between the simulated and experimental HDX-MS data for these systems with a subsequent decrease in the ability of HDX-MS to identify native states. However, we demonstrate surprisingly high diagnostic ability of the simulated data for assemblies in which a significant proportion of the individual chains occupy protein-protein interfaces. We relate this to the number of peptides that can sample alternate subunit orientations and discuss these observations within the larger context of applying HDX-MS to evaluate protein structures. Graphical Abstract.
Collapse
Affiliation(s)
- Matthew J Harris
- Department of Chemistry, King's College London, Britannia House, London, SE1 1DB, UK
| | - Deepika Raghavan
- Department of Chemistry, King's College London, Britannia House, London, SE1 1DB, UK
| | - Antoni J Borysik
- Department of Chemistry, King's College London, Britannia House, London, SE1 1DB, UK.
| |
Collapse
|
40
|
Mulnaes D, Gohlke H. TopScore: Using Deep Neural Networks and Large Diverse Data Sets for Accurate Protein Model Quality Assessment. J Chem Theory Comput 2018; 14:6117-6126. [DOI: 10.1021/acs.jctc.8b00690] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Daniel Mulnaes
- Department of Mathematics and Natural Sciences, Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, Universitätsstrasse 1, 40225 Düsseldorf, Germany
| | - Holger Gohlke
- Department of Mathematics and Natural Sciences, Institute for Pharmaceutical and Medicinal Chemistry, Heinrich Heine University Düsseldorf, Universitätsstrasse 1, 40225 Düsseldorf, Germany
- John von Neumann
Institute for Computing (NIC), Jülich Supercomputing Centre
(JSC) & Institute for Complex Systems - Structural Biochemistry
(ICS 6), Forschungszentrum Jülich GmbH, Jülich, Germany
| |
Collapse
|
41
|
Hayashi T, Inoue M, Yasuda S, Petretto E, Škrbić T, Giacometti A, Kinoshita M. Universal effects of solvent species on the stabilized structure of a protein. J Chem Phys 2018; 149:045105. [PMID: 30068177 DOI: 10.1063/1.5042111] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We investigate the effects of solvent specificities on the stability of the native structure (NS) of a protein on the basis of our free-energy function (FEF). We use CPB-bromodomain (CBP-BD) and apoplastocyanin (apoPC) as representatives of the protein universe and water, methanol, ethanol, and cyclohexane as solvents. The NSs of CBP-BD and apoPC consist of 66% α-helices and of 35% β-sheets and 4% α-helices, respectively. In order to assess the structural stability of a given protein immersed in each solvent, we contrast the FEF of its NS against that of a number of artificially created, misfolded decoys possessing the same amino-acid sequence but significantly different topology and α-helix and β-sheet contents. In the FEF, we compute the solvation entropy using the morphometric approach combined with the integral equation theories, and the change in electrostatic (ES) energy upon the folding is obtained by an explicit atomistic but simplified calculation. The ES energy change is represented by the break of protein-solvent hydrogen bonds (HBs), formation of protein intramolecular HBs, and recovery of solvent-solvent HBs. Protein-solvent and solvent-solvent HBs are absent in cyclohexane. We are thus able to separately evaluate the contributions to the structural stability from the entropic and energetic components. We find that for both CBP-BD and apoPC, the energetic component dominates in methanol, ethanol, and cyclohexane, with the most stable structures in these solvents sharing the same characteristics described as an association of α-helices. In particular, those in the two alcohols are identical. In water, the entropic component is as strong as or even stronger than the energetic one, with a large gain of translational, configurational entropy of water becoming crucially important so that the relative contents of α-helix and β-sheet and the content of total secondary structures are carefully selected to achieve sufficiently close packing of side chains. If the energetic component is excluded for a protein in water, the priority is given to closest side-chain packing, giving rise to the formation of a structure with very low α-helix and β-sheet contents. Our analysis, which requires minimal computational effort, can be applied to any protein immersed in any solvent and provides robust predictions that are quite consistent with the experimental observations for proteins in different solvent environments, thus paving the way toward a more detailed understanding of the folding process.
Collapse
Affiliation(s)
- Tomohiko Hayashi
- Institute of Advanced Energy, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Masao Inoue
- Institute of Advanced Energy, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Satoshi Yasuda
- Institute of Advanced Energy, Kyoto University, Uji, Kyoto 611-0011, Japan
| | - Emanuele Petretto
- Dipartimento di Scienze Molecolari e Nanosistemi, Università Ca' Foscari Venezia, Edificio Alfa Campus Scientifico, Via Torino 155, Venezia-Mestre I-3010, Italy
| | - Tatjana Škrbić
- Dipartimento di Scienze Molecolari e Nanosistemi, Università Ca' Foscari Venezia, Edificio Alfa Campus Scientifico, Via Torino 155, Venezia-Mestre I-3010, Italy
| | - Achille Giacometti
- Dipartimento di Scienze Molecolari e Nanosistemi, Università Ca' Foscari Venezia, Edificio Alfa Campus Scientifico, Via Torino 155, Venezia-Mestre I-3010, Italy
| | - Masahiro Kinoshita
- Institute of Advanced Energy, Kyoto University, Uji, Kyoto 611-0011, Japan
| |
Collapse
|
42
|
Deng H, Jia Y, Zhang Y. Protein structure prediction. INTERNATIONAL JOURNAL OF MODERN PHYSICS. B 2018; 32:1840009. [PMID: 30853739 PMCID: PMC6407873 DOI: 10.1142/s021797921840009x] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Predicting 3D structure of protein from its amino acid sequence is one of the most important unsolved problems in biophysics and computational biology. This paper attempts to give a comprehensive introduction of the most recent effort and progress on protein structure prediction. Following the general flowchart of structure prediction, related concepts and methods are presented and discussed. Moreover, brief introductions are made to several widely-used prediction methods and the community-wide critical assessment of protein structure prediction (CASP) experiments.
Collapse
Affiliation(s)
- Haiyou Deng
- College of Science, Huazhong Agricultural University, Wuhan 4R0070, P. R. China
| | - Ya Jia
- College of Physical Science and Technology, Central China Normal University, Wuhan 430079, P. R. China
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 45108, USA
| |
Collapse
|
43
|
Virtanen JJ, Zhang Y. MR-REX: molecular replacement by cooperative conformational search and occupancy optimization on low-accuracy protein models. Acta Crystallogr D Struct Biol 2018; 74:606-620. [PMID: 29968671 PMCID: PMC6038387 DOI: 10.1107/s2059798318005612] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2017] [Accepted: 04/10/2018] [Indexed: 11/10/2022] Open
Abstract
Molecular replacement (MR) has commonly been employed to derive the phase information in protein crystal X-ray diffraction, but its success rate decreases rapidly when the search model is dissimilar to the target. MR-REX has been developed to perform an MR search by replica-exchange Monte Carlo simulations, which enables cooperative rotation and translation searches and simultaneous clash and occupancy optimization. MR-REX was tested on a set of 1303 protein structures of different accuracies and successfully placed 699 structures at positions that have an r.m.s.d. of below 2 Å to the target position, which is 10% higher than was obtained by Phaser. However, cases studies show that many of the models for which Phaser failed and MR-REX succeeded can be solved by Phaser by pruning them and using nondefault parameters. The factors effecting success and the parts of the methodology which lead to success are studied. The results demonstrate a new avenue for molecular replacement which outperforms (and has results that are complementary to) the state-of-the-art MR methods, in particular for distantly homologous proteins.
Collapse
Affiliation(s)
- Jouko J. Virtanen
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
44
|
Derevyanko G, Grudinin S, Bengio Y, Lamoureux G. Deep convolutional networks for quality assessment of protein folds. Bioinformatics 2018; 34:4046-4053. [DOI: 10.1093/bioinformatics/bty494] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2017] [Accepted: 06/15/2018] [Indexed: 11/14/2022] Open
Affiliation(s)
- Georgy Derevyanko
- Department of Chemistry and Biochemistry and Centre for Research in Molecular Modeling (CERMM), Concordia University, Montréal, Québec, Canada
| | - Sergei Grudinin
- Inria, Université Grenoble Alpes, CNRS, Grenoble INP, LJK, Grenoble, France
| | - Yoshua Bengio
- Department of Computer Science and Operations Research, Université de Montréal, Montréal, Québec, Canada
| | - Guillaume Lamoureux
- Department of Chemistry and Biochemistry and Centre for Research in Molecular Modeling (CERMM), Concordia University, Montréal, Québec, Canada
| |
Collapse
|
45
|
Postic G, Hamelryck T, Chomilier J, Stratmann D. MyPMFs: a simple tool for creating statistical potentials to assess protein structural models. Biochimie 2018; 151:37-41. [PMID: 29857183 DOI: 10.1016/j.biochi.2018.05.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2018] [Accepted: 05/25/2018] [Indexed: 01/18/2023]
Abstract
Evaluating the model quality of protein structures that evolve in environments with particular physicochemical properties requires scoring functions that are adapted to their specific residue compositions and/or structural characteristics. Thus, computational methods developed for structures from the cytosol cannot work properly on membrane or secreted proteins. Here, we present MyPMFs, an easy-to-use tool that allows users to train statistical potentials of mean force (PMFs) on the protein structures of their choice, with all parameters being adjustable. We demonstrate its use by creating an accurate statistical potential for transmembrane protein domains. We also show its usefulness to study the influence of the physical environment on residue interactions within protein structures. Our open-source software is freely available for download at https://github.com/bibip-impmc/mypmfs.
Collapse
Affiliation(s)
- Guillaume Postic
- Sorbonne Université, UMR 7590 CNRS, MNHN, IRD, Institut de Minéralogie de Physique des Matériaux et de Cosmochimie (IMPMC), Paris, France.
| | - Thomas Hamelryck
- Bioinformatics Centre, Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark; Image Section, Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Jacques Chomilier
- Sorbonne Université, UMR 7590 CNRS, MNHN, IRD, Institut de Minéralogie de Physique des Matériaux et de Cosmochimie (IMPMC), Paris, France
| | - Dirk Stratmann
- Sorbonne Université, UMR 7590 CNRS, MNHN, IRD, Institut de Minéralogie de Physique des Matériaux et de Cosmochimie (IMPMC), Paris, France
| |
Collapse
|
46
|
Manavalan B, Lee J. SVMQA: support-vector-machine-based protein single-model quality assessment. Bioinformatics 2018; 33:2496-2503. [PMID: 28419290 DOI: 10.1093/bioinformatics/btx222] [Citation(s) in RCA: 130] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2016] [Accepted: 04/12/2017] [Indexed: 01/03/2023] Open
Abstract
Motivation The accurate ranking of predicted structural models and selecting the best model from a given candidate pool remain as open problems in the field of structural bioinformatics. The quality assessment (QA) methods used to address these problems can be grouped into two categories: consensus methods and single-model methods. Consensus methods in general perform better and attain higher correlation between predicted and true quality measures. However, these methods frequently fail to generate proper quality scores for native-like structures which are distinct from the rest of the pool. Conversely, single-model methods do not suffer from this drawback and are better suited for real-life applications where many models from various sources may not be readily available. Results In this study, we developed a support-vector-machine-based single-model global quality assessment (SVMQA) method. For a given protein model, the SVMQA method predicts TM-score and GDT_TS score based on a feature vector containing statistical potential energy terms and consistency-based terms between the actual structural features (extracted from the three-dimensional coordinates) and predicted values (from primary sequence). We trained SVMQA using CASP8, CASP9 and CASP10 targets and determined the machine parameters by 10-fold cross-validation. We evaluated the performance of our SVMQA method on various benchmarking datasets. Results show that SVMQA outperformed the existing best single-model QA methods both in ranking provided protein models and in selecting the best model from the pool. According to the CASP12 assessment, SVMQA was the best method in selecting good-quality models from decoys in terms of GDTloss. Availability and implementation SVMQA method can be freely downloaded from http://lee.kias.re.kr/SVMQA/SVMQA_eval.tar.gz. Contact jlee@kias.re.kr. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Balachandran Manavalan
- Center for In Silico Protein Science and School of Computational Sciences, Korea Institute for Advanced Study, Seoul 130-722, Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science and School of Computational Sciences, Korea Institute for Advanced Study, Seoul 130-722, Korea
| |
Collapse
|
47
|
Wang X, Zhang D, Huang SY. New Knowledge-Based Scoring Function with Inclusion of Backbone Conformational Entropies from Protein Structures. J Chem Inf Model 2018; 58:724-732. [DOI: 10.1021/acs.jcim.7b00601] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Affiliation(s)
- Xinxiang Wang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Di Zhang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| | - Sheng-You Huang
- School of Physics, Huazhong University of Science and Technology, Wuhan, Hubei 430074, P. R. China
| |
Collapse
|
48
|
dos Santos RN, Ferrari AJR, de Jesus HCR, Gozzo FC, Morcos F, Martínez L. Enhancing protein fold determination by exploring the complementary information of chemical cross-linking and coevolutionary signals. Bioinformatics 2018; 34:2201-2208. [DOI: 10.1093/bioinformatics/bty074] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Accepted: 02/10/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Ricardo N dos Santos
- Institute of Chemistry, University of Campinas, Campinas, Brazil
- Center for Computational Engineering and Sciences, University of Campinas, Campinas, Brazil
| | | | | | - Fábio C Gozzo
- Institute of Chemistry, University of Campinas, Campinas, Brazil
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, USA
| | - Leandro Martínez
- Institute of Chemistry, University of Campinas, Campinas, Brazil
- Center for Computational Engineering and Sciences, University of Campinas, Campinas, Brazil
| |
Collapse
|
49
|
Mirzaie M. Hydrophobic residues can identify native protein structures. Proteins 2018; 86:467-474. [DOI: 10.1002/prot.25466] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2017] [Revised: 12/28/2017] [Accepted: 01/23/2018] [Indexed: 11/06/2022]
Affiliation(s)
- Mehdi Mirzaie
- Department of Applied Mathematics, Faculty of Mathematical Sciences; Tarbiat Modares University, Jalal Ale Ahmad Highway; Tehran Iran
- School of Biological Sciences; Institute for Research in Fundamental Sciences (IPM); Tehran Iran
| |
Collapse
|
50
|
Yao Y, Gui R, Liu Q, Yi M, Deng H. Diverse effects of distance cutoff and residue interval on the performance of distance-dependent atom-pair potential in protein structure prediction. BMC Bioinformatics 2017; 18:542. [PMID: 29221443 PMCID: PMC5723101 DOI: 10.1186/s12859-017-1983-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 12/04/2017] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND As one of the most successful knowledge-based energy functions, the distance-dependent atom-pair potential is widely used in all aspects of protein structure prediction, including conformational search, model refinement, and model assessment. During the last two decades, great efforts have been made to improve the reference state of the potential, while other factors that also strongly affect the performance of the potential have been relatively less investigated. RESULTS Based on different distance cutoffs (from 5 to 22 Å) and residue intervals (from 0 to 15) as well as six different reference states, we constructed a series of distance-dependent atom-pair potentials and tested them on several groups of structural decoy sets collected from diverse sources. A comprehensive investigation has been performed to clarify the effects of distance cutoff and residue interval on the potential's performance. Our results provide a new perspective as well as a practical guidance for optimizing distance-dependent statistical potentials. CONCLUSIONS The optimal distance cutoff and residue interval are highly related with the reference state that the potential is based on, the measurements of the potential's performance, and the decoy sets that the potential is applied to. The performance of distance-dependent statistical potential can be significantly improved when the best statistical parameters for the specific application environment are adopted.
Collapse
Affiliation(s)
- Yuangen Yao
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070 China
| | - Rong Gui
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070 China
| | - Quan Liu
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070 China
| | - Ming Yi
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070 China
| | - Haiyou Deng
- Department of Physics, College of Science, Huazhong Agricultural University, Wuhan, 430070 China
- Institute of Applied Physics, Huazhong Agricultural University, Wuhan, 430070 China
| |
Collapse
|