1
|
Fasoulis R, Rigo MM, Lizée G, Antunes DA, Kavraki LE. APE-Gen2.0: Expanding Rapid Class I Peptide-Major Histocompatibility Complex Modeling to Post-Translational Modifications and Noncanonical Peptide Geometries. J Chem Inf Model 2024; 64:1730-1750. [PMID: 38415656 PMCID: PMC10936522 DOI: 10.1021/acs.jcim.3c01667] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Revised: 02/14/2024] [Accepted: 02/15/2024] [Indexed: 02/29/2024]
Abstract
The recognition of peptides bound to class I major histocompatibility complex (MHC-I) receptors by T-cell receptors (TCRs) is a determinant of triggering the adaptive immune response. While the exact molecular features that drive the TCR recognition are still unknown, studies have suggested that the geometry of the joint peptide-MHC (pMHC) structure plays an important role. As such, there is a definite need for methods and tools that accurately predict the structure of the peptide bound to the MHC-I receptor. In the past few years, many pMHC structural modeling tools have emerged that provide high-quality modeled structures in the general case. However, there are numerous instances of non-canonical cases in the immunopeptidome that the majority of pMHC modeling tools do not attend to, most notably, peptides that exhibit non-standard amino acids and post-translational modifications (PTMs) or peptides that assume non-canonical geometries in the MHC binding cleft. Such chemical and structural properties have been shown to be present in neoantigens; therefore, accurate structural modeling of these instances can be vital for cancer immunotherapy. To this end, we have developed APE-Gen2.0, a tool that improves upon its predecessor and other pMHC modeling tools, both in terms of modeling accuracy and the available modeling range of non-canonical peptide cases. Some of the improvements include (i) the ability to model peptides that have different types of PTMs such as phosphorylation, nitration, and citrullination; (ii) a new and improved anchor identification routine in order to identify and model peptides that exhibit a non-canonical anchor conformation; and (iii) a web server that provides a platform for easy and accessible pMHC modeling. We further show that structures predicted by APE-Gen2.0 can be used to assess the effects that PTMs have in binding affinity in a more accurate manner than just using solely the sequence of the peptide. APE-Gen2.0 is freely available at https://apegen.kavrakilab.org.
Collapse
Affiliation(s)
- Romanos Fasoulis
- Department
of Computer Science, Rice University, Houston, Texas 77005, United States
| | - Mauricio M. Rigo
- Department
of Computer Science, Rice University, Houston, Texas 77005, United States
| | - Gregory Lizée
- Department
of Melanoma Medical Oncology—Research, The University of Texas MD Anderson Cancer Center, Houston, Texas 77054, United States
| | - Dinler A. Antunes
- Department
of Biology and Biochemistry, University
of Houston, Houston, Texas 77004, United States
| | - Lydia E. Kavraki
- Department
of Computer Science, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
2
|
Liang S, Zhang C, Zhu M. Ab Initio Prediction of 3-D Conformations for Protein Long Loops with High Accuracy and Applications to Antibody CDRH3 Modeling. J Chem Inf Model 2023; 63:7568-7577. [PMID: 38018130 DOI: 10.1021/acs.jcim.3c01051] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Residue-level potentials of mean force were widely used for protein backbone refinements to avoid simultaneous sampling of side-chain conformations. The interaction energy between the reduced side chains and backbone atoms was not considered explicitly. In this study, we developed novel methods to calculate the residue-atom interaction energy in combination with atomic and residue-level terms. The parameters were optimized step by step to remove the overcounting or overlap problem between different energy terms. The mixing energy functions were then used to evaluate the generated backbone conformations at the initial sampling stage of protein loop modeling (OSCAR-loop), including the interaction energy between the reduced loop residues and full atoms of the protein framework. The accuracies of top-ranked decoys were 1.18 and 2.81 Å for 8-residue and 12-residue loops, respectively. We then selected diverse decoys for side-chain modeling, backbone refinement, and energy minimization. The procedure was repeated multiple times to select one prediction with the lowest energy. Consequently, we obtained an accuracy of 0.74 Å for a prevailing test set of 12-residue loops, compared with >1.4 Å reported by other researchers. The OSCAR-loop was also effective for modeling the H3 loops of antibody complementary determining regions (CDRs) in the crystal environment. The prediction accuracy of OSCAR-loop (1.74 Å) was better than the accuracy of the Rosetta NGK method (3.11 Å) or those achieved by deep learning methods (>2.2 Å) for the CDRH3 loops of 49 targets in the Rosetta antibody benchmark. The performance of OSCAR-loop in a model environment was also discussed.
Collapse
Affiliation(s)
- Shide Liang
- Department of Computational Biology, 20n Bio Limited, Hangzhou 310018, P. R. China
- Department of Research and Development, Bio-Thera Solutions, Guangzhou 510530, P. R. China
| | - Chi Zhang
- School of Biological Sciences, University of Nebraska, Lincoln, Nebraska 68588, United States
| | - Mingfu Zhu
- Department of Computational Biology, 20n Bio Limited, Hangzhou 310018, P. R. China
| |
Collapse
|
3
|
Wang T, Wang L, Zhang X, Shen C, Zhang O, Wang J, Wu J, Jin R, Zhou D, Chen S, Liu L, Wang X, Hsieh CY, Chen G, Pan P, Kang Y, Hou T. Comprehensive assessment of protein loop modeling programs on large-scale datasets: prediction accuracy and efficiency. Brief Bioinform 2023; 25:bbad486. [PMID: 38171930 PMCID: PMC10764206 DOI: 10.1093/bib/bbad486] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2023] [Revised: 12/04/2023] [Accepted: 12/05/2023] [Indexed: 01/05/2024] Open
Abstract
Protein loops play a critical role in the dynamics of proteins and are essential for numerous biological functions, and various computational approaches to loop modeling have been proposed over the past decades. However, a comprehensive understanding of the strengths and weaknesses of each method is lacking. In this work, we constructed two high-quality datasets (i.e. the General dataset and the CASP dataset) and systematically evaluated the accuracy and efficiency of 13 commonly used loop modeling approaches from the perspective of loop lengths, protein classes and residue types. The results indicate that the knowledge-based method FREAD generally outperforms the other tested programs in most cases, but encountered challenges when predicting loops longer than 15 and 30 residues on the CASP and General datasets, respectively. The ab initio method Rosetta NGK demonstrated exceptional modeling accuracy for short loops with four to eight residues and achieved the highest success rate on the CASP dataset. The well-known AlphaFold2 and RoseTTAFold require more resources for better performance, but they exhibit promise for predicting loops longer than 16 and 30 residues in the CASP and General datasets. These observations can provide valuable insights for selecting suitable methods for specific loop modeling tasks and contribute to future advancements in the field.
Collapse
Affiliation(s)
- Tianyue Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Langcheng Wang
- Department of Pathology, New York University Medical Center, 550 First Avenue, New York, NY 10016, USA
| | - Xujun Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Chao Shen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Odin Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jike Wang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Jialu Wu
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Ruofan Jin
- College of Life Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Donghao Zhou
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, Guangdong, China
| | - Shicheng Chen
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Liwei Liu
- Advanced Computing and Storage Laboratory, Central Research Institute, 2012 Laboratories, Huawei Technologies Co., Ltd., Shenzhen 518129, Guangdong, China
| | - Xiaorui Wang
- State Key Laboratory of Quality Research in Chinese Medicines, Macau University of Science and Technology, Macao, China
| | - Chang-Yu Hsieh
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Guangyong Chen
- Zhejiang Lab, Zhejiang University, Hangzhou 311121, Zhejiang, China
| | - Peichen Pan
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Yu Kang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| | - Tingjun Hou
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, Zhejiang, China
| |
Collapse
|
4
|
Lopez-Robles C, Scaramuzza S, Astorga-Simon EN, Ishida M, Williamson CD, Baños-Mateos S, Gil-Carton D, Romero-Durana M, Vidaurrazaga A, Fernandez-Recio J, Rojas AL, Bonifacino JS, Castaño-Díez D, Hierro A. Architecture of the ESCPE-1 membrane coat. Nat Struct Mol Biol 2023; 30:958-969. [PMID: 37322239 PMCID: PMC10352136 DOI: 10.1038/s41594-023-01014-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2022] [Accepted: 05/05/2023] [Indexed: 06/17/2023]
Abstract
Recycling of membrane proteins enables the reuse of receptors, ion channels and transporters. A key component of the recycling machinery is the endosomal sorting complex for promoting exit 1 (ESCPE-1), which rescues transmembrane proteins from the endolysosomal pathway for transport to the trans-Golgi network and the plasma membrane. This rescue entails the formation of recycling tubules through ESCPE-1 recruitment, cargo capture, coat assembly and membrane sculpting by mechanisms that remain largely unknown. Herein, we show that ESCPE-1 has a single-layer coat organization and suggest how synergistic interactions between ESCPE-1 protomers, phosphoinositides and cargo molecules result in a global arrangement of amphipathic helices to drive tubule formation. Our results thus define a key process of tubule-based endosomal sorting.
Collapse
Affiliation(s)
| | | | | | - Morié Ishida
- Neurosciences and Cellular and Structural Biology Division, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | - Chad D Williamson
- Neurosciences and Cellular and Structural Biology Division, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA
| | | | - David Gil-Carton
- CIC bioGUNE, Derio, Spain
- Ikerbasque, Basque Foundation for Science, Bilbao, Spain
- BREM Basque Resource for Electron Microscopy, Leioa, Spain
| | - Miguel Romero-Durana
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC-Universidad de La Rioja-Gobierno de La Rioja, Logroño, Spain
| | | | - Juan Fernandez-Recio
- Barcelona Supercomputing Center (BSC), Barcelona, Spain
- Instituto de Ciencias de la Vid y del Vino (ICVV), CSIC-Universidad de La Rioja-Gobierno de La Rioja, Logroño, Spain
| | | | - Juan S Bonifacino
- Neurosciences and Cellular and Structural Biology Division, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, MD, USA.
| | - Daniel Castaño-Díez
- BioEM Lab, Biozentrum, University of Basel, Basel, Switzerland.
- Instituto Biofisika (UPV/EHU, CSIC), University of the Basque Country, Leioa, Spain.
| | - Aitor Hierro
- CIC bioGUNE, Derio, Spain.
- Ikerbasque, Basque Foundation for Science, Bilbao, Spain.
| |
Collapse
|
5
|
Hernández IM, Dehouck Y, Bastolla U, López-Blanco JR, Chacón P. Predicting protein stability changes upon mutation using a simple orientational potential. Bioinformatics 2023; 39:6984713. [PMID: 36629451 PMCID: PMC9850275 DOI: 10.1093/bioinformatics/btad011] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 11/17/2022] [Accepted: 01/10/2023] [Indexed: 01/12/2023] Open
Abstract
MOTIVATION Structure-based stability prediction upon mutation is crucial for protein engineering and design, and for understanding genetic diseases or drug resistance events. For this task, we adopted a simple residue-based orientational potential that considers only three backbone atoms, previously applied in protein modeling. Its application to stability prediction only requires parametrizing 12 amino acid-dependent weights using cross-validation strategies on a curated dataset in which we tried to reduce the mutations that belong to protein-protein or protein-ligand interfaces, extreme conditions and the alanine over-representation. RESULTS Our method, called KORPM, accurately predicts mutational effects on an independent benchmark dataset, whether the wild-type or mutated structure is used as starting point. Compared with state-of-the-art methods on this balanced dataset, our approach obtained the lowest root mean square error (RMSE) and the highest correlation between predicted and experimental ΔΔG measures, as well as better receiver operating characteristics and precision-recall curves. Our method is almost anti-symmetric by construction, and it performs thus similarly for the direct and reverse mutations with the corresponding wild-type and mutated structures. Despite the strong limitations of the available experimental mutation data in terms of size, variability, and heterogeneity, we show competitive results with a simple sum of energy terms, which is more efficient and less prone to overfitting. AVAILABILITY AND IMPLEMENTATION https://github.com/chaconlab/korpm. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Iván Martín Hernández
- Department of Biological Physical Chemistry, Rocasolano Institute of Physical Chemistry, CSIC, 28006 Madrid, Spain
| | - Yves Dehouck
- Bioinformatic Unit, Centro de Biología Molecular “Severo Ochoa,” CSIC-UAM Cantoblanco, Madrid 28049, Spain
| | - Ugo Bastolla
- Bioinformatic Unit, Centro de Biología Molecular “Severo Ochoa,” CSIC-UAM Cantoblanco, Madrid 28049, Spain
| | - José Ramón López-Blanco
- Department of Biological Physical Chemistry, Rocasolano Institute of Physical Chemistry, CSIC, 28006 Madrid, Spain
| | | |
Collapse
|
6
|
Holland J, Grigoryan G. Structure‐conditioned amino‐acid couplings: how contact geometry affects pairwise sequence preferences. Protein Sci 2022; 31:900-917. [PMID: 35060221 PMCID: PMC8927866 DOI: 10.1002/pro.4280] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2021] [Revised: 01/06/2022] [Accepted: 01/12/2022] [Indexed: 11/11/2022]
Abstract
Relating a protein's sequence to its conformation is a central challenge for both structure prediction and sequence design. Statistical contact potentials, as well as their more descriptive versions that account for side‐chain orientation and other geometric descriptors, have served as simplistic but useful means of representing second‐order contributions in sequence–structure relationships. Here we ask what happens when a pairwise potential is conditioned on the fully defined geometry of interacting backbones fragments. We show that the resulting structure‐conditioned coupling energies more accurately reflect pair preferences as a function of structural contexts. These structure‐conditioned energies more reliably encode native sequence information and more highly correlate with experimentally determined coupling energies. Clustering a database of interaction motifs by structure results in ensembles of similar energies and clustering them by energy results in ensembles of similar structures. By comparing many pairs of interaction motifs and showing that structural similarity and energetic similarity go hand‐in‐hand, we provide a tangible link between modular sequence and structure elements. This link is applicable to structural modeling, and we show that scoring CASP models with structured‐conditioned energies results in substantially higher correlation with structural quality than scoring the same models with a contact potential. We conclude that structure‐conditioned coupling energies are a good way to model the impact of interaction geometry on second‐order sequence preferences.
Collapse
Affiliation(s)
- Jack Holland
- Department of Computer Science Dartmouth College Hanover New Hampshire USA
| | - Gevorg Grigoryan
- Department of Computer Science Dartmouth College Hanover New Hampshire USA
| |
Collapse
|
7
|
Bueno-Carrasco MT, Cuéllar J, Flydal MI, Santiago C, Kråkenes TA, Kleppe R, López-Blanco JR, Marcilla M, Teigen K, Alvira S, Chacón P, Martinez A, Valpuesta JM. Structural mechanism for tyrosine hydroxylase inhibition by dopamine and reactivation by Ser40 phosphorylation. Nat Commun 2022; 13:74. [PMID: 35013193 PMCID: PMC8748767 DOI: 10.1038/s41467-021-27657-y] [Citation(s) in RCA: 46] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2020] [Accepted: 12/03/2021] [Indexed: 12/15/2022] Open
Abstract
Tyrosine hydroxylase (TH) catalyzes the rate-limiting step in the biosynthesis of dopamine (DA) and other catecholamines, and its dysfunction leads to DA deficiency and parkinsonisms. Inhibition by catecholamines and reactivation by S40 phosphorylation are key regulatory mechanisms of TH activity and conformational stability. We used Cryo-EM to determine the structures of full-length human TH without and with DA, and the structure of S40 phosphorylated TH, complemented with biophysical and biochemical characterizations and molecular dynamics simulations. TH presents a tetrameric structure with dimerized regulatory domains that are separated 15 Å from the catalytic domains. Upon DA binding, a 20-residue α-helix in the flexible N-terminal tail of the regulatory domain is fixed in the active site, blocking it, while S40-phosphorylation forces its egress. The structures reveal the molecular basis of the inhibitory and stabilizing effects of DA and its counteraction by S40-phosphorylation, key regulatory mechanisms for homeostasis of DA and TH. Tyrosine hydroxylase (TH) catalyzes the rate-limiting step in the synthesis of the catecholamine neurotransmitters and hormones dopamine (DA), adrenaline and noradrenaline. Here, the authors present the cryo-EM structures of full-length human TH in the apo form and bound with DA, as well as the structure of Ser40 phosphorylated TH, and discuss the inhibitory and stabilizing effects of DA on TH and its counteraction by Ser40-phosphorylation.
Collapse
Affiliation(s)
| | - Jorge Cuéllar
- Centro Nacional de Biotecnología (CNB-CSIC), Madrid, Spain.
| | - Marte I Flydal
- Department of Biomedicine, University of Bergen, Bergen, Norway
| | - César Santiago
- Centro Nacional de Biotecnología (CNB-CSIC), Madrid, Spain
| | | | - Rune Kleppe
- Norwegian Centre for Maritime and Diving Medicine, Department of Occupational Medicine, Haukeland University Hospital, Bergen, Norway
| | | | | | - Knut Teigen
- Department of Biomedicine, University of Bergen, Bergen, Norway
| | - Sara Alvira
- Centro Nacional de Biotecnología (CNB-CSIC), Madrid, Spain.,School of Biochemistry, University of Bristol, Bristol, BS8 1TD, UK
| | - Pablo Chacón
- Instituto de Química Física Rocasolano (IQFR-CSIC), Madrid, Spain
| | - Aurora Martinez
- Department of Biomedicine, University of Bergen, Bergen, Norway.
| | | |
Collapse
|
8
|
Barozet A, Chacón P, Cortés J. Current approaches to flexible loop modeling. Curr Res Struct Biol 2021; 3:187-191. [PMID: 34409304 PMCID: PMC8361254 DOI: 10.1016/j.crstbi.2021.07.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 06/30/2021] [Accepted: 07/25/2021] [Indexed: 01/14/2023] Open
Abstract
Loops are key components of protein structures, involved in many biological functions. Due to their conformational variability, the structural investigation of loops is a difficult topic, requiring a combination of experimental and computational methods. This paper provides a brief overview of current computational approaches to flexible loop modeling, and presents the main ingredients of the most standard protocols. Despite great progress in recent years, accurately modeling the conformational variability of long flexible loops remains a challenging problem. Future advances in this field will likely come from a tight coupling of experimental and computational techniques, which would enable a better understanding of the relationships between loop sequence, structural flexibility, and functional roles. In fine, accurate loop modeling will open the road to loop design problems of interest for applications in biomedicine and biotechnology.
Collapse
Affiliation(s)
- Amélie Barozet
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| | - Pablo Chacón
- Department of Biological Physical Chemistry, Rocasolano Physical Chemistry Institute C.S.I.C., Madrid, Spain
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| |
Collapse
|
9
|
Kadukova M, Machado KDS, Chacón P, Grudinin S. KORP-PL: a coarse-grained knowledge-based scoring function for protein-ligand interactions. Bioinformatics 2021; 37:943-950. [PMID: 32840574 DOI: 10.1093/bioinformatics/btaa748] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 07/27/2020] [Accepted: 08/18/2020] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Despite the progress made in studying protein-ligand interactions and the widespread application of docking and affinity prediction tools, improving their precision and efficiency still remains a challenge. Computational approaches based on the scoring of docking conformations with statistical potentials constitute a popular alternative to more accurate but costly physics-based thermodynamic sampling methods. In this context, a minimalist and fast sidechain-free knowledge-based potential with a high docking and screening power can be very useful when screening a big number of putative docking conformations. RESULTS Here, we present a novel coarse-grained potential defined by a 3D joint probability distribution function that only depends on the pairwise orientation and position between protein backbone and ligand atoms. Despite its extreme simplicity, our approach yields very competitive results with the state-of-the-art scoring functions, especially in docking and screening tasks. For example, we observed a twofold improvement in the median 5% enrichment factor on the DUD-E benchmark compared to Autodock Vina results. Moreover, our results prove that a coarse sidechain-free potential is sufficient for a very successful docking pose prediction. AVAILABILITYAND IMPLEMENTATION The standalone version of KORP-PL with the corresponding tests and benchmarks are available at https://team.inria.fr/nano-d/korp-pl/ and https://chaconlab.org/modeling/korp-pl. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maria Kadukova
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France.,Research Center for Molecular Mechanisms of Aging and Age-Related Diseases, Moscow Institute of Physics and Technology, 141701 Dolgoprudniy, Russia
| | - Karina Dos Santos Machado
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France.,Computational Biology Laboratory, Centro de Ciências Computacionais, Universidade Federal do Rio Grande - FURG, Rio Grande, RS 96201-090, Brazil
| | - Pablo Chacón
- Department of Biological Physical Chemistry, Rocasolano Institute of Physical Chemistry C.S.I.C, Madrid 28006, Spain
| | - Sergei Grudinin
- Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP, LJK, 38000 Grenoble, France
| |
Collapse
|
10
|
Robustification of RosettaAntibody and Rosetta SnugDock. PLoS One 2021; 16:e0234282. [PMID: 33764990 PMCID: PMC7993800 DOI: 10.1371/journal.pone.0234282] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2020] [Accepted: 01/11/2021] [Indexed: 11/19/2022] Open
Abstract
In recent years, the observed antibody sequence space has grown exponentially due to advances in high-throughput sequencing of immune receptors. The rise in sequences has not been mirrored by a rise in structures, as experimental structure determination techniques have remained low-throughput. Computational modeling, however, has the potential to close the sequence–structure gap. To achieve this goal, computational methods must be robust, fast, easy to use, and accurate. Here we report on the latest advances made in RosettaAntibody and Rosetta SnugDock—methods for antibody structure prediction and antibody–antigen docking. We simplified the user interface, expanded and automated the template database, generalized the kinematics of antibody–antigen docking (which enabled modeling of single-domain antibodies) and incorporated new loop modeling techniques. To evaluate the effects of our updates on modeling accuracy, we developed rigorous tests under a new scientific benchmarking framework within Rosetta. Benchmarking revealed that more structurally similar templates could be identified in the updated database and that SnugDock broadened its applicability without losing accuracy. However, there are further advances to be made, including increasing the accuracy and speed of CDR-H3 loop modeling, before computational approaches can accurately model any antibody.
Collapse
|
11
|
Ruffolo JA, Guerra C, Mahajan SP, Sulam J, Gray JJ. Geometric potentials from deep learning improve prediction of CDR H3 loop structures. Bioinformatics 2021; 36:i268-i275. [PMID: 32657412 PMCID: PMC7355305 DOI: 10.1093/bioinformatics/btaa457] [Citation(s) in RCA: 49] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Motivation Antibody structure is largely conserved, except for a complementarity-determining region featuring six variable loops. Five of these loops adopt canonical folds which can typically be predicted with existing methods, while the remaining loop (CDR H3) remains a challenge due to its highly diverse set of observed conformations. In recent years, deep neural networks have proven to be effective at capturing the complex patterns of protein structure. This work proposes DeepH3, a deep residual neural network that learns to predict inter-residue distances and orientations from antibody heavy and light chain sequence. The output of DeepH3 is a set of probability distributions over distances and orientation angles between pairs of residues. These distributions are converted to geometric potentials and used to discriminate between decoy structures produced by RosettaAntibody and predict new CDR H3 loop structures de novo. Results When evaluated on the Rosetta antibody benchmark dataset of 49 targets, DeepH3-predicted potentials identified better, same and worse structures [measured by root-mean-squared distance (RMSD) from the experimental CDR H3 loop structure] than the standard Rosetta energy function for 33, 6 and 10 targets, respectively, and improved the average RMSD of predictions by 32.1% (1.4 Å). Analysis of individual geometric potentials revealed that inter-residue orientations were more effective than inter-residue distances for discriminating near-native CDR H3 loops. When applied to de novo prediction of CDR H3 loop structures, DeepH3 achieves an average RMSD of 2.2 ± 1.1 Å on the Rosetta antibody benchmark. Availability and Implementation DeepH3 source code and pre-trained model parameters are freely available at https://github.com/Graylab/deepH3-distances-orientations. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jeffrey A Ruffolo
- Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD 21218, USA
| | - Carlos Guerra
- Department of Computer Science, George Mason University, Fairfax, VA 22030, USA
| | - Sai Pooja Mahajan
- Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeremias Sulam
- Department of Biomedical Engineering, The Johns Hopkins University, Baltimore, MD 21218, USA.,Mathematical Institute for Data Science, The Johns Hopkins University, Baltimore, MD 21218, USA
| | - Jeffrey J Gray
- Program in Molecular Biophysics, The Johns Hopkins University, Baltimore, MD 21218, USA.,Department of Chemical and Biomolecular Engineering, The Johns Hopkins University, Baltimore, MD 21218, USA
| |
Collapse
|
12
|
Aguirre-Plans J, Meseguer A, Molina-Fernandez R, Marín-López MA, Jumde G, Casanova K, Bonet J, Fornes O, Fernandez-Fuentes N, Oliva B. SPServer: split-statistical potentials for the analysis of protein structures and protein-protein interactions. BMC Bioinformatics 2021; 22:4. [PMID: 33407073 PMCID: PMC7788957 DOI: 10.1186/s12859-020-03770-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Accepted: 09/20/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Statistical potentials, also named knowledge-based potentials, are scoring functions derived from empirical data that can be used to evaluate the quality of protein folds and protein-protein interaction (PPI) structures. In previous works we decomposed the statistical potentials in different terms, named Split-Statistical Potentials, accounting for the type of amino acid pairs, their hydrophobicity, solvent accessibility and type of secondary structure. These potentials have been successfully used to identify near-native structures in protein structure prediction, rank protein docking poses, and predict PPI binding affinities. RESULTS Here, we present the SPServer, a web server that applies the Split-Statistical Potentials to analyze protein folds and protein interfaces. SPServer provides global scores as well as residue/residue-pair profiles presented as score plots and maps. This level of detail allows users to: (1) identify potentially problematic regions on protein structures; (2) identify disrupting amino acid pairs in protein interfaces; and (3) compare and analyze the quality of tertiary and quaternary structural models. CONCLUSIONS While there are many web servers that provide scoring functions to assess the quality of either protein folds or PPI structures, SPServer integrates both aspects in a unique easy-to-use web server. Moreover, the server permits to locally assess the quality of the structures and interfaces at a residue level and provides tools to compare the local assessment between structures. SERVER ADDRESS: https://sbi.upf.edu/spserver/ .
Collapse
Grants
- BIO2017-85329-R (FEDER,UE) Ministerio de Economía, Industria y Competitividad, Gobierno de España
- BIO2017-83591-R(FEDER,UE Ministerio de Economía, Industria y Competitividad, Gobierno de España
- RYC-2015-17519 Ministerio de Economía, Industria y Competitividad, Gobierno de España
- MDM-2014-0370 Ministerio de Economía, Industria y Competitividad, Gobierno de España
- FI Agència de Gestió d'Ajuts Universitaris i de Recerca
- 2017 SGR 01020 Agència de Gestió d'Ajuts Universitaris i de Recerca
- PT13/0001/0023 Instituto de Salud Carlos III
- Agència de Gestió d’Ajuts Universitaris i de Recerca
Collapse
Affiliation(s)
- Joaquim Aguirre-Plans
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Alberto Meseguer
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Ruben Molina-Fernandez
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Manuel Alejandro Marín-López
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Gaurav Jumde
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Kevin Casanova
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain
| | - Jaume Bonet
- Laboratory of Protein Design and Immuno-Enginneering, School of Engineering, Ecole Polytechnique Federale de Lausanne, 1015, Lausanne, Vaud, Switzerland
| | - Oriol Fornes
- Centre for Molecular Medicine and Therapeutics, Department of Medical Genetics, BC Children's Hospital Research Institute, University of British Columbia, Vancouver, BC, V5Z 4H4, Canada
| | - Narcis Fernandez-Fuentes
- Department of Biosciences, U Science Tech, Universitat de Vic-Universitat Central de Catalunya, Vic 08500, Barcelona, Catalonia, Spain
- Institute of Biological, Environ-Mental and Rural Sciences, Aberystwyth University, Aberystwyth, SY23 3EB, UK
| | - Baldo Oliva
- Structural Bioinformatics Lab, Department of Experimental and Health Science, Universitat Pompeu Fabra, 08003, Barcelona, Catalonia, Spain.
| |
Collapse
|
13
|
Barozet A, Bianciotto M, Vaisset M, Siméon T, Minoux H, Cortés J. Protein loops with multiple meta-stable conformations: A challenge for sampling and scoring methods. Proteins 2020; 89:218-231. [PMID: 32920900 DOI: 10.1002/prot.26008] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Revised: 08/10/2020] [Accepted: 08/25/2020] [Indexed: 12/25/2022]
Abstract
Flexible regions in proteins, such as loops, cannot be represented by a single conformation. Instead, conformational ensembles are needed to provide a more global picture. In this context, identifying statistically meaningful conformations within an ensemble generated by loop sampling techniques remains an open problem. The difficulty is primarily related to the lack of structural data about these flexible regions. With the majority of structural data coming from x-ray crystallography and ignoring plasticity, the conception and evaluation of loop scoring methods is challenging. In this work, we compare the performance of various scoring methods on a set of eight protein loops that are known to be flexible. The ability of each method to identify and select all of the known conformations is assessed, and the underlying energy landscapes are produced and projected to visualize the qualitative differences obtained when using the methods. Statistical potentials are found to provide considerable reliability despite their being designed to tradeoff accuracy for lower computational cost. On a large pool of loop models, they are capable of filtering out statistically improbable states while retaining those that resemble known (and thus likely) conformations. However, computationally expensive methods are still required for more precise assessment and structural refinement. The results also highlight the importance of employing several scaffolds for the protein, due to the high influence of small structural rearrangements in the rest of the protein over the modeled energy landscape for the loop.
Collapse
Affiliation(s)
- Amélie Barozet
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France.,Sanofi Recherche & Développement, Integrated Drug Discovery, Molecular Design Sciences, Vitry-sur-Seine, France
| | - Marc Bianciotto
- Sanofi Recherche & Développement, Integrated Drug Discovery, Molecular Design Sciences, Vitry-sur-Seine, France
| | - Marc Vaisset
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| | - Thierry Siméon
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| | - Hervé Minoux
- Sanofi Recherche & Développement, Integrated Drug Discovery, Molecular Design Sciences, Vitry-sur-Seine, France
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| |
Collapse
|
14
|
Postic G, Janel N, Tufféry P, Moroy G. An information gain-based approach for evaluating protein structure models. Comput Struct Biotechnol J 2020; 18:2228-2236. [PMID: 32837711 PMCID: PMC7431362 DOI: 10.1016/j.csbj.2020.08.013] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2020] [Revised: 08/06/2020] [Accepted: 08/07/2020] [Indexed: 12/23/2022] Open
Abstract
For three decades now, knowledge-based scoring functions that operate through the "potential of mean force" (PMF) approach have continuously proven useful for studying protein structures. Although these statistical potentials are not to be confused with their physics-based counterparts of the same name-i.e. PMFs obtained by molecular dynamics simulations-their particular success in assessing the native-like character of protein structure predictions has lead authors to consider the computed scores as approximations of the free energy. However, this physical justification is a matter of controversy since the beginning. Alternative interpretations based on Bayes' theorem have been proposed, but the misleading formalism that invokes the inverse Boltzmann law remains recurrent in the literature. In this article, we present a conceptually new method for ranking protein structure models by quality, which is (i) independent of any physics-based explanation and (ii) relevant to statistics and to a general definition of information gain. The theoretical development described in this study provides new insights into how statistical PMFs work, in comparison with our approach. To prove the concept, we have built interatomic distance-dependent scoring functions, based on the former and new equations, and compared their performance on an independent benchmark of 60,000 protein structures. The results demonstrate that our new formalism outperforms statistical PMFs in evaluating the quality of protein structural decoys. Therefore, this original type of score offers a possibility to improve the success of statistical PMFs in the various fields of structural biology where they are applied. The open-source code is available for download at https://gitlab.rpbs.univ-paris-diderot.fr/src/ig-score.
Collapse
Affiliation(s)
- Guillaume Postic
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France.,Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France.,Institut Français de Bioinformatique (IFB), UMS 3601-CNRS, Université Paris-Saclay, Orsay, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Nathalie Janel
- Université de Paris, BFA, UMR 8251, CNRS, F-75013 Paris, France
| | - Pierre Tufféry
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Gautier Moroy
- Université de Paris, BFA, UMR 8251, CNRS, ERL U1133, Inserm, F-75013 Paris, France
| |
Collapse
|
15
|
Karami Y, Rey J, Postic G, Murail S, Tufféry P, de Vries SJ. DaReUS-Loop: a web server to model multiple loops in homology models. Nucleic Acids Res 2020; 47:W423-W428. [PMID: 31114872 PMCID: PMC6602439 DOI: 10.1093/nar/gkz403] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 04/20/2019] [Accepted: 05/06/2019] [Indexed: 02/07/2023] Open
Abstract
Loop regions in protein structures often have crucial roles, and they are much more variable in sequence and structure than other regions. In homology modeling, this leads to larger deviations from the homologous templates, and loop modeling of homology models remains an open problem. To address this issue, we have previously developed the DaReUS-Loop protocol, leading to significant improvement over existing methods. Here, a DaReUS-Loop web server is presented, providing an automated platform for modeling or remodeling loops in the context of homology models. This is the first web server accepting a protein with up to 20 loop regions, and modeling them all in parallel. It also provides a prediction confidence level that corresponds to the expected accuracy of the loops. DaReUS-Loop facilitates the analysis of the results through its interactive graphical interface and is freely available at http://bioserv.rpbs.univ-paris-diderot.fr/services/DaReUS-Loop/.
Collapse
Affiliation(s)
- Yasaman Karami
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Julien Rey
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Guillaume Postic
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France.,Institut Français de Bioinformatique (IFB), UMS 3601-CNRS, Université Paris-Saclay, Orsay, France
| | - Samuel Murail
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France
| | - Pierre Tufféry
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Sjoerd J de Vries
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| |
Collapse
|
16
|
Xu G, Wang Q, Ma J. OPUS-Fold: An Open-Source Protein Folding Framework Based on Torsion-Angle Sampling. J Chem Theory Comput 2020; 16:3970-3976. [DOI: 10.1021/acs.jctc.0c00186] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States
- Department of Bioengineering, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
17
|
Liu S, Xiang X, Gao X, Liu H. Neighborhood Preference of Amino Acids in Protein Structures and its Applications in Protein Structure Assessment. Sci Rep 2020; 10:4371. [PMID: 32152349 PMCID: PMC7062742 DOI: 10.1038/s41598-020-61205-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2019] [Accepted: 02/24/2020] [Indexed: 12/02/2022] Open
Abstract
Amino acids form protein 3D structures in unique manners such that the folded structure is stable and functional under physiological conditions. Non-specific and non-covalent interactions between amino acids exhibit neighborhood preferences. Based on structural information from the protein data bank, a statistical energy function was derived to quantify amino acid neighborhood preferences. The neighborhood of one amino acid is defined by its contacting residues, and the energy function is determined by the neighboring residue types and relative positions. The neighborhood preference of amino acids was exploited to facilitate structural quality assessment, which was implemented in the neighborhood preference program NEPRE. The source codes are available via https://github.com/LiuLab-CSRC/NePre.
Collapse
Affiliation(s)
- Siyuan Liu
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Xilun Xiang
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Xiang Gao
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China
- School of Software Engineering, University of Science and Technology of China, Hefei, Anhui, 230026, China
| | - Haiguang Liu
- Complex Systems Division, Beijing Computational Science Research Center, Haidian, Beijing, 100193, China.
- Physics Department, Beijing Normal University, Haidian, Beijing, 100875, China.
| |
Collapse
|
18
|
Xu G, Wang Q, Ma J. OPUS-Refine: A Fast Sampling-Based Framework for Refining Protein Backbone Torsion Angles and Global Conformation. J Chem Theory Comput 2020; 16:1359-1366. [DOI: 10.1021/acs.jctc.9b01054] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Gang Xu
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China
| | - Qinghua Wang
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States
| | - Jianpeng Ma
- Multiscale Research Institute of Complex Systems, Fudan University, Shanghai 200433, China
- Verna and Marrs Mclean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, One Baylor Plaza, BCM-125, Houston, Texas 77030, United States
- Department of Bioengineering, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
19
|
Serafimova K, Mihaylov I, Vassilev D, Avdjieva I, Zielenkiewicz P, Kaczanowski S. Using Machine Learning in Accuracy Assessment of Knowledge-Based Energy and Frequency Base Likelihood in Protein Structures. LECTURE NOTES IN COMPUTER SCIENCE 2020. [PMCID: PMC7304015 DOI: 10.1007/978-3-030-50420-5_43] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/02/2022]
Abstract
Many aspects of the study of protein folding and dynamics have been affected by the accumulation of data about native protein structures and recent advances in machine learning. Computational methods for predicting protein structures from their sequences are now heavily based on machine learning tools and on approaches that extract knowledge and rules from data using probabilistic models. Many of these methods use scoring functions to determine which structure best fits a native protein sequence. Using computational approaches, we obtained two scoring functions: knowledge-based energy and likelihood of base frequency, and we compared their accuracy in measuring the sequence structure fit. We compared the machine learning models’ accuracy of predictions for knowledge-based energy and likelihood values to validate our results, showing that likelihood is a more accurate scoring function than knowledge-based energy.
Collapse
|
20
|
Mirzaie M. Identification of native protein structures captured by principal interactions. BMC Bioinformatics 2019; 20:604. [PMID: 31752663 PMCID: PMC6873546 DOI: 10.1186/s12859-019-3186-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Accepted: 11/01/2019] [Indexed: 11/20/2022] Open
Abstract
Background Evaluation of protein structure is based on trustworthy potential function. The total potential of a protein structure is approximated as the summation of all pair-wise interaction potentials. Knowledge-based potentials (KBP) are one type of potential functions derived by known experimentally determined protein structures. Although several KBP functions with different methods have been introduced, the key interactions that capture the total potential have not studied yet. Results In this study, we seek the interaction types that preserve as much of the total potential as possible. We employ a procedure based on the principal component analysis (PCA) to extract the significant and key interactions in native protein structures. We call these interactions as principal interactions and show that the results of the model that considers only these interactions are very close to the full interaction model that considers all interactions in protein fold recognition. In fact, the principal interactions maintain the discriminative power of the full interaction model. This method was evaluated on 3 KBPs with different contact definitions and thresholds of distance and revealed that their corresponding principal interactions are very similar and have a lot in common. Additionally, the principal interactions consisted of 20 % of the full interactions on average, and they are between residues, which are considered important in protein folding. Conclusions This work shows that all interaction types are not equally important in discrimination of native structure. The results of the reduced model based on principal interactions that were very close to the full interaction model suggest that a new strategy is needed to capture the role of remaining interactions (non-principal interactions) to improve the power of knowledge-based potential functions.
Collapse
Affiliation(s)
- Mehdi Mirzaie
- Department of Applied Mathematics, Faculty of Mathematical Sciences, Tarbiat Modares University, Jalal Ale Ahmad Highway, P.O.Box: 14115-134, Tehran, Iran.
| |
Collapse
|
21
|
Discrimination power of knowledge-based potential dictated by the dominant energies in native protein structures. Amino Acids 2019; 51:1029-1038. [PMID: 31098784 DOI: 10.1007/s00726-019-02743-0] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2018] [Accepted: 05/08/2019] [Indexed: 01/20/2023]
Abstract
Extracting a well-designed energy function is important for protein structure evaluation. Knowledge-based potential functions are one type of the energy functions which can be obtained from known protein structures. The pairwise potential between atom types is approximated using Boltzmann's law which relates the frequency of atom types to its potential. The total energy is approximated as a summation of pairwise potential between the atomic pairs. In the present study, the performance of knowledge-based potential function was assessed based on the strength of interaction between groups of amino acids. The dominant energies involved in the pairwise potentials were revealed by eigenvalue analysis of the matrix, the elements of which represent the energy between amino acids. For this purpose, the matrix including the mean of the energies of residue-residue interaction types was constructed using 500 native protein structures. The matrix has a dominant eigenvalue and amino acids, with LEU, VAL, ILE, PHE, TYR, ALA and TRP having high values along the dominant eigenvector. The results show that the ranking of amino acids is consistent with the power of amino acids in discriminating native structures using K-alphabet reduced model. In the reduced interactions, only amino acids from a subset of all 20 amino acids, along with their interactions are considered to assess the energy. In the K-alphabet reduced model, the reduced structures are constructed based on only the K-amino acid types. The dominant K-alphabet reduced model derived for the k-first amino acids in the list [LEU, VAL, PHE, ILE, TYR, ALA, TRP] of amino acids has the best discrimination of native structure among all possible K-alphabet reduced models. Knowledge-based potentials might be improved with a new strategy.
Collapse
|