1
|
H S S, V G, T M N, Setlur AS, K C, Kumar J, Niranjan V. Comprehending interaction mechanism of natural actives of Colchicum autumnale L. for rheumatoid arthritis using integrative chemoinformatic approaches. J Biomol Struct Dyn 2023:1-20. [PMID: 38116745 DOI: 10.1080/07391102.2023.2294177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2023] [Accepted: 12/01/2023] [Indexed: 12/21/2023]
Abstract
This research delves into the realm of therapeutic potential within natural compounds derived from Colchicum autumnale L., emphasizing a holistic perspective on medications used in human therapy. Rather than confining the study to their primary actions, the research endeavors to unveil molecular targets for these natural compounds, with a specific focus on their potential applicability in the treatment of rheumatoid arthritis (RA). The study focuses on understanding interactions between specific natural actives that target RA. Fifteen RA target proteins were identified from OMIM, GeneScan and PharmaGKB. Their structures were downloaded from RCSB PDB. Two active components of C. autumnale L. were chosen for mass spectrometry investigation. Ligand characteristics were determined using the ADMETlab and SwissADME software tools. Molecular docking was performed, and the top three complexes were simulated for 200 ns, along with identification of free binding energies. The compounds β-sitosterol-IL-10 (-6.50 kcal/mol), colchicine-IL-10 (-6.01 kcal/mol), linoleic acid-IL-10 (-7.22 kcal/mol) and linoleic acid-IL-10 (-7.22 kcal/mol) exhibited best binding energies. β-Sitosterol and colchicine showed the highest stability in simulations, confirmed by molecular mechanics free energy binding calculations. This work provides insights into the molecular interaction of natural compounds against RA targets, offering potential therapeutic anti-RA medications.
Collapse
Affiliation(s)
- Sowmya H S
- Bangalore Bio-innovation Centre (BBC), Helix Biotech Park, Electronic City Phase-I, Bangalore, Karnataka, India
| | - Guruprasad V
- Homeopathic medical college and Hospital Bangalore, Bangalore, Karnataka, India
| | - Ningaraju T M
- University of Agricultural science Bangalore, Bangalore, Karnataka, India
| | - Anagha S Setlur
- Department of Biotechnology, RV College of Engineering, Bangalore, Karnataka, India
| | - Chandrashekar K
- Department of Biotechnology, RV College of Engineering, Bangalore, Karnataka, India
| | - Jitendra Kumar
- Biotechnology Industry Research Assistance Council (BIRAC), CGO complex Lodhi Road, New Delhi, India
| | - Vidya Niranjan
- Department of Biotechnology, RV College of Engineering, Bangalore, Karnataka, India
| |
Collapse
|
2
|
Abstract
Genome sequencing projects have resulted in a rapid increase in the number of known protein sequences. In contrast, only about one-hundredth of these sequences have been characterized at atomic resolution using experimental structure determination methods. Computational protein structure modeling techniques have the potential to bridge this sequence-structure gap. In the following chapter, we present an example that illustrates the use of MODELLER to construct a comparative model for a protein with unknown structure. Automation of a similar protocol has resulted in models of useful accuracy for domains in more than half of all known protein sequences.
Collapse
|
3
|
Karami Y, Rey J, Postic G, Murail S, Tufféry P, de Vries SJ. DaReUS-Loop: a web server to model multiple loops in homology models. Nucleic Acids Res 2020; 47:W423-W428. [PMID: 31114872 PMCID: PMC6602439 DOI: 10.1093/nar/gkz403] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2019] [Revised: 04/20/2019] [Accepted: 05/06/2019] [Indexed: 02/07/2023] Open
Abstract
Loop regions in protein structures often have crucial roles, and they are much more variable in sequence and structure than other regions. In homology modeling, this leads to larger deviations from the homologous templates, and loop modeling of homology models remains an open problem. To address this issue, we have previously developed the DaReUS-Loop protocol, leading to significant improvement over existing methods. Here, a DaReUS-Loop web server is presented, providing an automated platform for modeling or remodeling loops in the context of homology models. This is the first web server accepting a protein with up to 20 loop regions, and modeling them all in parallel. It also provides a prediction confidence level that corresponds to the expected accuracy of the loops. DaReUS-Loop facilitates the analysis of the results through its interactive graphical interface and is freely available at http://bioserv.rpbs.univ-paris-diderot.fr/services/DaReUS-Loop/.
Collapse
Affiliation(s)
- Yasaman Karami
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Julien Rey
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Guillaume Postic
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France.,Institut Français de Bioinformatique (IFB), UMS 3601-CNRS, Université Paris-Saclay, Orsay, France
| | - Samuel Murail
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France
| | - Pierre Tufféry
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| | - Sjoerd J de Vries
- Sorbonne Paris Cité, Université Paris Diderot, CNRS UMR 8251, INSERM ERL U1133, Paris, France.,Ressource Parisienne en Bioinformatique Structurale (RPBS), Paris, France
| |
Collapse
|
4
|
Investigation of machine learning techniques on proteomics: A comprehensive survey. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2019; 149:54-69. [PMID: 31568792 DOI: 10.1016/j.pbiomolbio.2019.09.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 09/16/2019] [Accepted: 09/23/2019] [Indexed: 11/21/2022]
Abstract
Proteomics is the extensive investigation of proteins which has empowered the recognizable proof of consistently expanding quantities of protein. Proteins are necessary part of living life form, with numerous capacities. The proteome is the complete arrangement of proteins that are created or altered by a life form or framework of the organism. Proteome fluctuates with time and unambiguous prerequisites, or stresses, that a cell or organism experiences. Proteomics is an interdisciplinary area that has derived from the hereditary data of different genome ventures. Much proteomics information is gathered with the assistance of high throughput techniques, for example, mass spectrometry and microarray. It would regularly take weeks or months to analyze the information and perform examinations by hand. Therefore, scholars and scientific experts are teaming up with computer science researchers and mathematicians to make projects and pipeline to computationally examine the protein information. Utilizing bioinformatics procedures, scientists are prepared to do quicker investigation and protein information storing. The goal of this paper is to brief about the review of machine learning procedures and its application in the field of proteomics.
Collapse
|
5
|
Karami Y, Guyon F, De Vries S, Tufféry P. DaReUS-Loop: accurate loop modeling using fragments from remote or unrelated proteins. Sci Rep 2018; 8:13673. [PMID: 30209260 PMCID: PMC6135855 DOI: 10.1038/s41598-018-32079-w] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Accepted: 08/31/2018] [Indexed: 11/08/2022] Open
Abstract
Despite efforts during the past decades, loop modeling remains a difficult part of protein structure modeling. Several approaches have been developed in the framework of crystal structures. However, for homology models, the modeling of loops is still far from being solved. We propose DaReUS-Loop, a data-based approach that identifies loop candidates mining the complete set of experimental structures available in the Protein Data Bank. Candidate filtering relies on local conformation profile-profile comparison, together with physico-chemical scoring. Applied to three different template-based test sets, DaReUS-Loop shows significant increase in the number of high-accuracy loops, and significant enhancement for modeling long loops. A special advantage is that our method proposes a prediction confidence score that correlates well with the expected accuracy of the loops. Strikingly, over 50% of successful loop models are derived from unrelated proteins, indicating that fragments under similar constraints tend to adopt similar structure, beyond mere homology.
Collapse
Affiliation(s)
- Yasaman Karami
- Molécules Thérapeutiques in silico, UMR-S973, Institut National de la Santé et de la Recherche Médicale (INSERM), Université Paris Diderot, Sorbonne Paris Cité, RPBS, 75013, Paris, France
| | - Frédéric Guyon
- Molécules Thérapeutiques in silico, UMR-S973, Institut National de la Santé et de la Recherche Médicale (INSERM), Université Paris Diderot, Sorbonne Paris Cité, RPBS, 75013, Paris, France
| | - Sjoerd De Vries
- Molécules Thérapeutiques in silico, UMR-S973, Institut National de la Santé et de la Recherche Médicale (INSERM), Université Paris Diderot, Sorbonne Paris Cité, RPBS, 75013, Paris, France.
| | - Pierre Tufféry
- Molécules Thérapeutiques in silico, UMR-S973, Institut National de la Santé et de la Recherche Médicale (INSERM), Université Paris Diderot, Sorbonne Paris Cité, RPBS, 75013, Paris, France.
| |
Collapse
|
6
|
Won J, Lee GR, Park H, Seok C. GalaxyGPCRloop: Template-Based and Ab Initio Structure Sampling of the Extracellular Loops of G-Protein-Coupled Receptors. J Chem Inf Model 2018; 58:1234-1243. [DOI: 10.1021/acs.jcim.8b00148] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Jonghun Won
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea
| | - Gyu Rie Lee
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea
| | - Hahnbeom Park
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea
| | - Chaok Seok
- Department of Chemistry, Seoul National University, Seoul 08826, Republic of Korea
| |
Collapse
|
7
|
Bansal N, Zheng Z, Song LF, Pei J, Merz KM. The Role of the Active Site Flap in Streptavidin/Biotin Complex Formation. J Am Chem Soc 2018; 140:5434-5446. [PMID: 29607642 DOI: 10.1021/jacs.8b00743] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Obtaining a detailed description of how active site flap motion affects substrate or ligand binding will advance structure-based drug design (SBDD) efforts on systems including the kinases, HSP90, HIV protease, ureases, etc. Through this understanding, we will be able to design better inhibitors and better proteins that have desired functions. Herein we address this issue by generating the relevant configurational states of a protein flap on the molecular energy landscape using an approach we call MTFlex-b and then following this with a procedure to estimate the free energy associated with the motion of the flap region. To illustrate our overall workflow, we explored the free energy changes in the streptavidin/biotin system upon introducing conformational flexibility in loop3-4 in the biotin unbound ( apo) and bound ( holo) state. The free energy surfaces were created using the Movable Type free energy method, and for further validation, we compared them to potential of mean force (PMF) generated free energy surfaces using MD simulations employing the FF99SBILDN and FF14SB force fields. We also estimated the free energy thermodynamic cycle using an ensemble of closed-like and open-like end states for the ligand unbound and bound states and estimated the binding free energy to be approximately -16.2 kcal/mol (experimental -18.3 kcal/mol). The good agreement between MTFlex-b in combination with the MT method with experiment and MD simulations supports the effectiveness of our strategy in obtaining unique insights into the motions in proteins that can then be used in a range of biological and biomedical applications.
Collapse
Affiliation(s)
- Nupur Bansal
- Department of Chemistry and Department of Biochemistry and Molecular Biology , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Zheng Zheng
- Department of Chemistry and Department of Biochemistry and Molecular Biology , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Lin Frank Song
- Department of Chemistry and Department of Biochemistry and Molecular Biology , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Jun Pei
- Department of Chemistry and Department of Biochemistry and Molecular Biology , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States
| | - Kenneth M Merz
- Department of Chemistry and Department of Biochemistry and Molecular Biology , Michigan State University , 578 South Shaw Lane , East Lansing , Michigan 48824 , United States.,Institute for Cyber Enabled Research , Michigan State University , 567 Wilson Road , East Lansing , Michigan 48824 , United States
| |
Collapse
|
8
|
Abstract
Genome sequencing projects have resulted in a rapid increase in the number of known protein sequences. In contrast, only about one-hundredth of these sequences have been characterized at atomic resolution using experimental structure determination methods. Computational protein structure modeling techniques have the potential to bridge this sequence-structure gap. In the following chapter, we present an example that illustrates the use of MODELLER to construct a comparative model for a protein with unknown structure. Automation of a similar protocol has resulted in models of useful accuracy for domains in more than half of all known protein sequences.
Collapse
Affiliation(s)
- Benjamin Webb
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences (QB3), University of California San Francisco, San Francisco, CA, 94143, USA
| | - Andrej Sali
- Department of Bioengineering and Therapeutic Sciences, Department of Pharmaceutical Chemistry, and California Institute for Quantitative Biosciences (QB3), University of California San Francisco, San Francisco, CA, 94143, USA.
| |
Collapse
|
9
|
Tang K, Zhang J, Liang J. Distance-Guided Forward and Backward Chain-Growth Monte Carlo Method for Conformational Sampling and Structural Prediction of Antibody CDR-H3 Loops. J Chem Theory Comput 2016; 13:380-388. [PMID: 27996262 DOI: 10.1021/acs.jctc.6b00845] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Antibodies recognize antigens through the complementary determining regions (CDR) formed by six-loop hypervariable regions crucial for the diversity of antigen specificities. Among the six CDR loops, the H3 loop is the most challenging to predict because of its much higher variation in sequence length and identity, resulting in much larger and complex structural space, compared to the other five loops. We developed a novel method based on a chain-growth sequential Monte Carlo method, called distance-guided sequential chain-growth Monte Carlo for H3 loops (DiSGro-H3). The new method samples protein chains in both forward and backward directions. It can efficiently generate low energy, near-native H3 loop structures using the conformation types predicted from the sequences of H3 loops. DiSGro-H3 performs significantly better than another ab initio method, RosettaAntibody, in both sampling and prediction, while taking less computational time. It performs comparably to template-based methods. As an ab initio method, DiSGro-H3 offers satisfactory accuracy while being able to predict any H3 loops without templates.
Collapse
Affiliation(s)
- Ke Tang
- Department of Bioengineering, University of Illinois at Chicago , Chicago, Illinois 60607, United States
| | - Jinfeng Zhang
- Department of Statistics, Florida State University , Tallahassee, Florida 32306, United States
| | - Jie Liang
- Department of Bioengineering, University of Illinois at Chicago , Chicago, Illinois 60607, United States
| |
Collapse
|
10
|
Abstract
Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | - Andrej Sali
- University of California at San Francisco, San Francisco, California
| |
Collapse
|
11
|
Kolodny R, Guibas L, Levitt M, Koehl P. Inverse Kinematics in Biology: The Protein Loop Closure Problem. Int J Rob Res 2016. [DOI: 10.1177/0278364905050352] [Citation(s) in RCA: 52] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Assembling fragments from known protein structures is a widely used approach to construct structural models for new proteins. We describe an application of this idea to an important inverse kinematics problem in structural biology: the loop closure problem. We have developed an algorithm for generating the conformations of candidate loops that fit in a gap of given length in a protein structure framework. Our method proceeds by concatenating small fragments of protein chosen from small libraries of representative fragments. Our approach has the advantages of ab initio methods since we are able to enumerate all candidate loops in the discrete approximation of the conformational space accessible to the loop, as well as the advantages of database search approach since the use of fragments of known protein structures guarantees that the backbone conformations are physically reasonable. We test our approach on a set of 427 loops, varying in length from four residues to 14 residues. The quality of the candidate loops is evaluated in terms of global coordinate root mean square (cRMS). The top predictions vary between 0.3 and 4.2 Å for four-residue loops and between 1.5 and 3.1 Å for 14-residue loops, respectively.
Collapse
Affiliation(s)
- Rachel Kolodny
- Department of Structural Biology and Computer Science Department, Stanford University, Stanford, CA 94305, USA,
| | - Leonidas Guibas
- Computer Science Department, Stanford University, Stanford, CA 94305, USA
| | - Michael Levitt
- Department of Structural Biology, Stanford University, Stanford, CA 94305, USA
| | - Patrice Koehl
- Department of Structural Biology, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
12
|
Webb B, Sali A. Comparative Protein Structure Modeling Using MODELLER. CURRENT PROTOCOLS IN BIOINFORMATICS 2016; 54:5.6.1-5.6.37. [PMID: 27322406 PMCID: PMC5031415 DOI: 10.1002/cpbi.3] [Citation(s) in RCA: 2088] [Impact Index Per Article: 232.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | - Andrej Sali
- University of California at San Francisco, San Francisco, California
| |
Collapse
|
13
|
Ismer J, Rose AS, Tiemann JKS, Goede A, Preissner R, Hildebrand PW. SL2: an interactive webtool for modeling of missing segments in proteins. Nucleic Acids Res 2016; 44:W390-4. [PMID: 27105847 PMCID: PMC4987885 DOI: 10.1093/nar/gkw297] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 04/11/2016] [Indexed: 11/22/2022] Open
Abstract
SuperLooper2 (SL2) (http://proteinformatics.charite.de/sl2) is the updated version of our previous web-server SuperLooper, a fragment based tool for the prediction and interactive placement of loop structures into globular and helical membrane proteins. In comparison to our previous version, SL2 benefits from both a considerably enlarged database of fragments derived from high-resolution 3D protein structures of globular and helical membrane proteins, and the integration of a new protein viewer. The database, now with double the content, significantly improved the coverage of fragment conformations and prediction quality. The employment of the NGL viewer for visualization of the protein under investigation and interactive selection of appropriate loops makes SL2 independent of third-party plug-ins and additional installations.
Collapse
Affiliation(s)
- Jochen Ismer
- Institute of Medical Physics and Biophysics, University Medicine, Berlin, 10117 Berlin, Germany
| | - Alexander S Rose
- Institute of Medical Physics and Biophysics, University Medicine, Berlin, 10117 Berlin, Germany
| | - Johanna K S Tiemann
- Institute of Medical Physics and Biophysics, University Medicine, Berlin, 10117 Berlin, Germany
| | - Andrean Goede
- Institute of Physiology & Experimental Clinical Research Center, University Medicine, Berlin, 13125, Germany
| | - Robert Preissner
- Institute of Physiology & Experimental Clinical Research Center, University Medicine, Berlin, 13125, Germany
| | - Peter W Hildebrand
- Institute of Medical Physics and Biophysics, University Medicine, Berlin, 10117 Berlin, Germany
| |
Collapse
|
14
|
Peng X, He J, Niemi AJ. Clustering and percolation in protein loop structures. BMC STRUCTURAL BIOLOGY 2015; 15:22. [PMID: 26510704 PMCID: PMC4625449 DOI: 10.1186/s12900-015-0049-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Accepted: 10/13/2015] [Indexed: 11/24/2022]
Abstract
Background High precision protein loop modelling remains a challenge, both in template based and template independent approaches to protein structure prediction. Method We introduce the concepts of protein loop clustering and percolation, to develop a quantitative approach to systematically classify the modular building blocks of loops in crystallographic folded proteins. These fragments are all different parameterisations of a unique kink solution to a generalised discrete nonlinear Schrödinger (DNLS) equation. Accordingly, the fragments are also local energy minima of the ensuing energy function. Results We show how the loop fragments cover practically all ultrahigh resolution crystallographic protein structures in Protein Data Bank (PDB), with a 0.2 Ångström root-mean-square (RMS) precision. We find that no more than 12 different loop fragments are needed, to describe around 38 % of ultrahigh resolution loops in PDB. But there is also a large number of loop fragments that are either unique, or very rare, and examples of unique fragments are found even in the structure of a myoglobin. Conclusions Protein loops are built in a modular fashion. The loops are composed of fragments that can be modelled by the kink of the DNLS equation. The majority of loop fragments are also common, which are shared by many proteins. These common fragments are probably important for supporting the overall protein conformation. But there are also several fragments that are either unique to a given protein, or very rare. Such fragments are probably related to the function of the protein. Furthermore, we have found that the amino acid sequence does not determine the structure in a unique fashion. There are many examples of loop fragments with an identical amino acid sequence, but with a very different structure. Electronic supplementary material The online version of this article (doi:10.1186/s12900-015-0049-x) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xubiao Peng
- Department of Physics and Astronomy, Uppsala University, P.O. Box 803, Uppsala, S-75108, Sweden.
| | - Jianfeng He
- School of Physics, Beijing Institute of Technology, Beijing, 100081, People's Republic of China.
| | - Antti J Niemi
- Department of Physics and Astronomy, Uppsala University, P.O. Box 803, Uppsala, S-75108, Sweden. .,Laboratoire de Mathematiques et Physique Theorique CNRS UMR 6083, Fédération Denis Poisson, Université de Tours, Parc de Grandmont, Tours, F37200, France.
| |
Collapse
|
15
|
Tang K, Wong SWK, Liu JS, Zhang J, Liang J. Conformational sampling and structure prediction of multiple interacting loops in soluble and β-barrel membrane proteins using multi-loop distance-guided chain-growth Monte Carlo method. Bioinformatics 2015; 31:2646-52. [PMID: 25861965 DOI: 10.1093/bioinformatics/btv198] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2014] [Accepted: 04/03/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Loops in proteins are often involved in biochemical functions. Their irregularity and flexibility make experimental structure determination and computational modeling challenging. Most current loop modeling methods focus on modeling single loops. In protein structure prediction, multiple loops often need to be modeled simultaneously. As interactions among loops in spatial proximity can be rather complex, sampling the conformations of multiple interacting loops is a challenging task. RESULTS In this study, we report a new method called multi-loop Distance-guided Sequential chain-Growth Monte Carlo (M-DiSGro) for prediction of the conformations of multiple interacting loops in proteins. Our method achieves an average RMSD of 1.93 Å for lowest energy conformations of 36 pairs of interacting protein loops with the total length ranging from 12 to 24 residues. We further constructed a data set containing proteins with 2, 3 and 4 interacting loops. For the most challenging target proteins with four loops, the average RMSD of the lowest energy conformations is 2.35 Å. Our method is also tested for predicting multiple loops in β-barrel membrane proteins. For outer-membrane protein G, the lowest energy conformation has a RMSD of 2.62 Å for the three extracellular interacting loops with a total length of 34 residues (12, 12 and 10 residues in each loop). AVAILABILITY AND IMPLEMENTATION The software is freely available at: tanto.bioe.uic.edu/m-DiSGro. CONTACT jinfeng@stat.fsu.edu or jliang@uic.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ke Tang
- Richard and Loan Hill Department of Bioengineering, University of Illinois at Chicago, Chicago, IL
| | - Samuel W K Wong
- Department of Statistics, University of Florida, Gainesville, FL
| | - Jun S Liu
- Department of Statistics, Harvard University, Science Center, Cambridge, MA and
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Jie Liang
- Richard and Loan Hill Department of Bioengineering, University of Illinois at Chicago, Chicago, IL
| |
Collapse
|
16
|
The origin of CDR H3 structural diversity. Structure 2015; 23:302-11. [PMID: 25579815 DOI: 10.1016/j.str.2014.11.010] [Citation(s) in RCA: 69] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2014] [Revised: 11/03/2014] [Accepted: 11/05/2014] [Indexed: 01/15/2023]
Abstract
Antibody complementarity determining region (CDR) H3 loops are critical for adaptive immunological functions. Although the other five CDR loops adopt predictable canonical structures, H3 conformations have proven unclassifiable, other than an unusual C-terminal "kink" present in most antibodies. To determine why the majority of H3 loops are kinked and to learn whether non-antibody proteins have loop structures similar to those of H3, we searched a set of 15,679 high-quality non-antibody structures for regions geometrically similar to the residues immediately surrounding the loop. By incorporating the kink into our search, we identified 1,030 H3-like loops from 632 protein families. Some protein families, including PDZ domains, appear to use the identified region for recognition and binding. Our results suggest that the kink is conserved in the immunoglobulin heavy chain fold because it disrupts the β-strand pairing at the base of the loop. Thus, the kink is a critical driver of the observed structural diversity in CDR H3.
Collapse
|
17
|
Abstract
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | | |
Collapse
|
18
|
Rysavy SJ, Beck DAC, Daggett V. Dynameomics: data-driven methods and models for utilizing large-scale protein structure repositories for improving fragment-based loop prediction. Protein Sci 2014; 23:1584-95. [PMID: 25142412 DOI: 10.1002/pro.2537] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2014] [Revised: 07/30/2014] [Accepted: 08/17/2014] [Indexed: 12/26/2022]
Abstract
Protein function is intimately linked to protein structure and dynamics yet experimentally determined structures frequently omit regions within a protein due to indeterminate data, which is often due protein dynamics. We propose that atomistic molecular dynamics simulations provide a diverse sampling of biologically relevant structures for these missing segments (and beyond) to improve structural modeling and structure prediction. Here we make use of the Dynameomics data warehouse, which contains simulations of representatives of essentially all known protein folds. We developed novel computational methods to efficiently identify, rank and retrieve small peptide structures, or fragments, from this database. We also created a novel data model to analyze and compare large repositories of structural data, such as contained within the Protein Data Bank and the Dynameomics data warehouse. Our evaluation compares these structural repositories for improving loop predictions and analyzes the utility of our methods and models. Using a standard set of loop structures, containing 510 loops, 30 for each loop length from 4 to 20 residues, we find that the inclusion of Dynameomics structures in fragment-based methods improves the quality of the loop predictions without being dependent on sequence homology. Depending on loop length, ∼ 25-75% of the best predictions came from the Dynameomics set, resulting in lower main chain root-mean-square deviations for all fragment lengths using the combined fragment library. We also provide specific cases where Dynameomics fragments provide better predictions for NMR loop structures than fragments from crystal structures. Online access to these fragment libraries is available at http://www.dynameomics.org/fragments.
Collapse
Affiliation(s)
- Steven J Rysavy
- Division of Biomedical and Health Informatics, University of Washington, Seattle, Washington
| | | | | |
Collapse
|
19
|
Tang K, Zhang J, Liang J. Fast protein loop sampling and structure prediction using distance-guided sequential chain-growth Monte Carlo method. PLoS Comput Biol 2014; 10:e1003539. [PMID: 24763317 PMCID: PMC3998890 DOI: 10.1371/journal.pcbi.1003539] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2013] [Accepted: 02/01/2014] [Indexed: 11/18/2022] Open
Abstract
Loops in proteins are flexible regions connecting regular secondary structures. They are often involved in protein functions through interacting with other molecules. The irregularity and flexibility of loops make their structures difficult to determine experimentally and challenging to model computationally. Conformation sampling and energy evaluation are the two key components in loop modeling. We have developed a new method for loop conformation sampling and prediction based on a chain growth sequential Monte Carlo sampling strategy, called Distance-guided Sequential chain-Growth Monte Carlo (DISGRO). With an energy function designed specifically for loops, our method can efficiently generate high quality loop conformations with low energy that are enriched with near-native loop structures. The average minimum global backbone RMSD for 1,000 conformations of 12-residue loops is 1:53 A° , with a lowest energy RMSD of 2:99 A° , and an average ensembleRMSD of 5:23 A° . A novel geometric criterion is applied to speed up calculations. The computational cost of generating 1,000 conformations for each of the x loops in a benchmark dataset is only about 10 cpu minutes for 12-residue loops, compared to ca 180 cpu minutes using the FALCm method. Test results on benchmark datasets show that DISGRO performs comparably or better than previous successful methods, while requiring far less computing time. DISGRO is especially effective in modeling longer loops (10-17 residues).
Collapse
Affiliation(s)
- Ke Tang
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois, United States of America
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, Florida, United States of America
- * E-mail: (JZ); (JL)
| | - Jie Liang
- Department of Bioengineering, University of Illinois at Chicago, Chicago, Illinois, United States of America
- * E-mail: (JZ); (JL)
| |
Collapse
|
20
|
Liang S, Zhang C, Zhou Y. LEAP: highly accurate prediction of protein loop conformations by integrating coarse-grained sampling and optimized energy scores with all-atom refinement of backbone and side chains. J Comput Chem 2014; 35:335-41. [PMID: 24327406 PMCID: PMC4125323 DOI: 10.1002/jcc.23509] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2013] [Revised: 10/06/2013] [Accepted: 11/24/2013] [Indexed: 11/11/2022]
Abstract
Prediction of protein loop conformations without any prior knowledge (ab initio prediction) is an unsolved problem. Its solution will significantly impact protein homology and template-based modeling as well as ab initio protein-structure prediction. Here, we developed a coarse-grained, optimized scoring function for initial sampling and ranking of loop decoys. The resulting decoys are then further optimized in backbone and side-chain conformations and ranked by all-atom energy scoring functions. The final integrated technique called loop prediction by energy-assisted protocol achieved a median value of 2.1 Å root mean square deviation (RMSD) for 325 12-residue test loops and 2.0 Å RMSD for 45 12-residue loops from critical assessment of structure-prediction techniques (CASP) 10 target proteins with native core structures (backbone and side chains). If all side-chain conformations in protein cores were predicted in the absence of the target loop, loop-prediction accuracy only reduces slightly (0.2 Å difference in RMSD for 12-residue loops in the CASP target proteins). The accuracy obtained is about 1 Å RMSD or more improvement over other methods we tested. The executable file for a Linux system is freely available for academic users at http://sparks-lab.org.
Collapse
Affiliation(s)
- Shide Liang
- Systems Immunology Lab, Immunology Frontier Research Center, Osaka University, Suita, Osaka, 565-0871, Japan
| | - Chi Zhang
- School of Biological Sciences, Center for Plant Science and Innovation, University of Nebraska, Lincoln, NE, 68588, USA
| | - Yaoqi Zhou
- School of Informatics, Indiana University Purdue University at Indianapolis, Indianapolis, IN 46202, Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Institute for Glycomics and School of Informatics and Communication Technology, Griffith University, Parklands Drive, Southport Qld 4222, Australia
| |
Collapse
|
21
|
Abstract
Structural proteomics aims to understand the structural basis of protein interactions and functions. A prerequisite for this is the availability of 3D protein structures that mediate the biochemical interactions. The explosion in the number of available gene sequences set the stage for the next step in genome-scale projects -- to obtain 3D structures for each protein. To achieve this ambitious goal, the slow and costly structure determination experiments are supplemented with theoretical approaches. The current state and recent advances in structure modeling approaches are reviewed here, with special emphasis on comparative protein structure modeling techniques.
Collapse
Affiliation(s)
- András Fiser
- Department of Biochemistry, Seaver Foundation Center for Bioinformatics, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA.
| |
Collapse
|
22
|
Abstract
Genome sequencing projects have resulted in a rapid increase in the number of known protein sequences. In contrast, only about one-hundredth of these sequences have been characterized at atomic resolution using experimental structure determination methods. Computational protein structure modeling techniques have the potential to bridge this sequence-structure gap. In this chapter, we present an example that illustrates the use of MODELLER to construct a comparative model for a protein with unknown structure. Automation of a similar protocol has resulted in models of useful accuracy for domains in more than half of all known protein sequences.
Collapse
Affiliation(s)
- Benjamin Webb
- Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco, CA, USA
| | | |
Collapse
|
23
|
Webb B, Eswar N, Fan H, Khuri N, Pieper U, Dong G, Sali A. Comparative Modeling of Drug Target Proteins☆. REFERENCE MODULE IN CHEMISTRY, MOLECULAR SCIENCES AND CHEMICAL ENGINEERING 2014. [PMCID: PMC7157477 DOI: 10.1016/b978-0-12-409547-2.11133-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
In this perspective, we begin by describing the comparative protein structure modeling technique and the accuracy of the corresponding models. We then discuss the significant role that comparative prediction plays in drug discovery. We focus on virtual ligand screening against comparative models and illustrate the state-of-the-art by a number of specific examples.
Collapse
|
24
|
Gipson B, Hsu D, Kavraki LE, Latombe JC. Computational models of protein kinematics and dynamics: beyond simulation. ANNUAL REVIEW OF ANALYTICAL CHEMISTRY (PALO ALTO, CALIF.) 2012; 5:273-91. [PMID: 22524225 PMCID: PMC4866812 DOI: 10.1146/annurev-anchem-062011-143024] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Physics-based simulation represents a powerful method for investigating the time-varying behavior of dynamic protein systems at high spatial and temporal resolution. Such simulations, however, can be prohibitively difficult or lengthy for large proteins or when probing the lower-resolution, long-timescale behaviors of proteins generally. Importantly, not all questions about a protein system require full space and time resolution to produce an informative answer. For instance, by avoiding the simulation of uncorrelated, high-frequency atomic movements, a larger, domain-level picture of protein dynamics can be revealed. The purpose of this review is to highlight the growing body of complementary work that goes beyond simulation. In particular, this review focuses on methods that address kinematics and dynamics, as well as those that address larger organizational questions and can quickly yield useful information about the long-timescale behavior of a protein.
Collapse
Affiliation(s)
- Bryant Gipson
- Computer Science Department, Rice University, Houston, Texas 77005, USA.
| | | | | | | |
Collapse
|
25
|
|
26
|
Tripathy C, Zeng J, Zhou P, Donald BR. Protein loop closure using orientational restraints from NMR data. Proteins 2012; 80:433-53. [PMID: 22161780 PMCID: PMC3305838 DOI: 10.1002/prot.23207] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2011] [Revised: 08/23/2011] [Accepted: 09/06/2011] [Indexed: 11/12/2022]
Abstract
Protein loops often play important roles in biological functions. Modeling loops accurately is crucial to determining the functional specificity of a protein. Despite the recent progress in loop prediction approaches, which led to a number of algorithms over the past decade, few rigorous algorithmic approaches exist to model protein loops using global orientational restraints, such as those obtained from residual dipolar coupling (RDC) data in solution nuclear magnetic resonance (NMR) spectroscopy. In this article, we present a novel, sparse data, RDC-based algorithm, which exploits the mathematical interplay between RDC-derived sphero-conics and protein kinematics, and formulates the loop structure determination problem as a system of low-degree polynomial equations that can be solved exactly, in closed-form. The polynomial roots, which encode the candidate conformations, are searched systematically, using provable pruning strategies that triage the vast majority of conformations, to enumerate or prune all possible loop conformations consistent with the data; therefore, completeness is ensured. Results on experimental RDC datasets for four proteins, including human ubiquitin, FF2, DinI, and GB3, demonstrate that our algorithm can compute loops with higher accuracy, a three- to six-fold improvement in backbone RMSD, versus those obtained by traditional structure determination protocols on the same data. Excellent results were also obtained on synthetic RDC datasets for protein loops of length 4, 8, and 12 used in previous studies. These results suggest that our algorithm can be successfully applied to determine protein loop conformations, and hence, will be useful in high-resolution protein backbone structure determination, including loops, from sparse NMR data. Proteins 2012. © 2011 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
| | - Jianyang Zeng
- Department of Computer Science, Duke University, Durham, NC 27708, USA
| | - Pei Zhou
- Department of Biochemistry, Duke University Medical Center, Durham, NC 27710, USA
| | - Bruce Randall Donald
- Department of Computer Science, Duke University, Durham, NC 27708, USA
- Department of Biochemistry, Duke University Medical Center, Durham, NC 27710, USA
| |
Collapse
|
27
|
Sacan A, Ekins S, Kortagere S. Applications and limitations of in silico models in drug discovery. Methods Mol Biol 2012; 910:87-124. [PMID: 22821594 DOI: 10.1007/978-1-61779-965-5_6] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
Drug discovery in the late twentieth and early twenty-first century has witnessed a myriad of changes that were adopted to predict whether a compound is likely to be successful, or conversely enable identification of molecules with liabilities as early as possible. These changes include integration of in silico strategies for lead design and optimization that perform complementary roles to that of the traditional in vitro and in vivo approaches. The in silico models are facilitated by the availability of large datasets associated with high-throughput screening, bioinformatics algorithms to mine and annotate the data from a target perspective, and chemoinformatics methods to integrate chemistry methods into lead design process. This chapter highlights the applications of some of these methods and their limitations. We hope this serves as an introduction to in silico drug discovery.
Collapse
Affiliation(s)
- Ahmet Sacan
- School of Biomedical Engineering, Drexel University, Philadelphia, PA, USA
| | | | | |
Collapse
|
28
|
Joo H, Chavan AG, Day R, Lennox KP, Sukhanov P, Dahl DB, Vannucci M, Tsai J. Near-native protein loop sampling using nonparametric density estimation accommodating sparcity. PLoS Comput Biol 2011; 7:e1002234. [PMID: 22028638 PMCID: PMC3197639 DOI: 10.1371/journal.pcbi.1002234] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2011] [Accepted: 09/01/2011] [Indexed: 11/29/2022] Open
Abstract
Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM). Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD <2.0 Å), the DPM-HMM method performs as well or better than the best templates, demonstrating that our automated method recaptures these canonical loops without inclusion of any IgG specific terms or manual intervention. In cases with poor or few good templates (mean RMSD >7.0 Å), this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/. A protein's structure consists of elements of regular secondary structure connected by less regular stretches of loop segments. The irregularity of the loop structure makes loop modeling quite challenging. More accurate sampling of these loop conformations has a direct impact on protein modeling, design, function classification, as well as protein interactions. A method has been developed that extends a more comprehensive knowledge-based approach to producing models of the loop regions of protein structure. Most physical models cannot adequately sample the large conformational space, while the more discrete knowledge based libraries are conformationally limited. To address both of these problems, we introduce a novel statistical method that produces a continuous yet weighted estimation of loop conformational space from a discrete library of structures by using a Dirichlet process mixture of hidden Markov models (DPM-HMM). Applied to loop structure sampling, the results of a number of tests demonstrate that our approach quickly generates large numbers of candidates with near native loop conformations. Most significantly, in the cases where the template sampling is sparse and/or far from native conformations, the DPM-HMM method samples close to the native space and produces a population of accurate loop structures.
Collapse
Affiliation(s)
- Hyun Joo
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
| | - Archana G. Chavan
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
| | - Ryan Day
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
| | - Kristin P. Lennox
- Department of Statistics, Texas A&M University, College Station, Texas, United States of America
| | - Paul Sukhanov
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
| | - David B. Dahl
- Department of Statistics, Texas A&M University, College Station, Texas, United States of America
| | - Marina Vannucci
- Department of Statistics, Rice University, Houston, Texas, United States of America
| | - Jerry Tsai
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
- * E-mail:
| |
Collapse
|
29
|
Agarwal G, Mahajan S, Srinivasan N, de Brevern AG. Identification of local conformational similarity in structurally variable regions of homologous proteins using protein blocks. PLoS One 2011; 6:e17826. [PMID: 21445259 PMCID: PMC3060819 DOI: 10.1371/journal.pone.0017826] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2010] [Accepted: 02/15/2011] [Indexed: 11/18/2022] Open
Abstract
Structure comparison tools can be used to align related protein structures to identify structurally conserved and variable regions and to infer functional and evolutionary relationships. While the conserved regions often superimpose well, the variable regions appear non superimposable. Differences in homologous protein structures are thought to be due to evolutionary plasticity to accommodate diverged sequences during evolution. One of the kinds of differences between 3-D structures of homologous proteins is rigid body displacement. A glaring example is not well superimposed equivalent regions of homologous proteins corresponding to α-helical conformation with different spatial orientations. In a rigid body superimposition, these regions would appear variable although they may contain local similarity. Also, due to high spatial deviation in the variable region, one-to-one correspondence at the residue level cannot be determined accurately. Another kind of difference is conformational variability and the most common example is topologically equivalent loops of two homologues but with different conformations. In the current study, we present a refined view of the “structurally variable” regions which may contain local similarity obscured in global alignment of homologous protein structures. As structural alphabet is able to describe local structures of proteins precisely through Protein Blocks approach, conformational similarity has been identified in a substantial number of ‘variable’ regions in a large data set of protein structural alignments; optimal residue-residue equivalences could be achieved on the basis of Protein Blocks which led to improved local alignments. Also, through an example, we have demonstrated how the additional information on local backbone structures through protein blocks can aid in comparative modeling of a loop region. In addition, understanding on sequence-structure relationships can be enhanced through our approach. This has been illustrated through examples where the equivalent regions in homologous protein structures share sequence similarity to varied extent but do not preserve local structure.
Collapse
Affiliation(s)
- Garima Agarwal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Swapnil Mahajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, UAS-GKVK Campus, Bangalore, India
| | | | - Alexandre G. de Brevern
- Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), INSERM, U665, Paris, France
- Université Paris Diderot - Paris 7, UMR-S665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| |
Collapse
|
30
|
di Luccio E, Koehl P. A quality metric for homology modeling: the H-factor. BMC Bioinformatics 2011; 12:48. [PMID: 21291572 PMCID: PMC3213331 DOI: 10.1186/1471-2105-12-48] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2010] [Accepted: 02/04/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The analysis of protein structures provides fundamental insight into most biochemical functions and consequently into the cause and possible treatment of diseases. As the structures of most known proteins cannot be solved experimentally for technical or sometimes simply for time constraints, in silico protein structure prediction is expected to step in and generate a more complete picture of the protein structure universe. Molecular modeling of protein structures is a fast growing field and tremendous works have been done since the publication of the very first model. The growth of modeling techniques and more specifically of those that rely on the existing experimental knowledge of protein structures is intimately linked to the developments of high resolution, experimental techniques such as NMR, X-ray crystallography and electron microscopy. This strong connection between experimental and in silico methods is however not devoid of criticisms and concerns among modelers as well as among experimentalists. RESULTS In this paper, we focus on homology-modeling and more specifically, we review how it is perceived by the structural biology community and what can be done to impress on the experimentalists that it can be a valuable resource to them. We review the common practices and provide a set of guidelines for building better models. For that purpose, we introduce the H-factor, a new indicator for assessing the quality of homology models, mimicking the R-factor in X-ray crystallography. The methods for computing the H-factor is fully described and validated on a series of test cases. CONCLUSIONS We have developed a web service for computing the H-factor for models of a protein structure. This service is freely accessible at http://koehllab.genomecenter.ucdavis.edu/toolkit/h-factor.
Collapse
Affiliation(s)
- Eric di Luccio
- Computer Science Department, Room 4337, Genome Center, GBSF University of California Davis 451 East Health Sciences Drive Davis, CA 95616, USA.
| | | |
Collapse
|
31
|
Arnautova YA, Abagyan RA, Totrov M. Development of a new physics-based internal coordinate mechanics force field and its application to protein loop modeling. Proteins 2011; 79:477-98. [PMID: 21069716 PMCID: PMC3057902 DOI: 10.1002/prot.22896] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
We report the development of internal coordinate mechanics force field (ICMFF), new force field parameterized using a combination of experimental data for crystals of small molecules and quantum mechanics calculations. The main features of ICMFF include: (a) parameterization for the dielectric constant relevant to the condensed state (ε = 2) instead of vacuum, (b) an improved description of hydrogen-bond interactions using duplicate sets of van der Waals parameters for heavy atom-hydrogen interactions, and (c) improved backbone covalent geometry and energetics achieved using novel backbone torsional potentials and inclusion of the bond angles at the C(α) atoms into the internal variable set. The performance of ICMFF was evaluated through loop modeling simulations for 4-13 residue loops. ICMFF was combined with a solvent-accessible surface area solvation model optimized using a large set of loop decoys. Conformational sampling was carried out using the biased probability Monte Carlo method. Average/median backbone root-mean-square deviations of the lowest energy conformations from the native structures were 0.25/0.21 Å for four residues loops, 0.84/0.46 Å for eight residue loops, and 1.16/0.73 Å for 12 residue loops. To our knowledge, these results are significantly better than or comparable with those reported to date for any loop modeling method that does not take crystal packing into account. Moreover, the accuracy of our method is on par with the best previously reported results obtained considering the crystal environment. We attribute this success to the high accuracy of the new ICM force field achieved by meticulous parameterization, to the optimized solvent model, and the efficiency of the search method.
Collapse
Affiliation(s)
- Yelena A Arnautova
- Molsoft LLC, 3366 North Torrey Pines Court, Suite 300, La Jolla, California 92037, USA
| | | | | |
Collapse
|
32
|
Abstract
Loop modeling is crucial for high-quality homology model construction outside conserved secondary structure elements. Dozens of loop modeling protocols involving a range of database and ab initio search algorithms and a variety of scoring functions have been proposed. Knowledge-based loop modeling methods are very fast and some can successfully and reliably predict loops up to about eight residues long. Several recent ab initio loop simulation methods can be used to construct accurate models of loops up to 12-13 residues long, albeit at a substantial computational cost. Major current challenges are the simulations of loops longer than 12-13 residues, the modeling of multiple interacting flexible loops, and the sensitivity of the loop predictions to the accuracy of the loop environment.
Collapse
|
33
|
Ramya L, Nehru Viji S, Arun Prasad P, Kanagasabai V, Gautham N. MOLS sampling and its applications in structural biophysics. Biophys Rev 2010; 2:169-179. [PMID: 28510038 DOI: 10.1007/s12551-010-0039-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2010] [Accepted: 10/19/2010] [Indexed: 12/01/2022] Open
Abstract
This review describes the MOLS method and its applications. This computational method has been developed in our laboratory primarily to explore the conformational space of small peptides and identify features of interest, particularly the minima, i.e., the low energy conformations. A systematic "brute-force" search through the vast conformational space for such features faces the insurmountable problem of combinatorial explosion, whilst other techniques, e.g., Monte Carlo searches, are somewhat limited in their region of exploration and may be considered inexhaustive. The MOLS method, on the other hand, uses a sampling technique commonly employed in experimental design theory to identify a small sample of the conformational space that nevertheless retains information about the entire space. The information is extracted using a technique that is a variant of the self-consistent mean field technique, which has been used to identify, for example, the optimal set of side-chain conformations in a protein. Applications of the MOLS method to understand peptide structure, predict the structures of loops in proteins, predict three-dimensional structures of small proteins, and arrive at the best conformation, orientation, and positions of a small molecule ligand in a protein receptor site have all yielded satisfactory results.
Collapse
Affiliation(s)
- L Ramya
- Centre of Advanced Study in Crystallography and Biophysics, University of Madras, Chennai, 600025, India
| | - Shankaran Nehru Viji
- Centre of Advanced Study in Crystallography and Biophysics, University of Madras, Chennai, 600025, India
| | - Pandurangan Arun Prasad
- Institute of Structural and Molecular Biology and Crystallography, Department of Biological Sciences, Birkbeck College, University of London, London, UK
| | - Vadivel Kanagasabai
- Department of Orthopaedic Surgery, David Geffen School of Medicine, University of California, Los Angeles, CA, USA
| | - Namasivayam Gautham
- Centre of Advanced Study in Crystallography and Biophysics, University of Madras, Chennai, 600025, India.
| |
Collapse
|
34
|
Sellers BD, Nilmeier JP, Jacobson MP. Antibodies as a model system for comparative model refinement. Proteins 2010; 78:2490-505. [PMID: 20602354 DOI: 10.1002/prot.22757] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Predicting the conformations of loops is a critical aspect of protein comparative (homology) modeling. Despite considerable advances in developing loop prediction algorithms, refining loops in homology models remains challenging. In this work, we use antibodies as a model system to investigate strategies for more robustly predicting loop conformations when the protein model contains errors in the conformations of side chains and protein backbone surrounding the loop in question. Specifically, our test system consists of partial models of antibodies in which the "scaffold" (i.e., the portion other than the complementarity determining region, CDR, loops) retains native backbone conformation, whereas the CDR loops are predicted using a combination of knowledge-based modeling (H1, H2, L1, L2, and L3) and ab initio loop prediction (H3). H3 is the most variable of the CDRs. Using a previously published method, a test set of 10 shorter H3 loops (5-7 residues) are predicted to an average backbone (N-C alpha-C-O) RMSD of 2.7 A while 11 longer loops (8-9 residues) are predicted to 5.1 A, thus recapitulating the difficulties in refining loops in models. By contrast, in control calculations predicting the same loops in crystal structures, the same method reconstructs the loops to an average of 0.5 and 1.4 A for the shorter and longer loops, respectively. We modify the loop prediction method to improve the ability to sample near-native loop conformations in the models, primarily by reducing the sensitivity of the sampling to the loop surroundings, and allowing the other CDR loops to optimize with the H3 loop. The new method improves the average accuracy significantly to 1.3 A RMSD and 3.1 A RMSD for the shorter and longer loops, respectively. Finally, we present results predicting 8-10 residue loops within complete comparative models of five nonantibody proteins. While anecdotal, these mixed, full-model results suggest our approach is a promising step toward more accurately predicting loops in homology models. Furthermore, while significant challenges remain, our method is a potentially useful tool for predicting antibody structures based on a known Fv scaffold.
Collapse
Affiliation(s)
- Benjamin D Sellers
- Department of Pharmaceutical Chemistry, University of California, San Francisco, California 94158-2517, USA
| | | | | |
Collapse
|
35
|
Skliros A, Jernigan RL, Kloczkowski A. Models to Approximate the Motions of Protein Loops. J Chem Theory Comput 2010; 6:3249-3258. [PMID: 21031141 DOI: 10.1021/ct1001413] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We approximate the loop motions of various proteins by using a coarse-grained model and the theory of rubberlike elasticity of polymer chains. The loops are considered as chains where only the first and the last residues thereof are tethered by their connections to the main structure; while within the loop, the loop residues are connected only to their sequence neighbors. We applied these approximate models to five proteins. Our approximation shows that the loop motions can usually be computed locally which shows these motions are robust and not random. But most interestingly, the new method presented here can be used to compute the likely motions of loops that are missing in the structures.
Collapse
Affiliation(s)
- Aris Skliros
- L. H. Baker Center for Bioinformatics and Biological Statistics, Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | | | | |
Collapse
|
36
|
Application of biasing-potential replica-exchange simulations for loop modeling and refinement of proteins in explicit solvent. Proteins 2010; 78:2809-19. [DOI: 10.1002/prot.22796] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
37
|
Choi Y, Deane CM. FREAD revisited: Accurate loop structure prediction using a database search algorithm. Proteins 2010; 78:1431-40. [PMID: 20034110 DOI: 10.1002/prot.22658] [Citation(s) in RCA: 121] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Loops are the most variable regions of protein structure and are, in general, the least accurately predicted. Their prediction has been approached in two ways, ab initio and database search. In recent years, it has been thought that ab initio methods are more powerful. In light of the continued rapid expansion in the number of known protein structures, we have re-evaluated FREAD, a database search method and demonstrate that the power of database search methods may have been underestimated. We found that sequence similarity as quantified by environment specific substitution scores can be used to significantly improve prediction. In fact, FREAD performs appreciably better for an identifiable subset of loops (two thirds of shorter loops and half of the longer loops tested) than the ab initio methods of MODELLER, PLOP, and RAPPER. Within this subset, FREAD's predictive ability is length independent, in general, producing results within 2A RMSD, compared to an average of over 10A for loop length 20 for any of the other tested methods. We also benchmarked the prediction protocols on a set of 212 loops from the model structures in CASP 7 and 8. An extended version of FREAD is able to make predictions for 127 of these, it gives the best prediction of the methods tested in 61 of these cases. In examining FREAD's ability to predict in the model environment, we found that whole structure quality did not affect the quality of loop predictions.
Collapse
Affiliation(s)
- Yoonjoo Choi
- Department of Statistics, Oxford University, United Kingdom.
| | | |
Collapse
|
38
|
Pierri CL, Parisi G, Porcelli V. Computational approaches for protein function prediction: a combined strategy from multiple sequence alignment to molecular docking-based virtual screening. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2010; 1804:1695-712. [PMID: 20433957 DOI: 10.1016/j.bbapap.2010.04.008] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/19/2010] [Revised: 03/04/2010] [Accepted: 04/14/2010] [Indexed: 12/12/2022]
Abstract
The functional characterization of proteins represents a daily challenge for biochemical, medical and computational sciences. Although finally proved on the bench, the function of a protein can be successfully predicted by computational approaches that drive the further experimental assays. Current methods for comparative modeling allow the construction of accurate 3D models for proteins of unknown structure, provided that a crystal structure of a homologous protein is available. Binding regions can be proposed by using binding site predictors, data inferred from homologous crystal structures, and data provided from a careful interpretation of the multiple sequence alignment of the investigated protein and its homologs. Once the location of a binding site has been proposed, chemical ligands that have a high likelihood of binding can be identified by using ligand docking and structure-based virtual screening of chemical libraries. Most docking algorithms allow building a list sorted by energy of the lowest energy docking configuration for each ligand of the library. In this review the state-of-the-art of computational approaches in 3D protein comparative modeling and in the study of protein-ligand interactions is provided. Furthermore a possible combined/concerted multistep strategy for protein function prediction, based on multiple sequence alignment, comparative modeling, binding region prediction, and structure-based virtual screening of chemical libraries, is described by using suitable examples. As practical examples, Abl-kinase molecular modeling studies, HPV-E6 protein multiple sequence alignment analysis, and some other model docking-based characterization reports are briefly described to highlight the importance of computational approaches in protein function prediction.
Collapse
Affiliation(s)
- Ciro Leonardo Pierri
- Department of Pharmaco-Biology, Laboratory of Biochemistry and Molecular Biology, University of Bari, Va E. Orabona, 4 - 70125 Bari, Italy.
| | | | | |
Collapse
|
39
|
Al Nasr K, Sun W, He J. Structure prediction for the helical skeletons detected from the low resolution protein density map. BMC Bioinformatics 2010; 11 Suppl 1:S44. [PMID: 20122218 PMCID: PMC3009517 DOI: 10.1186/1471-2105-11-s1-s44] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
Background The current advances in electron cryo-microscopy technique have made it possible to obtain protein density maps at about 6-10 Å resolution. Although it is hard to derive the protein chain directly from such a low resolution map, the location of the secondary structures such as helices and strands can be computationally detected. It has been demonstrated that such low-resolution map can be used during the protein structure prediction process to enhance the structure prediction. Results We have developed an approach to predict the 3-dimensional structure for the helical skeletons that can be detected from the low resolution protein density map. This approach does not require the construction of the entire chain and distinguishes the structures based on the conformation of the helices. A test with 35 low resolution density maps shows that the highest ranked structure with the correct topology can be found within the top 1% of the list ranked by the effective energy formed by the helices. Conclusion The results in this paper suggest that it is possible to eliminate the great majority of the bad conformations of the helices even without the construction of the entire chain of the protein. For many proteins, the effective contact energy formed by the secondary structures alone can distinguish a small set of likely structures from the pool.
Collapse
Affiliation(s)
- Kamal Al Nasr
- Department of Computer Science, Old Dominion University, Norfolk, VA 23529, USA.
| | | | | |
Collapse
|
40
|
Abstract
Functional characterization of a protein is often facilitated by its 3D structure. However, the fraction of experimentally known 3D models is currently less than 1% due to the inherently time-consuming and complicated nature of structure determination techniques. Computational approaches are employed to bridge the gap between the number of known sequences and that of 3D models. Template-based protein structure modeling techniques rely on the study of principles that dictate the 3D structure of natural proteins from the theory of evolution viewpoint. Strategies for template-based structure modeling will be discussed with a focus on comparative modeling, by reviewing techniques available for all the major steps involved in the comparative modeling pipeline.
Collapse
Affiliation(s)
- Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|
41
|
Hvidsten TR, Kryshtafovych A, Fidelis K. Local descriptors of protein structure: a systematic analysis of the sequence-structure relationship in proteins using short- and long-range interactions. Proteins 2009; 75:870-84. [PMID: 19025980 DOI: 10.1002/prot.22296] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Local protein structure representations that incorporate long-range contacts between residues are often considered in protein structure comparison but have found relatively little use in structure prediction where assembly from single backbone fragments dominates. Here, we introduce the concept of local descriptors of protein structure to characterize local neighborhoods of amino acids including short- and long-range interactions. We build a library of recurring local descriptors and show that this library is general enough to allow assembly of unseen protein structures. The library could on average re-assemble 83% of 119 unseen structures, and showed little or no performance decrease between homologous targets and targets with folds not represented among domains used to build it. We then systematically evaluate the descriptor library to establish the level of the sequence signal in sets of protein fragments of similar geometrical conformation. In particular, we test whether that signal is strong enough to facilitate correct assignment and alignment of these local geometries to new sequences. We use the signal to assign descriptors to a test set of 479 sequences with less than 40% sequence identity to any domain used to build the library, and show that on average more than 50% of the backbone fragments constituting descriptors can be correctly aligned. We also use the assigned descriptors to infer SCOP folds, and show that correct predictions can be made in many of the 151 cases where PSI-BLAST was unable to detect significant sequence similarity to proteins in the library. Although the combinatorial problem of simultaneously aligning several fragments to sequence is a major bottleneck compared with single fragment methods, the advantage of the current approach is that correct alignments imply correct long range distance constraints. The lack of these constraints is most likely the major reason why structure prediction methods fail to consistently produce adequate models when good templates are unavailable or undetectable. Thus, we believe that the current study offers new and valuable insight into the prediction of sequence-structure relationships in proteins.
Collapse
|
42
|
Liu P, Zhu F, Rassokhin DN, Agrafiotis DK. A self-organizing algorithm for modeling protein loops. PLoS Comput Biol 2009; 5:e1000478. [PMID: 19696883 PMCID: PMC2719875 DOI: 10.1371/journal.pcbi.1000478] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2009] [Accepted: 07/20/2009] [Indexed: 11/19/2022] Open
Abstract
Protein loops, the flexible short segments connecting two stable secondary
structural units in proteins, play a critical role in protein structure and
function. Constructing chemically sensible conformations of protein loops that
seamlessly bridge the gap between the anchor points without introducing any
steric collisions remains an open challenge. A variety of algorithms have been
developed to tackle the loop closure problem, ranging from inverse kinematics to
knowledge-based approaches that utilize pre-existing fragments extracted from
known protein structures. However, many of these approaches focus on the
generation of conformations that mainly satisfy the fixed end point condition,
leaving the steric constraints to be resolved in subsequent post-processing
steps. In the present work, we describe a simple solution that simultaneously
satisfies not only the end point and steric conditions, but also chirality and
planarity constraints. Starting from random initial atomic coordinates, each
individual conformation is generated independently by using a simple alternating
scheme of pairwise distance adjustments of randomly chosen atoms, followed by
fast geometric matching of the conformationally rigid components of the
constituent amino acids. The method is conceptually simple, numerically stable
and computationally efficient. Very importantly, additional constraints, such as
those derived from NMR experiments, hydrogen bonds or salt bridges, can be
incorporated into the algorithm in a straightforward and inexpensive way, making
the method ideal for solving more complex multi-loop problems. The remarkable
performance and robustness of the algorithm are demonstrated on a set of protein
loops of length 4, 8, and 12 that have been used in previous studies. Protein loops play an important role in protein function, such as ligand binding,
recognition, and allosteric regulation. However, due to their flexibility, it is
notoriously difficult to determine their 3D structures using traditional
experimental techniques. As a result, one can often find protein structures with
missing loops in the Protein Data Bank. Their sequence variability also presents
a particular challenge for homology modeling methods, which can only yield good
overall structures given sufficient sequence identity and good experimental
reference structures. Despite extensive research, the construction of protein
loop 3D structures remains an open problem, since a sensible conformation should
seamlessly bridge the anchor points without introducing steric clashes within
the loop itself or between the loop and its surroundings environment. Here, we
present a conceptually simple, mathematically straightforward, numerically
robust and computationally efficient approach for building protein loop
conformations that simultaneously satisfy end-point, steric, planar and chiral
constraints. More importantly, additional constraints derived from experimental
sources can be incorporated in a straightforward manner, allowing the processing
of more complex structures involving multiple interlocking loops.
Collapse
Affiliation(s)
- Pu Liu
- Johnson & Johnson Pharmaceutical Research and Development, Exton,
Pennsylvania, United States of America
- * E-mail: (PL); (DKA)
| | - Fangqiang Zhu
- Johnson & Johnson Pharmaceutical Research and Development, Exton,
Pennsylvania, United States of America
| | - Dmitrii N. Rassokhin
- Johnson & Johnson Pharmaceutical Research and Development, Exton,
Pennsylvania, United States of America
| | - Dimitris K. Agrafiotis
- Johnson & Johnson Pharmaceutical Research and Development, Exton,
Pennsylvania, United States of America
- * E-mail: (PL); (DKA)
| |
Collapse
|
43
|
Cui M, Mezei M, Osman R. Prediction of protein loop structures using a local move Monte Carlo approach and a grid-based force field. Protein Eng Des Sel 2008; 21:729-35. [PMID: 18957407 PMCID: PMC2597363 DOI: 10.1093/protein/gzn056] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2008] [Revised: 09/18/2008] [Accepted: 09/23/2008] [Indexed: 11/14/2022] Open
Abstract
We have developed an improved local move Monte Carlo (LMMC) loop sampling approach for loop predictions. The method generates loop conformations based on simple moves of the torsion angles of side chains and local moves of backbone of loops. To reduce the computational costs for energy evaluations, we developed a grid-based force field to represent the protein environment and solvation effect. Simulated annealing has been used to enhance the efficiency of the LMMC loop sampling and identify low-energy loop conformations. The prediction quality is evaluated on a set of protein loops with known crystal structure that has been previously used by others to test different loop prediction methods. The results show that this approach can reproduce the experimental results with the root mean square deviation within 1.8 A for all the test cases. The LMMC loop prediction approach developed here could be useful for improvement in the quality the loop regions in homology models, flexible protein-ligand and protein-protein docking studies.
Collapse
Affiliation(s)
- Meng Cui
- Department of Structural and Chemical Biology, Mount Sinai School of Medicine, NYU, Box 1218, New York, NY 10029
- Department of Physiology and Biophysics, Virginia Commonwealth University, 1101 East Marshall Street, PO Box 980551, Richmond, VA 23298, USA
| | - Mihaly Mezei
- Department of Structural and Chemical Biology, Mount Sinai School of Medicine, NYU, Box 1218, New York, NY 10029
| | - Roman Osman
- Department of Structural and Chemical Biology, Mount Sinai School of Medicine, NYU, Box 1218, New York, NY 10029
| |
Collapse
|
44
|
Yao P, Dhanik A, Marz N, Propper R, Kou C, Liu G, van den Bedem H, Latombe JC, Halperin-Landsberg I, Altman RB. Efficient algorithms to explore conformation spaces of flexible protein loops. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:534-45. [PMID: 18989041 PMCID: PMC2794838 DOI: 10.1109/tcbb.2008.96] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
Several applications in biology - e.g., incorporation of protein flexibility in ligand docking algorithms, interpretation of fuzzy X-ray crystallographic data, and homology modeling - require computing the internal parameters of a flexible fragment (usually, a loop) of a protein in order to connect its termini to the rest of the protein without causing any steric clash. One must often sample many such conformations in order to explore and adequately represent the conformational range of the studied loop. While sampling must be fast, it is made difficult by the fact that two conflicting constraints - kinematic closure and clash avoidance - must be satisfied concurrently. This paper describes two efficient and complementary sampling algorithms to explore the space of closed clash-free conformations of a flexible protein loop. The "seed sampling" algorithm samples broadly from this space, while the "deformation sampling" algorithm uses seed conformations as starting points to explore the conformation space around them at a finer grain. Computational results are presented for various loops ranging from 5 to 25 residues. More specific results also show that the combination of the sampling algorithms with a functional site prediction software (FEATURE) makes it possible to compute and recognize calcium-binding loop conformations. The sampling algorithms are implemented in a toolkit (LoopTK), which is available at https://simtk.org/home/looptk.
Collapse
Affiliation(s)
- Peggy Yao
- The Computer Science and Biomedical Informatics Departments, Stanford University, S240 Clark Center, 318 Campus Drive, Stanford, CA 94305.
| | - Ankur Dhanik
- The Computer Science and Mechanical Engineering Departments, Stanford University, S245 Clark Center, 318 Campus Drive, Stanford, CA 94305.
| | - Nathan Marz
- The Computer Science Department, Stanford University, S245 Clark Center, 318 Campus Drive, Stanford, CA 94305.
| | - Ryan Propper
- The Computer Science Department, Stanford University, S245 Clark Center, 318 Campus Drive, Stanford, CA 94305.
| | - Charles Kou
- The Computer Science Department, Stanford University, S245 Clark Center, 318 Campus Drive, Stanford, CA 94305.
| | - Guanfeng Liu
- The Computer Science Department, Stanford University, S245 Clark Center, 318 Campus Drive, Stanford, CA 94305.
| | - Henry van den Bedem
- The Stanford Linear Accelerator Center, SSRL/Joint Center for Structural Genomics, MS 69, 2575 Sand Hill Road, Menlo Park, CA 94025.
| | - Jean-Claude Latombe
- The Computer Science Department, Stanford University, S245 Clark Center, 318 Campus Drive, Stanford, CA 94305.
| | - Inbal Halperin-Landsberg
- The Department of Genetics, Stanford University, S240 Clark Center, 318 Campus Drive, Stanford, CA 94305.
| | - Russ Biagio Altman
- The Department of Bioengineering, Stanford University, 318 Campus Drive S172, Stanford, CA 94305-5444.
| |
Collapse
|
45
|
Sellers BD, Zhu K, Zhao S, Friesner RA, Jacobson MP. Toward better refinement of comparative models: predicting loops in inexact environments. Proteins 2008; 72:959-71. [PMID: 18300241 DOI: 10.1002/prot.21990] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Achieving atomic-level accuracy in comparative protein models is limited by our ability to refine the initial, homolog-derived model closer to the native state. Despite considerable effort, progress in developing a generalized refinement method has been limited. In contrast, methods have been described that can accurately reconstruct loop conformations in native protein structures. We hypothesize that loop refinement in homology models is much more difficult than loop reconstruction in crystal structures, in part, because side-chain, backbone, and other structural inaccuracies surrounding the loop create a challenging sampling problem; the loop cannot be refined without simultaneously refining adjacent portions. In this work, we single out one sampling issue in an artificial but useful test set and examine how loop refinement accuracy is affected by errors in surrounding side-chains. In 80 high-resolution crystal structures, we first perturbed 6-12 residue loops away from the crystal conformation, and placed all protein side chains in non-native but low energy conformations. Even these relatively small perturbations in the surroundings made the loop prediction problem much more challenging. Using a previously published loop prediction method, median backbone (N-Calpha-C-O) RMSD's for groups of 6, 8, 10, and 12 residue loops are 0.3/0.6/0.4/0.6 A, respectively, on native structures and increase to 1.1/2.2/1.5/2.3 A on the perturbed cases. We then augmented our previous loop prediction method to simultaneously optimize the rotamer states of side chains surrounding the loop. Our results show that this augmented loop prediction method can recover the native state in many perturbed structures where the previous method failed; the median RMSD's for the 6, 8, 10, and 12 residue perturbed loops improve to 0.4/0.8/1.1/1.2 A. Finally, we highlight three comparative models from blind tests, in which our new method predicted loops closer to the native conformation than first modeled using the homolog template, a task generally understood to be difficult. Although many challenges remain in refining full comparative models to high accuracy, this work offers a methodical step toward that goal.
Collapse
Affiliation(s)
- Benjamin D Sellers
- Graduate Group in Biophysics, University of California, San Francisco, California 94158-2517, USA
| | | | | | | | | |
Collapse
|
46
|
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using MODELLER. ACTA ACUST UNITED AC 2008; Chapter 2:Unit 2.9. [PMID: 18429317 DOI: 10.1002/0471140864.ps0209s50] [Citation(s) in RCA: 758] [Impact Index Per Article: 44.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Functional characterization of a protein sequence is a common goal in biology, and is usually facilitated by having an accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Narayanan Eswar
- University of California at San Francisco, San Francisco, California, USA
| | | | | | | | | | | | | | | |
Collapse
|
47
|
Olson MA, Feig M, Brooks CL. Prediction of protein loop conformations using multiscale modeling methods with physical energy scoring functions. J Comput Chem 2008; 29:820-31. [PMID: 17876760 DOI: 10.1002/jcc.20827] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
This article examines ab initio methods for the prediction of protein loops by a computational strategy of multiscale conformational sampling and physical energy scoring functions. Our approach consists of initial sampling of loop conformations from lattice-based low-resolution models followed by refinement using all-atom simulations. To allow enhanced conformational sampling, the replica exchange method was implemented. Physical energy functions based on CHARMM19 and CHARMM22 parameterizations with generalized Born (GB) solvent models were applied in scoring loop conformations extracted from the lattice simulations and, in the case of all-atom simulations, the ensemble of conformations were generated and scored with these models. Predictions are reported for 25 loop segments, each eight residues long and taken from a diverse set of 22 protein structures. We find that the simulations generally sampled conformations with low global root-mean-square-deviation (RMSD) for loop backbone coordinates from the known structures, whereas clustering conformations in RMSD space and scoring detected less favorable loop structures. Specifically, the lattice simulations sampled basins that exhibited an average global RMSD of 2.21 +/- 1.42 A, whereas clustering and scoring the loop conformations determined an RMSD of 3.72 +/- 1.91 A. Using CHARMM19/GB to refine the lattice conformations improved the sampling RMSD to 1.57 +/- 0.98 A and detection to 2.58 +/- 1.48 A. We found that further improvement could be gained from extending the upper temperature in the all-atom refinement from 400 to 800 K, where the results typically yield a reduction of approximately 1 A or greater in the RMSD of the detected loop. Overall, CHARMM19 with a simple pairwise GB solvent model is more efficient at sampling low-RMSD loop basins than CHARMM22 with a higher-resolution modified analytical GB model; however, the latter simulation method provides a more accurate description of the all-atom energy surface, yet demands a much greater computational cost.
Collapse
Affiliation(s)
- Mark A Olson
- Department of Cell Biology and Biochemistry, U.S. Army Medical Research Institute of Infectious Diseases, Frederick, Maryland 21702, USA.
| | | | | |
Collapse
|
48
|
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using Modeller. ACTA ACUST UNITED AC 2008; Chapter 5:Unit-5.6. [PMID: 18428767 DOI: 10.1002/0471250953.bi0506s15] [Citation(s) in RCA: 1805] [Impact Index Per Article: 106.2] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Narayanan Eswar
- University of California at San Francisco San Francisco, California
| | - Ben Webb
- University of California at San Francisco San Francisco, California
| | | | - M S Madhusudhan
- University of California at San Francisco San Francisco, California
| | - David Eramian
- University of California at San Francisco San Francisco, California
| | - Min-Yi Shen
- University of California at San Francisco San Francisco, California
| | - Ursula Pieper
- University of California at San Francisco San Francisco, California
| | - Andrej Sali
- University of California at San Francisco San Francisco, California
| |
Collapse
|
49
|
A historical perspective of template-based protein structure prediction. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 413:3-42. [PMID: 18075160 DOI: 10.1007/978-1-59745-574-9_1] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
This chapter presents a broad and a historical overview of the problem of protein structure prediction. Different structure prediction methods, including homology modeling, fold recognition (FR)/protein threading, ab initio/de novo approaches, and hybrid techniques involving multiple types of approaches, are introduced in a historical context. The progress of the field as a whole, especially in the threading/FR area, as reflected by the CASP/CAFASP contests, is reviewed. At the end of the chapter, we discuss the challenging issues ahead in the field of protein structure prediction.
Collapse
|
50
|
Abstract
Genome sequencing projects have resulted in a rapid increase in the number of known protein sequences. In contrast, only about one-hundredth of these sequences have been characterized using experimental structure determination methods. Computational protein structure modeling techniques have the potential to bridge this sequence-structure gap. This chapter presents an example that illustrates the use of MODELLER to construct a comparative model for a protein with unknown structure. Automation of similar protocols (correction of protcols) has resulted in models of useful accuracy for domains in more than half of all known protein sequences.
Collapse
Affiliation(s)
- Narayanan Eswar
- Department of Biopharmaceutical Sciences and California Institute for Quantitative Biomedical Research, University of California at San Francisco, San Francisco, CA, USA
| | | | | | | | | |
Collapse
|