1
|
Kryshtafovych A, Montelione GT, Rigden DJ, Mesdaghi S, Karaca E, Moult J. Breaking the conformational ensemble barrier: Ensemble structure modeling challenges in CASP15. Proteins 2023; 91:1903-1911. [PMID: 37872703 PMCID: PMC10840738 DOI: 10.1002/prot.26584] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 08/14/2023] [Indexed: 10/25/2023]
Abstract
For the first time, the 2022 CASP (Critical Assessment of Structure Prediction) community experiment included a section on computing multiple conformations for protein and RNA structures. There was full or partial success in reproducing the ensembles for four of the nine targets, an encouraging result. For protein structures, enhanced sampling with variations of the AlphaFold2 deep learning method was by far the most effective approach. One substantial conformational change caused by a single mutation across a complex interface was accurately reproduced. In two other assembly modeling cases, methods succeeded in sampling conformations near to the experimental ones even though environmental factors were not included in the calculations. An experimentally derived flexibility ensemble allowed a single accurate RNA structure model to be identified. Difficulties included how to handle sparse or low-resolution experimental data and the current lack of effective methods for modeling RNA/protein complexes. However, these and other obstacles appear addressable.
Collapse
Affiliation(s)
| | - Gaetano T Montelione
- Department of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, New York, USA
| | - Daniel J Rigden
- Institute of Systems, Molecular, and Integrative Biology, University of Liverpool, Liverpool, UK
| | - Shahram Mesdaghi
- Institute of Systems, Molecular, and Integrative Biology, University of Liverpool, Liverpool, UK
- Computational Biology Facility, MerseyBio, University of Liverpool, Liverpool, UK
| | - Ezgi Karaca
- Izmir Biomedicine and Genome Center, Izmir, Turkey
- Izmir International Biomedicine and Genome Institute, Dokuz Eylul University, Izmir, Turkey
| | - John Moult
- Institute for Bioscience and Biotechnology Research, Rockville, Maryland, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, Maryland, USA
| |
Collapse
|
2
|
Zhang D, Chen SJ, Zhou R. Modeling Noncanonical RNA Base Pairs by a Coarse-Grained IsRNA2 Model. J Phys Chem B 2021; 125:11907-11915. [PMID: 34694128 DOI: 10.1021/acs.jpcb.1c07288] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Noncanonical base pairs contribute crucially to the three-dimensional architecture of large RNA molecules; however, how to accurately model them remains an open challenge in RNA 3D structure prediction. Here, we report a promising coarse-grained (CG) IsRNA2 model to predict noncanonical base pairs in large RNAs through molecular dynamics simulations. By introducing a five-bead per nucleotide CG representation to reserve the three interacting edges of nucleobases, IsRNA2 accurately models various base-pairing interactions, including both canonical and noncanonical base pairs. A benchmark test indicated that IsRNA2 achieves a comparable performance to the atomic model in de novo modeling of noncanonical RNA structures. In addition, IsRNA2 was able to refine the 3D structure predictions for large RNAs in RNA-puzzle challenges. Finally, the graphics processing unit acceleration was introduced to speed up the sampling efficiency in IsRNA2 for very large RNA molecules. Therefore, the CG IsRNA2 model reported here offers a reliable approach to predict the structures and dynamics of large RNAs.
Collapse
Affiliation(s)
- Dong Zhang
- College of Life Sciences and Institute of Quantitative Biology, Zhejiang University, Hangzhou 310058, China
| | - Shi-Jie Chen
- Department of Physics, Department of Biochemistry, and Institute of Data Science and Informatics, University of Missouri, Columbia, Missouri 65211, United States
| | - Ruhong Zhou
- College of Life Sciences and Institute of Quantitative Biology, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
3
|
Structural Modeling and Ligand-Binding Prediction for Analysis of Structure-Unknown and Function-Unknown Proteins Using FORTE Alignment and PoSSuM Pocket Search. Methods Mol Biol 2020. [PMID: 32621216 DOI: 10.1007/978-1-0716-0708-4_1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2023]
Abstract
Structural data of biomolecules, such as those of proteins and nucleic acids, provide much information for estimation of their functions. For structure-unknown proteins, structure information is obtainable by modeling their structures based on sequence similarity of proteins. Moreover, information related to ligands or ligand-binding sites is necessary to elucidate protein functions because the binding of ligands can engender not only the activation and inactivation of the proteins but also the modification of protein functions. This chapter presents methods using our profile-profile alignment server FORTE and the PoSSuM ligand-binding site database for prediction of the structure and potential ligand-binding sites of structure-unknown and function-unknown proteins, aimed at protein function prediction.
Collapse
|
4
|
Gadzała M, Kalinowska B, Banach M, Konieczny L, Roterman I. Determining protein similarity by comparing hydrophobic core structure. Heliyon 2017; 3:e00235. [PMID: 28217749 PMCID: PMC5300504 DOI: 10.1016/j.heliyon.2017.e00235] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2016] [Revised: 12/06/2016] [Accepted: 01/19/2017] [Indexed: 12/19/2022] Open
Abstract
Formal assessment of structural similarity is - next to protein structure prediction - arguably the most important unsolved problem in proteomics. In this paper we propose a similarity criterion based on commonalities between the proteins' hydrophobic cores. The hydrophobic core emerges as a result of conformational changes through which each residue reaches its intended position in the protein body. A quantitative criterion based on this phenomenon has been proposed in the framework of the CASP challenge. The structure of the hydrophobic core - including the placement and scope of any deviations from the idealized model - may indirectly point to areas of importance from the point of view of the protein's biological function. Our analysis focuses on an arbitrarily selected target from the CASP11 challenge. The proposed measure, while compliant with CASP criteria (70-80% correlation), involves certain adjustments which acknowledge the presence of factors other than simple spatial arrangement of solids.
Collapse
Affiliation(s)
- M. Gadzała
- AGH - Academic Computer Center − Cyfronet, Nawojki 11, Kraków 30-950, Poland
| | - B. Kalinowska
- Faculty of Physics, Astronomy, Applied Computer Science − Jagiellonian University, Łojasiewicza 11, Kraków 30-348, Poland
| | - M. Banach
- Department of Bioinformatics and Telemedicine, Jagiellonian University − Medical College, Łazarza 16, Krakow 31-530, Poland
| | - L. Konieczny
- Chair of Medical Biochemistry, Jagiellonian University − Medical College, Kopernika 7, Kraków 31-034, Poland
| | - I. Roterman
- Department of Bioinformatics and Telemedicine, Jagiellonian University − Medical College, Łazarza 16, Krakow 31-530, Poland
| |
Collapse
|
5
|
Shahlaei M, Mousavi A. A Conformational Analysis Study on the Melanocortin 4 Receptor Using Multiple Molecular Dynamics Simulations. Chem Biol Drug Des 2015; 86:309-21. [DOI: 10.1111/cbdd.12495] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2014] [Revised: 05/29/2014] [Accepted: 06/13/2014] [Indexed: 12/28/2022]
Affiliation(s)
- Mohsen Shahlaei
- Novel Drug Delivery Research Center; School of Pharmacy; Kermanshah University of Medical Sciences; Parastar Bolvar 6734667149 Kermanshah Iran
| | - Atefeh Mousavi
- Student Research Committee; School of Pharmacy; Kermanshah University of Medical Sciences; Parastar Bolvar 6734667149 Kermanshah Iran
| |
Collapse
|
6
|
Khoury GA, Liwo A, Khatib F, Zhou H, Chopra G, Bacardit J, Bortot LO, Faccioli RA, Deng X, He Y, Krupa P, Li J, Mozolewska MA, Sieradzan AK, Smadbeck J, Wirecki T, Cooper S, Flatten J, Xu K, Baker D, Cheng J, Delbem ACB, Floudas CA, Keasar C, Levitt M, Popović Z, Scheraga HA, Skolnick J, Crivelli SN, Players F. WeFold: a coopetition for protein structure prediction. Proteins 2014; 82:1850-68. [PMID: 24677212 PMCID: PMC4249725 DOI: 10.1002/prot.24538] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2013] [Revised: 01/25/2014] [Accepted: 02/08/2014] [Indexed: 12/19/2022]
Abstract
The protein structure prediction problem continues to elude scientists. Despite the introduction of many methods, only modest gains were made over the last decade for certain classes of prediction targets. To address this challenge, a social-media based worldwide collaborative effort, named WeFold, was undertaken by 13 labs. During the collaboration, the laboratories were simultaneously competing with each other. Here, we present the first attempt at "coopetition" in scientific research applied to the protein structure prediction and refinement problems. The coopetition was possible by allowing the participating labs to contribute different components of their protein structure prediction pipelines and create new hybrid pipelines that they tested during CASP10. This manuscript describes both successes and areas needing improvement as identified throughout the first WeFold experiment and discusses the efforts that are underway to advance this initiative. A footprint of all contributions and structures are publicly accessible at http://www.wefold.org.
Collapse
Affiliation(s)
- George A. Khoury
- Department of Chemical and Biological Engineering, Princeton University, USA
| | - Adam Liwo
- Faculty of Chemistry, University of Gdansk, Poland
| | - Firas Khatib
- Department of Biochemistry, University of Washington, USA
| | - Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, USA
| | - Gaurav Chopra
- Department of Structural Biology, School of Medicine, Stanford University, USA
- Diabetes Center, School of Medicine, University of California San Francisco (UCSF), USA
| | - Jaume Bacardit
- School of Computing Science, Newcastle University, United Kingdom
| | - Leandro O. Bortot
- Laboratory of Biological Physics, Faculty of Pharmaceutical Sciences at Ribeirão Preto, University of São Paulo, Brazil
| | - Rodrigo A. Faccioli
- Institute of Mathematical and Computer Sciences, University of São Paulo, Brazil
| | - Xin Deng
- Department of Computer Science, University of Missouri, USA
| | - Yi He
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | - Pawel Krupa
- Faculty of Chemistry, University of Gdansk, Poland
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | - Jilong Li
- Department of Computer Science, University of Missouri, USA
| | - Magdalena A. Mozolewska
- Faculty of Chemistry, University of Gdansk, Poland
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | | | - James Smadbeck
- Department of Chemical and Biological Engineering, Princeton University, USA
| | - Tomasz Wirecki
- Faculty of Chemistry, University of Gdansk, Poland
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | - Seth Cooper
- Center for Game Science, Department of Computer Science & Engineering, University of Washington, USA
| | - Jeff Flatten
- Center for Game Science, Department of Computer Science & Engineering, University of Washington, USA
| | - Kefan Xu
- Center for Game Science, Department of Computer Science & Engineering, University of Washington, USA
| | - David Baker
- Department of Biochemistry, University of Washington, USA
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, USA
| | | | | | - Chen Keasar
- Departments of Computer Science and Life Sciences, Ben Gurion University of the Negev, Israel
| | - Michael Levitt
- Department of Structural Biology, School of Medicine, Stanford University, USA
| | - Zoran Popović
- Center for Game Science, Department of Computer Science & Engineering, University of Washington, USA
| | - Harold A. Scheraga
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, USA
| | | | | |
Collapse
|
7
|
Moult J, Fidelis K, Kryshtafovych A, Schwede T, Tramontano A. Critical assessment of methods of protein structure prediction (CASP)--round x. Proteins 2014; 82 Suppl 2:1-6. [PMID: 24344053 PMCID: PMC4394854 DOI: 10.1002/prot.24452] [Citation(s) in RCA: 312] [Impact Index Per Article: 31.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/16/2013] [Accepted: 10/21/2013] [Indexed: 12/28/2022]
Abstract
This article is an introduction to the special issue of the journal PROTEINS, dedicated to the tenth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. The 10 CASP experiments span almost 20 years of progress in the field of protein structure modeling, and there have been enormous advances in methods and model accuracy in that period. Notable in this round is the first sustained improvement of models with refinement methods, using molecular dynamics. For the first time, we tested the ability of modeling methods to make use of sparse experimental three-dimensional contact information, such as may be obtained from new experimental techniques, with encouraging results. On the other hand, new contact prediction methods, though holding considerable promise, have yet to make an impact in CASP testing. The nature of CASP targets has been changing in recent CASPs, reflecting shifts in experimental structural biology, with more irregular structures, more multi-domain and multi-subunit structures, and less standard versions of known folds. When allowance is made for these factors, we continue to see steady progress in the overall accuracy of models, particularly resulting from improvement of non-template regions.
Collapse
Affiliation(s)
- John Moult
- Institute for Bioscience and Biotechnology Research, and Department of Cell Biology and Molecular Genetics, University of Maryland, Rockville, Maryland 20850
| | | | | | - Torsten Schwede
- University of Basel, Biozentrum & SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Anna Tramontano
- Department of Physics and Istituto Pasteur-Fondazione Cenci Bolognetti, Sapienza University of Rome, 00185 Rome, Italy
| |
Collapse
|
8
|
Abstract
This article is an introduction to the special issue of the journal PROTEINS, dedicated to the tenth Critical Assessment of Structure Prediction (CASP) experiment to assess the state of the art in protein structure modeling. The article describes the conduct of the experiment, the categories of prediction included, and outlines the evaluation and assessment procedures. The 10 CASP experiments span almost 20 years of progress in the field of protein structure modeling, and there have been enormous advances in methods and model accuracy in that period. Notable in this round is the first sustained improvement of models with refinement methods, using molecular dynamics. For the first time, we tested the ability of modeling methods to make use of sparse experimental three-dimensional contact information, such as may be obtained from new experimental techniques, with encouraging results. On the other hand, new contact prediction methods, though holding considerable promise, have yet to make an impact in CASP testing. The nature of CASP targets has been changing in recent CASPs, reflecting shifts in experimental structural biology, with more irregular structures, more multi-domain and multi-subunit structures, and less standard versions of known folds. When allowance is made for these factors, we continue to see steady progress in the overall accuracy of models, particularly resulting from improvement of non-template regions.
Collapse
|
9
|
Roy A, Taddese B, Vohra S, Thimmaraju PK, Illingworth CJR, Simpson LM, Mukherjee K, Reynolds CA, Chintapalli SV. Identifying subset errors in multiple sequence alignments. J Biomol Struct Dyn 2013; 32:364-71. [PMID: 23527867 DOI: 10.1080/07391102.2013.770371] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Multiple sequence alignment (MSA) accuracy is important, but there is no widely accepted method of judging the accuracy that different alignment algorithms give. We present a simple approach to detecting two types of error, namely block shifts and the misplacement of residues within a gap. Given a MSA, subsets of very similar sequences are generated through the use of a redundancy filter, typically using a 70-90% sequence identity cut-off. Subsets thus produced are typically small and degenerate, and errors can be easily detected even by manual examination. The errors, albeit minor, are inevitably associated with gaps in the alignment, and so the procedure is particularly relevant to homology modelling of protein loop regions. The usefulness of the approach is illustrated in the context of the universal but little known [K/R]KLH motif that occurs in intracellular loop 1 of G protein coupled receptors (GPCR); other issues relevant to GPCR modelling are also discussed.
Collapse
Affiliation(s)
- Aparna Roy
- a School of Biological Sciences, University of Essex , Wivenhoe Park, Colchester , CO4 3SQ , UK
| | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Kryshtafovych A, Fidelis K, Moult J. CASP9 results compared to those of previous CASP experiments. Proteins 2011; 79 Suppl 10:196-207. [PMID: 21997643 PMCID: PMC4180080 DOI: 10.1002/prot.23182] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2011] [Revised: 07/13/2011] [Accepted: 08/13/2011] [Indexed: 01/07/2023]
Abstract
The quality of structure models submitted to CASP9 is analyzed in the context of previous CASPs. Comparison methods are similar to those used in previous articles in this series, with the addition of new methods looking at model quality in regions not covered by a single best structural template, alignment accuracy, and progress for template-free models. Progress in this CASP was again modest and statistically hard to validate. Nevertheless, there are several positive trends. There is an indication of improvement in overall model quality for the midrange of template-based modeling difficulty, methods for identifying the best model from a set generated have improved, and there are strong indications of progress in the quality of template-free models of short proteins. In addition, the new examination of a model quality in regions of model not covered by the best available template reveals better performance than had previously been apparent.
Collapse
Affiliation(s)
- Andriy Kryshtafovych
- Genome Center, University of California-Davis, 451 Health Sciences Drive, Davis, CA 95616, USA.
| | | | | |
Collapse
|
11
|
Berrondo M, Gray JJ, Schleif R. Computational predictions of the mutant behavior of AraC. J Mol Biol 2010; 398:462-70. [PMID: 20338183 DOI: 10.1016/j.jmb.2010.03.021] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2009] [Revised: 02/16/2010] [Accepted: 03/11/2010] [Indexed: 11/29/2022]
Abstract
An algorithm implemented in Rosetta correctly predicts the folding capabilities of the 17-residue N-terminal arm of the AraC gene regulatory protein when arabinose is bound to the protein and the dramatically different structure of this arm when arabinose is absent. The transcriptional activity of 43 mutant AraC proteins with alterations in the arm sequences was measured in vivo and compared with their predicted folding properties. Seventeen of the mutants possessed regulatory properties that could be directly compared with folding predictions. Sixteen of the 17 mutants were correctly predicted. The algorithm predicts that the N-terminal arm sequences of AraC homologs fold to the Escherichia coli AraC arm structure. In contrast, it predicts that random sequences of the same length and many partially randomized E. coli arm sequences do not fold to the E. coli arm structure. The high level of success shows that relatively "simple" computational methods can in some cases predict the behavior of mutant proteins with good reliability.
Collapse
Affiliation(s)
- Monica Berrondo
- Chemical and Biomolecular Engineering, Johns Hopkins University, 3400 North Charles Street, Baltimore, MD 21218, USA
| | | | | |
Collapse
|
12
|
Ilinkin I, Ye J, Janardan R. Multiple structure alignment and consensus identification for proteins. BMC Bioinformatics 2010; 11:71. [PMID: 20122279 PMCID: PMC2829528 DOI: 10.1186/1471-2105-11-71] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2009] [Accepted: 02/02/2010] [Indexed: 11/20/2022] Open
Abstract
Background An algorithm is presented to compute a multiple structure alignment for a set of proteins and to generate a consensus (pseudo) protein which captures common substructures present in the given proteins. The algorithm represents each protein as a sequence of triples of coordinates of the alpha-carbon atoms along the backbone. It then computes iteratively a sequence of transformation matrices (i.e., translations and rotations) to align the proteins in space and generate the consensus. The algorithm is a heuristic in that it computes an approximation to the optimal alignment that minimizes the sum of the pairwise distances between the consensus and the transformed proteins. Results Experimental results show that the algorithm converges quite rapidly and generates consensus structures that are visually similar to the input proteins. A comparison with other coordinate-based alignment algorithms (MAMMOTH and MATT) shows that the proposed algorithm is competitive in terms of speed and the sizes of the conserved regions discovered in an extensive benchmark dataset derived from the HOMSTRAD and SABmark databases. The algorithm has been implemented in C++ and can be downloaded from the project's web page. Alternatively, the algorithm can be used via a web server which makes it possible to align protein structures by uploading files from local disk or by downloading protein data from the RCSB Protein Data Bank. Conclusions An algorithm is presented to compute a multiple structure alignment for a set of proteins, together with their consensus structure. Experimental results show its effectiveness in terms of the quality of the alignment and computational cost.
Collapse
Affiliation(s)
- Ivaylo Ilinkin
- Department of Computer Science, Gettysburg College, Gettysburg, PA, USA.
| | | | | |
Collapse
|
13
|
Kryshtafovych A, Fidelis K, Moult J. CASP8 results in context of previous experiments. Proteins 2010; 77 Suppl 9:217-28. [PMID: 19722266 DOI: 10.1002/prot.22562] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The quality of structure models submitted to CASP8 is analyzed in the context of previous CASPs. To compare models from the latest experiment with their predecessors, we use the approaches consistent with the previous articles in this series. Using the basic evaluation measures accepted in CASP, there were no noticeable advances in the quality of the methods in any of the target difficulty categories. At the same time, there were three positive developments: (1) for set of the best models on each target, CASP8 registered the highest number of cases from all CASPs where alignment accuracy exceeded the maximum possible from the best template; (2) modeling accuracy of regions not present in the best template has improved; and (3) the loss in modeling quality from selection of nonoptimal models as the best ones submitted on the target has decreased.
Collapse
|
14
|
McCauley MJ, Shokri L, Sefcikova J, Venclovas Č, Beuning PJ, Williams MC. Distinct double- and single-stranded DNA binding of E. coli replicative DNA polymerase III alpha subunit. ACS Chem Biol 2008; 3:577-87. [PMID: 18652472 PMCID: PMC2665888 DOI: 10.1021/cb8001107] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
![]()
The α subunit of the replicative DNA polymerase III of Escherichia coli is the active polymerase of the 10-subunit bacterial replicase. The C-terminal region of the α subunit is predicted to contain an oligonucleotide binding (OB-fold) domain. In a series of optical tweezers experiments, the α subunit is shown to have an affinity for both double- and single-stranded DNA, in distinct subdomains of the protein. The portion of the protein that binds to double-stranded DNA stabilizes the DNA helix, because protein binding must be at least partially disrupted with increasing force to melt DNA. Upon relaxation, the DNA fails to fully reanneal, because bound protein interferes with the reformation of the double helix. In addition, the single-stranded DNA binding component appears to be passive, as the protein does not facilitate melting but instead binds to single-stranded regions already separated by force. From DNA stretching measurements we determine equilibrium association constants for the binding of α and several fragments to dsDNA and ssDNA. The results demonstrate that ssDNA binding is localized to the C-terminal region that contains the OB-fold domain, while a tandem helix-hairpin-helix (HhH)2 motif contributes significantly to dsDNA binding.
Collapse
Affiliation(s)
- Micah J. McCauley
- Department of Physics, Northeastern University, Boston, Massachusetts, 02115
| | - Leila Shokri
- Department of Physics, Northeastern University, Boston, Massachusetts, 02115
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, 02115
| | - Jana Sefcikova
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, 02115
| | - Česlovas Venclovas
- Laboratory of Bioinformatics, Institute of Biotechnology, Vilnius LT-02241, Lithuania
| | - Penny J. Beuning
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, Massachusetts, 02115
- Center for Interdisciplinary Research on Complex Systems, Northeastern University, Boston, Massachusetts 02115
| | - Mark C. Williams
- Department of Physics, Northeastern University, Boston, Massachusetts, 02115
- Center for Interdisciplinary Research on Complex Systems, Northeastern University, Boston, Massachusetts 02115
| |
Collapse
|
15
|
Han R, Leo-Macias A, Zerbino D, Bastolla U, Contreras-Moreira B, Ortiz AR. An efficient conformational sampling method for homology modeling. Proteins 2008; 71:175-88. [PMID: 17985353 DOI: 10.1002/prot.21672] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The structural refinement of protein models is a challenging problem in protein structure prediction (Moult et al., Proteins 2003;53(Suppl 6):334-339). Most attempts to refine comparative models lead to degradation rather than improvement in model quality, so most current comparative modeling procedures omit the refinement step. However, it has been shown that even in the absence of alignment errors and using optimal templates, methods based on a single template have intrinsic limitations, and that refinement is needed to improve model accuracy. It is thought that failure of current methods originates on one hand from the inaccuracy of the effective free energy functions adopted, which do not represent properly the energetic balance in the native state, and on the other hand from the difficulty to sample the high dimensional and rugged free energy landscape of protein folding, in the search for the global minimum. Here, we address this second issue. We define the evolutionary and vibrational armonics subspace (EVA), a reduced sampling subspace that consists of a combination of evolutionarily favored directions, defined by the principal components of the structural variation within a homologous family, plus topologically favored directions, derived from the low frequency normal modes of the vibrational dynamics, up to 50 dimensions. This subspace is accurate enough so that the cores of most proteins can be represented within 1 A accuracy, and reduced enough so that Replica Exchange Monte Carlo (Hukushima and Nemoto, J Phys Soc Jpn 1996;65:1604-1608; Hukushima et al., Int J Mod Phys C: Phys Comput 1996;7:337-344; Mitsutake et al., J Chem Phys 2003;118:6664-6675; Mitsutake et al., J Chem Phys 2003;118:6676-6688) (REMC) can be applied. REMC is one of the best sampling methods currently available, but its applicability is restricted to spaces of small dimensionality. We show that the combination of the EVA subspace and REMC can essentially solve the optimization problem for backbone atoms in the reduced sampling subspace, even for rather rugged free energy landscapes. Applications and limitations of this methodology are finally discussed.
Collapse
Affiliation(s)
- Rongsheng Han
- Bioinformatics Unit, Centro de Biología Molecular "Severo Ochoa" (CSIC-UAM), Universidad Autónoma de Madrid, Cantoblanco, Madrid, Spain
| | | | | | | | | | | |
Collapse
|
16
|
Abstract
This article describes the general quality of models of three dimensional structure submitted to CASP7 and analyzes progress since the previous experiment, primarily using measures that were used in earlier analyses. Overall improvement in model accuracy compared to CASP6 is modest, but there are two developments of note: server performance has moved closer to that of humans, and there has been a significant improvement in the fraction of targets for which the best model is superior to that obtainable using knowledge of a single best template structure.
Collapse
|
17
|
Adamczak R, Meller* J. On the transferability of folding and threading potentials and sequence-independent filters for protein folding simulations. Mol Phys 2007. [DOI: 10.1080/00268970410001728636] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Affiliation(s)
- Rafal Adamczak
- a Division of Biomedical Informatics , Children’s Hospital Research Foundation , 3333 Burnet Avenue, Cincinnati , OH 45229 , USA
| | - Jaroslaw Meller*
- a Division of Biomedical Informatics , Children’s Hospital Research Foundation , 3333 Burnet Avenue, Cincinnati , OH 45229 , USA
- b Department of Informatics , Nicholas Copernicus University , 87-100 Toruń , Poland
| |
Collapse
|
18
|
Paiva ACM, Oliveira L, Horn F, Bywater RP, Vriend G. Modeling GPCRs. ERNST SCHERING FOUNDATION SYMPOSIUM PROCEEDINGS 2007:23-47. [PMID: 17703576 DOI: 10.1007/2789_2006_002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Many GPCR models have been built over the years for many different purposes, of which drug-design undoubtedly has been the most frequent one. The release of the structure of bovine rhodopsin in August 2000 enabled us to analyze models built before that period to learn things for the models we build today. We conclude that the GPCR modeling field is riddled with "common knowledge". Several characteristics of the bovine rhodopsin structure came as a big surprise, and had obviously not been predicted, which led to large errors in the models. Some of these surprises, however, could have been predicted if the modelers had more rigidly stuck to the rule that holds for all models, namely that a model should explain all experimental facts, and not just those facts that agree with the modeler's preconceptions.
Collapse
Affiliation(s)
- A C M Paiva
- CMBI NCMLS, UMC, Geert Grooteplein 28, 6525 GA Nijmegen, The Netherlands
| | | | | | | | | |
Collapse
|
19
|
Saha RP, Chakrabarti P. Molecular modeling and characterization of Vibrio cholerae transcription regulator HlyU. BMC STRUCTURAL BIOLOGY 2006; 6:24. [PMID: 17116251 PMCID: PMC1665450 DOI: 10.1186/1472-6807-6-24] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/17/2006] [Accepted: 11/20/2006] [Indexed: 11/15/2022]
Abstract
Background The SmtB/ArsR family of prokaryotic metal-regulatory transcriptional repressors represses the expression of operons linked to stress-inducing concentrations of heavy metal ions, while derepression results from direct binding of metal ions by these 'metal-sensor' proteins. The HlyU protein from Vibrio cholerae is the positive regulator of haemolysin gene, it also plays important role in the regulation of expression of the virulence genes. Despite the understanding of biochemical properties, its structure and relationship to other protein families remain unknown. Results We find that HlyU exhibits structural features common to the SmtB/ArsR family of transcriptional repressors. Analysis of the modeled structure of HlyU reveals that it does not have the key metal-sensing residues which are unique to the SmtB/ArsR family of repressors, yet the tertiary structure is very similar to the family members. HlyU is the only member that has a positive control on transcription, while all the other members in the family are repressors. An evolutionary analysis with other SmtB/ArsR family members suggests that during evolution HlyU probably occurred by gene duplication and mutational events that led to the emergence of this protein from ancestral transcriptional repressor by the loss of the metal-binding sites. Conclusion The study indicates that the same protein family can contain both the positive regulator of transcription and repressors – the exact function being controlled by the absence or the presence of metal-binding sites.
Collapse
Affiliation(s)
- Rudra P Saha
- Department of Biochemistry, Bose Institute, P-1/12 CIT Scheme VIIM, Calcutta 700 054, India
| | - Pinak Chakrabarti
- Department of Biochemistry, Bose Institute, P-1/12 CIT Scheme VIIM, Calcutta 700 054, India
| |
Collapse
|
20
|
De Mori GMS, Colombo G, Micheletti C. Study of the Villin headpiece folding dynamics by combining coarse-grained Monte Carlo evolution and all-atom molecular dynamics. Proteins 2006; 58:459-71. [PMID: 15521059 DOI: 10.1002/prot.20313] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The folding mechanism of the Villin headpiece (HP36) is studied by means of a novel approach which entails an initial coarse-grained Monte Carlo (MC) scheme followed by all-atom molecular dynamics (MD) simulations in explicit solvent. The MC evolution occurs in a simplified free-energy landscape and allows an efficient selection of marginally-compact structures which are taken as viable initial conformations for the MD. The coarse-grained MC structural representation is connected to the one with atomic resolution through a "fine-graining" reconstruction algorithm. This two-stage strategy is used to select and follow the dynamics of seven different unrelated conformations of HP36. In a notable case the MD trajectory rapidly evolves towards the folded state, yielding a typical root-mean-square deviation (RMSD) of the core region of only 2.4 A from the closest NMR model (the typical RMSD over the whole structure being 4.0 A). The analysis of the various MC-MD trajectories provides valuable insight into the details of the folding and mis-folding mechanisms and particularly about the delicate influence of local and nonlocal interactions in steering the folding process.
Collapse
|
21
|
Adamczak R, Porollo A, Meller J. Combining prediction of secondary structure and solvent accessibility in proteins. Proteins 2006; 59:467-75. [PMID: 15768403 DOI: 10.1002/prot.20441] [Citation(s) in RCA: 222] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Owing to the use of evolutionary information and advanced machine learning protocols, secondary structures of amino acid residues in proteins can be predicted from the primary sequence with more than 75% per-residue accuracy for the 3-state (i.e., helix, beta-strand, and coil) classification problem. In this work we investigate whether further progress may be achieved by incorporating the relative solvent accessibility (RSA) of an amino acid residue as a fingerprint of the overall topology of the protein. Toward that goal, we developed a novel method for secondary structure prediction that uses predicted RSA in addition to attributes derived from evolutionary profiles. Our general approach follows the 2-stage protocol of Rost and Sander, with a number of Elman-type recurrent neural networks (NNs) combined into a consensus predictor. The RSA is predicted using our recently developed regression-based method that provides real-valued RSA, with the overall correlation coefficients between the actual and predicted RSA of about 0.66 in rigorous tests on independent control sets. Using the predicted RSA, we were able to improve the performance of our secondary structure prediction by up to 1.4% and achieved the overall per-residue accuracy between 77.0% and 78.4% for the 3-state classification problem on different control sets comprising, together, 603 proteins without homology to proteins included in the training. The effects of including solvent accessibility depend on the quality of RSA prediction. In the limit of perfect prediction (i.e., when using the actual RSA values derived from known protein structures), the accuracy of secondary structure prediction increases by up to 4%. We also observed that projecting real-valued RSA into 2 discrete classes with the commonly used threshold of 25% RSA decreases the classification accuracy for secondary structure prediction. While the level of improvement of secondary structure prediction may be different for prediction protocols that implicitly account for RSA in other ways, we conclude that an increase in the 3-state classification accuracy may be achieved when combining RSA with a state-of-the-art protocol utilizing evolutionary profiles. The new method is available through a Web server at http://sable.cchmc.org.
Collapse
Affiliation(s)
- Rafał Adamczak
- Biomedical Informatics, Children's Hospital Research Foundation, Cincinnati, Ohio 45229, USA
| | | | | |
Collapse
|
22
|
Jaubert S, Milac AL, Petrescu AJ, de Almeida-Engler J, Abad P, Rosso MN. In planta secretion of a calreticulin by migratory and sedentary stages of root-knot nematode. MOLECULAR PLANT-MICROBE INTERACTIONS : MPMI 2005; 18:1277-84. [PMID: 16478047 DOI: 10.1094/mpmi-18-1277] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Esophageal secretions from endoparasitic sedentary nematodes are thought to play key roles throughout plant parasitism, in particular during the invasion of the root tissue and the initiation and maintenance of the nematode feeding site (NFS) essential for nematode development. The secretion in planta of esophageal cell-wall-degrading enzymes by migratory juveniles has been shown, suggesting a role for these enzymes in the invasion phase. Nevertheless, the secretion of an esophageal gland protein into the NFS by nematode sedentary stages has never been demonstrated. The calreticulin Mi-CRT is a protein synthesized in the esophageal glands of the root-knot nematode Meloidogyne incognita. After three-dimensional modeling of the Mi-CRT protein, a surface peptide was selected to raise specific antibodies. In planta immunolocalization showed that Mi-CRT is secreted by migratory and sedentary stage nematodes, suggesting a role for Mi-CRT throughout parasitism. During the maintenance of the NFS, the secreted Mi-CRT was localized outside the nematode at the tip of the stylet. In addition, Mi-CRT accumulation was observed along the cell wall of the giant cells that compose the feeding site, providing evidence for a nematode esophageal protein secretion into the NFS.
Collapse
Affiliation(s)
- Stéphanie Jaubert
- INRA-CNRS-UNSA, Plant-Microbe Interactions and Plant Health, 400 route des Chappes BP 167, 06903 Sophia Antipolis, France
| | | | | | | | | | | |
Collapse
|
23
|
Kudla U, Qin L, Milac A, Kielak A, Maissen C, Overmars H, Popeijus H, Roze E, Petrescu A, Smant G, Bakker J, Helder J. Origin, distribution and 3D-modeling of Gr-EXPB1, an expansin from the potato cyst nematode Globodera rostochiensis. FEBS Lett 2005; 579:2451-7. [PMID: 15848187 DOI: 10.1016/j.febslet.2005.03.047] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2004] [Revised: 02/24/2005] [Accepted: 03/13/2005] [Indexed: 10/25/2022]
Abstract
Southern analysis showed that Gr-EXPB1, a functional expansin from the potato cyst nematode Globodera rostochiensis, is member of a multigene family, and EST data suggest expansins to be present in other plant parasitic nematodes as well. Homology modeling predicted that Gr-EXPB1 domain 1 (D1) has a flat beta-barrel structure with surface-exposed aromatic rings, whereas the 3D structure of Gr-EXPB1-D2 was remarkably similar to plant expansins. Gr-EXPB1 shows highest sequence similarity to two extracellular proteins from saprophytic soil-inhabiting Actinobacteria, and includes a bacterial type II carbohydrate-binding module. These results support the hypothesis that a number of pathogenicity factors of cyst nematodes is of procaryotic origin and were acquired by horizontal gene transfer.
Collapse
Affiliation(s)
- Urszula Kudla
- Laboratory of Nematology, Graduate School for Experimental Plant Sciences, Wageningen University, The Netherlands
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
24
|
Wagner M, Adamczak R, Porollo A, Meller J. Linear Regression Models for Solvent Accessibility Prediction in Proteins. J Comput Biol 2005; 12:355-69. [PMID: 15857247 DOI: 10.1089/cmb.2005.12.355] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The relative solvent accessibility (RSA) of an amino acid residue in a protein structure is a real number that represents the solvent exposed surface area of this residue in relative terms. The problem of predicting the RSA from the primary amino acid sequence can therefore be cast as a regression problem. Nevertheless, RSA prediction has so far typically been cast as a classification problem. Consequently, various machine learning techniques have been used within the classification framework to predict whether a given amino acid exceeds some (arbitrary) RSA threshold and would thus be predicted to be "exposed," as opposed to "buried." We have recently developed novel methods for RSA prediction using nonlinear regression techniques which provide accurate estimates of the real-valued RSA and outperform classification-based approaches with respect to commonly used two-class projections. However, while their performance seems to provide a significant improvement over previously published approaches, these Neural Network (NN) based methods are computationally expensive to train and involve several thousand parameters. In this work, we develop alternative regression models for RSA prediction which are computationally much less expensive, involve orders-of-magnitude fewer parameters, and are still competitive in terms of prediction quality. In particular, we investigate several regression models for RSA prediction using linear L1-support vector regression (SVR) approaches as well as standard linear least squares (LS) regression. Using rigorously derived validation sets of protein structures and extensive cross-validation analysis, we compare the performance of the SVR with that of LS regression and NN-based methods. In particular, we show that the flexibility of the SVR (as encoded by metaparameters such as the error insensitivity and the error penalization terms) can be very beneficial to optimize the prediction accuracy for buried residues. We conclude that the simple and computationally much more efficient linear SVR performs comparably to nonlinear models and thus can be used in order to facilitate further attempts to design more accurate RSA prediction methods, with applications to fold recognition and de novo protein structure prediction methods.
Collapse
Affiliation(s)
- Michael Wagner
- Division of Biomedical Informatics, Cincinnati Children's Hospital Research Foundation, 3333 Burnet Avenue, Cincinnati, OH 45229, USA
| | | | | | | |
Collapse
|
25
|
Adamczak R, Porollo A, Meller J. Accurate prediction of solvent accessibility using neural networks-based regression. Proteins 2004; 56:753-67. [PMID: 15281128 DOI: 10.1002/prot.20176] [Citation(s) in RCA: 212] [Impact Index Per Article: 10.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Accurate prediction of relative solvent accessibilities (RSAs) of amino acid residues in proteins may be used to facilitate protein structure prediction and functional annotation. Toward that goal we developed a novel method for improved prediction of RSAs. Contrary to other machine learning-based methods from the literature, we do not impose a classification problem with arbitrary boundaries between the classes. Instead, we seek a continuous approximation of the real-value RSA using nonlinear regression, with several feed forward and recurrent neural networks, which are then combined into a consensus predictor. A set of 860 protein structures derived from the PFAM database was used for training, whereas validation of the results was carefully performed on several nonredundant control sets comprising a total of 603 structures derived from new Protein Data Bank structures and had no homology to proteins included in the training. Two classes of alternative predictors were developed for comparison with the regression-based approach: one based on the standard classification approach and the other based on a semicontinuous approximation with the so-called thermometer encoding. Furthermore, a weighted approximation, with errors being scaled by the observed levels of variability in RSA for equivalent residues in families of homologous structures, was applied in order to improve the results. The effects of including evolutionary profiles and the growth of sequence databases were assessed. In accord with the observed levels of variability in RSA for different ranges of RSA values, the regression accuracy is higher for buried than for exposed residues, with overall 15.3-15.8% mean absolute errors and correlation coefficients between the predicted and experimental values of 0.64-0.67 on different control sets. The new method outperforms classification-based algorithms when the real value predictions are projected onto two-class classification problems with several commonly used thresholds to separate exposed and buried residues. For example, classification accuracy of about 77% is consistently achieved on all control sets with a threshold of 25% RSA. A web server that enables RSA prediction using the new method and provides customizable graphical representation of the results is available at http://sable.cchmc.org.
Collapse
Affiliation(s)
- Rafał Adamczak
- Children's Hospital Research Foundation, Cincinnati, Ohio, USA
| | | | | |
Collapse
|
26
|
Marabotti A, D'Auria S, Rossi M, Facchiano AM. Theoretical model of the three-dimensional structure of a sugar-binding protein from Pyrococcus horikoshii: structural analysis and sugar-binding simulations. Biochem J 2004; 380:677-84. [PMID: 15015939 PMCID: PMC1224218 DOI: 10.1042/bj20031876] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2003] [Revised: 03/11/2004] [Accepted: 03/12/2004] [Indexed: 11/17/2022]
Abstract
The three-dimensional structure of a sugar-binding protein from the thermophilic archaea Pyrococcus horikoshii has been predicted by a homology modelling procedure and investigated for its stability and its ability to bind different sugars. The model was created by using as templates the three-dimensional structures of a maltodextrin-binding protein from Pyrococcus furiosus, a trehalose-maltose-binding protein from Thermococcus litoralis and a maltodextrin-binding protein from Escherichia coli. According to the suggestions from the CASP (Critical Assessment of Structure Prediction) meetings, the homology modelling strategy was applied by assessing an accurate multiple sequence alignment, based on the high structural conservation in the family of ATP-binding cassette transporters to which all these proteins belong. The model has been deposited in the Protein Data Bank with the code 1R25. According to the origin of the protein, several characteristics in the organization of the secondary-structure elements and in the distribution of polar and non-polar amino acids are very similar to those of thermophilic proteins, compared with proteins from mesophilic organisms, and are analysed in detail. Finally, a simulation of the binding of several sugars in the binding site of this protein is presented, and interactions with amino acids are highlighted in detail.
Collapse
Affiliation(s)
- Anna Marabotti
- Laboratory of Bioinformatics, Institute of Food Science, Italian National Research Council, Via Roma 52A/C, 83100 Avellino, Italy
| | | | | | | |
Collapse
|
27
|
Abstract
The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences.
Collapse
Affiliation(s)
- Marc A Marti-Renom
- Mission Bay Genentech Hall, University of California, San Francisco, San Francisco, CA 94143, USA.
| | | | | |
Collapse
|
28
|
Gardner PP, Giegerich R. A comprehensive comparison of comparative RNA structure prediction approaches. BMC Bioinformatics 2004; 5:140. [PMID: 15458580 PMCID: PMC526219 DOI: 10.1186/1471-2105-5-140] [Citation(s) in RCA: 260] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2004] [Accepted: 09/30/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND An increasing number of researchers have released novel RNA structure analysis and prediction algorithms for comparative approaches to structure prediction. Yet, independent benchmarking of these algorithms is rarely performed as is now common practice for protein-folding, gene-finding and multiple-sequence-alignment algorithms. RESULTS Here we evaluate a number of RNA folding algorithms using reliable RNA data-sets and compare their relative performance. CONCLUSIONS We conclude that comparative data can enhance structure prediction but structure-prediction-algorithms vary widely in terms of both sensitivity and selectivity across different lengths and homologies. Furthermore, we outline some directions for future research.
Collapse
Affiliation(s)
- Paul P Gardner
- Department of Evolutionary Biology, University of Copenhagen, Universitetsparken 15, 2100 Copenhagen Ø, Denmark
| | - Robert Giegerich
- Faculty of Technology, University of Bielefeld, PO Box 10 01 31, 33501 Bielefeld, Germany
| |
Collapse
|
29
|
Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A. The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants. Hum Mutat 2004; 23:464-70. [PMID: 15108278 DOI: 10.1002/humu.20021] [Citation(s) in RCA: 104] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Missense mutation leading to single amino acid polymorphism (SAP) is the type of mutation most frequently related to human diseases. The Swiss-Prot protein knowledgebase records information on such mutations in various sections of a protein entry, namely in the "feature," "comment," and "reference" fields. To facilitate users in obtaining the most relevant information about each human SAP recorded in the knowledgebase, the Swiss-Prot Variant web pages were created to provide a summary of available sequence information, as well as additional structural information on each variant. In particular, the ModSNP database was set up to store information related to SAPs and to manage the modeling of SAPs onto protein structures via an automatic homology modeling pipeline. Currently, among the 16,566 human SAPs recorded in the Swiss-Prot knowledgebase (release 42.5, 21 November 2003), more than 25% have corresponding 3D-models. Of these variants, 47% are related to disease, 26% are polymorphisms, and 27% are not yet clearly classified. The ModSNP database is updated and the subsequent model construction pipeline is launched with each weekly Swiss-Prot release. Thus, the ModSNP database represents a valuable resource for the structural analysis of protein variation. The Swiss-Prot variant pages are accessible from the NiceProt view of a Swiss-Prot entry on the ExPASy server (www.expasy.org/), via a hyperlink created for the stable and unique identifier FTId of each human SAP.
Collapse
Affiliation(s)
- Yum L Yip
- Swiss-Prot Group, Swiss Institute of Bioinformatics, Centre Médical Universitaire, Geneva, Switzerland.
| | | | | | | | | | | | | |
Collapse
|
30
|
Abstract
Empirical force field-based studies of biological macromolecules are becoming a common tool for investigating their structure-activity relationships at an atomic level of detail. Such studies facilitate interpretation of experimental data and allow for information not readily accessible to experimental methods to be obtained. A large part of the success of empirical force field-based methods is the quality of the force fields combined with the algorithmic advances that allow for more accurate reproduction of experimental observables. Presented is an overview of the issues associated with the development and application of empirical force fields to biomolecular systems. This is followed by a summary of the force fields commonly applied to the different classes of biomolecules; proteins, nucleic acids, lipids, and carbohydrates. In addition, issues associated with computational studies on "heterogeneous" biomolecular systems and the transferability of force fields to a wide range of organic molecules of pharmacological interest are discussed.
Collapse
Affiliation(s)
- Alexander D Mackerell
- Department of Pharmaceutical Sciences, School of Pharmacy, University of Maryland, 20 Penn Street, Baltimore, Maryland 21201, USA.
| |
Collapse
|
31
|
Das R, Gerstein M. A method using active-site sequence conservation to find functional shifts in protein families: application to the enzymes of central metabolism, leading to the identification of an anomalous isocitrate dehydrogenase in pathogens. Proteins 2004; 55:455-63. [PMID: 15048835 DOI: 10.1002/prot.10639] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
We have introduced a method to identify functional shifts in protein families. Our method is based on the calculation of an active-site conservation ratio, which we call the "ASC ratio." For a structurally based alignment of a protein family, this ratio is the average sequence similarity of the active-site region compared to the full-length protein. The active-site region is defined as all the residues within a certain radius of the known functionally important groups. Using our method, we have analyzed enzymes of central metabolism from a large number of genomes (35). We found that for most of the enzymes, the active-site region is more highly conserved than the full-length sequence. However, for three tricarboxylic acid (TCA)-cycle enzymes, active-site sequences are considerably more diverged (than full-length ones). In particular, we were able to identify in six pathogens a novel isocitrate dehydrogenase that has very low sequence similarity around the active site. Detailed sequence-structure analysis indicates that while the active-site structure of isocitrate dehydrogenase is most likely similar between pathogens and nonpathogens, the unusual sequence divergence could result from an extra domain added at the N-terminus. This domain has a leucine-rich motif similar one in the Yersinia pestis cytotoxin and may therefore confer additional pathogenic functions.
Collapse
Affiliation(s)
- Rajdeep Das
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520, USA
| | | |
Collapse
|
32
|
Oliveira L, Hulsen T, Lutje Hulsik D, Paiva ACM, Vriend G. Heavier-than-air flying machines are impossible. FEBS Lett 2004; 564:269-73. [PMID: 15111108 DOI: 10.1016/s0014-5793(04)00320-5] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2003] [Accepted: 02/23/2004] [Indexed: 02/08/2023]
Abstract
Many G protein-coupled receptor (GPCR) models have been built over the years. The release of the structure of bovine rhodopsin in August 2000 enabled us to analyze models built before that period to learn more about the models we build today. We conclude that the GPCR modelling field is riddled with 'common knowledge' similar to Lord Kelvin's remark in 1895 that "heavier-than-air flying machines are impossible", and we summarize what we think are the (im)possibilities of modelling GPCRs using the coordinates of bovine rhodopsin as a template. Associated WWW pages: www.gpcr.org/articles/2003_mod
Collapse
Affiliation(s)
- L Oliveira
- Escola Paulista de Medicina, Sao Paulo, Brazil
| | | | | | | | | |
Collapse
|
33
|
Lau AY, Chasman DI. Functional classification of proteins and protein variants. Proc Natl Acad Sci U S A 2004; 101:6576-81. [PMID: 15087495 PMCID: PMC404087 DOI: 10.1073/pnas.0305043101] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
To help characterize the diversity in biological function of proteins emerging from the analysis of whole genomes, we present an operational definition of biological function that provides an explicit link between the functional classification of proteins and the effects of genetic variation or mutation on protein function. Using phylogenetic information, we establish definite criteria for functional relatedness among proteins and a companion procedure for predicting deleterious alleles or mutations. Applied to the functional classification of sequences similar to 13 human tumor suppressor proteins, our methods predict there are functional properties unique to mammals for three of them, BRCA1, BRCA2, and WT1. We examine protein variants caused by nonsynonymous single-nucleotide polymorphisms in a set of clinically important genes and estimate the magnitude of a disproportionate propensity for disruption of function among the nonsynomous single-nucleotide polymorphisms that are maintained at low frequency in the human population.
Collapse
Affiliation(s)
- Albert Y Lau
- Variagenics, Incorporated, 60 Hampshire Street, Cambridge, MA 02139, USA.
| | | |
Collapse
|
34
|
Kosinski J, Cymerman IA, Feder M, Kurowski MA, Sasin JM, Bujnicki JM. A "FRankenstein's monster" approach to comparative modeling: merging the finest fragments of Fold-Recognition models and iterative model refinement aided by 3D structure evaluation. Proteins 2004; 53 Suppl 6:369-79. [PMID: 14579325 DOI: 10.1002/prot.10545] [Citation(s) in RCA: 138] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We applied a new multi-step protocol to predict the structures of all targets during CASP5, regardless of their potential category. 1) We used diverse fold-recognition (FR) methods to generate initial target-template alignments, which were converted into preliminary full-atom models by comparative modeling. All preliminary models were evaluated (scored) by VERIFY3D to identify well- and poorly-folded fragments. 2) Preliminary models with similar 3D folds were superimposed, poorly-scoring regions were deleted and the "average model" structure was created by merging the remaining segments. All template structures reported by FR were superimposed and a composite multiple-structure template was created from the most conserved fragments. 3). The average model was superimposed onto the composite template and the structure-based target-template alignment was inferred. This alignment was used to build a new (intermediate) comparative model of the target, again scored with VERIFY3D. 4) For all poorly scoring regions series of alternative alignments were generated by progressively shifting the "unfit" sequence fragment in either direction. Here, we considered additional information, such as secondary structure, placement of insertions and deletions in loops, conservation of putative catalytic residues, and the necessity to obtain a compact, well-folded structure. For all alternative alignments, new models were built and evaluated. 5) All models were superimposed and the "FRankenstein's monster" (FR, fold recognition) model was built from best-scoring segments. The final model was obtained after limited energy minimization to remove steric clashes between sidechains from different fragments. The novelty of this approach is in the focus on "vertical" recombination of structure fragments, typical for the ab initio field, rather than "horizontal" sequence alignment typical for comparative modeling. We tested the usefulness of the "FRankenstein" approach for non-expert predictors: only the leader of our team had considerable experience in protein modeling - he registered as a separate group (020) and submitted models built only by himself. At the onset of CASP5, the other five members of the team (students) had very little or no experience with modeling. They followed the same protocol in a deliberately naïve way. In the fourth step they used solely the VERIFY3D criterion to compare their models and the leader's model (the latter regarded only as one of the many alternatives) and generated the hybrid or selected only one model for submission (group 517). In order to compare our protocol with the traditional "one target-one template-one alignment" approach, we submitted (as a separate group 242) models selected from those automatically generated by all CAFASP servers (i.e. obtained without any human intervention). Here, we compare the results obtained by the three "groups", describe successes and failures of the "FRankenstein" approach and discuss future developments of comparative modeling. The automatic version of our multi-step protocol is being developed as a meta-server; the prototype is freely available at http://genesilico.pl/meta/.
Collapse
Affiliation(s)
- Jan Kosinski
- Bioinformatics Laboratory, International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland
| | | | | | | | | | | |
Collapse
|
35
|
Moult J, Fidelis K, Zemla A, Hubbard T. Critical assessment of methods of protein structure prediction (CASP)-round V. Proteins 2004; 53 Suppl 6:334-9. [PMID: 14579322 DOI: 10.1002/prot.10556] [Citation(s) in RCA: 184] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
This article provides an introduction to the special issue of the journal Proteins dedicated to the fifth CASP experiment to assess the state of the art in protein structure prediction. The article describes the conduct, the categories of prediction, and the evaluation and assessment procedures of the experiment. A brief summary of progress over the five CASP experiments is provided. Related developments in the field are also described.
Collapse
Affiliation(s)
- John Moult
- Center for Advanced Research in Biotechnology, University of Maryland Biotechnology Institute, Rockville, Maryland 20850, USA.
| | | | | | | |
Collapse
|
36
|
Venclovas C, Zemla A, Fidelis K, Moult J. Assessment of progress over the CASP experiments. Proteins 2004; 53 Suppl 6:585-95. [PMID: 14579350 DOI: 10.1002/prot.10530] [Citation(s) in RCA: 93] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The quality of structure models produced in the CASP5 experiment has been compared with that in earlier CASPs. The most significant progress is in the fold recognition regime, where the development of meta-servers has allowed more accurate consensus models to be generated. In contrast to this, there is little evidence of progress in producing more accurate comparative models, particularly those based on sequence identities > 30%. For comparative models based on low-sequence identity and for fold recognition models, accuracy depends primarily on the fraction of the target structure that is similar to an available template, and the quality of the alignment. Overall, these results indicate that there are still no effective methods of improving model quality beyond that obtained by successfully copying a template structure. For models of proteins with previously unknown folds, there appears to be a pause in the previous consistent improvement. There is some evidence that more groups are producing top-quality models, however. Although specific progress between successive experiments is sometimes difficulty to identify, over the history of all the CASPs there has been steady, if sometimes slow, progress in all modeling regimes.
Collapse
Affiliation(s)
- Ceslovas Venclovas
- Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, Livermore, California, USA
| | | | | | | |
Collapse
|
37
|
Fan H, Mark AE. Refinement of homology-based protein structures by molecular dynamics simulation techniques. Protein Sci 2004; 13:211-20. [PMID: 14691236 PMCID: PMC2286528 DOI: 10.1110/ps.03381404] [Citation(s) in RCA: 124] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2003] [Revised: 09/10/2003] [Accepted: 09/10/2003] [Indexed: 10/26/2022]
Abstract
The use of classical molecular dynamics simulations, performed in explicit water, for the refinement of structural models of proteins generated ab initio or based on homology has been investigated. The study involved a test set of 15 proteins that were previously used by Baker and coworkers to assess the efficiency of the ROSETTA method for ab initio protein structure prediction. For each protein, four models generated using the ROSETTA procedure were simulated for periods of between 5 and 400 nsec in explicit solvent, under identical conditions. In addition, the experimentally determined structure and the experimentally derived structure in which the side chains of all residues had been deleted and then regenerated using the WHATIF program were simulated and used as controls. A significant improvement in the deviation of the model structures from the experimentally determined structures was observed in several cases. In addition, it was found that in certain cases in which the experimental structure deviated rapidly from the initial structure in the simulations, indicating internal strain, the structures were more stable after regenerating the side-chain positions. Overall, the results indicate that molecular dynamics simulations on a tens to hundreds of nanoseconds time scale are useful for the refinement of homology or ab initio models of small to medium-size proteins.
Collapse
Affiliation(s)
- Hao Fan
- Groningen Biomolecular Sciences and Biotechnology Institute (GBB), Department of Biophysical Chemistry, University of Groningen, 9747 AG Groningen, The Netherlands
| | | |
Collapse
|
38
|
Evers A, Gohlke H, Klebe G. Ligand-supported homology modelling of protein binding-sites using knowledge-based potentials. J Mol Biol 2003; 334:327-45. [PMID: 14607122 DOI: 10.1016/j.jmb.2003.09.032] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
A new approach, MOBILE, is presented that models protein binding-sites including bound ligand molecules as restraints. Initially generated, homology models of the target protein are refined iteratively by including information about bioactive ligands as spatial restraints and optimising the mutual interactions between the ligands and the binding-sites. Thus optimised models can be used for structure-based drug design and virtual screening. In a first step, ligands are docked into an averaged ensemble of crude homology models of the target protein. In the next step, improved homology models are generated, considering explicitly the previously placed ligands by defining restraints between protein and ligand atoms. These restraints are expressed in terms of knowledge-based distance-dependent pair potentials, which were compiled from crystallographically determined protein-ligand complexes. Subsequently, the most favourable models are selected by ranking the interactions between the ligands and the generated pockets using these potentials. Final models are obtained by selecting the best-ranked side-chain conformers from various models, followed by an energy optimisation of the entire complex using a common force-field. Application of the knowledge-based pair potentials proved efficient to restrain the homology modelling process and to score and optimise the modelled protein-ligand complexes. For a test set of 46 protein-ligand complexes, taken from the Protein Data Bank (PDB), the success rate of producing near-native binding-site geometries (rmsd<2.0A) with MODELLER is 70% when the ligand restrains the homology modelling process in its native orientation. Scoring these complexes with the knowledge-based potentials, in 66% of the cases a pose with rmsd <2.0A is found on rank 1. Finally, MOBILE has been applied to two case studies modelling factor Xa based on trypsin and aldose reductase based on aldehyde reductase.
Collapse
Affiliation(s)
- Andreas Evers
- Institute of Pharmaceutical Chemistry, University of Marburg, Marbacher Weg 6, D-35032 Marburg, Germany
| | | | | |
Collapse
|
39
|
Venclovas C. Comparative modeling in CASP5: Progress is evident, but alignment errors remain a significant hindrance. Proteins 2003; 53 Suppl 6:380-8. [PMID: 14579326 DOI: 10.1002/prot.10591] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Models for 20 comparative modeling targets were submitted for the fifth round of the "blind" test of protein structure prediction methods (CASP5; http://predictioncenter.llnl.gov/casp5). The modeling approach used in CASP5 was similar to that used 2 years ago in CASP4 (Venclovas, Proteins 2001; Suppl 5:47-54). The main features of this approach include use of multiple templates, initial assessment of alignment reliability in a region-specific manner, and structure-based selection of alignment variants in unreliable regions. The CASP5 modeling results presented here show significant improvement in comparison to CASP4, especially in the area of distant homology. The improvements include more effective use of multiple templates and better alignments. However, a number of structurally conserved regions in submitted distant homology models were misaligned. Analysis of these errors indicates that the absolute majority of them occurred in regions deemed unreliable in the course of model building. Most of these error-prone regions can be characterized by their peripheral location and a lack of conserved sequence patterns. For a few of the error-prone regions, all methods evaluated during CASP5 proved ineffective, pointing to the need for more sensitive energy-based methods. Despite these remaining issues, the applicability of comparative modeling continues to expand into more distant evolutionary relationships, providing a means to structurally characterize a significant number of currently available protein sequences.
Collapse
Affiliation(s)
- Ceslovas Venclovas
- Biology and Biotechnology Research Program, Lawrence Livermore National Laboratory, Livermore, California 94551, USA.
| |
Collapse
|
40
|
Singh SM, Murray D. Molecular modeling of the membrane targeting of phospholipase C pleckstrin homology domains. Protein Sci 2003; 12:1934-53. [PMID: 12930993 PMCID: PMC2323991 DOI: 10.1110/ps.0358803] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Phospholipases C (PLCs) reversibly associate with membranes to hydrolyze phosphatidylinositol-4, 5-bisphosphate (PI[4,5]P(2)) and comprise four main classes: beta, gamma, delta, and epsilon. Most eukaryotic PLCs contain a single, N-terminal pleckstrin homology (PH) domain, which is thought to play an important role in membrane targeting. The structure of a single PLC PH domain, that from PLCdelta1, has been determined; this PH domain binds PI(4,5)P(2) with high affinity and stereospecificity and has served as a paradigm for PH domain functionality. However, experimental studies demonstrate that PH domains from different PLC classes exhibit diverse modes of membrane interaction, reflecting the dissimilarity in their amino acid sequences. To elucidate the structural basis for their differential membrane-binding specificities, we modeled the three-dimensional structures of all mammalian PLC PH domains by using bioinformatic tools and calculated their biophysical properties by using continuum electrostatic approaches. Our computational analysis accounts for a large body of experimental data, provides predictions for those PH domains with unknown functions, and indicates functional roles for regions other than the canonical lipid-binding site identified in the PLCdelta1-PH structure. In particular, our calculations predict that (1). members from each of the four PLC classes exhibit strikingly different electrostatic profiles than those ordinarily observed for PH domains in general, (2). nonspecific electrostatic interactions contribute to the membrane localization of PLCdelta-, PLCgamma-, and PLCbeta-PH domains, and (3). phosphorylation regulates the interaction of PLCbeta-PH with its effectors through electrostatic repulsion. Our molecular models for PH domains from all of the PLC classes clearly demonstrate how a common structural fold can serve as a scaffold for a wide range of surface features and biophysical properties that support distinctive functional roles.
Collapse
Affiliation(s)
- Shaneen M Singh
- Department of Microbiology and Immunology, Weill Medical College of Cornell University, New York, New York 10021, USA
| | | |
Collapse
|
41
|
Abstract
Protein residues that are critical for structure and function are expected to be conserved throughout evolution. Here, we investigate the extent to which these conserved residues are clustered in three-dimensional protein structures. In 92% of the proteins in a data set of 79 proteins, the most conserved positions in multiple sequence alignments are significantly more clustered than randomly selected sets of positions. The comparison to random subsets is not necessarily appropriate, however, because the signal could be the result of differences in the amino acid composition of sets of conserved residues compared to random subsets (hydrophobic residues tend to be close together in the protein core), or differences in sequence separation of the residues in the different sets. In order to overcome these limits, we compare the degree of clustering of the conserved positions on the native structure and on alternative conformations generated by the de novo structure prediction method Rosetta. For 65% of the 79 proteins, the conserved residues are significantly more clustered in the native structure than in the alternative conformations, indicating that the clustering of conserved residues in protein structures goes beyond that expected purely from sequence locality and composition effects. The differences in the spatial distribution of conserved residues can be utilized in de novo protein structure prediction: We find that for 79% of the proteins, selection of the Rosetta generated conformations with the greatest clustering of the conserved residues significantly enriches the fraction of close-to-native structures.
Collapse
Affiliation(s)
- Ora Schueler-Furman
- Department of Biochemistry, University of Washington, Seattle, Washington 98195, USA
| | | |
Collapse
|
42
|
John B, Sali A. Comparative protein structure modeling by iterative alignment, model building and model assessment. Nucleic Acids Res 2003; 31:3982-92. [PMID: 12853614 PMCID: PMC165975 DOI: 10.1093/nar/gkg460] [Citation(s) in RCA: 264] [Impact Index Per Article: 12.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Comparative or homology protein structure modeling is severely limited by errors in the alignment of a modeled sequence with related proteins of known three-dimensional structure. To ameliorate this problem, we have developed an automated method that optimizes both the alignment and the model implied by it. This task is achieved by a genetic algorithm protocol that starts with a set of initial alignments and then iterates through re-alignment, model building and model assessment to optimize a model assessment score. During this iterative process: (i) new alignments are constructed by application of a number of operators, such as alignment mutations and cross-overs; (ii) comparative models corresponding to these alignments are built by satisfaction of spatial restraints, as implemented in our program MODELLER; (iii) the models are assessed by a variety of criteria, partly depending on an atomic statistical potential. When testing the procedure on a very difficult set of 19 modeling targets sharing only 4-27% sequence identity with their template structures, the average final alignment accuracy increased from 37 to 45% relative to the initial alignment (the alignment accuracy was measured as the percentage of positions in the tested alignment that were identical to the reference structure-based alignment). Correspondingly, the average model accuracy increased from 43 to 54% (the model accuracy was measured as the percentage of the C(alpha) atoms of the model that were within 5 A of the corresponding C(alpha) atoms in the superposed native structure). The present method also compares favorably with two of the most successful previously described methods, PSI-BLAST and SAM. The accuracy of the final models would be increased further if a better method for ranking of the models were available.
Collapse
Affiliation(s)
- Bino John
- Laboratory of Molecular Biophysics, Pels Family Center for Biochemistry and Structural Biology, The Rockefeller University, New York, NY 10021, USA
| | | |
Collapse
|
43
|
Janin J, Henrick K, Moult J, Eyck LT, Sternberg MJE, Vajda S, Vakser I, Wodak SJ. CAPRI: a Critical Assessment of PRedicted Interactions. Proteins 2003; 52:2-9. [PMID: 12784359 DOI: 10.1002/prot.10381] [Citation(s) in RCA: 470] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
CAPRI is a communitywide experiment to assess the capacity of protein-docking methods to predict protein-protein interactions. Nineteen groups participated in rounds 1 and 2 of CAPRI and submitted blind structure predictions for seven protein-protein complexes based on the known structure of the component proteins. The predictions were compared to the unpublished X-ray structures of the complexes. We describe here the motivations for launching CAPRI, the rules that we applied to select targets and run the experiment, and some conclusions that can already be drawn. The results stress the need for new scoring functions and for methods handling the conformation changes that were observed in some of the target systems. CAPRI has already been a powerful drive for the community of computational biologists who development docking algorithms. We hope that this issue of Proteins will also be of interest to the community of structural biologists, which we call upon to provide new targets for future rounds of CAPRI, and to all molecular biologists who view protein-protein recognition as an essential process.
Collapse
Affiliation(s)
- Joël Janin
- Laboratoire d'Enzymologie et Biochimie Structurales, CNRS, Gif-sur-Yvette, France.
| | | | | | | | | | | | | | | |
Collapse
|
44
|
Swalla BM, Gumport RI, Gardner JF. Conservation of structure and function among tyrosine recombinases: homology-based modeling of the lambda integrase core-binding domain. Nucleic Acids Res 2003; 31:805-18. [PMID: 12560475 PMCID: PMC149183 DOI: 10.1093/nar/gkg142] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Tyrosine recombinases participate in diverse biological processes by catalyzing recombination between specific DNA sites. Although a conserved protein fold has been described for the catalytic (CAT) domains of five recombinases, structural relationships between their core-binding (CB) domains remain unclear. Despite differences in the specificity and affinity of core-type DNA recognition, a conserved binding mechanism is suggested by the shared two-domain motif in crystal structure models of the recombinases Cre, XerD and Flp. We have found additional evidence for conservation of the CB domain fold. Comparison of XerD and Cre crystal structures showed that their CB domains are closely related; the three central alpha-helices of these domains are superposable to within 1.44 A. A structure-based multiple sequence alignment containing 25 diverse CB domain sequences provided evidence for widespread conservation of both structural and functional elements in this fold. Based upon the Cre and XerD crystal structures, we employed homology modeling to construct a three-dimensional structure for the lambda integrase CB domain. The model provides a conceptual framework within which many previously identified, functionally important amino acid residues were investigated. In addition, the model predicts new residues that may participate in core-type DNA binding or dimerization, thereby providing hypotheses for future genetic and biochemical experiments.
Collapse
|
45
|
Wallin S, Farwer J, Bastolla U. Testing similarity measures with continuous and discrete protein models. Proteins 2003; 50:144-57. [PMID: 12471607 DOI: 10.1002/prot.10271] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
There are many ways to define the distance between two protein structures, thus assessing their similarity. Here, we investigate and compare the properties of five different distance measures, including the standard root-mean-square deviation (cRMSD). The performance of these measures is studied from different perspectives with two different protein models, one continuous and the other discrete. Using the continuous model, we examine the correlation between energy and native distance, and the ability of the different measures to discriminate between the two possible topologies of a three-helix bundle. Using the discrete model, we perform fits to real protein structures by minimizing different distance measures. The properties of the fitted structures are found to depend strongly on the distance measure used and the scale considered. We find that the cRMSD measure very effectively describes long-range features but is less effective with short-range features, and it correlates weakly with energy. A stronger correlation with energy and a better description of short-range properties is obtained when we use measures based on intramolecular distances.
Collapse
Affiliation(s)
- Stefan Wallin
- Complex Systems Division, Department of Theoretical Physics, Lund University, Sölvegatan 14A, SE-223 62 Lund, Sweden.
| | | | | |
Collapse
|
46
|
Betz SF, Baxter SM, Fetrow JS. Function first: a powerful approach to post-genomic drug discovery. Drug Discov Today 2002; 7:865-71. [PMID: 12546953 DOI: 10.1016/s1359-6446(02)02398-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
In the post-genomic era, pharmaceutical researchers must evaluate vast numbers of protein sequences and formulate novel, intelligent strategies for identifying valid targets and discovering leads against them. The identification of small molecules that selectively target proteins or protein families will be aided by knowing the function and/or the structure of the target(s). By identifying protein function first, efficiencies are gained that allow subsequent focus of resources on particular protein families of interest. This article reviews current proteomic-scale approaches to identifying function as a way of accelerating lead discovery.
Collapse
Affiliation(s)
- Stephen F Betz
- GeneFormatics, 5830 Oberlin Drive, Suite 200, San Diego, CA 92121, USA
| | | | | |
Collapse
|
47
|
Abstract
Central issues concerning protein structure prediction have been highlighted by the recently published summary of the fourth community-wide protein structure prediction experiment (CASP4). Although sequence/structure alignment remains the bottleneck in comparative modeling, there has been substantial progress in fully automated remote homolog detection and in de novo structure prediction. Significant further progress will probably require improvements in high-resolution modeling.
Collapse
Affiliation(s)
- Jack Schonbrun
- Howard Hughes Medical Institute and Department of Biochemistry, Box 357350, University of Washington, Seattle, Washington 98165, USA
| | | | | |
Collapse
|