1
|
Grønbæk C, Hamelryck T, Røgen P. GISA: using Gauss Integrals to identify rare conformations in protein structures. PeerJ 2020; 8:e9159. [PMID: 32566389 PMCID: PMC7293858 DOI: 10.7717/peerj.9159] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2019] [Accepted: 04/18/2020] [Indexed: 12/13/2022] Open
Abstract
The native structure of a protein is important for its function, and therefore methods for exploring protein structures have attracted much research. However, rather few methods are sensitive to topologic-geometric features, the examples being knots, slipknots, lassos, links, and pokes, and with each method aimed only for a specific set of such configurations. We here propose a general method which transforms a structure into a ”fingerprint of topological-geometric values” consisting in a series of real-valued descriptors from mathematical Knot Theory. The extent to which a structure contains unusual configurations can then be judged from this fingerprint. The method is not confined to a particular pre-defined topology or geometry (like a knot or a poke), and so, unlike existing methods, it is general. To achieve this our new algorithm, GISA, as a key novelty produces the descriptors, so called Gauss integrals, not only for the full chains of a protein but for all its sub-chains. This allows fingerprinting on any scale from local to global. The Gauss integrals are known to be effective descriptors of global protein folds. Applying GISA to sets of several thousand high resolution structures, we first show how the most basic Gauss integral, the writhe, enables swift identification of pre-defined geometries such as pokes and links. We then apply GISA with no restrictions on geometry, to show how it allows identifying rare conformations by finding rare invariant values only. In this unrestricted search, pokes and links are still found, but also knotted conformations, as well as more highly entangled configurations not previously described. Thus, an application of the basic scan method in GISA’s tool-box revealed 10 known cases of knots as the top positive writhe cases, while placing at the top of the negative writhe 14 cases in cis-trans isomerases sharing a spatial motif of little secondary structure content, which possibly has gone unnoticed. Possible general applications of GISA are fold classification and structural alignment based on local Gauss integrals. Others include finding errors in protein models and identifying unusual conformations that might be important for protein folding and function. By its broad potential, we believe that GISA will be of general benefit to the structural bioinformatics community. GISA is coded in C and comes as a command line tool. Source and compiled code for GISA plus read-me and examples are publicly available at GitHub (https://github.com).
Collapse
Affiliation(s)
- Christian Grønbæk
- Department of Biology, University of Copenhagen, Copenhagen, Denmark.,Current affiliation: Department of Biology, Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
| | - Thomas Hamelryck
- Department of Biology, Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Peter Røgen
- DTU COMPUTE, Technical University of Denmark, Kgs. Lyngby, Denmark
| |
Collapse
|
2
|
Orengo C, Velankar S, Wodak S, Zoete V, Bonvin AMJJ, Elofsson A, Feenstra KA, Gerloff DL, Hamelryck T, Hancock JM, Helmer-Citterich M, Hospital A, Orozco M, Perrakis A, Rarey M, Soares C, Sussman JL, Thornton JM, Tuffery P, Tusnady G, Wierenga R, Salminen T, Schneider B. A community proposal to integrate structural bioinformatics activities in ELIXIR (3D-Bioinfo Community). F1000Res 2020; 9. [PMID: 32566135 PMCID: PMC7284151 DOI: 10.12688/f1000research.20559.1] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/05/2020] [Indexed: 12/11/2022] Open
Abstract
Structural bioinformatics provides the scientific methods and tools to analyse, archive, validate, and present the biomolecular structure data generated by the structural biology community. It also provides an important link with the genomics community, as structural bioinformaticians also use the extensive sequence data to predict protein structures and their functional sites. A very broad and active community of structural bioinformaticians exists across Europe, and 3D-Bioinfo will establish formal platforms to address their needs and better integrate their activities and initiatives. Our mission will be to strengthen the ties with the structural biology research communities in Europe covering life sciences, as well as chemistry and physics and to bridge the gap between these researchers in order to fully realize the potential of structural bioinformatics. Our Community will also undertake dedicated educational, training and outreach efforts to facilitate this, bringing new insights and thus facilitating the development of much needed innovative applications e.g. for human health, drug and protein design. Our combined efforts will be of critical importance to keep the European research efforts competitive in this respect. Here we highlight the major European contributions to the field of structural bioinformatics, the most pressing challenges remaining and how Europe-wide interactions, enabled by ELIXIR and its platforms, will help in addressing these challenges and in coordinating structural bioinformatics resources across Europe. In particular, we present recent activities and future plans to consolidate an ELIXIR 3D-Bioinfo Community in structural bioinformatics and propose means to develop better links across the community. These include building new consortia, organising workshops to establish data standards and seeking community agreement on benchmark data sets and strategies. We also highlight existing and planned collaborations with other ELIXIR Communities and other European infrastructures, such as the structural biology community supported by Instruct-ERIC, with whom we have synergies and overlapping common interests.
Collapse
Affiliation(s)
- Christine Orengo
- Structural and Molecular Biology Department, University College, London, UK
| | - Sameer Velankar
- Protein Data Bank in Europe, European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, CB10 1SD, UK
| | - Shoshana Wodak
- VIB-VUB Center for Structural Biology, Brussels, Belgium
| | - Vincent Zoete
- Department of Oncology, Lausanne University, Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Alexandre M J J Bonvin
- Bijvoet Center, Faculty of Science - Chemistry, Utrecht University, Utrecht, 3584CH, The Netherlands
| | - Arne Elofsson
- Science for Life Laboratory, Stockholm University, Solna, S-17121, Sweden
| | - K Anton Feenstra
- Dept. Computer Science, Center for Integrative Bioinformatics VU (IBIVU), Vrije Universiteit, Amsterdam, 1081 HV, The Netherlands
| | - Dietland L Gerloff
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, Belvaux, L-4367, Luxembourg
| | - Thomas Hamelryck
- Bioinformatics center, Department of Biology, University of Copenhagen, Copenhagen, DK-2200, Denmark
| | | | | | - Adam Hospital
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, 08028, Spain
| | - Modesto Orozco
- Institute for Research in Biomedicine, The Barcelona Institute of Science and Technology, Barcelona, 08028, Spain
| | | | - Matthias Rarey
- ZBH - Center for Bioinformatics, Universität Hamburg, Hamburg, D-20146, Germany
| | - Claudio Soares
- Instituto de Tecnologia Química e Biológica Antonio Xavier, Universidade Nova de Lisboa, Lisbon, Portugal
| | - Joel L Sussman
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, 76100, Israel
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, CB10 1SD, UK
| | - Pierre Tuffery
- Ressource Parisienne en Bioinformatique Structurale, Université de Paris, Paris, F-75205, France
| | - Gabor Tusnady
- Membrane Bioinformatics Research Group, Institute of Enzymology, Budapest, H-1117, Hungary
| | | | - Tiina Salminen
- Structural Bioinformatics Laboratory, Åbo Akademi University, Turku, FI-20500, Finland
| | - Bohdan Schneider
- Institute of Biotechnology of the Czech Academy of Sciences, Vestec, CZ-25250, Czech Republic
| |
Collapse
|
3
|
Moreta LS, Al-Sibahi AS, Theobald D, Bullock W, Rommes BN, Manoukian A, Hamelryck T. A Probabilistic Programming Approach to Protein Structure Superposition. Proc IEEE Symp Comput Intell Bioinforma Comput Biol 2019; 2019:10.1109/cibcb.2019.8791469. [PMID: 34661202 PMCID: PMC8515897 DOI: 10.1109/cibcb.2019.8791469] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Optimal superposition of protein structures or other biological molecules is crucial for understanding their structure, function, dynamics and evolution. Here, we investigate the use of probabilistic programming to superimpose protein structures guided by a Bayesian model. Our model THESEUS-PP is based on the THESEUS model, a probabilistic model of protein superposition based on rotation, translation and perturbation of an underlying, latent mean structure. The model was implemented in the probabilistic programming language Pyro. Unlike conventional methods that minimize the sum of the squared distances, THESEUS takes into account correlated atom positions and heteroscedasticity (ie. atom positions can feature different variances). THESEUS performs maximum likelihood estimation using iterative expectation-maximization. In contrast, THESEUS-PP allows automated maximum a-posteriori (MAP) estimation using suitable priors over rotation, translation, variances and latent mean structure. The results indicate that probabilistic programming is a powerful new paradigm for the formulation of Bayesian probabilistic models concerning biomolecular structure. Specifically, we envision the use of the THESEUS-PP model as a suitable error model or likelihood in Bayesian protein structure prediction using deep probabilistic programming.
Collapse
Affiliation(s)
- Lys Sanz Moreta
- Department of Computer Science. University of Copenhagen, Denmark
| | | | - Douglas Theobald
- Department of Biochemistry. Brandeis University. Waltham, MA 02452, USA
| | - William Bullock
- The Bioinformatics Centre. Section for Computational and RNA Biology. University of Copenhagen. Copenhagen, Denmark
| | - Basile Nicolas Rommes
- The Bioinformatics Centre. Section for Computational and RNA Biology. University of Copenhagen. Copenhagen, Denmark
| | - Andreas Manoukian
- The Bioinformatics Centre. Section for Computational and RNA Biology. University of Copenhagen. Copenhagen, Denmark
| | - Thomas Hamelryck
- Department of Computer Science. University of Copenhagen, Denmark
- The Bioinformatics Centre. Section for Computational and RNA Biology. University of Copenhagen. Copenhagen, Denmark
| |
Collapse
|
4
|
Postic G, Hamelryck T, Chomilier J, Stratmann D. MyPMFs: a simple tool for creating statistical potentials to assess protein structural models. Biochimie 2018; 151:37-41. [PMID: 29857183 DOI: 10.1016/j.biochi.2018.05.013] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2018] [Accepted: 05/25/2018] [Indexed: 01/18/2023]
Abstract
Evaluating the model quality of protein structures that evolve in environments with particular physicochemical properties requires scoring functions that are adapted to their specific residue compositions and/or structural characteristics. Thus, computational methods developed for structures from the cytosol cannot work properly on membrane or secreted proteins. Here, we present MyPMFs, an easy-to-use tool that allows users to train statistical potentials of mean force (PMFs) on the protein structures of their choice, with all parameters being adjustable. We demonstrate its use by creating an accurate statistical potential for transmembrane protein domains. We also show its usefulness to study the influence of the physical environment on residue interactions within protein structures. Our open-source software is freely available for download at https://github.com/bibip-impmc/mypmfs.
Collapse
Affiliation(s)
- Guillaume Postic
- Sorbonne Université, UMR 7590 CNRS, MNHN, IRD, Institut de Minéralogie de Physique des Matériaux et de Cosmochimie (IMPMC), Paris, France.
| | - Thomas Hamelryck
- Bioinformatics Centre, Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark; Image Section, Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Jacques Chomilier
- Sorbonne Université, UMR 7590 CNRS, MNHN, IRD, Institut de Minéralogie de Physique des Matériaux et de Cosmochimie (IMPMC), Paris, France
| | - Dirk Stratmann
- Sorbonne Université, UMR 7590 CNRS, MNHN, IRD, Institut de Minéralogie de Physique des Matériaux et de Cosmochimie (IMPMC), Paris, France
| |
Collapse
|
5
|
Abstract
Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both “smooth” conformational changes and “catastrophic” conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof.
Collapse
Affiliation(s)
- Michael Golden
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | - Eduardo García-Portugués
- Department of Statistics, Carlos III University of Madrid, Madrid, Spain.,Department of Mathematical Sciences, University of Copenhagen, Copenhagen, Denmark.,Bioinformatics Centre, Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Michael Sørensen
- Department of Mathematical Sciences, University of Copenhagen, Copenhagen, Denmark
| | - Kanti V Mardia
- Department of Statistics, University of Oxford, Oxford, United Kingdom.,Department of Mathematics, University of Leeds, Leeds, United Kingdom
| | - Thomas Hamelryck
- Bioinformatics Centre, Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark.,Image Section, Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Jotun Hein
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
6
|
Abstract
The inherent flexibility of intrinsically disordered proteins (IDPs) and multi-domain proteins with intrinsically disordered regions (IDRs) presents challenges to structural analysis. These macromolecules need to be represented by an ensemble of conformations, rather than a single structure. Small-angle X-ray scattering (SAXS) experiments capture ensemble-averaged data for the set of conformations. We present a Bayesian approach to ensemble inference from SAXS data, called Bayesian ensemble SAXS (BE-SAXS). We address two issues with existing methods: the use of a finite ensemble of structures to represent the underlying distribution, and the selection of that ensemble as a subset of an initial pool of structures. This is achieved through the formulation of a Bayesian posterior of the conformational space. BE-SAXS modifies a structural prior distribution in accordance with the experimental data. It uses multi-step expectation maximization, with alternating rounds of Markov-chain Monte Carlo simulation and empirical Bayes optimization. We demonstrate the method by employing it to obtain a conformational ensemble of the antitoxin PaaA2 and comparing the results to a published ensemble.
Collapse
Affiliation(s)
- L D Antonov
- Bioinformatics Centre, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - S Olsson
- Laboratory of Physical Chemistry, Swiss Federal Institute of Technology, ETH-Hönggerberg, Vladimir-Prelog-Weg 2, CH-8093 Zürich, Switzerland and Institute for Research in Biomedicine, Università della Svizzera Italiana, Via Vincenzo Vela 6, CH-6500 Bellinzona, Switzerland
| | - W Boomsma
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark
| | - T Hamelryck
- Bioinformatics Centre, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark.
| |
Collapse
|
7
|
Johansson KE, Tidemand Johansen N, Christensen S, Horowitz S, Bardwell JCA, Olsen JG, Willemoës M, Lindorff-Larsen K, Ferkinghoff-Borg J, Hamelryck T, Winther JR. Computational Redesign of Thioredoxin Is Hypersensitive toward Minor Conformational Changes in the Backbone Template. J Mol Biol 2016; 428:4361-4377. [PMID: 27659562 DOI: 10.1016/j.jmb.2016.09.013] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2016] [Revised: 09/08/2016] [Accepted: 09/14/2016] [Indexed: 01/26/2023]
Abstract
Despite the development of powerful computational tools, the full-sequence design of proteins still remains a challenging task. To investigate the limits and capabilities of computational tools, we conducted a study of the ability of the program Rosetta to predict sequences that recreate the authentic fold of thioredoxin. Focusing on the influence of conformational details in the template structures, we based our study on 8 experimentally determined template structures and generated 120 designs from each. For experimental evaluation, we chose six sequences from each of the eight templates by objective criteria. The 48 selected sequences were evaluated based on their progressive ability to (1) produce soluble protein in Escherichia coli and (2) yield stable monomeric protein, and (3) on the ability of the stable, soluble proteins to adopt the target fold. Of the 48 designs, we were able to synthesize 32, 20 of which resulted in soluble protein. Of these, only two were sufficiently stable to be purified. An X-ray crystal structure was solved for one of the designs, revealing a close resemblance to the target structure. We found a significant difference among the eight template structures to realize the above three criteria despite their high structural similarity. Thus, in order to improve the success rate of computational full-sequence design methods, we recommend that multiple template structures are used. Furthermore, this study shows that special care should be taken when optimizing the geometry of a structure prior to computational design when using a method that is based on rigid conformations.
Collapse
Affiliation(s)
- Kristoffer E Johansson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| | - Nicolai Tidemand Johansen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| | - Signe Christensen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| | - Scott Horowitz
- Howard Hughes Medical Institute, Department of Molecular, Cellular and Developmental Biology, University of Michigan, 109 Zina Pitcher Place, Ann Arbor, MI 48109, USA
| | - James C A Bardwell
- Howard Hughes Medical Institute, Department of Molecular, Cellular and Developmental Biology, University of Michigan, 109 Zina Pitcher Place, Ann Arbor, MI 48109, USA
| | - Johan G Olsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| | - Martin Willemoës
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| | - Jesper Ferkinghoff-Borg
- Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| | - Thomas Hamelryck
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| | - Jakob R Winther
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| |
Collapse
|
8
|
Bratholm LA, Christensen AS, Hamelryck T, Jensen JH. Bayesian inference of protein structure from chemical shift data. PeerJ 2015; 3:e861. [PMID: 25825683 PMCID: PMC4375973 DOI: 10.7717/peerj.861] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2014] [Accepted: 03/06/2015] [Indexed: 12/15/2022] Open
Abstract
Protein chemical shifts are routinely used to augment molecular mechanics force fields in protein structure simulations, with weights of the chemical shift restraints determined empirically. These weights, however, might not be an optimal descriptor of a given protein structure and predictive model, and a bias is introduced which might result in incorrect structures. In the inferential structure determination framework, both the unknown structure and the disagreement between experimental and back-calculated data are formulated as a joint probability distribution, thus utilizing the full information content of the data. Here, we present the formulation of such a probability distribution where the error in chemical shift prediction is described by either a Gaussian or Cauchy distribution. The methodology is demonstrated and compared to a set of empirically weighted potentials through Markov chain Monte Carlo simulations of three small proteins (ENHD, Protein G and the SMN Tudor Domain) using the PROFASI force field and the chemical shift predictor CamShift. Using a clustering-criterion for identifying the best structure, together with the addition of a solvent exposure scoring term, the simulations suggests that sampling both the structure and the uncertainties in chemical shift prediction leads more accurate structures compared to conventional methods using empirical determined weights. The Cauchy distribution, using either sampled uncertainties or predetermined weights, did, however, result in overall better convergence to the native fold, suggesting that both types of distribution might be useful in different aspects of the protein structure prediction.
Collapse
Affiliation(s)
- Lars A. Bratholm
- Department of Chemistry, University of Copenhagen, Copenhagen, Denmark
| | | | - Thomas Hamelryck
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jan H. Jensen
- Department of Chemistry, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
9
|
Olsson S, Vögeli BR, Cavalli A, Boomsma W, Ferkinghoff-Borg J, Lindorff-Larsen K, Hamelryck T. Probabilistic Determination of Native State Ensembles of Proteins. J Chem Theory Comput 2014; 10:3484-91. [DOI: 10.1021/ct5001236] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Simon Olsson
- Bioinformatics
Centre, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
- Institute for Research in Biomedicine, CH-6500 Bellinzona, Switzerland
| | - Beat Rolf Vögeli
- Laboratory
of Physical Chemistry, Eidgenössische Technische Hochschule Zürich, 8093 Zürich, Switzerland
| | - Andrea Cavalli
- Institute for Research in Biomedicine, CH-6500 Bellinzona, Switzerland
| | - Wouter Boomsma
- Structural
Biology and NMR Laboratory, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Jesper Ferkinghoff-Borg
- Cellular
Signal Integration Group, Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
| | - Kresten Lindorff-Larsen
- Structural
Biology and NMR Laboratory, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Thomas Hamelryck
- Bioinformatics
Centre, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
10
|
Christensen AS, Hamelryck T, Jensen JH. FragBuilder: an efficient Python library to setup quantum chemistry calculations on peptides models. PeerJ 2014; 2:e277. [PMID: 24688855 PMCID: PMC3961104 DOI: 10.7717/peerj.277] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2013] [Accepted: 01/27/2014] [Indexed: 11/20/2022] Open
Abstract
We present a powerful Python library to quickly and efficiently generate realistic peptide model structures. The library makes it possible to quickly set up quantum mechanical calculations on model peptide structures. It is possible to manually specify a specific conformation of the peptide. Additionally the library also offers sampling of backbone conformations and side chain rotamer conformations from continuous distributions. The generated peptides can then be geometry optimized by the MMFF94 molecular mechanics force field via convenient functions inside the library. Finally, it is possible to output the resulting structures directly to files in a variety of useful formats, such as XYZ or PDB formats, or directly as input files for a quantum chemistry program. FragBuilder is freely available at https://github.com/jensengroup/fragbuilder/ under the terms of the BSD open source license.
Collapse
Affiliation(s)
| | - Thomas Hamelryck
- Department of Biology, University of Copenhagen , Copenhagen , Denmark
| | - Jan H Jensen
- Department of Chemistry, University of Copenhagen , Copenhagen , Denmark
| |
Collapse
|
11
|
Christensen AS, Linnet TE, Borg M, Boomsma W, Lindorff-Larsen K, Hamelryck T, Jensen JH. Protein structure validation and refinement using amide proton chemical shifts derived from quantum mechanics. PLoS One 2013; 8:e84123. [PMID: 24391900 PMCID: PMC3877219 DOI: 10.1371/journal.pone.0084123] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Accepted: 11/11/2013] [Indexed: 11/18/2022] Open
Abstract
We present the ProCS method for the rapid and accurate prediction of protein backbone amide proton chemical shifts--sensitive probes of the geometry of key hydrogen bonds that determine protein structure. ProCS is parameterized against quantum mechanical (QM) calculations and reproduces high level QM results obtained for a small protein with an RMSD of 0.25 ppm (r = 0.94). ProCS is interfaced with the PHAISTOS protein simulation program and is used to infer statistical protein ensembles that reflect experimentally measured amide proton chemical shift values. Such chemical shift-based structural refinements, starting from high-resolution X-ray structures of Protein G, ubiquitin, and SMN Tudor Domain, result in average chemical shifts, hydrogen bond geometries, and trans-hydrogen bond ((h3)J(NC')) spin-spin coupling constants that are in excellent agreement with experiment. We show that the structural sensitivity of the QM-based amide proton chemical shift predictions is needed to obtain this agreement. The ProCS method thus offers a powerful new tool for refining the structures of hydrogen bonding networks to high accuracy with many potential applications such as protein flexibility in ligand binding.
Collapse
Affiliation(s)
| | - Troels E. Linnet
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Mikael Borg
- Structural Bioinformatics Group, Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Wouter Boomsma
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Thomas Hamelryck
- Structural Bioinformatics Group, Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jan H. Jensen
- Department of Chemistry, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
12
|
Olsson S, Frellsen J, Boomsma W, Mardia KV, Hamelryck T. Inference of structure ensembles of flexible biomolecules from sparse, averaged data. PLoS One 2013; 8:e79439. [PMID: 24244505 PMCID: PMC3820694 DOI: 10.1371/journal.pone.0079439] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2013] [Accepted: 09/24/2013] [Indexed: 11/21/2022] Open
Abstract
We present the theoretical foundations of a general principle to infer structure ensembles of flexible biomolecules from spatially and temporally averaged data obtained in biophysical experiments. The central idea is to compute the Kullback-Leibler optimal modification of a given prior distribution with respect to the experimental data and its uncertainty. This principle generalizes the successful inferential structure determination method and recently proposed maximum entropy methods. Tractability of the protocol is demonstrated through the analysis of simulated nuclear magnetic resonance spectroscopy data of a small peptide.
Collapse
Affiliation(s)
- Simon Olsson
- Bioinformatics Centre, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
- * E-mail: (SO); (TH)
| | - Jes Frellsen
- Bioinformatics Centre, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Wouter Boomsma
- Structural Biology and NMR Laboratory, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Kanti V. Mardia
- Department of Statistics, School of Mathematics, University of Leeds, Leeds, United Kingdom
| | - Thomas Hamelryck
- Bioinformatics Centre, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
- * E-mail: (SO); (TH)
| |
Collapse
|
13
|
Valentin JB, Andreetta C, Boomsma W, Bottaro S, Ferkinghoff-Borg J, Frellsen J, Mardia KV, Tian P, Hamelryck T. Formulation of probabilistic models of protein structure in atomic detail using the reference ratio method. Proteins 2013; 82:288-99. [PMID: 23934827 DOI: 10.1002/prot.24386] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Revised: 07/02/2013] [Accepted: 07/18/2013] [Indexed: 01/10/2023]
Abstract
We propose a method to formulate probabilistic models of protein structure in atomic detail, for a given amino acid sequence, based on Bayesian principles, while retaining a close link to physics. We start from two previously developed probabilistic models of protein structure on a local length scale, which concern the dihedral angles in main chain and side chains, respectively. Conceptually, this constitutes a probabilistic and continuous alternative to the use of discrete fragment and rotamer libraries. The local model is combined with a nonlocal model that involves a small number of energy terms according to a physical force field, and some information on the overall secondary structure content. In this initial study we focus on the formulation of the joint model and the evaluation of the use of an energy vector as a descriptor of a protein's nonlocal structure; hence, we derive the parameters of the nonlocal model from the native structure without loss of generality. The local and nonlocal models are combined using the reference ratio method, which is a well-justified probabilistic construction. For evaluation, we use the resulting joint models to predict the structure of four proteins. The results indicate that the proposed method and the probabilistic models show considerable promise for probabilistic protein structure prediction and related applications.
Collapse
Affiliation(s)
- Jan B Valentin
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | | | | | | | | | | |
Collapse
|
14
|
Boomsma W, Frellsen J, Harder T, Bottaro S, Johansson KE, Tian P, Stovgaard K, Andreetta C, Olsson S, Valentin JB, Antonov LD, Christensen AS, Borg M, Jensen JH, Lindorff-Larsen K, Ferkinghoff-Borg J, Hamelryck T. PHAISTOS: a framework for Markov chain Monte Carlo simulation and inference of protein structure. J Comput Chem 2013; 34:1697-705. [PMID: 23619610 DOI: 10.1002/jcc.23292] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2012] [Revised: 03/14/2013] [Accepted: 03/20/2013] [Indexed: 11/10/2022]
Abstract
We present a new software framework for Markov chain Monte Carlo sampling for simulation, prediction, and inference of protein structure. The software package contains implementations of recent advances in Monte Carlo methodology, such as efficient local updates and sampling from probabilistic models of local protein structure. These models form a probabilistic alternative to the widely used fragment and rotamer libraries. Combined with an easily extendible software architecture, this makes PHAISTOS well suited for Bayesian inference of protein structure from sequence and/or experimental data. Currently, two force-fields are available within the framework: PROFASI and OPLS-AA/L, the latter including the generalized Born surface area solvent model. A flexible command-line and configuration-file interface allows users quickly to set up simulations with the desired configuration. PHAISTOS is released under the GNU General Public License v3.0. Source code and documentation are freely available from http://phaistos.sourceforge.net. The software is implemented in C++ and has been tested on Linux and OSX platforms.
Collapse
Affiliation(s)
- Wouter Boomsma
- Department of Biology, University of Copenhagen, Copenhagen, 2200, Denmark
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
15
|
Johansson KE, Hamelryck T. A simple probabilistic model of multibody interactions in proteins. Proteins 2013; 81:1340-50. [DOI: 10.1002/prot.24277] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2012] [Revised: 01/31/2013] [Accepted: 02/18/2013] [Indexed: 11/10/2022]
Affiliation(s)
- Kristoffer Enøe Johansson
- Section for Biomolecular Sciences; Department of Biology, University of Copenhagen; Ole Maal⊘es Vej 5, DK-2200 Copenhagen N Denmark
| | - Thomas Hamelryck
- Section for Computational and RNA biology; Department of Biology, University of Copenhagen; Room 1.2.22, Ole Maal⊘es Vej 5 DK-2200 Copenhagen N Denmark
| |
Collapse
|
16
|
Mardia KV, Kent JT, Zhang Z, Taylor CC, Hamelryck T. Mixtures of concentrated multivariate sine distributions with applications to bioinformatics. J Appl Stat 2012. [DOI: 10.1080/02664763.2012.719221] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
17
|
Harder T, Borg M, Bottaro S, Boomsma W, Olsson S, Ferkinghoff-Borg J, Hamelryck T. An Efficient Null Model for Conformational Fluctuations in Proteins. Structure 2012; 20:1028-39. [DOI: 10.1016/j.str.2012.03.020] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2011] [Revised: 03/08/2012] [Accepted: 03/12/2012] [Indexed: 10/28/2022]
|
18
|
Bottaro S, Boomsma W, E. Johansson K, Andreetta C, Hamelryck T, Ferkinghoff-Borg J. Subtle Monte Carlo Updates in Dense Molecular Systems. J Chem Theory Comput 2012; 8:695-702. [DOI: 10.1021/ct200641m] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Sandro Bottaro
- Department of Electrical Engineering, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Wouter Boomsma
- Department of Electrical Engineering, Technical University of Denmark, Kgs. Lyngby, Denmark
- Department of Astronomy and Theoretical Physics, Lund University, Lund, Sweden
| | | | | | - Thomas Hamelryck
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | |
Collapse
|
19
|
|
20
|
Harder T, Borg M, Boomsma W, Røgen P, Hamelryck T. Fast large-scale clustering of protein structures using Gauss integrals. Bioinformatics 2011; 28:510-5. [DOI: 10.1093/bioinformatics/btr692] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
21
|
Olsson S, Boomsma W, Frellsen J, Bottaro S, Harder T, Ferkinghoff-Borg J, Hamelryck T. Generative probabilistic models extend the scope of inferential structure determination. J Magn Reson 2011; 213:182-186. [PMID: 21993764 DOI: 10.1016/j.jmr.2011.08.039] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Revised: 08/19/2011] [Accepted: 08/30/2011] [Indexed: 05/31/2023]
Abstract
Conventional methods for protein structure determination from NMR data rely on the ad hoc combination of physical forcefields and experimental data, along with heuristic determination of free parameters such as weight of experimental data relative to a physical forcefield. Recently, a theoretically rigorous approach was developed which treats structure determination as a problem of Bayesian inference. In this case, the forcefields are brought in as a prior distribution in the form of a Boltzmann factor. Due to high computational cost, the approach has been only sparsely applied in practice. Here, we demonstrate that the use of generative probabilistic models instead of physical forcefields in the Bayesian formalism is not only conceptually attractive, but also improves precision and efficiency. Our results open new vistas for the use of sophisticated probabilistic models of biomolecular structure in structure determination from experimental data.
Collapse
Affiliation(s)
- Simon Olsson
- Bioinformatics Center, University of Copenhagen, Department of Biology, Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| | | | | | | | | | | | | |
Collapse
|
22
|
Hamelryck T, Borg M, Paluszewski M, Paulsen J, Frellsen J, Andreetta C, Boomsma W, Bottaro S, Ferkinghoff-Borg J. Potentials of mean force for protein structure prediction vindicated, formalized and generalized. PLoS One 2010; 5:e13714. [PMID: 21103041 PMCID: PMC2978081 DOI: 10.1371/journal.pone.0013714] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2010] [Accepted: 10/04/2010] [Indexed: 11/26/2022] Open
Abstract
Understanding protein structure is of crucial importance in science, medicine and biotechnology. For about two decades, knowledge-based potentials based on pairwise distances – so-called “potentials of mean force” (PMFs) – have been center stage in the prediction and design of protein structure and the simulation of protein folding. However, the validity, scope and limitations of these potentials are still vigorously debated and disputed, and the optimal choice of the reference state – a necessary component of these potentials – is an unsolved problem. PMFs are loosely justified by analogy to the reversible work theorem in statistical physics, or by a statistical argument based on a likelihood function. Both justifications are insightful but leave many questions unanswered. Here, we show for the first time that PMFs can be seen as approximations to quantities that do have a rigorous probabilistic justification: they naturally arise when probability distributions over different features of proteins need to be combined. We call these quantities “reference ratio distributions” deriving from the application of the “reference ratio method.” This new view is not only of theoretical relevance but leads to many insights that are of direct practical use: the reference state is uniquely defined and does not require external physical insights; the approach can be generalized beyond pairwise distances to arbitrary features of protein structure; and it becomes clear for which purposes the use of these quantities is justified. We illustrate these insights with two applications, involving the radius of gyration and hydrogen bonding. In the latter case, we also show how the reference ratio method can be iteratively applied to sculpt an energy funnel. Our results considerably increase the understanding and scope of energy functions derived from known biomolecular structures.
Collapse
Affiliation(s)
- Thomas Hamelryck
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- * E-mail: (TH); (JFB)
| | - Mikael Borg
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Martin Paluszewski
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jonas Paulsen
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jes Frellsen
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Christian Andreetta
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Wouter Boomsma
- Biomedical Engineering, Technical University of Denmark (DTU) Elektro, Technical University of Denmark, Lyngby, Denmark
- Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Sandro Bottaro
- Biomedical Engineering, Technical University of Denmark (DTU) Elektro, Technical University of Denmark, Lyngby, Denmark
| | - Jesper Ferkinghoff-Borg
- Biomedical Engineering, Technical University of Denmark (DTU) Elektro, Technical University of Denmark, Lyngby, Denmark
- * E-mail: (TH); (JFB)
| |
Collapse
|
23
|
Stovgaard K, Andreetta C, Ferkinghoff-Borg J, Hamelryck T. Calculation of accurate small angle X-ray scattering curves from coarse-grained protein models. BMC Bioinformatics 2010; 11:429. [PMID: 20718956 PMCID: PMC2931518 DOI: 10.1186/1471-2105-11-429] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2010] [Accepted: 08/18/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Genome sequencing projects have expanded the gap between the amount of known protein sequences and structures. The limitations of current high resolution structure determination methods make it unlikely that this gap will disappear in the near future. Small angle X-ray scattering (SAXS) is an established low resolution method for routinely determining the structure of proteins in solution. The purpose of this study is to develop a method for the efficient calculation of accurate SAXS curves from coarse-grained protein models. Such a method can for example be used to construct a likelihood function, which is paramount for structure determination based on statistical inference. RESULTS We present a method for the efficient calculation of accurate SAXS curves based on the Debye formula and a set of scattering form factors for dummy atom representations of amino acids. Such a method avoids the computationally costly iteration over all atoms. We estimated the form factors using generated data from a set of high quality protein structures. No ad hoc scaling or correction factors are applied in the calculation of the curves. Two coarse-grained representations of protein structure were investigated; two scattering bodies per amino acid led to significantly better results than a single scattering body. CONCLUSION We show that the obtained point estimates allow the calculation of accurate SAXS curves from coarse-grained protein models. The resulting curves are on par with the current state-of-the-art program CRYSOL, which requires full atomic detail. Our method was also comparable to CRYSOL in recognizing native structures among native-like decoys. As a proof-of-concept, we combined the coarse-grained Debye calculation with a previously described probabilistic model of protein structure, TorusDBN. This resulted in a significant improvement in the decoy recognition performance. In conclusion, the presented method shows great promise for use in statistical inference of protein structures from SAXS data.
Collapse
Affiliation(s)
- Kasper Stovgaard
- Department of Biology, University of Copenhagen, The Bioinformatics Centre, Denmark
| | | | | | | |
Collapse
|
24
|
Harder T, Boomsma W, Paluszewski M, Frellsen J, Johansson KE, Hamelryck T. Beyond rotamers: a generative, probabilistic model of side chains in proteins. BMC Bioinformatics 2010; 11:306. [PMID: 20525384 PMCID: PMC2902450 DOI: 10.1186/1471-2105-11-306] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2010] [Accepted: 06/05/2010] [Indexed: 11/21/2022] Open
Abstract
Background Accurately covering the conformational space of amino acid side chains is essential for important applications such as protein design, docking and high resolution structure prediction. Today, the most common way to capture this conformational space is through rotamer libraries - discrete collections of side chain conformations derived from experimentally determined protein structures. The discretization can be exploited to efficiently search the conformational space. However, discretizing this naturally continuous space comes at the cost of losing detailed information that is crucial for certain applications. For example, rigorously combining rotamers with physical force fields is associated with numerous problems. Results In this work we present BASILISK: a generative, probabilistic model of the conformational space of side chains that makes it possible to sample in continuous space. In addition, sampling can be conditional upon the protein's detailed backbone conformation, again in continuous space - without involving discretization. Conclusions A careful analysis of the model and a comparison with various rotamer libraries indicates that the model forms an excellent, fully continuous model of side chain conformational space. We also illustrate how the model can be used for rigorous, unbiased sampling with a physical force field, and how it improves side chain prediction when used as a pseudo-energy term. In conclusion, BASILISK is an important step forward on the way to a rigorous probabilistic description of protein structure in continuous space and in atomic detail.
Collapse
Affiliation(s)
- Tim Harder
- The Bioinformatics Section, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | | | | |
Collapse
|
25
|
Paluszewski M, Hamelryck T. Mocapy++--a toolkit for inference and learning in dynamic Bayesian networks. BMC Bioinformatics 2010; 11:126. [PMID: 20226024 PMCID: PMC2848649 DOI: 10.1186/1471-2105-11-126] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2009] [Accepted: 03/12/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Mocapy++ is a toolkit for parameter learning and inference in dynamic Bayesian networks (DBNs). It supports a wide range of DBN architectures and probability distributions, including distributions from directional statistics (the statistics of angles, directions and orientations). RESULTS The program package is freely available under the GNU General Public Licence (GPL) from SourceForge http://sourceforge.net/projects/mocapy. The package contains the source for building the Mocapy++ library, several usage examples and the user manual. CONCLUSIONS Mocapy++ is especially suitable for constructing probabilistic models of biomolecular structure, due to its support for directional statistics. In particular, it supports the Kent distribution on the sphere and the bivariate von Mises distribution on the torus. These distributions have proven useful to formulate probabilistic models of protein and RNA structure in atomic detail.
Collapse
|
26
|
Frellsen J, Moltke I, Thiim M, Mardia KV, Ferkinghoff-Borg J, Hamelryck T. A probabilistic model of RNA conformational space. PLoS Comput Biol 2009; 5:e1000406. [PMID: 19543381 PMCID: PMC2691987 DOI: 10.1371/journal.pcbi.1000406] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2009] [Accepted: 05/06/2009] [Indexed: 11/29/2022] Open
Abstract
The increasing importance of non-coding RNA in biology and medicine has led to a growing interest in the problem of RNA 3-D structure prediction. As is the case for proteins, RNA 3-D structure prediction methods require two key ingredients: an accurate energy function and a conformational sampling procedure. Both are only partly solved problems. Here, we focus on the problem of conformational sampling. The current state of the art solution is based on fragment assembly methods, which construct plausible conformations by stringing together short fragments obtained from experimental structures. However, the discrete nature of the fragments necessitates the use of carefully tuned, unphysical energy functions, and their non-probabilistic nature impairs unbiased sampling. We offer a solution to the sampling problem that removes these important limitations: a probabilistic model of RNA structure that allows efficient sampling of RNA conformations in continuous space, and with associated probabilities. We show that the model captures several key features of RNA structure, such as its rotameric nature and the distribution of the helix lengths. Furthermore, the model readily generates native-like 3-D conformations for 9 out of 10 test structures, solely using coarse-grained base-pairing information. In conclusion, the method provides a theoretical and practical solution for a major bottleneck on the way to routine prediction and simulation of RNA structure and dynamics in atomic detail.
Collapse
Affiliation(s)
- Jes Frellsen
- The Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Ida Moltke
- The Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Martin Thiim
- The Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kanti V. Mardia
- Department of Statistics, University of Leeds, Leeds, United Kingdom
| | | | - Thomas Hamelryck
- The Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
27
|
Cock PJA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B, de Hoon MJL. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 2009; 25:1422-3. [PMID: 19304878 PMCID: PMC2682512 DOI: 10.1093/bioinformatics/btp163] [Citation(s) in RCA: 2773] [Impact Index Per Article: 184.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
SUMMARY The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Biopython includes modules for reading and writing different sequence file formats and multiple sequence alignments, dealing with 3D macro molecular structures, interacting with common tools such as BLAST, ClustalW and EMBOSS, accessing key online databases, as well as providing numerical methods for statistical learning. AVAILABILITY Biopython is freely available, with documentation and source code at (www.biopython.org) under the Biopython license.
Collapse
|
28
|
Abstract
Structural bioinformatics is concerned with the molecular structure of biomacromolecules on a genomic scale, using computational methods. Classic problems in structural bioinformatics include the prediction of protein and RNA structure from sequence, the design of artificial proteins or enzymes, and the automated analysis and comparison of biomacromolecules in atomic detail. The determination of macromolecular structure from experimental data (for example coming from nuclear magnetic resonance, X-ray crystallography or small angle X-ray scattering) has close ties with the field of structural bioinformatics. Recently, probabilistic models and machine learning methods based on Bayesian principles are providing efficient and rigorous solutions to challenging problems that were long regarded as intractable. In this review, I will highlight some important recent developments in the prediction, analysis and experimental determination of macromolecular structure that are based on such methods. These developments include generative models of protein structure, the estimation of the parameters of energy functions that are used in structure prediction, the superposition of macromolecules and structure determination methods that are based on inference. Although this review is not exhaustive, I believe the selected topics give a good impression of the exciting new, probabilistic road the field of structural bioinformatics is taking.
Collapse
Affiliation(s)
- Thomas Hamelryck
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen N, Denmark.
| |
Collapse
|
29
|
Won KJ, Hamelryck T, Prügel-Bennett A, Krogh A. An evolutionary method for learning HMM structure: prediction of protein secondary structure. BMC Bioinformatics 2007; 8:357. [PMID: 17888163 PMCID: PMC2072961 DOI: 10.1186/1471-2105-8-357] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2007] [Accepted: 09/21/2007] [Indexed: 11/24/2022] Open
Abstract
Background The prediction of the secondary structure of proteins is one of the most studied problems in bioinformatics. Despite their success in many problems of biological sequence analysis, Hidden Markov Models (HMMs) have not been used much for this problem, as the complexity of the task makes manual design of HMMs difficult. Therefore, we have developed a method for evolving the structure of HMMs automatically, using Genetic Algorithms (GAs). Results In the GA procedure, populations of HMMs are assembled from biologically meaningful building blocks. Mutation and crossover operators were designed to explore the space of such Block-HMMs. After each step of the GA, the standard HMM estimation algorithm (the Baum-Welch algorithm) was used to update model parameters. The final HMM captures several features of protein sequence and structure, with its own HMM grammar. In contrast to neural network based predictors, the evolved HMM also calculates the probabilities associated with the predictions. We carefully examined the performance of the HMM based predictor, both under the multiple- and single-sequence condition. Conclusion We have shown that the proposed evolutionary method can automatically design the topology of HMMs. The method reads the grammar of protein sequences and converts it into the grammar of an HMM. It improved previously suggested evolutionary methods and increased the prediction quality. Especially, it shows good performance under the single-sequence condition and provides probabilistic information on the prediction result. The protein secondary structure predictor using HMMs (P.S.HMM) is on-line available http://www.binf.ku.dk/~won/pshmm.htm. It runs under the single-sequence condition.
Collapse
Affiliation(s)
- Kyoung-Jae Won
- Bioinformatics Centre, Department of Molecular Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen, Denmark
- School of Electronics and Computer Science, University of Southampton, SO17 1BJ, UK
- Department of Chemistry & Biochemistry, UCSD, 9500 Gilman Drive, Mail Code 0359, La Jolla, CA, 92093-0359, USA
| | - Thomas Hamelryck
- Bioinformatics Centre, Department of Molecular Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen, Denmark
| | - Adam Prügel-Bennett
- School of Electronics and Computer Science, University of Southampton, SO17 1BJ, UK
| | - Anders Krogh
- Bioinformatics Centre, Department of Molecular Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen, Denmark
| |
Collapse
|
30
|
Paluszewski M, Hamelryck T, Winter P. Reconstructing protein structure from solvent exposure using tabu search. Algorithms Mol Biol 2006; 1:20. [PMID: 17069644 PMCID: PMC1635054 DOI: 10.1186/1748-7188-1-20] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2006] [Accepted: 10/27/2006] [Indexed: 11/10/2022] Open
Abstract
Background A new, promising solvent exposure measure, called half-sphere-exposure (HSE), has recently been proposed. Here, we study the reconstruction of a protein's Cα trace solely from structure-derived HSE information. This problem is of relevance for de novo structure prediction using predicted HSE measure. For comparison, we also consider the well-established contact number (CN) measure. We define energy functions based on the HSE- or CN-vectors and minimize them using two conformational search heuristics: Monte Carlo simulation (MCS) and tabu search (TS). While MCS has been the dominant conformational search heuristic in literature, TS has been applied only a few times. To discretize the conformational space, we use lattice models with various complexity. Results The proposed TS heuristic with a novel tabu definition generally performs better than MCS for this problem. Our experiments show that, at least for small proteins (up to 35 amino acids), it is possible to reconstruct the protein backbone solely from the HSE or CN information. In general, the HSE measure leads to better models than the CN measure, as judged by the RMSD and the angle correlation with the native structure. The angle correlation, a measure of structural similarity, evaluates whether equivalent residues in two structures have the same general orientation. Our results indicate that the HSE measure is potentially very useful to represent solvent exposure in protein structure prediction, design and simulation.
Collapse
Affiliation(s)
- Martin Paluszewski
- Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100 Copenhagen, Denmark
| | - Thomas Hamelryck
- Bioinformatics Center, Institute of Molecular Biology, University of Copenhagen, Universitetsparken 15 building 10, 2100 Copenhagen, Denmark
| | - Pawel Winter
- Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100 Copenhagen, Denmark
| |
Collapse
|
31
|
Baranov PV, Vestergaard B, Hamelryck T, Gesteland RF, Nyborg J, Atkins JF. Diverse bacterial genomes encode an operon of two genes, one of which is an unusual class-I release factor that potentially recognizes atypical mRNA signals other than normal stop codons. Biol Direct 2006; 1:28. [PMID: 16970810 PMCID: PMC1586002 DOI: 10.1186/1745-6150-1-28] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2006] [Accepted: 09/13/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND While all codons that specify amino acids are universally recognized by tRNA molecules, codons signaling termination of translation are recognized by proteins known as class-I release factors (RF). In most eukaryotes and archaea a single RF accomplishes termination at all three stop codons. In most bacteria, there are two RFs with overlapping specificity, RF1 recognizes UA(A/G) and RF2 recognizes U(A/G)A. THE HYPOTHESIS First, we hypothesize that orthologues of the E. coli K12 pseudogene prfH encode a third class-I RF that we designate RFH. Second, it is likely that RFH responds to signals other than conventional stop codons. Supporting evidence comes from the following facts: (i) A number of bacterial genomes contain prfH orthologues with no discernable interruptions in their ORFs. (ii) RFH shares strong sequence similarity with other class-I bacterial RFs. (iii) RFH contains a highly conserved GGQ motif associated with peptidyl hydrolysis activity (iv) residues located in the areas supposedly interacting with mRNA and the ribosomal decoding center are highly conserved in RFH, but different from other RFs. RFH lacks the functional, but non-essential domain 1. Yet, RFH-encoding genes are invariably accompanied by a highly conserved gene of unknown function, which is absent in genomes that lack a gene for RFH. The accompanying gene is always located upstream of the RFH gene and with the same orientation. The proximity of the 3' end of the former with the 5' end of the RFH gene makes it likely that their expression is co-regulated via translational coupling. In summary, RFH has the characteristics expected for a class-I RF, but likely with different specificity than RF1 and RF2. TESTING THE HYPOTHESIS The most puzzling question is which signals RFH recognizes to trigger its release function. Genetic swapping of RFH mRNA recognition components with its RF1 or RF2 counterparts may reveal the nature of RFH signals. IMPLICATIONS OF THE HYPOTHESIS The hypothesis implies a greater versatility of release-factor like activity in the ribosomal A-site than previously appreciated. A closer study of RFH may provide insight into the evolution of the genetic code and of the translational machinery responsible for termination of translation. REVIEWERS This article was reviewed by Daniel Wilson (nominated by Eugene Koonin), Warren Tate (nominated by Eugene Koonin), Yoshikazu Nakamura (nominated by Eugene Koonin) and Eugene Koonin.
Collapse
Affiliation(s)
- Pavel V Baranov
- Bioscience Institute, University College Cork, Cork, Ireland
- Department of Human Genetics, University of Utah, 15N 2030E, Salt Lake City, UT84112-5330, USA
| | - Bente Vestergaard
- Department of Molecular Biology, University of Aarhus, Gustav Wieds Vej 10C, DK-8000 Aarhus C, Denmark
- Department of Medicinal Chemistry, Danish University of Pharmaceutical Sciences, Universitetsparken 2, DK-2100 Copenhagen, Denmark
| | - Thomas Hamelryck
- Bioinformatics center, Institute of Molecular Biology and Physiology, University of Copenhagen, Universitetsparken 15, Building 10, 2100 Copenhagen, Denmark
| | | | - Jens Nyborg
- Department of Molecular Biology, University of Aarhus, Gustav Wieds Vej 10C, DK-8000 Aarhus C, Denmark
| | - John F Atkins
- Bioscience Institute, University College Cork, Cork, Ireland
- Department of Human Genetics, University of Utah, 15N 2030E, Salt Lake City, UT84112-5330, USA
| |
Collapse
|
32
|
Abstract
The prediction of protein structure from sequence remains a major unsolved problem in biology. The most successful protein structure prediction methods make use of a divide-and-conquer strategy to attack the problem: a conformational sampling method generates plausible candidate structures, which are subsequently accepted or rejected using an energy function. Conceptually, this often corresponds to separating local structural bias from the long-range interactions that stabilize the compact, native state. However, sampling protein conformations that are compatible with the local structural bias encoded in a given protein sequence is a long-standing open problem, especially in continuous space. We describe an elegant and mathematically rigorous method to do this, and show that it readily generates native-like protein conformations simply by enforcing compactness. Our results have far-reaching implications for protein structure prediction, determination, simulation, and design. Protein structure prediction is one of the main unsolved problems in computational biology today. A common way to tackle the problem is to generate plausible protein conformations using a fairly inaccurate but fast method, and to evaluate the conformations using an accurate but slow method. The main bottleneck lies in the first step, that is, efficiently exploring protein conformational space. Currently, the best way to do this is to construct plausible structures by stringing together fragments from experimentally determined protein structures, a method called fragment assembly. Hamelryck, Kent, and Krogh present a new method that can efficiently generate protein conformations that are compatible with a given protein sequence. Unlike for existing methods, the generated conformations cover a continuous range and come with an associated probability. The method shows great promise for use in protein structure prediction, determination, simulation, and design.
Collapse
Affiliation(s)
- Thomas Hamelryck
- Bioinformatics Center, Institute of Molecular Biology and Physiology, University of Copenhagen, Copenhagen, Denmark.
| | | | | |
Collapse
|
33
|
Boomsma W, Hamelryck T. Full cyclic coordinate descent: solving the protein loop closure problem in Calpha space. BMC Bioinformatics 2005; 6:159. [PMID: 15985178 PMCID: PMC1192790 DOI: 10.1186/1471-2105-6-159] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2005] [Accepted: 06/28/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Various forms of the so-called loop closure problem are crucial to protein structure prediction methods. Given an N- and a C-terminal end, the problem consists of finding a suitable segment of a certain length that bridges the ends seamlessly. In homology modelling, the problem arises in predicting loop regions. In de novo protein structure prediction, the problem is encountered when implementing local moves for Markov Chain Monte Carlo simulations. Most loop closure algorithms keep the bond angles fixed or semi-fixed, and only vary the dihedral angles. This is appropriate for a full-atom protein backbone, since the bond angles can be considered as fixed, while the (phi, psi) dihedral angles are variable. However, many de novo structure prediction methods use protein models that only consist of Calpha atoms, or otherwise do not make use of all backbone atoms. These methods require a method that alters both bond and dihedral angles, since the pseudo bond angle between three consecutive Calpha atoms also varies considerably. RESULTS Here we present a method that solves the loop closure problem for Calpha only protein models. We developed a variant of Cyclic Coordinate Descent (CCD), an inverse kinematics method from the field of robotics, which was recently applied to the loop closure problem. Since the method alters both bond and dihedral angles, which is equivalent to applying a full rotation matrix, we call our method Full CCD (FCDD). FCCD replaces CCD's vector-based optimization of a rotation around an axis with a singular value decomposition-based optimization of a general rotation matrix. The method is easy to implement and numerically stable. CONCLUSION We tested the method's performance on sets of random protein Calpha segments between 5 and 30 amino acids long, and a number of loops of length 4, 8 and 12. FCCD is fast, has a high success rate and readily generates conformations close to those of real loops. The presence of constraints on the angles only has a small effect on the performance. A reference implementation of FCCD in Python is available as supplementary information.
Collapse
Affiliation(s)
- Wouter Boomsma
- Bioinformatics center, Institute of Molecular Biology and Physiology, University of Copenhagen, Universitetsparken 15, Building 10, DK-2100 Copenhagen, Denmark
| | - Thomas Hamelryck
- Bioinformatics center, Institute of Molecular Biology and Physiology, University of Copenhagen, Universitetsparken 15, Building 10, DK-2100 Copenhagen, Denmark
| |
Collapse
|
34
|
|
35
|
Abstract
UNLABELLED The biopython project provides a set of bioinformatics tools implemented in Python. Recently, biopython was extended with a set of modules that deal with macromolecular structure. Biopython now contains a parser for PDB files that makes the atomic information available in an easy-to-use but powerful data structure. The parser and data structure deal with features that are often left out or handled inadequately by other packages, e.g. atom and residue disorder (if point mutants are present in the crystal), anisotropic B factors, multiple models and insertion codes. In addition, the parser performs some sanity checking to detect obvious errors. AVAILABILITY The Biopython distribution (including source code and documentation) is freely available (under the Biopython license) from http://www.biopython.org
Collapse
Affiliation(s)
- Thomas Hamelryck
- Department of Cellular and Molecular Interactions, Vlaams Interuniversitair Instituut voor Biotechnologie and Computational Modeling Lab, Department of Computer Science, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium.
| | | |
Collapse
|
36
|
Abstract
Convergent evolution often produces similar functional sites in nonhomologous proteins. The identification of these sites can make it possible to infer function from structure, to pinpoint the location of a functional site, to identify enzymes with similar enzymatic mechanisms, or to discover putative functional sites. In this article, a novel method is presented that (a) queries a database of protein structures for the occurrence of a given side chain pattern and (b) identifies interesting side-chain patterns in a given structure. For efficiency and to make a robust statistical evaluation of the significance of a similarity possible, patterns of three residues (or triads) are considered. Each triad is encoded as a high-dimensional vector and stored in an SR (Sphere/Rectangle) tree, an efficient multidimensional index tree. Identifying similar triads can then be reformulated as identifying neighboring vectors. The method deals with many features that otherwise complicate the identification of meaningful patterns: shifted backbone positions, conservative substitutions, various atom label ambiguities and mirror imaged geometries. The combined treatment of these features leads to the identification of previously unidentified patterns. In particular, the identification of mirror imaged side-chain patterns is unique to the here-described method. Interesting triads in a given structure can be identified by extracting all triads and comparing them with a database of triads involved in ligand binding. The approach was tested by an all-against-all comparison of unique representatives of all SCOP superfamilies. New findings include mirror imaged metal binding and active sites, and a putative active site in bacterial luciferase.
Collapse
Affiliation(s)
- Thomas Hamelryck
- ULTR Department, Vrije Universiteit Brussel (VUB), Vlaams Interuniversitair Instituut voor Biotechnologie (VIB), Brussel, Belgium.
| |
Collapse
|
37
|
Buts L, Dao-Thi MH, Loris R, Wyns L, Etzler M, Hamelryck T. Weak protein-protein interactions in lectins: the crystal structure of a vegetative lectin from the legume Dolichos biflorus. J Mol Biol 2001; 309:193-201. [PMID: 11491289 DOI: 10.1006/jmbi.2001.4639] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The legume lectins are widely used as a model system for studying protein-carbohydrate and protein-protein interactions. They exhibit a fascinating quaternary structure variation, which becomes important when they interact with multivalent glycoconjugates, for instance those on cell surfaces. Recently, it has become clear that certain lectins form weakly associated oligomers. This phenomenon may play a role in the regulation of receptor crosslinking and subsequent signal transduction. The crystal structure of DB58, a dimeric lectin from the legume Dolichos biflorus reveals a separate dimer of a previously unobserved type, in addition to a tetramer consisting of two such dimers. This tetramer resembles that formed by DBL, the seed lectin from the same plant. A single amino acid substitution in DB58 affects the conformation and flexibility of a loop in the canonical dimer interface. This disrupts the formation of a stable DBL-like tetramer in solution, but does not prohibit its formation in suitable conditions, which greatly increases the possibilities for the cross-linking of multivalent ligands. The non-canonical DB58 dimer has a buried symmetrical alpha helix, which can be present in the crystal in either of two antiparallel orientations. Two existing structures and datasets for lectins with similar quaternary structures were reconsidered. A central alpha helix could be observed in the soybean lectin, but not in the leucoagglutinating lectin from Phaseolus vulgaris. The relative position and orientation of the carbohydrate-binding sites in the DB58 dimer may affect its ability to crosslink mulitivalent ligands, compared to the other legume lectin dimers.
Collapse
Affiliation(s)
- L Buts
- ULTR-Ultrastructure Department, Vrije Universiteit Brussel, Sint-Genesius-Rode Belgium.
| | | | | | | | | | | |
Collapse
|
38
|
Abstract
Several novel structures of legume lectins have led to a thorough understanding of monosaccharide and oligosaccharide specificity, to the determination of novel and surprising quaternary structures and, most importantly, to the structural identification of the binding site for adenine and plant hormones. This deepening of our understanding of the structure/function relationships among the legume lectins is paralleled by advances in two other plant lectin families - the monocot lectins and the jacalin family. As the number of available crystal structures increases, more parallels between plant and animal lectins become apparent.
Collapse
Affiliation(s)
- J Bouckaert
- Laboratorium voor Ultrastruktuur, Vlaams Interuniversitair Instituut voor Biotechnologie, Vrije Universiteit Brussel, Paardenstraat 65, B-1640, Sint-Genesius-Rode, Belgium
| | | | | | | |
Collapse
|
39
|
Abstract
The legume lectins are a large family of homologous carbohydrate binding proteins that are found mainly in the seeds of most legume plants. Despite their strong similarity on the level of their amino acid sequences and tertiary structures, their carbohydrate specificities and quaternary structures vary widely. In this review we will focus on the structural features of legume lectins and their complexes with carbohydrates. These will be discussed in the light of recent mutagenesis results when appropriate. Monosaccharide specificity seems to be achieved by the use of a conserved core of residues that hydrogen bond to the sugar, and a variable loop that determines the exact shape of the monosaccharide binding site. The higher affinity for particular oligosaccharides and monosaccharides containing a hydrophobic aglycon results mainly from a few distinct subsites next to the monosaccharide binding site. These subsites consist of a small number of variable residues and are found in both the mannose and galactose specificity groups. The quaternary structures of these proteins form the basis of a higher level of specificity, where the spacing between individual epitopes of multivalent carbohydrates becomes important. This results in homogeneous cross-linked lattices even in mixed precipitation systems, and is of relevance for their effects on the biological activities of cells such as mitogenic responses. Quaternary structure is also thought to play an important role in the high affinity interaction between some legume lectins and adenine and a series of adenine-derived plant hormones. The molecular basis of the variation in quaternary structure in this group of proteins is poorly understood.
Collapse
Affiliation(s)
- R Loris
- Laboratorium voor Ultrastruktuur, Vlaams Interuniversitair Instituut voor Biotechnologie, Vrije Universiteit Brussel, Sint-Genesius-Rode, Belgium.
| | | | | | | |
Collapse
|
40
|
Casset F, Hamelryck T, Loris R, Brisson JR, Tellier C, Dao-Thi MH, Wyns L, Poortmans F, Pérez S, Imberty A. NMR, molecular modeling, and crystallographic studies of lentil lectin-sucrose interaction. J Biol Chem 1995; 270:25619-28. [PMID: 7592736 DOI: 10.1074/jbc.270.43.25619] [Citation(s) in RCA: 62] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
The conformational features of sucrose in the combining site of lentil lectin have been characterized through elucidation of a crystalline complex at 1.9-A resolution, transferred nuclear Overhauser effect experiments performed at 600 Mhz, and molecular modeling. In the crystal, the lentil lectin dimer binds one sucrose molecule per monomer. The locations of 229 water molecules have been identified. NMR experiments have provided 11 transferred NOEs. In parallel, the docking study and conformational analysis of sucrose in the combining site of lentil lectin indicate that three different conformations can be accommodated. Of these, the orientation with lowest energy is identical with the one observed in the crystalline complex and provides good agreement with the observed transferred NOEs. These structural investigations indicate that the bound sucrose has a unique conformation for the glycosidic linkage, close to the one observed in crystalline sucrose, whereas the fructofuranose ring remains relatively flexible and does not exhibit any strong interaction with the protein. Major differences in the hydrogen bonding network of sucrose are found. None of the two inter-residue hydrogen bonds in crystalline sucrose are conserved in the complex with the lectin. Instead, a water molecule bridges hydroxyl groups O2-g and O3-f of sucrose.
Collapse
Affiliation(s)
- F Casset
- Institut National de la Recherche Agronomique, Nantes, France
| | | | | | | | | | | | | | | | | | | |
Collapse
|