1
|
Hatzakis N, Kaestel-Hansen J, de Sautu M, Saminathan A, Scanavachi G, Correia R, Nielsen AJ, Bleshoey S, Boomsma W, Kirchhausen T. Deep learning assisted single particle tracking for automated correlation between diffusion and function. Res Sq 2024:rs.3.rs-3716053. [PMID: 38352328 PMCID: PMC10862944 DOI: 10.21203/rs.3.rs-3716053/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/21/2024]
Abstract
Sub-cellular diffusion in living systems reflects cellular processes and interactions. Recent advances in optical microscopy allow the tracking of this nanoscale diffusion of individual objects with an unprecedented level of precision. However, the agnostic and automated extraction of functional information from the diffusion of molecules and organelles within the sub-cellular environment, is labor-intensive and poses a significant challenge. Here we introduce DeepSPT, a deep learning framework to interpret the diffusional 2D or 3D temporal behavior of objects in a rapid and efficient manner, agnostically. Demonstrating its versatility, we have applied DeepSPT to automated mapping of the early events of viral infections, identifying distinct types of endosomal organelles, and clathrin-coated pits and vesicles with up to 95% accuracy and within seconds instead of weeks. The fact that DeepSPT effectively extracts biological information from diffusion alone illustrates that besides structure, motion encodes function at the molecular and subcellular level.
Collapse
|
2
|
Kæstel-Hansen J, de Sautu M, Saminathan A, Scanavachi G, Da Cunha Correia RFB, Nielsen AJ, Bleshøy SV, Boomsma W, Kirchhausen T, Hatzakis NS. Deep learning assisted single particle tracking for automated correlation between diffusion and function. bioRxiv 2023:2023.11.16.567393. [PMID: 38014323 PMCID: PMC10680793 DOI: 10.1101/2023.11.16.567393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Sub-cellular diffusion in living systems reflects cellular processes and interactions. Recent advances in optical microscopy allow the tracking of this nanoscale diffusion of individual objects with an unprecedented level of precision. However, the agnostic and automated extraction of functional information from the diffusion of molecules and organelles within the sub-cellular environment, is labor-intensive and poses a significant challenge. Here we introduce DeepSPT, a deep learning framework to interpret the diffusional 2D or 3D temporal behavior of objects in a rapid and efficient manner, agnostically. Demonstrating its versatility, we have applied DeepSPT to automated mapping of the early events of viral infections, identifying distinct types of endosomal organelles, and clathrin-coated pits and vesicles with up to 95% accuracy and within seconds instead of weeks. The fact that DeepSPT effectively extracts biological information from diffusion alone indicates that besides structure, motion encodes function at the molecular and subcellular level.
Collapse
Affiliation(s)
- Jacob Kæstel-Hansen
- Department of Chemistry University of Copenhagen
- Center for 4D cellular dynamics, Department of Chemistry University of Copenhagen
- Novo Nordisk Center for Optimised Oligo Escape
- Novo Nordisk foundation Center for Protein Research
| | - Marilina de Sautu
- Biological Chemistry and Molecular Pharmaceutics Harvard Medical School
- Laboratory of Molecular Medicine Boston Children's Hospital
| | - Anand Saminathan
- Department of Cell Biology Harvard Medical School
- Department of Pediatrics Harvard Medical School
- Program in Cellular and Molecular Medicine Boston Children's Hospital
| | - Gustavo Scanavachi
- Department of Cell Biology Harvard Medical School
- Department of Pediatrics Harvard Medical School
- Program in Cellular and Molecular Medicine Boston Children's Hospital
| | - Ricardo F Bango Da Cunha Correia
- Department of Cell Biology Harvard Medical School
- Department of Pediatrics Harvard Medical School
- Program in Cellular and Molecular Medicine Boston Children's Hospital
| | - Annette Juma Nielsen
- Department of Chemistry University of Copenhagen
- Center for 4D cellular dynamics, Department of Chemistry University of Copenhagen
- Novo Nordisk Center for Optimised Oligo Escape
- Novo Nordisk foundation Center for Protein Research
| | - Sara Vogt Bleshøy
- Department of Chemistry University of Copenhagen
- Center for 4D cellular dynamics, Department of Chemistry University of Copenhagen
- Novo Nordisk Center for Optimised Oligo Escape
- Novo Nordisk foundation Center for Protein Research
| | | | - Tom Kirchhausen
- Department of Cell Biology Harvard Medical School
- Department of Pediatrics Harvard Medical School
- Program in Cellular and Molecular Medicine Boston Children's Hospital
| | - Nikos S Hatzakis
- Department of Chemistry University of Copenhagen
- Center for 4D cellular dynamics, Department of Chemistry University of Copenhagen
- Novo Nordisk Center for Optimised Oligo Escape
- Novo Nordisk foundation Center for Protein Research
| |
Collapse
|
3
|
Jacobsen NL, Bloch M, Millard PS, Ruidiaz SF, Elsborg JD, Boomsma W, Hendus‐Altenburger R, Hartmann‐Petersen R, Kragelund BB. Phosphorylation of Schizosaccharomyces pombe Dss1 mediates direct binding to the ubiquitin-ligase Dma1 in vitro. Protein Sci 2023; 32:e4733. [PMID: 37463013 PMCID: PMC10443397 DOI: 10.1002/pro.4733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2023] [Revised: 07/12/2023] [Accepted: 07/14/2023] [Indexed: 08/25/2023]
Abstract
Intrinsically disordered proteins (IDPs) are often multifunctional and frequently posttranslationally modified. Deleted in split hand/split foot 1 (Dss1-Sem1 in budding yeast) is a highly multifunctional IDP associated with a range of protein complexes. However, it remains unknown if the different functions relate to different modified states. In this work, we show that Schizosaccharomyces pombe Dss1 is a substrate for casein kinase 2 in vitro, and we identify three phosphorylated threonines in its linker region separating two known disordered ubiquitin-binding motifs. Phosphorylations of the threonines had no effect on ubiquitin-binding but caused a slight destabilization of the C-terminal α-helix and mediated a direct interaction with the forkhead-associated (FHA) domain of the RING-FHA E3-ubiquitin ligase defective in mitosis 1 (Dma1). The phosphorylation sites are not conserved and are absent in human Dss1. Sequence analyses revealed that the Txx(E/D) motif, which is important for phosphorylation and Dma1 binding, is not linked to certain branches of the evolutionary tree. Instead, we find that the motif appears randomly, supporting the mechanism of ex nihilo evolution of novel motifs. In support of this, other threonine-based motifs, although frequent, are nonconserved in the linker, pointing to additional functions connected to this region. We suggest that Dss1 acts as an adaptor protein that docks to Dma1 via the phosphorylated FHA-binding motifs, while the C-terminal α-helix is free to bind mitotic septins, thereby stabilizing the complex. The presence of Txx(D/E) motifs in the disordered regions of certain septin subunits may be of further relevance to the formation and stabilization of these complexes.
Collapse
Affiliation(s)
- Nina L. Jacobsen
- Structural Biology and NMR LaboratoryUniversity of CopenhagenCopenhagen NDenmark
- REPINUniversity of CopenhagenCopenhagen NDenmark
- The Linderstrøm Lang Centre for Protein Science, Department of BiologyUniversity of CopenhagenCopenhagen NDenmark
| | - Magnus Bloch
- Structural Biology and NMR LaboratoryUniversity of CopenhagenCopenhagen NDenmark
| | - Peter S. Millard
- REPINUniversity of CopenhagenCopenhagen NDenmark
- The Linderstrøm Lang Centre for Protein Science, Department of BiologyUniversity of CopenhagenCopenhagen NDenmark
| | - Sarah F. Ruidiaz
- Structural Biology and NMR LaboratoryUniversity of CopenhagenCopenhagen NDenmark
- REPINUniversity of CopenhagenCopenhagen NDenmark
| | - Jonas D. Elsborg
- Structural Biology and NMR LaboratoryUniversity of CopenhagenCopenhagen NDenmark
| | - Wouter Boomsma
- Department of Computer ScienceUniversity of CopenhagenCopenhagen ØDenmark
| | | | - Rasmus Hartmann‐Petersen
- REPINUniversity of CopenhagenCopenhagen NDenmark
- The Linderstrøm Lang Centre for Protein Science, Department of BiologyUniversity of CopenhagenCopenhagen NDenmark
| | - Birthe B. Kragelund
- Structural Biology and NMR LaboratoryUniversity of CopenhagenCopenhagen NDenmark
- REPINUniversity of CopenhagenCopenhagen NDenmark
- The Linderstrøm Lang Centre for Protein Science, Department of BiologyUniversity of CopenhagenCopenhagen NDenmark
| |
Collapse
|
4
|
Blaabjerg LM, Kassem MM, Good LL, Jonsson N, Cagiada M, Johansson KE, Boomsma W, Stein A, Lindorff-Larsen K. Rapid protein stability prediction using deep learning representations. eLife 2023; 12:82593. [PMID: 37184062 DOI: 10.7554/elife.82593] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2022] [Accepted: 05/12/2023] [Indexed: 05/16/2023] Open
Abstract
Predicting the thermodynamic stability of proteins is a common and widely used step in protein engineering, and when elucidating the molecular mechanisms behind evolution and disease. Here, we present RaSP, a method for making rapid and accurate predictions of changes in protein stability by leveraging deep learning representations. RaSP performs on-par with biophysics-based methods and enables saturation mutagenesis stability predictions in less than a second per residue. We use RaSP to calculate ∼ 300 million stability changes for nearly all single amino acid changes in the human proteome, and examine variants observed in the human population. We find that variants that are common in the population are substantially depleted for severe destabilization, and that there are substantial differences between benign and pathogenic variants, highlighting the role of protein stability in genetic diseases. RaSP is freely available-including via a Web interface-and enables large-scale analyses of stability in experimental and predicted protein structures.
Collapse
Affiliation(s)
- Lasse M Blaabjerg
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Maher M Kassem
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Lydia L Good
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Nicolas Jonsson
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Matteo Cagiada
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | - Wouter Boomsma
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Amelie Stein
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | |
Collapse
|
5
|
Kampmeyer C, Grønbæk-Thygesen M, Oelerich N, Tatham MH, Cagiada M, Lindorff-Larsen K, Boomsma W, Hofmann K, Hartmann-Petersen R. Lysine deserts prevent adventitious ubiquitylation of ubiquitin-proteasome components. Cell Mol Life Sci 2023; 80:143. [PMID: 37160462 PMCID: PMC10169902 DOI: 10.1007/s00018-023-04782-z] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2022] [Revised: 03/15/2023] [Accepted: 04/17/2023] [Indexed: 05/11/2023]
Abstract
In terms of its relative frequency, lysine is a common amino acid in the human proteome. However, by bioinformatics we find hundreds of proteins that contain long and evolutionarily conserved stretches completely devoid of lysine residues. These so-called lysine deserts show a high prevalence in intrinsically disordered proteins with known or predicted functions within the ubiquitin-proteasome system (UPS), including many E3 ubiquitin-protein ligases and UBL domain proteasome substrate shuttles, such as BAG6, RAD23A, UBQLN1 and UBQLN2. We show that introduction of lysine residues into the deserts leads to a striking increase in ubiquitylation of some of these proteins. In case of BAG6, we show that ubiquitylation is catalyzed by the E3 RNF126, while RAD23A is ubiquitylated by E6AP. Despite the elevated ubiquitylation, mutant RAD23A appears stable, but displays a partial loss of function phenotype in fission yeast. In case of UBQLN1 and BAG6, introducing lysine leads to a reduced abundance due to proteasomal degradation of the proteins. For UBQLN1 we show that arginine residues within the lysine depleted region are critical for its ability to form cytosolic speckles/inclusions. We propose that selective pressure to avoid lysine residues may be a common evolutionary mechanism to prevent unwarranted ubiquitylation and/or perhaps other lysine post-translational modifications. This may be particularly relevant for UPS components as they closely and frequently encounter the ubiquitylation machinery and are thus more susceptible to nonspecific ubiquitylation.
Collapse
Affiliation(s)
- Caroline Kampmeyer
- Department of Biology, The Linderstrøm-Lang Centre for Protein Science, University of Copenhagen, Copenhagen, Denmark
| | - Martin Grønbæk-Thygesen
- Department of Biology, The Linderstrøm-Lang Centre for Protein Science, University of Copenhagen, Copenhagen, Denmark
| | - Nicole Oelerich
- Institute for Genetics, University of Cologne, Cologne, Germany
| | - Michael H Tatham
- Centre for Gene Regulation and Expression, Sir James Black Centre, School of Life Sciences, University of Dundee, Dundee, UK
| | - Matteo Cagiada
- Department of Biology, The Linderstrøm-Lang Centre for Protein Science, University of Copenhagen, Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Department of Biology, The Linderstrøm-Lang Centre for Protein Science, University of Copenhagen, Copenhagen, Denmark
| | - Wouter Boomsma
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.
| | - Kay Hofmann
- Institute for Genetics, University of Cologne, Cologne, Germany.
| | - Rasmus Hartmann-Petersen
- Department of Biology, The Linderstrøm-Lang Centre for Protein Science, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
6
|
Dreier JE, Prestel A, Martins JM, Brøndum SS, Nielsen O, Garbers AE, Suga H, Boomsma W, Rogers JM, Hartmann-Petersen R, Kragelund BB. A context-dependent and disordered ubiquitin-binding motif. Cell Mol Life Sci 2022; 79:484. [PMID: 35974206 PMCID: PMC9381478 DOI: 10.1007/s00018-022-04486-w] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2022] [Revised: 07/06/2022] [Accepted: 07/14/2022] [Indexed: 02/07/2023]
Abstract
Ubiquitin is a small, globular protein that is conjugated to other proteins as a posttranslational event. A palette of small, folded domains recognizes and binds ubiquitin to translate and effectuate this posttranslational signal. Recent computational studies have suggested that protein regions can recognize ubiquitin via a process of folding upon binding. Using peptide binding arrays, bioinformatics, and NMR spectroscopy, we have uncovered a disordered ubiquitin-binding motif that likely remains disordered when bound and thus expands the palette of ubiquitin-binding proteins. We term this motif Disordered Ubiquitin-Binding Motif (DisUBM) and find it to be present in many proteins with known or predicted functions in degradation and transcription. We decompose the determinants of the motif showing it to rely on features of aromatic and negatively charged residues, and less so on distinct sequence positions in line with its disordered nature. We show that the affinity of the motif is low and moldable by the surrounding disordered chain, allowing for an enhanced interaction surface with ubiquitin, whereby the affinity increases ~ tenfold. Further affinity optimization using peptide arrays pushed the affinity into the low micromolar range, but compromised context dependence. Finally, we find that DisUBMs can emerge from unbiased screening of randomized peptide libraries, featuring in de novo cyclic peptides selected to bind ubiquitin chains. We suggest that naturally occurring DisUBMs can recognize ubiquitin as a posttranslational signal to act as affinity enhancers in IDPs that bind to folded and ubiquitylated binding partners.
Collapse
Affiliation(s)
- Jesper E Dreier
- Structural Biology and NMR Laboratory, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark.,REPIN, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark
| | - Andreas Prestel
- Structural Biology and NMR Laboratory, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark
| | - João M Martins
- Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100, Copenhagen Ø, Denmark
| | - Sebastian S Brøndum
- Structural Biology and NMR Laboratory, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark
| | - Olaf Nielsen
- Functional Genomics, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark
| | - Anna E Garbers
- Structural Biology and NMR Laboratory, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark.,REPIN, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark
| | - Hiroaki Suga
- Department of Chemistry, Graduate School of Science, The University of Tokyo, Tokyo, 113-0033, Japan
| | - Wouter Boomsma
- Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100, Copenhagen Ø, Denmark
| | - Joseph M Rogers
- Department of Drug Design and Pharmacology, University of Copenhagen, Jagtvej 160, 2100, Copenhagen Ø, Denmark
| | - Rasmus Hartmann-Petersen
- REPIN, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark. .,The Linderstrøm Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark.
| | - Birthe B Kragelund
- Structural Biology and NMR Laboratory, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark. .,REPIN, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark. .,The Linderstrøm Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200, Copenhagen N, Denmark.
| |
Collapse
|
7
|
Abstract
How we choose to represent our data has a fundamental impact on our ability to subsequently extract information from them. Machine learning promises to automatically determine efficient representations from large unstructured datasets, such as those arising in biology. However, empirical evidence suggests that seemingly minor changes to these machine learning models yield drastically different data representations that result in different biological interpretations of data. This begs the question of what even constitutes the most meaningful representation. Here, we approach this question for representations of protein sequences, which have received considerable attention in the recent literature. We explore two key contexts in which representations naturally arise: transfer learning and interpretable learning. In the first context, we demonstrate that several contemporary practices yield suboptimal performance, and in the latter we demonstrate that taking representation geometry into account significantly improves interpretability and lets the models reveal biological information that is otherwise obscured. "Representation learning plays an increasing role in protein sequence analysis. This paper seeks to clarify how to ensure that such representations are meaningful, proposing best practices both for the choice of methods and the subsequence analysis
Collapse
Affiliation(s)
| | - Søren Hauberg
- Section for Cognitive Systems, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Wouter Boomsma
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
8
|
Jimenez-Solem E, Petersen TS, Hansen C, Hansen C, Lioma C, Igel C, Boomsma W, Krause O, Lorenzen S, Selvan R, Petersen J, Nyeland ME, Ankarfeldt MZ, Virenfeldt GM, Winther-Jensen M, Linneberg A, Ghazi MM, Detlefsen N, Lauritzen AD, Smith AG, de Bruijne M, Ibragimov B, Petersen J, Lillholm M, Middleton J, Mogensen SH, Thorsen-Meyer HC, Perner A, Helleberg M, Kaas-Hansen BS, Bonde M, Bonde A, Pai A, Nielsen M, Sillesen M. Developing and validating COVID-19 adverse outcome risk prediction models from a bi-national European cohort of 5594 patients. Sci Rep 2021; 11:3246. [PMID: 33547335 PMCID: PMC7864944 DOI: 10.1038/s41598-021-81844-x] [Citation(s) in RCA: 45] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2020] [Accepted: 01/12/2021] [Indexed: 12/15/2022] Open
Abstract
Patients with severe COVID-19 have overwhelmed healthcare systems worldwide. We hypothesized that machine learning (ML) models could be used to predict risks at different stages of management and thereby provide insights into drivers and prognostic markers of disease progression and death. From a cohort of approx. 2.6 million citizens in Denmark, SARS-CoV-2 PCR tests were performed on subjects suspected for COVID-19 disease; 3944 cases had at least one positive test and were subjected to further analysis. SARS-CoV-2 positive cases from the United Kingdom Biobank was used for external validation. The ML models predicted the risk of death (Receiver Operation Characteristics—Area Under the Curve, ROC-AUC) of 0.906 at diagnosis, 0.818, at hospital admission and 0.721 at Intensive Care Unit (ICU) admission. Similar metrics were achieved for predicted risks of hospital and ICU admission and use of mechanical ventilation. Common risk factors, included age, body mass index and hypertension, although the top risk features shifted towards markers of shock and organ dysfunction in ICU patients. The external validation indicated fair predictive performance for mortality prediction, but suboptimal performance for predicting ICU admission. ML may be used to identify drivers of progression to more severe disease and for prognostication patients in patients with COVID-19. We provide access to an online risk calculator based on these findings.
Collapse
Affiliation(s)
- Espen Jimenez-Solem
- Department of Clinical Pharmacology, Copenhagen University Hospital, Bispebjerg and Frederiksberg, Copenhagen, Denmark.,Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark.,Copenhagen Phase IV Unit (Phase4CPH), Department of Clinical Pharmacology and Center for Clinical Research and Prevention, Copenhagen University Hospital, Bispebjerg and Frederiksberg, Copenhagen, Denmark
| | - Tonny S Petersen
- Department of Clinical Pharmacology, Copenhagen University Hospital, Bispebjerg and Frederiksberg, Copenhagen, Denmark.,Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark
| | - Casper Hansen
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Christian Hansen
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Christina Lioma
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Christian Igel
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Wouter Boomsma
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Oswin Krause
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Stephan Lorenzen
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Raghavendra Selvan
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Janne Petersen
- Center for Clinical Research and Prevention, Copenhagen University Hospital, Bispebjerg and Frederiksberg, Copenhagen, Denmark.,Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark.,Copenhagen Phase IV Unit (Phase4CPH), Department of Clinical Pharmacology and Center for Clinical Research and Prevention, Copenhagen University Hospital, Bispebjerg and Frederiksberg, Copenhagen, Denmark
| | - Martin Erik Nyeland
- Department of Clinical Pharmacology, Copenhagen University Hospital, Bispebjerg and Frederiksberg, Copenhagen, Denmark
| | - Mikkel Zöllner Ankarfeldt
- Center for Clinical Research and Prevention, Copenhagen University Hospital, Bispebjerg and Frederiksberg, Copenhagen, Denmark.,Copenhagen Phase IV Unit (Phase4CPH), Department of Clinical Pharmacology and Center for Clinical Research and Prevention, Copenhagen University Hospital, Bispebjerg and Frederiksberg, Copenhagen, Denmark
| | - Gert Mehl Virenfeldt
- Center for Clinical Research and Prevention, Copenhagen University Hospital, Bispebjerg and Frederiksberg, Copenhagen, Denmark
| | - Matilde Winther-Jensen
- Center for Clinical Research and Prevention, Copenhagen University Hospital, Bispebjerg and Frederiksberg, Copenhagen, Denmark
| | - Allan Linneberg
- Center for Clinical Research and Prevention, Copenhagen University Hospital, Bispebjerg and Frederiksberg, Copenhagen, Denmark
| | | | - Nicki Detlefsen
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.,DTU Compute, Denmarks Technical University, Lyngby, Denmark
| | | | | | - Marleen de Bruijne
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.,Department of Radiology and Nuclear Medicine, Erasmus MC - University Medical Center Rotterdam, Rotterdam, The Netherlands
| | - Bulat Ibragimov
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Jens Petersen
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Martin Lillholm
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Jon Middleton
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | | | | | - Anders Perner
- Department of Intensive Care Medicine, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
| | - Marie Helleberg
- Department of Infectious Diseases, Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
| | | | - Mikkel Bonde
- Center for Surgical Translational and Artificial Intelligence Research (CSTAR), Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
| | - Alexander Bonde
- Department of Surgical Gastroenterology, Copenhagen University Hospital, Rigshospitalet, Blegdamsvej 9, 2100, Copenhagen Ø, Denmark.,Center for Surgical Translational and Artificial Intelligence Research (CSTAR), Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark
| | - Akshay Pai
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark.,Cerebriu A/S, Copenhagen, Denmark
| | - Mads Nielsen
- Department of Computer Science, University of Copenhagen, Copenhagen, Denmark
| | - Martin Sillesen
- Department of Surgical Gastroenterology, Copenhagen University Hospital, Rigshospitalet, Blegdamsvej 9, 2100, Copenhagen Ø, Denmark. .,Center for Surgical Translational and Artificial Intelligence Research (CSTAR), Copenhagen University Hospital, Rigshospitalet, Copenhagen, Denmark. .,Department of Clinical Medicine, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
9
|
Seiffert P, Bugge K, Nygaard M, Haxholm GW, Martinsen JH, Pedersen MN, Arleth L, Boomsma W, Kragelund BB. Orchestration of signaling by structural disorder in class 1 cytokine receptors. Cell Commun Signal 2020; 18:132. [PMID: 32831102 PMCID: PMC7444064 DOI: 10.1186/s12964-020-00626-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2020] [Accepted: 07/08/2020] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND Class 1 cytokine receptors (C1CRs) are single-pass transmembrane proteins responsible for transmitting signals between the outside and the inside of cells. Remarkably, they orchestrate key biological processes such as proliferation, differentiation, immunity and growth through long disordered intracellular domains (ICDs), but without having intrinsic kinase activity. Despite these key roles, their characteristics remain rudimentarily understood. METHODS The current paper asks the question of why disorder has evolved to govern signaling of C1CRs by reviewing the literature in combination with new sequence and biophysical analyses of chain properties across the family. RESULTS We uncover that the C1CR-ICDs are fully disordered and brimming with SLiMs. Many of these short linear motifs (SLiMs) are overlapping, jointly signifying a complex regulation of interactions, including network rewiring by isoforms. The C1CR-ICDs have unique properties that distinguish them from most IDPs and we forward the perception that the C1CR-ICDs are far from simple strings with constitutively bound kinases. Rather, they carry both organizational and operational features left uncovered within their disorder, including mechanisms and complexities of regulatory functions. CONCLUSIONS Critically, the understanding of the fascinating ability of these long, completely disordered chains to orchestrate complex cellular signaling pathways is still in its infancy, and we urge a perceptional shift away from the current simplistic view towards uncovering their full functionalities and potential. Video abstract.
Collapse
Affiliation(s)
- Pernille Seiffert
- REPIN, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark
| | - Katrine Bugge
- REPIN, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark
| | - Mads Nygaard
- REPIN, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark
| | - Gitte W. Haxholm
- REPIN, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark
| | - Jacob H. Martinsen
- REPIN, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark
| | - Martin N. Pedersen
- Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, 2100 Copenhagen Ø, Denmark
| | - Lise Arleth
- Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, 2100 Copenhagen Ø, Denmark
| | - Wouter Boomsma
- Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100 Copenhagen Ø, Denmark
| | - Birthe B. Kragelund
- REPIN, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark
| |
Collapse
|
10
|
Kragelund BB, Prestel A, Wickmann N, Martins J, Boomsma W, Staby L, Hendus-Altenburger R, Skriver K. Context Matters in Disorder Based Protein Communication. Biophys J 2020. [DOI: 10.1016/j.bpj.2019.11.2720] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
|
11
|
Millard PS, Bugge K, Marabini R, Boomsma W, Burow M, Kragelund BB. IDDomainSpotter: Compositional bias reveals domains in long disordered protein regions-Insights from transcription factors. Protein Sci 2020; 29:169-183. [PMID: 31642121 PMCID: PMC6933863 DOI: 10.1002/pro.3754] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2019] [Revised: 10/16/2019] [Accepted: 10/16/2019] [Indexed: 12/12/2022]
Abstract
Protein domains constitute regions of distinct structural properties and molecular functions that are retained when removed from the rest of the protein. However, due to the lack of tertiary structure, the identification of domains has been largely neglected for long (>50 residues) intrinsically disordered regions. Here we present a sequence-based approach to assess and visualize domain organization in long intrinsically disordered regions based on compositional sequence biases. An online tool to find putative intrinsically disordered domains (IDDomainSpotter) in any protein sequence or sequence alignment using any particular sequence trait is available at http://www.bio.ku.dk/sbinlab/IDDomainSpotter. Using this tool, we have identified a putative domain enriched in hydrophilic and disorder-promoting residues (Pro, Ser, and Thr) and depleted in positive charges (Arg and Lys) bordering the folded DNA-binding domains of several transcription factors (p53, GCR, NAC46, MYB28, and MYB29). This domain, from two different MYB transcription factors, was characterized biophysically to determine its properties. Our analyses show the domain to be extended, dynamic and highly disordered. It connects the DNA-binding domain to other disordered domains and is present and conserved in several transcription factors from different families and domains of life. This example illustrates the potential of IDDomainSpotter to predict, from sequence alone, putative domains of functional interest in otherwise uncharacterized disordered proteins.
Collapse
Affiliation(s)
- Peter S. Millard
- DynaMo Center, Department of Plant and Environmental SciencesUniversity of CopenhagenCopenhagenDenmark
- Copenhagen Plant Science Centre, Department of Plant and Environmental SciencesUniversity of CopenhagenCopenhagenDenmark
| | - Katrine Bugge
- Structural Biology and NMR Laboratory, Department of BiologyUniversity of CopenhagenCopenhagenDenmark
| | - Riccardo Marabini
- Structural Biology and NMR Laboratory, Department of BiologyUniversity of CopenhagenCopenhagenDenmark
| | - Wouter Boomsma
- Department of Computer ScienceUniversity of CopenhagenCopenhagenDenmark
| | - Meike Burow
- DynaMo Center, Department of Plant and Environmental SciencesUniversity of CopenhagenCopenhagenDenmark
- Copenhagen Plant Science Centre, Department of Plant and Environmental SciencesUniversity of CopenhagenCopenhagenDenmark
| | - Birthe B. Kragelund
- Structural Biology and NMR Laboratory, Department of BiologyUniversity of CopenhagenCopenhagenDenmark
| |
Collapse
|
12
|
Hendus-Altenburger R, Fernandes CB, Bugge K, Kunze MBA, Boomsma W, Kragelund BB. Random coil chemical shifts for serine, threonine and tyrosine phosphorylation over a broad pH range. J Biomol NMR 2019; 73:713-725. [PMID: 31598803 PMCID: PMC6875518 DOI: 10.1007/s10858-019-00283-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2019] [Accepted: 09/30/2019] [Indexed: 05/26/2023]
Abstract
Phosphorylation is one of the main regulators of cellular signaling typically occurring in flexible parts of folded proteins and in intrinsically disordered regions. It can have distinct effects on the chemical environment as well as on the structural properties near the modification site. Secondary chemical shift analysis is the main NMR method for detection of transiently formed secondary structure in intrinsically disordered proteins (IDPs) and the reliability of the analysis depends on an appropriate choice of random coil model. Random coil chemical shifts and sequence correction factors were previously determined for an Ac-QQXQQ-NH2-peptide series with X being any of the 20 common amino acids. However, a matching dataset on the phosphorylated states has so far only been incompletely determined or determined only at a single pH value. Here we extend the database by the addition of the random coil chemical shifts of the phosphorylated states of serine, threonine and tyrosine measured over a range of pH values covering the pKas of the phosphates and at several temperatures (www.bio.ku.dk/sbinlab/randomcoil). The combined results allow for accurate random coil chemical shift determination of phosphorylated regions at any pH and temperature, minimizing systematic biases of the secondary chemical shifts. Comparison of chemical shifts using random coil sets with and without inclusion of the phosphoryl group, revealed under/over estimations of helicity of up to 33%. The expanded set of random coil values will improve the reliability in detection and quantification of transient secondary structure in phosphorylation-modified IDPs.
Collapse
Affiliation(s)
- Ruth Hendus-Altenburger
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200, Copenhagen N, Denmark
| | - Catarina B Fernandes
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200, Copenhagen N, Denmark
| | - Katrine Bugge
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200, Copenhagen N, Denmark
| | - Micha B A Kunze
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200, Copenhagen N, Denmark
| | - Wouter Boomsma
- Department of Computer Science, University of Copenhagen, Universitetsparken 1, 2100, Copenhagen Ø, Denmark
| | - Birthe B Kragelund
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, 2200, Copenhagen N, Denmark.
| |
Collapse
|
13
|
Bottaro S, Bussi G, Pinamonti G, Reißer S, Boomsma W, Lindorff-Larsen K. Barnaba: software for analysis of nucleic acid structures and trajectories. RNA 2019; 25:219-231. [PMID: 30420522 PMCID: PMC6348988 DOI: 10.1261/rna.067678.118] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Accepted: 11/06/2018] [Indexed: 06/09/2023]
Abstract
RNA molecules are highly dynamic systems characterized by a complex interplay between sequence, structure, dynamics, and function. Molecular simulations can potentially provide powerful insights into the nature of these relationships. The analysis of structures and molecular trajectories of nucleic acids can be nontrivial because it requires processing very high-dimensional data that are not easy to visualize and interpret. Here we introduce Barnaba, a Python library aimed at facilitating the analysis of nucleic acid structures and molecular simulations. The software consists of a variety of analysis tools that allow the user to (i) calculate distances between three-dimensional structures using different metrics, (ii) back-calculate experimental data from three-dimensional structures, (iii) perform cluster analysis and dimensionality reductions, (iv) search three-dimensional motifs in PDB structures and trajectories, and (v) construct elastic network models for nucleic acids and nucleic acids-protein complexes. In addition, Barnaba makes it possible to calculate torsion angles, pucker conformations, and to detect base-pairing/base-stacking interactions. Barnaba produces graphics that conveniently visualize both extended secondary structure and dynamics for a set of molecular conformations. The software is available as a command-line tool as well as a library, and supports a variety of file formats such as PDB, dcd, and xtc files. Source code, documentation, and examples are freely available at https://github.com/srnas/barnaba under GNU GPLv3 license.
Collapse
Affiliation(s)
- Sandro Bottaro
- Structural Biology and NMR Laboratory and Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen 2200, Denmark
- International School for Advanced Studies, 34136 Trieste, Italy
| | - Giovanni Bussi
- International School for Advanced Studies, 34136 Trieste, Italy
| | - Giovanni Pinamonti
- International School for Advanced Studies, 34136 Trieste, Italy
- Department of Mathematics and Computer Science, Freie Universität, 14195 Berlin, Germany
| | - Sabine Reißer
- International School for Advanced Studies, 34136 Trieste, Italy
| | - Wouter Boomsma
- Department of Computer Science, University of Copenhagen, Copenhagen 2200, Denmark
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory and Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Copenhagen 2200, Denmark
| |
Collapse
|
14
|
Wang Y, Tian P, Boomsma W, Lindorff-Larsen K. Monte Carlo Sampling of Protein Folding by Combining an All-Atom Physics-Based Model with a Native State Bias. J Phys Chem B 2018; 122:11174-11185. [DOI: 10.1021/acs.jpcb.8b06335] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Yong Wang
- Structural Biology and NMR Laboratory, Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark
| | - Pengfei Tian
- Structural Biology and NMR Laboratory, Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland 20892, United States
| | - Wouter Boomsma
- Department of Computer Science, University of Copenhagen, 2100 Copenhagen Ø, Denmark
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark
| |
Collapse
|
15
|
Kurut A, Fonseca R, Boomsma W. Driving Structural Transitions in Molecular Simulations Using the Nonequilibrium Candidate Monte Carlo. J Phys Chem B 2018; 122:1195-1204. [PMID: 29260565 DOI: 10.1021/acs.jpcb.7b11426] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Hybrid simulation procedures which combine molecular dynamics with Monte Carlo are attracting increasing attention as tools for improving the sampling efficiency in molecular simulations. In particular, encouraging results have been reported for nonequilibrium candidate protocols, in which a Monte Carlo move is applied gradually, and interleaved with a process that equilibrates the remaining degrees of freedom. Although initial studies have uncovered a substantial potential of the method, its practical applicability for sampling structural transitions in macromolecules remains incompletely understood. Here, we address this issue by systematically investigating the efficiency of the nonequilibrium candidate Monte Carlo on the sampling of rotameric distributions of two peptide systems at atomistic resolution both in vacuum and explicit solvent. The studied systems allow us to directly probe the efficiency with which a single or a few slow degrees of freedom can be driven between well-separated free-energy minima and to explore the sensitivity of the method toward the involved free parameters. In line with results on other systems, our study suggests that order-of-magnitude gains can be obtained in certain scenarios but also identifies challenges that arise when applying the procedure in explicit solvent.
Collapse
Affiliation(s)
- Anıl Kurut
- Department of Computer Science, University of Copenhagen , 2100 Copenhagen Ø, Denmark
| | - Rasmus Fonseca
- Department of Molecular and Cellular Physiology, Stanford University , Stanford, California 94305, United States
| | - Wouter Boomsma
- Department of Computer Science, University of Copenhagen , 2100 Copenhagen Ø, Denmark
| |
Collapse
|
16
|
Kassem MM, Wang Y, Boomsma W, Lindorff-Larsen K. Structure of the Bacterial Cytoskeleton Protein Bactofilin by NMR Chemical Shifts and Sequence Variation. Biophys J 2017; 110:2342-2348. [PMID: 27276252 DOI: 10.1016/j.bpj.2016.04.039] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Revised: 04/19/2016] [Accepted: 04/21/2016] [Indexed: 12/28/2022] Open
Abstract
Bactofilins constitute a recently discovered class of bacterial proteins that form cytoskeletal filaments. They share a highly conserved domain (DUF583) of which the structure remains unknown, in part due to the large size and noncrystalline nature of the filaments. Here, we describe the atomic structure of a bactofilin domain from Caulobacter crescentus. To determine the structure, we developed an approach that combines a biophysical model for proteins with recently obtained solid-state NMR spectroscopy data and amino acid contacts predicted from a detailed analysis of the evolutionary history of bactofilins. Our structure reveals a triangular β-helical (solenoid) conformation with conserved residues forming the tightly packed core and polar residues lining the surface. The repetitive structure explains the presence of internal repeats as well as strongly conserved positions, and is reminiscent of other fibrillar proteins. Our work provides a structural basis for future studies of bactofilin biology and for designing molecules that target them, as well as a starting point for determining the organization of the entire bactofilin filament. Finally, our approach presents new avenues for determining structures that are difficult to obtain by traditional means.
Collapse
Affiliation(s)
- Maher M Kassem
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Yong Wang
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Wouter Boomsma
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark.
| |
Collapse
|
17
|
Abstract
The inherent flexibility of intrinsically disordered proteins (IDPs) and multi-domain proteins with intrinsically disordered regions (IDRs) presents challenges to structural analysis. These macromolecules need to be represented by an ensemble of conformations, rather than a single structure. Small-angle X-ray scattering (SAXS) experiments capture ensemble-averaged data for the set of conformations. We present a Bayesian approach to ensemble inference from SAXS data, called Bayesian ensemble SAXS (BE-SAXS). We address two issues with existing methods: the use of a finite ensemble of structures to represent the underlying distribution, and the selection of that ensemble as a subset of an initial pool of structures. This is achieved through the formulation of a Bayesian posterior of the conformational space. BE-SAXS modifies a structural prior distribution in accordance with the experimental data. It uses multi-step expectation maximization, with alternating rounds of Markov-chain Monte Carlo simulation and empirical Bayes optimization. We demonstrate the method by employing it to obtain a conformational ensemble of the antitoxin PaaA2 and comparing the results to a published ensemble.
Collapse
Affiliation(s)
- L D Antonov
- Bioinformatics Centre, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark.
| | - S Olsson
- Laboratory of Physical Chemistry, Swiss Federal Institute of Technology, ETH-Hönggerberg, Vladimir-Prelog-Weg 2, CH-8093 Zürich, Switzerland and Institute for Research in Biomedicine, Università della Svizzera Italiana, Via Vincenzo Vela 6, CH-6500 Bellinzona, Switzerland
| | - W Boomsma
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark
| | - T Hamelryck
- Bioinformatics Centre, Department of Biology, University of Copenhagen, Ole Maaloes Vej 5, DK-2200 Copenhagen N, Denmark.
| |
Collapse
|
18
|
Boomsma W, Nielsen SV, Lindorff-Larsen K, Hartmann-Petersen R, Ellgaard L. Bioinformatics analysis identifies several intrinsically disordered human E3 ubiquitin-protein ligases. PeerJ 2016; 4:e1725. [PMID: 26966660 PMCID: PMC4782732 DOI: 10.7717/peerj.1725] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2015] [Accepted: 02/02/2016] [Indexed: 12/28/2022] Open
Abstract
The ubiquitin-proteasome system targets misfolded proteins for degradation. Since the accumulation of such proteins is potentially harmful for the cell, their prompt removal is important. E3 ubiquitin-protein ligases mediate substrate ubiquitination by bringing together the substrate with an E2 ubiquitin-conjugating enzyme, which transfers ubiquitin to the substrate. For misfolded proteins, substrate recognition is generally delegated to molecular chaperones that subsequently interact with specific E3 ligases. An important exception is San1, a yeast E3 ligase. San1 harbors extensive regions of intrinsic disorder, which provide both conformational flexibility and sites for direct recognition of misfolded targets of vastly different conformations. So far, no mammalian ortholog of San1 is known, nor is it clear whether other E3 ligases utilize disordered regions for substrate recognition. Here, we conduct a bioinformatics analysis to examine >600 human and S. cerevisiae E3 ligases to identify enzymes that are similar to San1 in terms of function and/or mechanism of substrate recognition. An initial sequence-based database search was found to detect candidates primarily based on the homology of their ordered regions, and did not capture the unique disorder patterns that encode the functional mechanism of San1. However, by searching specifically for key features of the San1 sequence, such as long regions of intrinsic disorder embedded with short stretches predicted to be suitable for substrate interaction, we identified several E3 ligases with these characteristics. Our initial analysis revealed that another remarkable trait of San1 is shared with several candidate E3 ligases: long stretches of complete lysine suppression, which in San1 limits auto-ubiquitination. We encode these characteristic features into a San1 similarity-score, and present a set of proteins that are plausible candidates as San1 counterparts in humans. In conclusion, our work indicates that San1 is not a unique case, and that several other yeast and human E3 ligases have sequence properties that may allow them to recognize substrates by a similar mechanism as San1.
Collapse
Affiliation(s)
- Wouter Boomsma
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen , Copenhagen , Denmark
| | - Sofie V Nielsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen , Copenhagen , Denmark
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen , Copenhagen , Denmark
| | - Rasmus Hartmann-Petersen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen , Copenhagen , Denmark
| | - Lars Ellgaard
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen , Copenhagen , Denmark
| |
Collapse
|
19
|
Tian P, Lindorff-Larsen K, Boomsma W, Jensen MH, Otzen DE. A Monte Carlo Study of the Early Steps of Functional Amyloid Formation. PLoS One 2016; 11:e0146096. [PMID: 26745180 PMCID: PMC4706413 DOI: 10.1371/journal.pone.0146096] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2015] [Accepted: 12/14/2015] [Indexed: 11/18/2022] Open
Abstract
In addition to their well-known roles in neurodegenerative diseases and amyloidoses, amyloid structures also assume important functional roles in the cell. Although functional amyloid shares many physiochemical properties with its pathogenic counterpart, it is evolutionarily optimized to avoid cytotoxicity. This makes it an interesting study case for aggregation phenomenon in general. One of the most well-known examples of a functional amyloid, E. coli curli, is an essential component in the formation of bacterial biofilm, and is primarily formed by aggregates of the protein CsgA. Previous studies have shown that the minor sequence variations observed in the five different subrepeats (R1-R5), which comprise the CsgA primary sequence, have a substantial influence on their individual aggregation propensities. Using a recently described diffusion-optimized enhanced sampling approach for Monte Carlo simulations, we here investigate the equilibrium properties of the monomeric and dimeric states of these subrepeats, to probe whether structural properties observed in these early stage oligomers are decisive for the characteristics of the resulting aggregate. We show that the dimerization propensities of these peptides have strong correlations with their propensity for amyloid formation, and provide structural insights into the inter- and intramolecular contacts that appear to be essential in this process.
Collapse
Affiliation(s)
- Pengfei Tian
- Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, 2100, Copenhagen, Denmark.,Linderstrøm-Lang Centre for Protein Science and Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200, Copenhagen N, Denmark
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science and Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200, Copenhagen N, Denmark
| | - Wouter Boomsma
- Linderstrøm-Lang Centre for Protein Science and Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, DK-2200, Copenhagen N, Denmark
| | - Mogens Høgh Jensen
- Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, 2100, Copenhagen, Denmark
| | - Daniel Erik Otzen
- Interdisciplinary Nanoscience Center (iNANO), Centre for Insoluble Protein Structures (inSPIN), Department of Molecular Biology, Aarhus University, Gustav Wieds Vej 10, 8000, Aarhus C, Denmark
| |
Collapse
|
20
|
Martín-García F, Papaleo E, Gomez-Puertas P, Boomsma W, Lindorff-Larsen K. Comparing molecular dynamics force fields in the essential subspace. PLoS One 2015; 10:e0121114. [PMID: 25811178 PMCID: PMC4374674 DOI: 10.1371/journal.pone.0121114] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2014] [Accepted: 02/10/2015] [Indexed: 12/11/2022] Open
Abstract
The continued development and utility of molecular dynamics simulations requires improvements in both the physical models used (force fields) and in our ability to sample the Boltzmann distribution of these models. Recent developments in both areas have made available multi-microsecond simulations of two proteins, ubiquitin and Protein G, using a number of different force fields. Although these force fields mostly share a common mathematical form, they differ in their parameters and in the philosophy by which these were derived, and previous analyses showed varying levels of agreement with experimental NMR data. To complement the comparison to experiments, we have performed a structural analysis of and comparison between these simulations, thereby providing insight into the relationship between force-field parameterization, the resulting ensemble of conformations and the agreement with experiments. In particular, our results show that, at a coarse level, many of the motional properties are preserved across several, though not all, force fields. At a finer level of detail, however, there are distinct differences in both the structure and dynamics of the two proteins, which can, together with comparison with experimental data, help to select force fields for simulations of proteins. A noteworthy observation is that force fields that have been reparameterized and improved to provide a more accurate energetic description of the balance between helical and coil structures are difficult to distinguish from their "unbalanced" counterparts in these simulations. This observation implies that simulations of stable, folded proteins, even those reaching 10 microseconds in length, may provide relatively little information that can be used to modify torsion parameters to achieve an accurate balance between different secondary structural elements.
Collapse
Affiliation(s)
- Fernando Martín-García
- Molecular Modelling Group, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), C/Nicolás Cabrera 1, Cantoblanco, Madrid, Spain
- Biomol-Informatics SL, Parque Científico de Madrid, Cantoblanco, Madrid, Spain
| | - Elena Papaleo
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Paulino Gomez-Puertas
- Molecular Modelling Group, Centro de Biología Molecular Severo Ochoa (CSIC-UAM), C/Nicolás Cabrera 1, Cantoblanco, Madrid, Spain
| | - Wouter Boomsma
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- * E-mail:
| |
Collapse
|
21
|
Tian P, Boomsma W, Wang Y, Otzen DE, Jensen MH, Lindorff-Larsen K. What does Evolution Tell us about the Structure of a Functional Amyloid Protein? Biophys J 2015. [DOI: 10.1016/j.bpj.2014.11.1253] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022] Open
|
22
|
Tian P, Boomsma W, Wang Y, Otzen DE, Jensen MH, Lindorff-Larsen K. Structure of a Functional Amyloid Protein Subunit Computed Using Sequence Variation. J Am Chem Soc 2014; 137:22-5. [DOI: 10.1021/ja5093634] [Citation(s) in RCA: 82] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Pengfei Tian
- Niels
Bohr Institute, University of Copenhagen, Blegdamsvej 17, 2100 Copenhagen, Denmark
| | - Wouter Boomsma
- Structural
Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5 DK-2200 Copenhagen N, Denmark
| | - Yong Wang
- Structural
Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5 DK-2200 Copenhagen N, Denmark
| | - Daniel E. Otzen
- Interdisciplinary
Nanoscience Center (iNANO), Centre for Insoluble Protein Structures
(inSPIN), Department of Molecular Biology and Genetics, Aarhus University, Gustav Wieds Vej 14, 8000 Aarhus C, Denmark
| | - Mogens H. Jensen
- Niels
Bohr Institute, University of Copenhagen, Blegdamsvej 17, 2100 Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Structural
Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5 DK-2200 Copenhagen N, Denmark
| |
Collapse
|
23
|
Petrlova J, Bhattacherjee A, Boomsma W, Wallin S, Lagerstedt JO, Irbäck A. Conformational and aggregation properties of the 1-93 fragment of apolipoprotein A-I. Protein Sci 2014; 23:1559-71. [PMID: 25131953 DOI: 10.1002/pro.2534] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Revised: 07/11/2014] [Accepted: 08/04/2014] [Indexed: 11/12/2022]
Abstract
Several disease-linked mutations of apolipoprotein A-I, the major protein in high-density lipoprotein (HDL), are known to be amyloidogenic, and the fibrils often contain N-terminal fragments of the protein. Here, we present a combined computational and experimental study of the fibril-associated disordered 1-93 fragment of this protein, in wild-type and mutated (G26R, S36A, K40L, W50R) forms. In atomic-level Monte Carlo simulations of the free monomer, validated by circular dichroism spectroscopy, we observe changes in the position-dependent β-strand probability induced by mutations. We find that these conformational shifts match well with the effects of these mutations in thioflavin T fluorescence and transmission electron microscopy experiments. Together, our results point to molecular mechanisms that may have a key role in disease-linked aggregation of apolipoprotein A-I.
Collapse
Affiliation(s)
- Jitka Petrlova
- Department of Experimental Medical Science, Lund University, BMC Floor C12, SE-221 84, Lund, Sweden
| | | | | | | | | | | |
Collapse
|
24
|
Olsson S, Vögeli BR, Cavalli A, Boomsma W, Ferkinghoff-Borg J, Lindorff-Larsen K, Hamelryck T. Probabilistic Determination of Native State Ensembles of Proteins. J Chem Theory Comput 2014; 10:3484-91. [DOI: 10.1021/ct5001236] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Simon Olsson
- Bioinformatics
Centre, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
- Institute for Research in Biomedicine, CH-6500 Bellinzona, Switzerland
| | - Beat Rolf Vögeli
- Laboratory
of Physical Chemistry, Eidgenössische Technische Hochschule Zürich, 8093 Zürich, Switzerland
| | - Andrea Cavalli
- Institute for Research in Biomedicine, CH-6500 Bellinzona, Switzerland
| | - Wouter Boomsma
- Structural
Biology and NMR Laboratory, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Jesper Ferkinghoff-Borg
- Cellular
Signal Integration Group, Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
| | - Kresten Lindorff-Larsen
- Structural
Biology and NMR Laboratory, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Thomas Hamelryck
- Bioinformatics
Centre, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
25
|
Abstract
A key component of computational biology is to compare the results of computer modelling with experimental measurements. Despite substantial progress in the models and algorithms used in many areas of computational biology, such comparisons sometimes reveal that the computations are not in quantitative agreement with experimental data. The principle of maximum entropy is a general procedure for constructing probability distributions in the light of new data, making it a natural tool in cases when an initial model provides results that are at odds with experiments. The number of maximum entropy applications in our field has grown steadily in recent years, in areas as diverse as sequence analysis, structural modelling, and neurobiology. In this Perspectives article, we give a broad introduction to the method, in an attempt to encourage its further adoption. The general procedure is explained in the context of a simple example, after which we proceed with a real-world application in the field of molecular simulations, where the maximum entropy procedure has recently provided new insight. Given the limited accuracy of force fields, macromolecular simulations sometimes produce results that are at not in complete and quantitative accordance with experiments. A common solution to this problem is to explicitly ensure agreement between the two by perturbing the potential energy function towards the experimental data. So far, a general consensus for how such perturbations should be implemented has been lacking. Three very recent papers have explored this problem using the maximum entropy approach, providing both new theoretical and practical insights to the problem. We highlight each of these contributions in turn and conclude with a discussion on remaining challenges.
Collapse
Affiliation(s)
- Wouter Boomsma
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- * E-mail: (WB); (JFB); (KLL)
| | - Jesper Ferkinghoff-Borg
- Cellular Signal Integration Group, Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark
- * E-mail: (WB); (JFB); (KLL)
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- * E-mail: (WB); (JFB); (KLL)
| |
Collapse
|
26
|
Tian P, Jónsson SÆ, Ferkinghoff-Borg J, Krivov SV, Lindorff-Larsen K, Irbäck A, Boomsma W. Robust Estimation of Diffusion-Optimized Ensembles for Enhanced Sampling. J Chem Theory Comput 2014; 10:543-53. [DOI: 10.1021/ct400844x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Affiliation(s)
- Pengfei Tian
- Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, 2100 Copenhagen, Denmark
| | - Sigurdur Æ. Jónsson
- Computational Biology
and Biological Physics, Department of Astronomy and Theoretical Physics, Lund University, Sölvegatan 14A, SE-223 62 Lund, Sweden
| | | | - Sergei V. Krivov
- Astbury Center for
Structural Molecular Biology, University of Leeds, Leeds LS2 9JT, United Kingdom
| | - Kresten Lindorff-Larsen
- Structural
Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5 DK-2200 Copenhagen N, Denmark
| | - Anders Irbäck
- Computational Biology
and Biological Physics, Department of Astronomy and Theoretical Physics, Lund University, Sölvegatan 14A, SE-223 62 Lund, Sweden
| | - Wouter Boomsma
- Structural
Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5 DK-2200 Copenhagen N, Denmark
| |
Collapse
|
27
|
Christensen AS, Linnet TE, Borg M, Boomsma W, Lindorff-Larsen K, Hamelryck T, Jensen JH. Protein structure validation and refinement using amide proton chemical shifts derived from quantum mechanics. PLoS One 2013; 8:e84123. [PMID: 24391900 PMCID: PMC3877219 DOI: 10.1371/journal.pone.0084123] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2013] [Accepted: 11/11/2013] [Indexed: 11/18/2022] Open
Abstract
We present the ProCS method for the rapid and accurate prediction of protein backbone amide proton chemical shifts--sensitive probes of the geometry of key hydrogen bonds that determine protein structure. ProCS is parameterized against quantum mechanical (QM) calculations and reproduces high level QM results obtained for a small protein with an RMSD of 0.25 ppm (r = 0.94). ProCS is interfaced with the PHAISTOS protein simulation program and is used to infer statistical protein ensembles that reflect experimentally measured amide proton chemical shift values. Such chemical shift-based structural refinements, starting from high-resolution X-ray structures of Protein G, ubiquitin, and SMN Tudor Domain, result in average chemical shifts, hydrogen bond geometries, and trans-hydrogen bond ((h3)J(NC')) spin-spin coupling constants that are in excellent agreement with experiment. We show that the structural sensitivity of the QM-based amide proton chemical shift predictions is needed to obtain this agreement. The ProCS method thus offers a powerful new tool for refining the structures of hydrogen bonding networks to high accuracy with many potential applications such as protein flexibility in ligand binding.
Collapse
Affiliation(s)
| | - Troels E. Linnet
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Mikael Borg
- Structural Bioinformatics Group, Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Wouter Boomsma
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Kresten Lindorff-Larsen
- Structural Biology and NMR Laboratory, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Thomas Hamelryck
- Structural Bioinformatics Group, Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jan H. Jensen
- Department of Chemistry, University of Copenhagen, Copenhagen, Denmark
| |
Collapse
|
28
|
Olsson S, Frellsen J, Boomsma W, Mardia KV, Hamelryck T. Inference of structure ensembles of flexible biomolecules from sparse, averaged data. PLoS One 2013; 8:e79439. [PMID: 24244505 PMCID: PMC3820694 DOI: 10.1371/journal.pone.0079439] [Citation(s) in RCA: 43] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2013] [Accepted: 09/24/2013] [Indexed: 11/21/2022] Open
Abstract
We present the theoretical foundations of a general principle to infer structure ensembles of flexible biomolecules from spatially and temporally averaged data obtained in biophysical experiments. The central idea is to compute the Kullback-Leibler optimal modification of a given prior distribution with respect to the experimental data and its uncertainty. This principle generalizes the successful inferential structure determination method and recently proposed maximum entropy methods. Tractability of the protocol is demonstrated through the analysis of simulated nuclear magnetic resonance spectroscopy data of a small peptide.
Collapse
Affiliation(s)
- Simon Olsson
- Bioinformatics Centre, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
- * E-mail: (SO); (TH)
| | - Jes Frellsen
- Bioinformatics Centre, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Wouter Boomsma
- Structural Biology and NMR Laboratory, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
| | - Kanti V. Mardia
- Department of Statistics, School of Mathematics, University of Leeds, Leeds, United Kingdom
| | - Thomas Hamelryck
- Bioinformatics Centre, Department of Biology, Faculty of Science, University of Copenhagen, Copenhagen, Denmark
- * E-mail: (SO); (TH)
| |
Collapse
|
29
|
Valentin JB, Andreetta C, Boomsma W, Bottaro S, Ferkinghoff-Borg J, Frellsen J, Mardia KV, Tian P, Hamelryck T. Formulation of probabilistic models of protein structure in atomic detail using the reference ratio method. Proteins 2013; 82:288-99. [PMID: 23934827 DOI: 10.1002/prot.24386] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2013] [Revised: 07/02/2013] [Accepted: 07/18/2013] [Indexed: 01/10/2023]
Abstract
We propose a method to formulate probabilistic models of protein structure in atomic detail, for a given amino acid sequence, based on Bayesian principles, while retaining a close link to physics. We start from two previously developed probabilistic models of protein structure on a local length scale, which concern the dihedral angles in main chain and side chains, respectively. Conceptually, this constitutes a probabilistic and continuous alternative to the use of discrete fragment and rotamer libraries. The local model is combined with a nonlocal model that involves a small number of energy terms according to a physical force field, and some information on the overall secondary structure content. In this initial study we focus on the formulation of the joint model and the evaluation of the use of an energy vector as a descriptor of a protein's nonlocal structure; hence, we derive the parameters of the nonlocal model from the native structure without loss of generality. The local and nonlocal models are combined using the reference ratio method, which is a well-justified probabilistic construction. For evaluation, we use the resulting joint models to predict the structure of four proteins. The results indicate that the proposed method and the probabilistic models show considerable promise for probabilistic protein structure prediction and related applications.
Collapse
Affiliation(s)
- Jan B Valentin
- The Bioinformatics Centre, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | | | | | | | | | | |
Collapse
|
30
|
Boomsma W, Frellsen J, Harder T, Bottaro S, Johansson KE, Tian P, Stovgaard K, Andreetta C, Olsson S, Valentin JB, Antonov LD, Christensen AS, Borg M, Jensen JH, Lindorff-Larsen K, Ferkinghoff-Borg J, Hamelryck T. PHAISTOS: a framework for Markov chain Monte Carlo simulation and inference of protein structure. J Comput Chem 2013; 34:1697-705. [PMID: 23619610 DOI: 10.1002/jcc.23292] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2012] [Revised: 03/14/2013] [Accepted: 03/20/2013] [Indexed: 11/10/2022]
Abstract
We present a new software framework for Markov chain Monte Carlo sampling for simulation, prediction, and inference of protein structure. The software package contains implementations of recent advances in Monte Carlo methodology, such as efficient local updates and sampling from probabilistic models of local protein structure. These models form a probabilistic alternative to the widely used fragment and rotamer libraries. Combined with an easily extendible software architecture, this makes PHAISTOS well suited for Bayesian inference of protein structure from sequence and/or experimental data. Currently, two force-fields are available within the framework: PROFASI and OPLS-AA/L, the latter including the generalized Born surface area solvent model. A flexible command-line and configuration-file interface allows users quickly to set up simulations with the desired configuration. PHAISTOS is released under the GNU General Public License v3.0. Source code and documentation are freely available from http://phaistos.sourceforge.net. The software is implemented in C++ and has been tested on Linux and OSX platforms.
Collapse
Affiliation(s)
- Wouter Boomsma
- Department of Biology, University of Copenhagen, Copenhagen, 2200, Denmark
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Harder T, Borg M, Bottaro S, Boomsma W, Olsson S, Ferkinghoff-Borg J, Hamelryck T. An Efficient Null Model for Conformational Fluctuations in Proteins. Structure 2012; 20:1028-39. [DOI: 10.1016/j.str.2012.03.020] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2011] [Revised: 03/08/2012] [Accepted: 03/12/2012] [Indexed: 10/28/2022]
|
32
|
Bottaro S, Boomsma W, E. Johansson K, Andreetta C, Hamelryck T, Ferkinghoff-Borg J. Subtle Monte Carlo Updates in Dense Molecular Systems. J Chem Theory Comput 2012; 8:695-702. [DOI: 10.1021/ct200641m] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Sandro Bottaro
- Department of Electrical Engineering, Technical University of Denmark, Kgs. Lyngby, Denmark
| | - Wouter Boomsma
- Department of Electrical Engineering, Technical University of Denmark, Kgs. Lyngby, Denmark
- Department of Astronomy and Theoretical Physics, Lund University, Lund, Sweden
| | | | | | - Thomas Hamelryck
- Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | |
Collapse
|
33
|
Harder T, Borg M, Boomsma W, Røgen P, Hamelryck T. Fast large-scale clustering of protein structures using Gauss integrals. Bioinformatics 2011; 28:510-5. [DOI: 10.1093/bioinformatics/btr692] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
34
|
Olsson S, Boomsma W, Frellsen J, Bottaro S, Harder T, Ferkinghoff-Borg J, Hamelryck T. Generative probabilistic models extend the scope of inferential structure determination. J Magn Reson 2011; 213:182-186. [PMID: 21993764 DOI: 10.1016/j.jmr.2011.08.039] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Revised: 08/19/2011] [Accepted: 08/30/2011] [Indexed: 05/31/2023]
Abstract
Conventional methods for protein structure determination from NMR data rely on the ad hoc combination of physical forcefields and experimental data, along with heuristic determination of free parameters such as weight of experimental data relative to a physical forcefield. Recently, a theoretically rigorous approach was developed which treats structure determination as a problem of Bayesian inference. In this case, the forcefields are brought in as a prior distribution in the form of a Boltzmann factor. Due to high computational cost, the approach has been only sparsely applied in practice. Here, we demonstrate that the use of generative probabilistic models instead of physical forcefields in the Bayesian formalism is not only conceptually attractive, but also improves precision and efficiency. Our results open new vistas for the use of sophisticated probabilistic models of biomolecular structure in structure determination from experimental data.
Collapse
Affiliation(s)
- Simon Olsson
- Bioinformatics Center, University of Copenhagen, Department of Biology, Ole Maaløes Vej 5, DK-2200 Copenhagen N, Denmark.
| | | | | | | | | | | | | |
Collapse
|
35
|
Hamelryck T, Borg M, Paluszewski M, Paulsen J, Frellsen J, Andreetta C, Boomsma W, Bottaro S, Ferkinghoff-Borg J. Potentials of mean force for protein structure prediction vindicated, formalized and generalized. PLoS One 2010; 5:e13714. [PMID: 21103041 PMCID: PMC2978081 DOI: 10.1371/journal.pone.0013714] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2010] [Accepted: 10/04/2010] [Indexed: 11/26/2022] Open
Abstract
Understanding protein structure is of crucial importance in science, medicine and biotechnology. For about two decades, knowledge-based potentials based on pairwise distances – so-called “potentials of mean force” (PMFs) – have been center stage in the prediction and design of protein structure and the simulation of protein folding. However, the validity, scope and limitations of these potentials are still vigorously debated and disputed, and the optimal choice of the reference state – a necessary component of these potentials – is an unsolved problem. PMFs are loosely justified by analogy to the reversible work theorem in statistical physics, or by a statistical argument based on a likelihood function. Both justifications are insightful but leave many questions unanswered. Here, we show for the first time that PMFs can be seen as approximations to quantities that do have a rigorous probabilistic justification: they naturally arise when probability distributions over different features of proteins need to be combined. We call these quantities “reference ratio distributions” deriving from the application of the “reference ratio method.” This new view is not only of theoretical relevance but leads to many insights that are of direct practical use: the reference state is uniquely defined and does not require external physical insights; the approach can be generalized beyond pairwise distances to arbitrary features of protein structure; and it becomes clear for which purposes the use of these quantities is justified. We illustrate these insights with two applications, involving the radius of gyration and hydrogen bonding. In the latter case, we also show how the reference ratio method can be iteratively applied to sculpt an energy funnel. Our results considerably increase the understanding and scope of energy functions derived from known biomolecular structures.
Collapse
Affiliation(s)
- Thomas Hamelryck
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
- * E-mail: (TH); (JFB)
| | - Mikael Borg
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Martin Paluszewski
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jonas Paulsen
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Jes Frellsen
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Christian Andreetta
- Bioinformatics Center, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | - Wouter Boomsma
- Biomedical Engineering, Technical University of Denmark (DTU) Elektro, Technical University of Denmark, Lyngby, Denmark
- Department of Chemistry, University of Cambridge, Cambridge, United Kingdom
| | - Sandro Bottaro
- Biomedical Engineering, Technical University of Denmark (DTU) Elektro, Technical University of Denmark, Lyngby, Denmark
| | - Jesper Ferkinghoff-Borg
- Biomedical Engineering, Technical University of Denmark (DTU) Elektro, Technical University of Denmark, Lyngby, Denmark
- * E-mail: (TH); (JFB)
| |
Collapse
|
36
|
Harder T, Boomsma W, Paluszewski M, Frellsen J, Johansson KE, Hamelryck T. Beyond rotamers: a generative, probabilistic model of side chains in proteins. BMC Bioinformatics 2010; 11:306. [PMID: 20525384 PMCID: PMC2902450 DOI: 10.1186/1471-2105-11-306] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2010] [Accepted: 06/05/2010] [Indexed: 11/21/2022] Open
Abstract
Background Accurately covering the conformational space of amino acid side chains is essential for important applications such as protein design, docking and high resolution structure prediction. Today, the most common way to capture this conformational space is through rotamer libraries - discrete collections of side chain conformations derived from experimentally determined protein structures. The discretization can be exploited to efficiently search the conformational space. However, discretizing this naturally continuous space comes at the cost of losing detailed information that is crucial for certain applications. For example, rigorously combining rotamers with physical force fields is associated with numerous problems. Results In this work we present BASILISK: a generative, probabilistic model of the conformational space of side chains that makes it possible to sample in continuous space. In addition, sampling can be conditional upon the protein's detailed backbone conformation, again in continuous space - without involving discretization. Conclusions A careful analysis of the model and a comparison with various rotamer libraries indicates that the model forms an excellent, fully continuous model of side chain conformational space. We also illustrate how the model can be used for rigorous, unbiased sampling with a physical force field, and how it improves side chain prediction when used as a pseudo-energy term. In conclusion, BASILISK is an important step forward on the way to a rigorous probabilistic description of protein structure in continuous space and in atomic detail.
Collapse
Affiliation(s)
- Tim Harder
- The Bioinformatics Section, Department of Biology, University of Copenhagen, Copenhagen, Denmark
| | | | | | | | | | | |
Collapse
|
37
|
Abstract
We present a heuristic approach to the DNA assignment problem based on phylogenetic inferences using constrained neighbour joining and non-parametric bootstrapping. We show that this method performs as well as the more computationally intensive full Bayesian approach in an analysis of 500 insect DNA sequences obtained from GenBank. We also analyse a previously published dataset of environmental DNA sequences from soil from New Zealand and Siberia, and use these data to illustrate the fact that statistical approaches to the DNA assignment problem allow for more appropriate criteria for determining the taxonomic level at which a particular DNA sequence can be assigned.
Collapse
Affiliation(s)
- Kasper Munch
- Department of Integrative Biology, University of California, Berkeley, CA 94720-3140, USA.
| | | | | | | |
Collapse
|
38
|
Abstract
We provide a new automated statistical method for DNA barcoding based on a Bayesian phylogenetic analysis. The method is based on automated database sequence retrieval, alignment, and phylogenetic analysis using a custom-built program for Bayesian phylogenetic analysis. We show on real data that the method outperforms Blast searches as a measure of confidence and can help eliminate 80% of all false assignment based on best Blast hit. However, the most important advance of the method is that it provides statistically meaningful measures of confidence. We apply the method to a re-analysis of previously published ancient DNA data and show that, with high statistical confidence, most of the published sequences are in fact of Neanderthal origin. However, there are several cases of chimeric sequences that are comprised of a combination of both Neanderthal and modern human DNA.
Collapse
Affiliation(s)
- Kasper Munch
- Department of Integrative Biology, University of California, Berkeley, California 94720-3140, USA
| | | | | | | | | |
Collapse
|
39
|
Willerslev E, Cappellini E, Boomsma W, Nielsen R, Hebsgaard MB, Brand TB, Hofreiter M, Bunce M, Poinar HN, Dahl-Jensen D, Johnsen S, Steffensen JP, Bennike O, Schwenninger JL, Nathan R, Armitage S, de Hoog CJ, Alfimov V, Christl M, Beer J, Muscheler R, Barker J, Sharp M, Penkman KEH, Haile J, Taberlet P, Gilbert MTP, Casoli A, Campani E, Collins MJ. Ancient biomolecules from deep ice cores reveal a forested southern Greenland. Science 2007; 317:111-4. [PMID: 17615355 PMCID: PMC2694912 DOI: 10.1126/science.1141758] [Citation(s) in RCA: 333] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
It is difficult to obtain fossil data from the 10% of Earth's terrestrial surface that is covered by thick glaciers and ice sheets, and hence, knowledge of the paleoenvironments of these regions has remained limited. We show that DNA and amino acids from buried organisms can be recovered from the basal sections of deep ice cores, enabling reconstructions of past flora and fauna. We show that high-altitude southern Greenland, currently lying below more than 2 kilometers of ice, was inhabited by a diverse array of conifer trees and insects within the past million years. The results provide direct evidence in support of a forested southern Greenland and suggest that many deep ice cores may contain genetic records of paleoenvironments in their basal sections.
Collapse
Affiliation(s)
- Eske Willerslev
- Centre for Ancient Genetics, University of Copenhagen, Denmark.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
40
|
Boomsma W, Hamelryck T. Full cyclic coordinate descent: solving the protein loop closure problem in Calpha space. BMC Bioinformatics 2005; 6:159. [PMID: 15985178 PMCID: PMC1192790 DOI: 10.1186/1471-2105-6-159] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2005] [Accepted: 06/28/2005] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Various forms of the so-called loop closure problem are crucial to protein structure prediction methods. Given an N- and a C-terminal end, the problem consists of finding a suitable segment of a certain length that bridges the ends seamlessly. In homology modelling, the problem arises in predicting loop regions. In de novo protein structure prediction, the problem is encountered when implementing local moves for Markov Chain Monte Carlo simulations. Most loop closure algorithms keep the bond angles fixed or semi-fixed, and only vary the dihedral angles. This is appropriate for a full-atom protein backbone, since the bond angles can be considered as fixed, while the (phi, psi) dihedral angles are variable. However, many de novo structure prediction methods use protein models that only consist of Calpha atoms, or otherwise do not make use of all backbone atoms. These methods require a method that alters both bond and dihedral angles, since the pseudo bond angle between three consecutive Calpha atoms also varies considerably. RESULTS Here we present a method that solves the loop closure problem for Calpha only protein models. We developed a variant of Cyclic Coordinate Descent (CCD), an inverse kinematics method from the field of robotics, which was recently applied to the loop closure problem. Since the method alters both bond and dihedral angles, which is equivalent to applying a full rotation matrix, we call our method Full CCD (FCDD). FCCD replaces CCD's vector-based optimization of a rotation around an axis with a singular value decomposition-based optimization of a general rotation matrix. The method is easy to implement and numerically stable. CONCLUSION We tested the method's performance on sets of random protein Calpha segments between 5 and 30 amino acids long, and a number of loops of length 4, 8 and 12. FCCD is fast, has a high success rate and readily generates conformations close to those of real loops. The presence of constraints on the angles only has a small effect on the performance. A reference implementation of FCCD in Python is available as supplementary information.
Collapse
Affiliation(s)
- Wouter Boomsma
- Bioinformatics center, Institute of Molecular Biology and Physiology, University of Copenhagen, Universitetsparken 15, Building 10, DK-2100 Copenhagen, Denmark
| | - Thomas Hamelryck
- Bioinformatics center, Institute of Molecular Biology and Physiology, University of Copenhagen, Universitetsparken 15, Building 10, DK-2100 Copenhagen, Denmark
| |
Collapse
|