1
|
Paloncýová M, Valério M, Dos Santos RN, Kührová P, Šrejber M, Čechová P, Dobchev DA, Balsubramani A, Banáš P, Agarwal V, Souza PCT, Otyepka M. Computational Methods for Modeling Lipid-Mediated Active Pharmaceutical Ingredient Delivery. Mol Pharm 2025; 22:1110-1141. [PMID: 39879096 PMCID: PMC11881150 DOI: 10.1021/acs.molpharmaceut.4c00744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Revised: 01/06/2025] [Accepted: 01/06/2025] [Indexed: 01/31/2025]
Abstract
Lipid-mediated delivery of active pharmaceutical ingredients (API) opened new possibilities in advanced therapies. By encapsulating an API into a lipid nanocarrier (LNC), one can safely deliver APIs not soluble in water, those with otherwise strong adverse effects, or very fragile ones such as nucleic acids. However, for the rational design of LNCs, a detailed understanding of the composition-structure-function relationships is missing. This review presents currently available computational methods for LNC investigation, screening, and design. The state-of-the-art physics-based approaches are described, with the focus on molecular dynamics simulations in all-atom and coarse-grained resolution. Their strengths and weaknesses are discussed, highlighting the aspects necessary for obtaining reliable results in the simulations. Furthermore, a machine learning, i.e., data-based learning, approach to the design of lipid-mediated API delivery is introduced. The data produced by the experimental and theoretical approaches provide valuable insights. Processing these data can help optimize the design of LNCs for better performance. In the final section of this Review, state-of-the-art of computer simulations of LNCs are reviewed, specifically addressing the compatibility of experimental and computational insights.
Collapse
Affiliation(s)
- Markéta Paloncýová
- Regional
Center of Advanced Technologies and Materials, Czech Advanced Technology and Research Institute (CATRIN), Palacký
University Olomouc, Šlechtitelů 27, 779 00 Olomouc, Czech Republic
| | - Mariana Valério
- Laboratoire
de Biologie et Modélisation de la Cellule, CNRS, UMR 5239,
Inserm, U1293, Université Claude Bernard Lyon 1, Ecole Normale
Supérieure de Lyon, 46 Allée d’Italie, 69364 Lyon, France
- Centre Blaise
Pascal de Simulation et de Modélisation Numérique, Ecole Normale Supérieure de Lyon, 46 Allée d’Italie, 69364 Lyon, France
| | | | - Petra Kührová
- Regional
Center of Advanced Technologies and Materials, Czech Advanced Technology and Research Institute (CATRIN), Palacký
University Olomouc, Šlechtitelů 27, 779 00 Olomouc, Czech Republic
| | - Martin Šrejber
- Regional
Center of Advanced Technologies and Materials, Czech Advanced Technology and Research Institute (CATRIN), Palacký
University Olomouc, Šlechtitelů 27, 779 00 Olomouc, Czech Republic
| | - Petra Čechová
- Regional
Center of Advanced Technologies and Materials, Czech Advanced Technology and Research Institute (CATRIN), Palacký
University Olomouc, Šlechtitelů 27, 779 00 Olomouc, Czech Republic
| | | | - Akshay Balsubramani
- mRNA Center
of Excellence, Sanofi, Waltham, Massachusetts 02451, United States
| | - Pavel Banáš
- Regional
Center of Advanced Technologies and Materials, Czech Advanced Technology and Research Institute (CATRIN), Palacký
University Olomouc, Šlechtitelů 27, 779 00 Olomouc, Czech Republic
| | - Vikram Agarwal
- mRNA Center
of Excellence, Sanofi, Waltham, Massachusetts 02451, United States
| | - Paulo C. T. Souza
- Laboratoire
de Biologie et Modélisation de la Cellule, CNRS, UMR 5239,
Inserm, U1293, Université Claude Bernard Lyon 1, Ecole Normale
Supérieure de Lyon, 46 Allée d’Italie, 69364 Lyon, France
- Centre Blaise
Pascal de Simulation et de Modélisation Numérique, Ecole Normale Supérieure de Lyon, 46 Allée d’Italie, 69364 Lyon, France
| | - Michal Otyepka
- Regional
Center of Advanced Technologies and Materials, Czech Advanced Technology and Research Institute (CATRIN), Palacký
University Olomouc, Šlechtitelů 27, 779 00 Olomouc, Czech Republic
- IT4Innovations,
VŠB − Technical University of Ostrava, 17. listopadu 2172/15, 708 00 Ostrava-Poruba, Czech Republic
| |
Collapse
|
2
|
Upadhyay U, Pucci F, Herold J, Schug A. NucleoSeeker-precision filtering of RNA databases to curate high-quality datasets. NAR Genom Bioinform 2025; 7:lqaf021. [PMID: 40104673 PMCID: PMC11915511 DOI: 10.1093/nargab/lqaf021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2024] [Revised: 01/28/2025] [Accepted: 02/24/2025] [Indexed: 03/20/2025] Open
Abstract
The structural prediction of biomolecules via computational methods complements the often involved wet-lab experiments. Unlike protein structure prediction, RNA structure prediction remains a significant challenge in bioinformatics, primarily due to the scarcity of annotated RNA structure data and its varying quality. Many methods have used this limited data to train deep learning models but redundancy, data leakage and bad data quality hampers their performance. In this work, we present NucleoSeeker, a tool designed to curate high-quality, tailored datasets from the Protein Data Bank (PDB) database. It is a unified framework that combines multiple tools and streamlines an otherwise complicated process of data curation. It offers multiple filters at structure, sequence, and annotation levels, giving researchers full control over data curation. Further, we present several use cases. In particular, we demonstrate how NucleoSeeker allows the creation of a nonredundant RNA structure dataset to assess AlphaFold3's performance for RNA structure prediction. This demonstrates NucleoSeeker's effectiveness in curating valuable nonredundant tailored datasets to both train novel and judge existing methods. NucleoSeeker is very easy to use, highly flexible, and can significantly increase the quality of RNA structure datasets.
Collapse
Affiliation(s)
- Utkarsh Upadhyay
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, 52428 Jülich, Germany
| | - Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Interuniversity Institute of Bioinformatics, 1050 Brussels, Belgium
| | - Julian Herold
- Scientific Computing Center, Karlsruhe Institute for Technology, 76344 Karlsruhe, Germany
| | - Alexander Schug
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, 52428 Jülich, Germany
- Department of Biology, University of Duisburg-Essen, D-45141 Essen, Germany
| |
Collapse
|
3
|
Lupo U, Sgarbossa D, Milighetti M, Bitbol AF. DiffPaSS-high-performance differentiable pairing of protein sequences using soft scores. Bioinformatics 2024; 41:btae738. [PMID: 39672677 PMCID: PMC11676329 DOI: 10.1093/bioinformatics/btae738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2024] [Revised: 12/05/2024] [Accepted: 12/11/2024] [Indexed: 12/15/2024] Open
Abstract
MOTIVATION Identifying interacting partners from two sets of protein sequences has important applications in computational biology. Interacting partners share similarities across species due to their common evolutionary history, and feature correlations in amino acid usage due to the need to maintain complementary interaction interfaces. Thus, the problem of finding interacting pairs can be formulated as searching for a pairing of sequences that maximizes a sequence similarity or a coevolution score. Several methods have been developed to address this problem, applying different approximate optimization methods to different scores. RESULTS We introduce Differentiable Pairing using Soft Scores (DiffPaSS), a differentiable framework for flexible, fast, and hyperparameter-free optimization for pairing interacting biological sequences, which can be applied to a wide variety of scores. We apply it to a benchmark prokaryotic dataset, using mutual information and neighbor graph alignment scores. DiffPaSS outperforms existing algorithms for optimizing the same scores. We demonstrate the usefulness of our paired alignments for the prediction of protein complex structure. DiffPaSS does not require sequences to be aligned, and we also apply it to nonaligned sequences from T-cell receptors. AVAILABILITY AND IMPLEMENTATION A PyTorch implementation and installable Python package are available at https://github.com/Bitbol-Lab/DiffPaSS.
Collapse
Affiliation(s)
- Umberto Lupo
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Damiano Sgarbossa
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| | - Martina Milighetti
- Division of Infection and Immunity, University College London, London WC1E 6BT, United Kingdom
- Cancer Institute, University College London, London WC1E 6DD, United Kingdom
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne CH-1015, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne CH-1015, Switzerland
| |
Collapse
|
4
|
Kansari M, Idiris F, Szurmant H, Kubař T, Schug A. Mechanism of activation and autophosphorylation of a histidine kinase. Commun Chem 2024; 7:196. [PMID: 39227740 PMCID: PMC11371814 DOI: 10.1038/s42004-024-01272-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Accepted: 08/06/2024] [Indexed: 09/05/2024] Open
Abstract
Histidine kinases (HK) are one of the main prokaryotic signaling systems. Two structurally conserved catalytic domains inside the HK enable autokinase, phosphotransfer, and phosphatase activities. Here, we focus on a detailed mechanistic understanding of the functional cycle of the WalK HK by a multi-scale simulation approach, consisting of classical as well as hybrid QM/MM molecular dynamics simulation. Strikingly, a conformational transition induced solely in DHp leads to the correct activated conformation in CA crucial for autophosphorylation. This finding explains how variable sensor domains induce the transition from inactive to active state. The subsequent autophosphorylation inside DHp proceeds via a penta-coordinated transition state to a protonated phosphohistidine intermediate. This intermediate is consequently deprotonated by a suitable nearby base. The reaction energetics are controlled by the final proton acceptor and presence of a magnesium cation. The slow rates of the process result from the high energy barrier of the conformational transition between inactive and active states. The phosphorylation step exhibits a lower barrier and down-the-hill energetics. Thus, our work suggests a detailed mechanistic model for HK autophosphorylation.
Collapse
Affiliation(s)
- Mayukh Kansari
- Institute of Physical Chemistry, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Fathia Idiris
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Hendrik Szurmant
- College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona, CA, USA
| | - Tomáš Kubař
- Institute of Physical Chemistry, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Alexander Schug
- Jülich Supercomputing Centre, Forschungszentrum Jülich, Jülich, Germany.
- Faculty of Biology, University of Duisburg/Essen, Essen, Germany.
| |
Collapse
|
5
|
Schug A. Residue coevolution and mutational landscape for OmpR and NarL: You can teach old dogs new tricks. Biophys J 2024; 123:653-654. [PMID: 38379283 PMCID: PMC10995386 DOI: 10.1016/j.bpj.2024.02.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2024] [Revised: 02/19/2024] [Accepted: 02/19/2024] [Indexed: 02/22/2024] Open
Affiliation(s)
- Alexander Schug
- Jülich Supercomputing Centre, Forschungszentrum Jülich, Jülich, Germany; Faculty of Biology, University of Duisburg-Essen, Essen, Germany.
| |
Collapse
|
6
|
Shibata M, Lin X, Onuchic JN, Yura K, Cheng RR. Residue coevolution and mutational landscape for OmpR and NarL response regulator subfamilies. Biophys J 2024; 123:681-692. [PMID: 38291753 PMCID: PMC10995415 DOI: 10.1016/j.bpj.2024.01.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 12/31/2023] [Accepted: 01/24/2024] [Indexed: 02/01/2024] Open
Abstract
DNA-binding response regulators (DBRRs) are a broad class of proteins that operate in tandem with their partner kinase proteins to form two-component signal transduction systems in bacteria. Typical DBRRs are composed of two domains where the conserved N-terminal domain accepts transduced signals and the evolutionarily diverse C-terminal domain binds to DNA. These domains are assumed to be functionally independent, and hence recombination of the two domains should yield novel DBRRs of arbitrary input/output response, which can be used as biosensors. This idea has been proved to be successful in some cases; yet, the error rate is not trivial. Improvement of the success rate of this technique requires a deeper understanding of the linker-domain and inter-domain residue interactions, which have not yet been thoroughly examined. Here, we studied residue coevolution of DBRRs of the two main subfamilies (OmpR and NarL) using large collections of bacterial amino acid sequences to extensively investigate the evolutionary signatures of linker-domain and inter-domain residue interactions. Coevolutionary analysis uncovered evolutionarily selected linker-domain and inter-domain residue interactions of known experimental structures, as well as previously unknown inter-domain residue interactions. We examined the possibility of these inter-domain residue interactions as contacts that stabilize an inactive conformation of the DBRR where DNA binding is inhibited for both subfamilies. The newly gained insights on linker-domain/inter-domain residue interactions and shared inactivation mechanisms improve the understanding of the functional mechanism of DBRRs, providing clues to efficiently create functional DBRR-based biosensors. Additionally, we show the feasibility of applying coevolutionary landscape models to predict the functionality of domain-swapped DBRR proteins. The presented result demonstrates that sequence information can be used to filter out bioengineered DBRR proteins that are predicted to be nonfunctional due to a high negative predictive value.
Collapse
Affiliation(s)
- Mayu Shibata
- Graduate School of Humanities and Sciences, Ochanomizu University, Bunkyo, Tokyo, Japan; Center for Theoretical Biological Physics, Rice University, Houston Texas
| | - Xingcheng Lin
- Department of Physics, North Carolina State University, Raleigh, North Carolina; Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston Texas; Department of Physics and Astronomy, Chemistry, and Biosciences, Rice University, Houston, Texas
| | - Kei Yura
- Graduate School of Humanities and Sciences, Ochanomizu University, Bunkyo, Tokyo, Japan; Center for Interdisciplinary AI and Data Science, Ochanomizu University, Bunkyo, Tokyo, Japan; Graduate School of Advanced Science and Engineering, Waseda University, Shinjuku, Tokyo, Japan
| | - Ryan R Cheng
- Department of Chemistry, University of Kentucky, Lexington, Kentucky.
| |
Collapse
|
7
|
Pucci F, Zerihun MB, Rooman M, Schug A. pycofitness-Evaluating the fitness landscape of RNA and protein sequences. Bioinformatics 2024; 40:btae074. [PMID: 38335928 PMCID: PMC10881095 DOI: 10.1093/bioinformatics/btae074] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 01/25/2024] [Accepted: 02/06/2024] [Indexed: 02/12/2024] Open
Abstract
MOTIVATION The accurate prediction of how mutations change biophysical properties of proteins or RNA is a major goal in computational biology with tremendous impacts on protein design and genetic variant interpretation. Evolutionary approaches such as coevolution can help solving this issue. RESULTS We present pycofitness, a standalone Python-based software package for the in silico mutagenesis of protein and RNA sequences. It is based on coevolution and, more specifically, on a popular inverse statistical approach, namely direct coupling analysis by pseudo-likelihood maximization. Its efficient implementation and user-friendly command line interface make it an easy-to-use tool even for researchers with no bioinformatics background. To illustrate its strengths, we present three applications in which pycofitness efficiently predicts the deleteriousness of genetic variants and the effect of mutations on protein fitness and thermodynamic stability. AVAILABILITY AND IMPLEMENTATION https://github.com/KIT-MBS/pycofitness.
Collapse
Affiliation(s)
- Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, 1050 Brussels, Belgium
| | - Mehari B Zerihun
- John von Neumann Institute for Computing, Jülich Supercomputer Centre, 52428 Jülich, Germany
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, 1050 Brussels, Belgium
| | - Alexander Schug
- John von Neumann Institute for Computing, Jülich Supercomputer Centre, 52428 Jülich, Germany
- Department of Biology, University of Duisburg-Essen, D-45141 Essen, Germany
| |
Collapse
|
8
|
Yan Z, Wang J. Evolution shapes interaction patterns for epistasis and specific protein binding in a two-component signaling system. Commun Chem 2024; 7:13. [PMID: 38233668 PMCID: PMC10794238 DOI: 10.1038/s42004-024-01098-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Accepted: 01/05/2024] [Indexed: 01/19/2024] Open
Abstract
The elegant design of protein sequence/structure/function relationships arises from the interaction patterns between amino acid positions. A central question is how evolutionary forces shape the interaction patterns that encode long-range epistasis and binding specificity. Here, we combined family-wide evolutionary analysis of natural homologous sequences and structure-oriented evolution simulation for two-component signaling (TCS) system. The magnitude-frequency relationship of coupling conservation between positions manifests a power-law-like distribution and the positions with highly coupling conservation are sparse but distributed intensely on the binding surfaces and hydrophobic core. The structure-specific interaction pattern involves further optimization of local frustrations at or near the binding surface to adapt the binding partner. The construction of family-wide conserved interaction patterns and structure-specific ones demonstrates that binding specificity is modulated by both direct intermolecular interactions and long-range epistasis across the binding complex. Evolution sculpts the interaction patterns via sequence variations at both family-wide and structure-specific levels for TCS system.
Collapse
Affiliation(s)
- Zhiqiang Yan
- Center for Theoretical Interdisciplinary Sciences, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang, 325001, PR China
| | - Jin Wang
- Department of Chemistry and Physics, State University of New York at Stony Brook, Stony Brook, NY, 11790, USA.
| |
Collapse
|
9
|
Taubert O, von der Lehr F, Bazarova A, Faber C, Knechtges P, Weiel M, Debus C, Coquelin D, Basermann A, Streit A, Kesselheim S, Götz M, Schug A. RNA contact prediction by data efficient deep learning. Commun Biol 2023; 6:913. [PMID: 37674020 PMCID: PMC10482910 DOI: 10.1038/s42003-023-05244-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Accepted: 08/14/2023] [Indexed: 09/08/2023] Open
Abstract
On the path to full understanding of the structure-function relationship or even design of RNA, structure prediction would offer an intriguing complement to experimental efforts. Any deep learning on RNA structure, however, is hampered by the sparsity of labeled training data. Utilizing the limited data available, we here focus on predicting spatial adjacencies ("contact maps") as a proxy for 3D structure. Our model, BARNACLE, combines the utilization of unlabeled data through self-supervised pre-training and efficient use of the sparse labeled data through an XGBoost classifier. BARNACLE shows a considerable improvement over both the established classical baseline and a deep neural network. In order to demonstrate that our approach can be applied to tasks with similar data constraints, we show that our findings generalize to the related setting of accessible surface area prediction.
Collapse
Affiliation(s)
- Oskar Taubert
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
| | - Fabrice von der Lehr
- Institute for Software Technology (SC), German Aerospace Centre (DLR), 51147, Köln, Germany
| | - Alina Bazarova
- Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428, Jülich, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Christian Faber
- Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428, Jülich, Germany
| | - Philipp Knechtges
- Institute for Software Technology (SC), German Aerospace Centre (DLR), 51147, Köln, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Marie Weiel
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Charlotte Debus
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Daniel Coquelin
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Achim Basermann
- Institute for Software Technology (SC), German Aerospace Centre (DLR), 51147, Köln, Germany
| | - Achim Streit
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany
| | - Stefan Kesselheim
- Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428, Jülich, Germany
- Helmholtz AI, 81675, Munich, Germany
| | - Markus Götz
- Steinbuch Centre for Computing (SCC), Karlsruhe Institute of Technology, 76344, Eggenstein-Leopoldshafen, Germany.
- Helmholtz AI, 81675, Munich, Germany.
| | - Alexander Schug
- Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428, Jülich, Germany.
- Faculty of Biology, University of Duisburg-Essen, 45117, Essen, Germany.
| |
Collapse
|
10
|
Rouzine IM, Rozhnova G. Evolutionary implications of SARS-CoV-2 vaccination for the future design of vaccination strategies. COMMUNICATIONS MEDICINE 2023; 3:86. [PMID: 37336956 PMCID: PMC10279745 DOI: 10.1038/s43856-023-00320-x] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2022] [Accepted: 06/07/2023] [Indexed: 06/21/2023] Open
Abstract
Once the first SARS-CoV-2 vaccine became available, mass vaccination was the main pillar of the public health response to the COVID-19 pandemic. It was very effective in reducing hospitalizations and deaths. Here, we discuss the possibility that mass vaccination might accelerate SARS-CoV-2 evolution in antibody-binding regions compared to natural infection at the population level. Using the evidence of strong genetic variation in antibody-binding regions and taking advantage of the similarity between the envelope proteins of SARS-CoV-2 and influenza, we assume that immune selection pressure acting on these regions of the two viruses is similar. We discuss the consequences of this assumption for SARS-CoV-2 evolution in light of mathematical models developed previously for influenza. We further outline the implications of this phenomenon, if our assumptions are confirmed, for the future design of SARS-CoV-2 vaccination strategies.
Collapse
Affiliation(s)
- Igor M Rouzine
- Immunogenetics, Sechenov Institute of Evolutionary Physiology and Biochemistry of Russian Academy of Sciences, Saint-Petersburg, Russia.
| | - Ganna Rozhnova
- Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Utrecht University, Utrecht, The Netherlands.
- BioISI - Biosystems & Integrative Sciences Institute, Faculdade de Ciências, Universidade de Lisboa, Lisboa, Portugal.
- Center for Complex Systems Studies (CCSS), Utrecht University, Utrecht, The Netherlands.
| |
Collapse
|
11
|
Contucci P, Osabutey G, Vernia C. Inverse problem beyond two-body interaction: The cubic mean-field Ising model. Phys Rev E 2023; 107:054124. [PMID: 37329041 DOI: 10.1103/physreve.107.054124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Accepted: 04/25/2023] [Indexed: 06/18/2023]
Abstract
In this paper, we solve the inverse problem for the cubic mean-field Ising model. Starting from configuration data generated according to the distribution of the model, we reconstruct the free parameters of the system. We test the robustness of this inversion procedure both in the region of uniqueness of the solutions and in the region where multiple thermodynamics phases are present.
Collapse
Affiliation(s)
| | - Godwin Osabutey
- Dipartimento di Matematica, Università di Bologna, Bologna 40127, Italy
| | - Cecilia Vernia
- Dipartimento di Scienze Fisiche Informatiche e Matematiche, Università di Modena e Reggio Emilia, Modena 41125, Italy
| |
Collapse
|
12
|
Huh E, Agosto MA, Wensel TG, Lichtarge O. Coevolutionary signals in metabotropic glutamate receptors capture residue contacts and long-range functional interactions. J Biol Chem 2023; 299:103030. [PMID: 36806686 PMCID: PMC10060750 DOI: 10.1016/j.jbc.2023.103030] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 02/09/2023] [Accepted: 02/10/2023] [Indexed: 02/18/2023] Open
Abstract
Upon ligand binding to a G protein-coupled receptor, extracellular signals are transmitted into a cell through sets of residue interactions that translate ligand binding into structural rearrangements. These interactions needed for functions impose evolutionary constraints so that, on occasion, mutations in one position may be compensated by other mutations at functionally coupled positions. To quantify the impact of amino acid substitutions in the context of major evolutionary divergence in the G protein-coupled receptor subfamily of metabotropic glutamate receptors (mGluRs), we combined two phylogenetic-based algorithms, Evolutionary Trace and covariation Evolutionary Trace, to infer potential structure-function couplings and roles in mGluRs. We found a subset of evolutionarily important residues at known functional sites and evidence of coupling among distinct structural clusters in mGluR. In addition, experimental mutagenesis and functional assays confirmed that some highly covariant residues are coupled, revealing their synergy. Collectively, these findings inform a critical step toward understanding the molecular and structural basis of amino acid variation patterns within mGluRs and provide insight for drug development, protein engineering, and analysis of naturally occurring variants.
Collapse
Affiliation(s)
- Eunna Huh
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Melina A Agosto
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA; Retina and Optic Nerve Research Laboratory, Department of Physiology and Biophysics, Dalhousie University, Halifax, Canada
| | - Theodore G Wensel
- Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, Texas, USA
| | - Olivier Lichtarge
- Department of Pharmacology and Chemical Biology, Baylor College of Medicine, Houston, Texas, USA; Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, USA.
| |
Collapse
|
13
|
Magi Meconi G, Sasselli IR, Bianco V, Onuchic JN, Coluzza I. Key aspects of the past 30 years of protein design. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2022; 85:086601. [PMID: 35704983 DOI: 10.1088/1361-6633/ac78ef] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 06/15/2022] [Indexed: 06/15/2023]
Abstract
Proteins are the workhorse of life. They are the building infrastructure of living systems; they are the most efficient molecular machines known, and their enzymatic activity is still unmatched in versatility by any artificial system. Perhaps proteins' most remarkable feature is their modularity. The large amount of information required to specify each protein's function is analogically encoded with an alphabet of just ∼20 letters. The protein folding problem is how to encode all such information in a sequence of 20 letters. In this review, we go through the last 30 years of research to summarize the state of the art and highlight some applications related to fundamental problems of protein evolution.
Collapse
Affiliation(s)
- Giulia Magi Meconi
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | - Ivan R Sasselli
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | | | - Jose N Onuchic
- Center for Theoretical Biological Physics, Department of Physics & Astronomy, Department of Chemistry, Department of Biosciences, Rice University, Houston, TX 77251, United States of America
| | - Ivan Coluzza
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Bld. Martina Casiano, UPV/EHU Science Park, Barrio Sarriena s/n, 48940 Leioa, Spain
- Basque Foundation for Science, Ikerbasque, 48009, Bilbao, Spain
| |
Collapse
|
14
|
Singh J, Paliwal K, Litfin T, Singh J, Zhou Y. Predicting RNA distance-based contact maps by integrated deep learning on physics-inferred secondary structure and evolutionary-derived mutational coupling. Bioinformatics 2022; 38:3900-3910. [PMID: 35751593 PMCID: PMC9364379 DOI: 10.1093/bioinformatics/btac421] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2021] [Revised: 04/30/2022] [Accepted: 06/28/2022] [Indexed: 12/24/2022] Open
Abstract
MOTIVATION Recently, AlphaFold2 achieved high experimental accuracy for the majority of proteins in Critical Assessment of Structure Prediction (CASP 14). This raises the hope that one day, we may achieve the same feat for RNA structure prediction for those structured RNAs, which is as fundamentally and practically important similar to protein structure prediction. One major factor in the recent advancement of protein structure prediction is the highly accurate prediction of distance-based contact maps of proteins. RESULTS Here, we showed that by integrated deep learning with physics-inferred secondary structures, co-evolutionary information and multiple sequence-alignment sampling, we can achieve RNA contact-map prediction at a level of accuracy similar to that in protein contact-map prediction. More importantly, highly accurate prediction for top L long-range contacts can be assured for those RNAs with a high effective number of homologous sequences (Neff > 50). The initial use of the predicted contact map as distance-based restraints confirmed its usefulness in 3D structure prediction. AVAILABILITY AND IMPLEMENTATION SPOT-RNA-2D is available as a web server at https://sparks-lab.org/server/spot-rna-2d/ and as a standalone program at https://github.com/jaswindersingh2/SPOT-RNA-2D. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Thomas Litfin
- Institute for Glycomics, Griffith University, Parklands Dr. Southport, QLD 4222, Australia
| | - Jaspreet Singh
- Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia
| | - Yaoqi Zhou
- To whom correspondence should be addressed. or or
| |
Collapse
|
15
|
Chi H, Zhou Q, Tutol JN, Phelps SM, Lee J, Kapadia P, Morcos F, Dodani SC. Coupling a Live Cell Directed Evolution Assay with Coevolutionary Landscapes to Engineer an Improved Fluorescent Rhodopsin Chloride Sensor. ACS Synth Biol 2022; 11:1627-1638. [PMID: 35389621 PMCID: PMC9184236 DOI: 10.1021/acssynbio.2c00033] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Our understanding of chloride in biology has been accelerated through the application of fluorescent protein-based sensors in living cells. These sensors can be generated and diversified to have a range of properties using laboratory-guided evolution. Recently, we established that the fluorescent proton-pumping rhodopsin wtGR from Gloeobacter violaceus can be converted into a fluorescent sensor for chloride. To unlock this non-natural function, a single point mutation at the Schiff counterion position (D121V) was introduced into wtGR fused to cyan fluorescent protein (CFP) resulting in GR1-CFP. Here, we have integrated coevolutionary analysis with directed evolution to understand how the rhodopsin sequence space can be explored and engineered to improve this starting point. We first show how evolutionary couplings are predictive of functional sites in the rhodopsin family and how a fitness metric based on a sequence can be used to quantify the known proton-pumping activities of GR-CFP variants. Then, we couple this ability to predict potential functional outcomes with a screening and selection assay in live Escherichia coli to reduce the mutational search space of five residues along the proton-pumping pathway in GR1-CFP. This iterative selection process results in GR2-CFP with four additional mutations: E132K, A84K, T125C, and V245I. Finally, bulk and single fluorescence measurements in live E. coli reveal that GR2-CFP is a reversible, ratiometric fluorescent sensor for extracellular chloride with an improved dynamic range. We anticipate that our framework will be applicable to other systems, providing a more efficient methodology to engineer fluorescent protein-based sensors with desired properties.
Collapse
|
16
|
Hayes RL, Vilseck JZ, Brooks CL. Addressing Intersite Coupling Unlocks Large Combinatorial Chemical Spaces for Alchemical Free Energy Methods. J Chem Theory Comput 2022; 18:2114-2123. [PMID: 35255214 PMCID: PMC9700482 DOI: 10.1021/acs.jctc.1c00948] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Alchemical free energy methods are playing a growing role in molecular design, both for computer-aided drug design of small molecules and for computational protein design. Multisite λ dynamics (MSλD) is a uniquely scalable alchemical free energy method that enables more efficient exploration of combinatorial alchemical spaces encountered in molecular design, but simulations have typically been limited to a few hundred ligands or sequences. Here, we focus on coupling between sites to enable scaling to larger alchemical spaces. We first discuss updates to the biasing potentials that facilitate MSλD sampling to include coupling terms and show that this can provide more thorough sampling of alchemical states. We then harness coupling between sites by developing a new free energy estimator based on the Potts models underlying direct coupling analysis, a method for predicting contacts from sequence coevolution, and find it yields more accurate free energies than previous estimators. The sampling requirements of the Potts model estimator scale with the square of the number of sites, a substantial improvement over the exponential scaling of the standard estimator. This opens up exploration of much larger alchemical spaces with MSλD for molecular design.
Collapse
Affiliation(s)
- Ryan L Hayes
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
- Biophysics Program, University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Jonah Z Vilseck
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
- Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, Indiana 46202, United States
| | - Charles L Brooks
- Department of Chemistry, University of Michigan, Ann Arbor, Michigan 48109, United States
- Biophysics Program, University of Michigan, Ann Arbor, Michigan 48109, United States
| |
Collapse
|
17
|
Voronin A, Schug A. Selection of representative structures from large biomolecular ensembles. J Chem Phys 2022; 156:144102. [DOI: 10.1063/5.0082444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Despite the incredible progress of experimental techniques, protein structure determination still remains a challenging task. Due to the rapid improvements of computer technology, simulations are often used to complement or interpret experimental data, in particular for sparse or low-resolution data. Many such in silico methods allow to obtain highly accurate models of a protein structure either de novo or via refinement of a physical model with experimental restraints. One crucial question is how to select a representative member or ensemble out of vast number of computationally generated structures. Here, we introduce such a method. As a representative task, we add co-evolutionary contact pairs as distance restraints to a physical force field and want to select a good characterization of the resulting native-like ensemble. To generate large ensembles, we run replica-exchange molecular dynamics (REMD) on five mid-sized test proteins and over a wide temperature range. High temperatures allow overcoming energetic barriers while low temperatures perform local searches of native-like conformations. The integrated bias is based on co-evolutionary contact pairs derived from a deep residual neural network to guide the simulation towards native-like conformations. We shortly compare and discuss the achieved model precision of contact-guided REMD for mid-sized proteins. Lastly, we discuss four robust ensemble-selection algorithms in great detail which are capable to extract the representative structure models with a high certainty. To assess the performance of the selection algorithms we exemplarily mimic a "blind scenario', i.e. where the target structure is unknown, and select a representative structural ensemble of native-like folds.
Collapse
Affiliation(s)
| | - Alexander Schug
- Forschungszentrum Jülich, Forschungszentrum Jülich Jülich Supercomputing Centre, Germany
| |
Collapse
|
18
|
Zerihun MB, Pucci F, Schug A. CoCoNet-boosting RNA contact prediction by convolutional neural networks. Nucleic Acids Res 2021; 49:12661-12672. [PMID: 34871451 PMCID: PMC8682773 DOI: 10.1093/nar/gkab1144] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2020] [Revised: 10/27/2021] [Accepted: 11/05/2021] [Indexed: 11/24/2022] Open
Abstract
Co-evolutionary models such as direct coupling analysis (DCA) in combination with machine learning (ML) techniques based on deep neural networks are able to predict accurate protein contact or distance maps. Such information can be used as constraints in structure prediction and massively increase prediction accuracy. Unfortunately, the same ML methods cannot readily be applied to RNA as they rely on large structural datasets only available for proteins. Here, we demonstrate how the available smaller data for RNA can be used to improve prediction of RNA contact maps. We introduce an algorithm called CoCoNet that is based on a combination of a Coevolutionary model and a shallow Convolutional Neural Network. Despite its simplicity and the small number of trained parameters, the method boosts the positive predictive value (PPV) of predicted contacts by about 70% with respect to DCA as tested by cross-validation of about eighty RNA structures. However, the direct inclusion of the CoCoNet contacts in 3D modeling tools does not result in a proportional increase of the 3D RNA structure prediction accuracy. Therefore, we suggest that the field develops, in addition to contact PPV, metrics which estimate the expected impact for 3D structure modeling tools better. CoCoNet is freely available and can be found at https://github.com/KIT-MBS/coconet.
Collapse
Affiliation(s)
- Mehari B Zerihun
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany.,Steinbuch Centre for Computing, Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany
| | - Fabrizio Pucci
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany.,Computational Biology and Bioinformatics, Université Libre de Bruxelles 1050, Brussels, Belgium
| | - Alexander Schug
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52428 Jülich, Germany.,Faculty of Biology, University of Duisburg-Essen, 45117 Essen, Germany
| |
Collapse
|
19
|
Pozzati G, Zhu W, Bassot C, Lamb J, Kundrotas P, Elofsson A. Limits and potential of combined folding and docking. Bioinformatics 2021; 38:954-961. [PMID: 34788800 PMCID: PMC8796369 DOI: 10.1093/bioinformatics/btab760] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 09/23/2021] [Accepted: 11/02/2021] [Indexed: 02/03/2023] Open
Abstract
MOTIVATION In the last decade, de novo protein structure prediction accuracy for individual proteins has improved significantly by utilising deep learning (DL) methods for harvesting the co-evolution information from large multiple sequence alignments (MSAs). The same approach can, in principle, also be used to extract information about evolutionary-based contacts across protein-protein interfaces. However, most earlier studies have not used the latest DL methods for inter-chain contact distance prediction. This article introduces a fold-and-dock method based on predicted residue-residue distances with trRosetta. RESULTS The method can simultaneously predict the tertiary and quaternary structure of a protein pair, even when the structures of the monomers are not known. The straightforward application of this method to a standard dataset for protein-protein docking yielded limited success. However, using alternative methods for generating MSAs allowed us to dock accurately significantly more proteins. We also introduced a novel scoring function, PconsDock, that accurately separates 98% of correctly and incorrectly folded and docked proteins. The average performance of the method is comparable to the use of traditional, template-based or ab initio shape-complementarity-only docking methods. Moreover, the results of conventional and fold-and-dock approaches are complementary, and thus a combined docking pipeline could increase overall docking success significantly. This methodology contributed to the best model for one of the CASP14 oligomeric targets, H1065. AVAILABILITY AND IMPLEMENTATION All scripts for predictions and analysis are available from https://github.com/ElofssonLab/bioinfo-toolbox/ and https://gitlab.com/ElofssonLab/benchmark5/. All models joined alignments, and evaluation results are available from the following figshare repository https://doi.org/10.6084/m9.figshare.14654886.v2. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | | | - John Lamb
- Science for Life Laboratory and Department of Biochemistry and Biophysics, Stockholm University, 171 21 Solna, Sweden
| | - Petras Kundrotas
- Science for Life Laboratory and Department of Biochemistry and Biophysics, Stockholm University, 171 21 Solna, Sweden,Center for Computational Biology, The University of Kansas, Lawrence, KS 66047, USA
| | | |
Collapse
|
20
|
Mehrabiani KM, Cheng RR, Onuchic JN. Expanding Direct Coupling Analysis to Identify Heterodimeric Interfaces from Limited Protein Sequence Data. J Phys Chem B 2021; 125:11408-11417. [PMID: 34618469 DOI: 10.1021/acs.jpcb.1c07145] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Direct coupling analysis (DCA) is a global statistical approach that uses information encoded in protein sequence data to predict spatial contacts in a three-dimensional structure of a folded protein. DCA has been widely used to predict the monomeric fold at amino acid resolution and to identify biologically relevant interaction sites within a folded protein. Going beyond single proteins, DCA has also been used to identify spatial contacts that stabilize the interaction in protein complex formation. However, extracting this higher order information necessary to predict dimer contacts presents a significant challenge. A DCA evolutionary signal is much stronger at the single protein level (intraprotein contacts) than at the protein-protein interface (interprotein contacts). Therefore, if DCA-derived information is to be used to predict the structure of these complexes, there is a need to identify statistically significant DCA predictions. We propose a simple Z-score measure that can filter good predictions despite noisy, limited data. This new methodology not only improves our prediction ability but also provides a quantitative measure for the validity of the prediction.
Collapse
Affiliation(s)
- Kareem M Mehrabiani
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Systems, Synthetic, and Physical Biology, Rice University, Houston, Texas 77005, United States
| | - Ryan R Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Systems, Synthetic, and Physical Biology, Rice University, Houston, Texas 77005, United States.,Department of Physics & Astronomy, Rice University, Houston, Texas 77005, United States.,Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Department of Biosciences, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
21
|
Laine E, Eismann S, Elofsson A, Grudinin S. Protein sequence-to-structure learning: Is this the end(-to-end revolution)? Proteins 2021; 89:1770-1786. [PMID: 34519095 DOI: 10.1002/prot.26235] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/16/2021] [Accepted: 09/03/2021] [Indexed: 01/08/2023]
Abstract
The potential of deep learning has been recognized in the protein structure prediction community for some time, and became indisputable after CASP13. In CASP14, deep learning has boosted the field to unanticipated levels reaching near-experimental accuracy. This success comes from advances transferred from other machine learning areas, as well as methods specifically designed to deal with protein sequences and structures, and their abstractions. Novel emerging approaches include (i) geometric learning, that is, learning on representations such as graphs, three-dimensional (3D) Voronoi tessellations, and point clouds; (ii) pretrained protein language models leveraging attention; (iii) equivariant architectures preserving the symmetry of 3D space; (iv) use of large meta-genome databases; (v) combinations of protein representations; and (vi) finally truly end-to-end architectures, that is, differentiable models starting from a sequence and returning a 3D structure. Here, we provide an overview and our opinion of the novel deep learning approaches developed in the last 2 years and widely used in CASP14.
Collapse
Affiliation(s)
- Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France
| | - Stephan Eismann
- Department of Computer Science and Applied Physics, Stanford University, Stanford, California, USA
| | - Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - Sergei Grudinin
- Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, Grenoble, France
| |
Collapse
|
22
|
Elofsson A. Toward Characterising the Cellular 3D-Proteome. FRONTIERS IN BIOINFORMATICS 2021; 1:598878. [PMID: 36353353 PMCID: PMC9638702 DOI: 10.3389/fbinf.2021.598878] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 01/27/2021] [Indexed: 11/30/2022] Open
|
23
|
ELIHKSIR Web Server: Evolutionary Links Inferred for Histidine Kinase Sensors Interacting with Response Regulators. ENTROPY 2021; 23:e23020170. [PMID: 33573110 PMCID: PMC7911359 DOI: 10.3390/e23020170] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/17/2020] [Revised: 01/21/2021] [Accepted: 01/26/2021] [Indexed: 12/03/2022]
Abstract
Two-component systems (TCS) are signaling machinery that consist of a histidine kinases (HK) and response regulator (RR). When an environmental change is detected, the HK phosphorylates its cognate response regulator (RR). While cognate interactions were considered orthogonal, experimental evidence shows the prevalence of crosstalk interactions between non-cognate HK–RR pairs. Currently, crosstalk interactions have been demonstrated for TCS proteins in a limited number of organisms. By providing specificity predictions across entire TCS networks for a large variety of organisms, the ELIHKSIR web server assists users in identifying interactions for TCS proteins and their mutants. To generate specificity scores, a global probabilistic model was used to identify interfacial couplings and local fields from sequence information. These couplings and local fields were then used to construct Hamiltonian scores for positions with encoded specificity, resulting in the specificity score. These methods were applied to 6676 organisms available on the ELIHKSIR web server. Due to the ability to mutate proteins and display the resulting network changes, there are nearly endless combinations of TCS networks to analyze using ELIHKSIR. The functionality of ELIHKSIR allows users to perform a variety of TCS network analyses and visualizations to support TCS research efforts.
Collapse
|
24
|
Voronin A, Weiel M, Schug A. Including residual contact information into replica-exchange MD simulations significantly enriches native-like conformations. PLoS One 2020; 15:e0242072. [PMID: 33196676 PMCID: PMC7668583 DOI: 10.1371/journal.pone.0242072] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 10/27/2020] [Indexed: 11/19/2022] Open
Abstract
Proteins are complex biomolecules which perform critical tasks in living organisms. Knowledge of a protein's structure is essential for understanding its physiological function in detail. Despite the incredible progress in experimental techniques, protein structure determination is still expensive, time-consuming, and arduous. That is why computer simulations are often used to complement or interpret experimental data. Here, we explore how in silico protein structure determination based on replica-exchange molecular dynamics (REMD) can benefit from including contact information derived from theoretical and experimental sources, such as direct coupling analysis or NMR spectroscopy. To reflect the influence from erroneous and noisy data we probe how false-positive contacts influence the simulated ensemble. Specifically, we integrate varying numbers of randomly selected native and non-native contacts and explore how such a bias can guide simulations towards the native state. We investigate the number of contacts needed for a significant enrichment of native-like conformations and show the capabilities and limitations of this method. Adhering to a threshold of approximately 75% true-positive contacts within a simulation, we obtain an ensemble with native-like conformations of high quality. We find that contact-guided REMD is capable of delivering physically reasonable models of a protein's structure.
Collapse
Affiliation(s)
- Arthur Voronin
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
- Department of Physics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Marie Weiel
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
- Department of Physics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Alexander Schug
- Institute for Advanced Simulation, Jülich Supercomputing Center, Jülich, Germany
- Faculty of Biology, University of Duisburg-Essen, Duisburg, Germany
| |
Collapse
|
25
|
Tian P, Best RB. Exploring the sequence fitness landscape of a bridge between protein folds. PLoS Comput Biol 2020; 16:e1008285. [PMID: 33048928 PMCID: PMC7553338 DOI: 10.1371/journal.pcbi.1008285] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2020] [Accepted: 08/24/2020] [Indexed: 12/15/2022] Open
Abstract
Most foldable protein sequences adopt only a single native fold. Recent protein design studies have, however, created protein sequences which fold into different structures apon changes of environment, or single point mutation, the best characterized example being the switch between the folds of the GA and GB binding domains of streptococcal protein G. To obtain further insight into the design of sequences which can switch folds, we have used a computational model for the fitness landscape of a single fold, built from the observed sequence variation of protein homologues. We have recently shown that such coevolutionary models can be used to design novel foldable sequences. By appropriately combining two of these models to describe the joint fitness landscape of GA and GB, we are able to describe the propensity of a given sequence for each of the two folds. We have successfully tested the combined model against the known series of designed GA/GB hybrids. Using Monte Carlo simulations on this landscape, we are able to identify pathways of mutations connecting the two folds. In the absence of a requirement for domain stability, the most frequent paths go via sequences in which neither domain is stably folded, reminiscent of the propensity for certain intrinsically disordered proteins to fold into different structures according to context. Even if the folded state is required to be stable, we find that there is nonetheless still a wide range of sequences which are close to the transition region and therefore likely fold switches, consistent with recent estimates that fold switching may be more widespread than had been thought.
Collapse
Affiliation(s)
- Pengfei Tian
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, U.S.A
| | - Robert B. Best
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, U.S.A
| |
Collapse
|
26
|
Dokholyan NV. Experimentally-driven protein structure modeling. J Proteomics 2020; 220:103777. [PMID: 32268219 PMCID: PMC7214187 DOI: 10.1016/j.jprot.2020.103777] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 03/17/2020] [Accepted: 04/02/2020] [Indexed: 11/25/2022]
Abstract
Revolutions in natural and exact sciences started at the dawn of last century have led to the explosion of theoretical, experimental, and computational approaches to determine structures of molecules, complexes, as well as their rich conformational dynamics. Since different experimental methods produce information that is attributed to specific time and length scales, corresponding computational methods have to be tailored to these scales and experiments. These methods can be then combined and integrated in scales, hence producing a fuller picture of molecular structure and motion from the "puzzle pieces" offered by various experiments. Here, we describe a number of computational approaches to utilize experimental data to glance into structure of proteins and understand their dynamics. We will also discuss the limitations and the resolution of the constraints-based modeling approaches. SIGNIFICANCE: Experimentally-driven computational structure modeling and determination is a rapidly evolving alternative to traditional approaches for molecular structure determination. These new hybrid experimental-computational approaches are proving to be a powerful microscope to glance into the structural features of intrinsically or partially disordered proteins, dynamics of molecules and complexes. In this review, we describe various approaches in the field of experimentally-driven computational structure modeling.
Collapse
Affiliation(s)
- Nikolay V Dokholyan
- Department of Pharmacology, Penn State University College of Medicine, Hershey, PA 17033, USA; Department of Biochemistry & Molecular Biology, Penn State College of Medicine, Hershey, PA 17033, USA.; Department of Chemistry, Pennsylvania State University, University Park, PA 16802, USA.; Department of Biomedical Engineering, Pennsylvania State University, University Park, PA 16802, USA.
| |
Collapse
|
27
|
Andreani J, Quignot C, Guerois R. Structural prediction of protein interactions and docking using conservation and coevolution. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1470] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Jessica Andreani
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| | - Chloé Quignot
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| | - Raphael Guerois
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| |
Collapse
|
28
|
Epistatic contributions promote the unification of incompatible models of neutral molecular evolution. Proc Natl Acad Sci U S A 2020; 117:5873-5882. [PMID: 32123092 PMCID: PMC7084075 DOI: 10.1073/pnas.1913071117] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Mathematical models of evolution help us understand mechanisms driving protein-sequence change. Previous models recapitulate a disjoint subset of statistical features of natural sequences. We present a neutral evolution model that unifies features including extreme variance of the molecular clock’s tick rate and the observation of an evolutionary Stokes shift, an irreversible effect of mutations in the fitness landscape during sequence evolution. We show that interactions between amino acid sites, which inform our fitness metric, are required to observe these features. These interactions are inferred by using direct coupling analysis, which has been successfully utilized to predict protein structures, dynamics, and complexes from coevolutionary information. We anticipate our model will have applications in phylogenetics, ancestral reconstruction of sequences, and protein design. We introduce a model of amino acid sequence evolution that accounts for the statistical behavior of real sequences induced by epistatic interactions. We base the model dynamics on parameters derived from multiple sequence alignments analyzed by using direct coupling analysis methodology. Known statistical properties such as overdispersion, heterotachy, and gamma-distributed rate-across-sites are shown to be emergent properties of this model while being consistent with neutral evolution theory, thereby unifying observations from previously disjointed evolutionary models of sequences. The relationship between site restriction and heterotachy is characterized by tracking the effective alphabet dynamics of sites. We also observe an evolutionary Stokes shift in the fitness of sequences that have undergone evolution under our simulation. By analyzing the structural information of some proteins, we corroborate that the strongest Stokes shifts derive from sites that physically interact in networks near biochemically important regions. Perspectives on the implementation of our model in the context of the molecular clock are discussed.
Collapse
|
29
|
Nerattini F, Figliuzzi M, Cardelli C, Tubiana L, Bianco V, Dellago C, Coluzza I. Identification of Protein Functional Regions. Chemphyschem 2020; 21:335-347. [PMID: 31944517 DOI: 10.1002/cphc.201900898] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 11/01/2019] [Indexed: 11/12/2022]
Abstract
Protein sequence stores the information relative to both functionality and stability, thus making it difficult to disentangle the two contributions. However, the identification of critical residues for function and stability has important implications for the mapping of the proteome interactions, as well as for many pharmaceutical applications, e. g. the identification of ligand binding regions for targeted pharmaceutical protein design. In this work, we propose a computational method to identify critical residues for protein functionality and stability and to further categorise them in strictly functional, structural and intermediate. We evaluate single site conservation and use Direct Coupling Analysis (DCA) to identify co-evolved residues both in natural and artificial evolution processes. We reproduce artificial evolution using protein design and base our approach on the hypothesis that artificial evolution in the absence of any functional constraint would exclusively lead to site conservation and co-evolution events of the structural type. Conversely, natural evolution intrinsically embeds both functional and structural information. By comparing the lists of conserved and co-evolved residues, outcomes of the analysis on natural and artificial evolution, we identify the functional residues without the need of any a priori knowledge of the biological role of the analysed protein.
Collapse
Affiliation(s)
- Francesca Nerattini
- Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090, Vienna, Austria
| | - Matteo Figliuzzi
- Sorbonne Universites, UPMC, Institut de Biologie Paris-Seine, CNRS, Laboratoire de Biologie Computationnelle et Quantitative UMR, 7238, Paris, France
| | - Chiara Cardelli
- Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090, Vienna, Austria
| | - Luca Tubiana
- Physics Department, Universitá degli studi di Trento, via Sommarive 14, 38123, Trento, IT
| | - Valentino Bianco
- Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090, Vienna, Austria.,Faculty of Chemistry, Chemical Physics Department, Universidad Complutense de Madrid, Plaza de las Ciencias, Ciudad Universitaria, Madrid, 28040, Spain
| | - Christoph Dellago
- Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090, Vienna, Austria
| | - Ivan Coluzza
- CIC biomaGUNE, Paseo Miramon 182, 20014 San Sebastian, Spain, and IKERBASQUE, Basque Foundation for Science, 48013, Bilbao, Spain
| |
Collapse
|
30
|
Sala D, Cerofolini L, Fragai M, Giachetti A, Luchinat C, Rosato A. A protocol to automatically calculate homo-oligomeric protein structures through the integration of evolutionary constraints and NMR ambiguous contacts. Comput Struct Biotechnol J 2019; 18:114-124. [PMID: 31969972 PMCID: PMC6961069 DOI: 10.1016/j.csbj.2019.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 11/20/2019] [Accepted: 12/06/2019] [Indexed: 12/15/2022] Open
Abstract
Protein assemblies are involved in many important biological processes. Solid-state NMR (SSNMR) spectroscopy is a technique suitable for the structural characterization of samples with high molecular weight and thus can be applied to such assemblies. A significant bottleneck in terms of both effort and time required is the manual identification of unambiguous intermolecular contacts. This is particularly challenging for homo-oligomeric complexes, where simple uniform labeling may not be effective. We tackled this challenge by exploiting coevolution analysis to extract information on homo-oligomeric interfaces from NMR-derived ambiguous contacts. After removing the evolutionary couplings (ECs) that are already satisfied by the 3D structure of the monomer, the predicted ECs are matched with the automatically generated list of experimental contacts. This approach provides a selection of potential interface residues that is used directly in monomer-monomer docking calculations. We validated the protocol on tetrameric L-asparaginase II and dimeric Sod1.
Collapse
Affiliation(s)
- Davide Sala
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Linda Cerofolini
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Marco Fragai
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Andrea Giachetti
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Claudio Luchinat
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Antonio Rosato
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| |
Collapse
|
31
|
Taubert O, Reinartz I, Meyerhenke H, Schug A. diSTruct v1.0: generating biomolecular structures from distance constraints. Bioinformatics 2019; 35:5337-5338. [PMID: 31329240 DOI: 10.1093/bioinformatics/btz578] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2019] [Revised: 07/10/2019] [Accepted: 07/17/2019] [Indexed: 11/13/2022] Open
Abstract
SUMMARY The distance geometry problem is often encountered in molecular biology and the life sciences at large, as a host of experimental methods produce ambiguous and noisy distance data. In this note, we present diSTruct; an adaptation of the generic MaxEnt-Stress graph drawing algorithm to the domain of biological macromolecules. diSTruct is fast, provides reliable structural models even from incomplete or noisy distance data and integrates access to graph analysis tools. AVAILABILITY AND IMPLEMENTATION diSTruct is written in C++, Cython and Python 3. It is available from https://github.com/KIT-MBS/distruct.git or in the Python package index under the MIT license. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Oskar Taubert
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology (KIT), Eggenstein-Leopoldshafen, Germany
| | - Ines Reinartz
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology (KIT), Eggenstein-Leopoldshafen, Germany
| | - Henning Meyerhenke
- Department of Computer Science, Humbold-Universität zu Berlin, Berlin, Germany
| | - Alexander Schug
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, Jülich 52425, Germany
| |
Collapse
|
32
|
Zerihun MB, Pucci F, Peter EK, Schug A. pydca v1.0: a comprehensive software for direct coupling analysis of RNA and protein sequences. Bioinformatics 2019; 36:2264-2265. [DOI: 10.1093/bioinformatics/btz892] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 10/14/2019] [Accepted: 11/26/2019] [Indexed: 12/20/2022] Open
Abstract
Abstract
Motivation
The ongoing advances in sequencing technologies have provided a massive increase in the availability of sequence data. This made it possible to study the patterns of correlated substitution between residues in families of homologous proteins or RNAs and to retrieve structural and stability information. Direct coupling analysis (DCA) infers coevolutionary couplings between pairs of residues indicating their spatial proximity, making such information a valuable input for subsequent structure prediction.
Results
Here, we present pydca, a standalone Python-based software package for the DCA of protein- and RNA-homologous families. It is based on two popular inverse statistical approaches, namely, the mean-field and the pseudo-likelihood maximization and is equipped with a series of functionalities that range from multiple sequence alignment trimming to contact map visualization. Thanks to its efficient implementation, features and user-friendly command line interface, pydca is a modular and easy-to-use tool that can be used by researchers with a wide range of backgrounds.
Availability and implementation
pydca can be obtained from https://github.com/KIT-MBS/pydca or from the Python Package Index under the MIT License.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mehari B Zerihun
- Steinbuch Centre for Computing, Eggenstein-Leopoldshafen 76344
- Department of Physics, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen 76344
| | - Fabrizio Pucci
- John von Neumann Institute for Computing, Jülich Supercomputer Centre, Forschungszentrum Jülich, Jülich 52428, Germany
| | - Emanuel K Peter
- John von Neumann Institute for Computing, Jülich Supercomputer Centre, Forschungszentrum Jülich, Jülich 52428, Germany
| | - Alexander Schug
- John von Neumann Institute for Computing, Jülich Supercomputer Centre, Forschungszentrum Jülich, Jülich 52428, Germany
| |
Collapse
|
33
|
Peng M, Maier M, Esch J, Schug A, Rabe KS. Direct coupling analysis improves the identification of beneficial amino acid mutations for the functional thermostabilization of a delicate decarboxylase. Biol Chem 2019; 400:1519-1527. [PMID: 31472057 DOI: 10.1515/hsz-2019-0156] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2019] [Accepted: 08/09/2019] [Indexed: 12/26/2022]
Abstract
The optimization of enzyme properties for specific reaction conditions enables their tailored use in biotechnology. Predictions using established computer-based methods, however, remain challenging, especially regarding physical parameters such as thermostability without concurrent loss of activity. Employing established computational methods such as energy calculations using FoldX can lead to the identification of beneficial single amino acid substitutions for the thermostabilization of enzymes. However, these methods require a three-dimensional (3D)-structure of the enzyme. In contrast, coevolutionary analysis is a computational method, which is solely based on sequence data. To enable a comparison, we employed coevolutionary analysis together with structure-based approaches to identify mutations, which stabilize an enzyme while retaining its activity. As an example, we used the delicate dimeric, thiamine pyrophosphate dependent enzyme ketoisovalerate decarboxylase (Kivd) and experimentally determined its stability represented by a T50 value indicating the temperature where 50% of enzymatic activity remained after incubation for 10 min. Coevolutionary analysis suggested 12 beneficial mutations, which were not identified by previously established methods, out of which four mutations led to a functional Kivd with an increased T50 value of up to 3.9°C.
Collapse
Affiliation(s)
- Martin Peng
- Institute for Biological Interfaces I, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, D-76344 Eggenstein-Leopoldshafen, Germany
| | - Manfred Maier
- Institute for Biological Interfaces I, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, D-76344 Eggenstein-Leopoldshafen, Germany
| | - Jan Esch
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, D-76344 Eggenstein-Leopoldshafen, Germany
| | - Alexander Schug
- Institute for Advanced Simulation, Jülich Supercomputing Center, D-52428 Jülich, Germany
| | - Kersten S Rabe
- Institute for Biological Interfaces I, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, D-76344 Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
34
|
Croce G, Gueudré T, Ruiz Cuevas MV, Keidel V, Figliuzzi M, Szurmant H, Weigt M. A multi-scale coevolutionary approach to predict interactions between protein domains. PLoS Comput Biol 2019; 15:e1006891. [PMID: 31634362 PMCID: PMC6822775 DOI: 10.1371/journal.pcbi.1006891] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2019] [Revised: 10/31/2019] [Accepted: 09/27/2019] [Indexed: 11/18/2022] Open
Abstract
Interacting proteins and protein domains coevolve on multiple scales, from their correlated presence across species, to correlations in amino-acid usage. Genomic databases provide rapidly growing data for variability in genomic protein content and in protein sequences, calling for computational predictions of unknown interactions. We first introduce the concept of direct phyletic couplings, based on global statistical models of phylogenetic profiles. They strongly increase the accuracy of predicting pairs of related protein domains beyond simpler correlation-based approaches like phylogenetic profiling (80% vs. 30-50% positives out of the 1000 highest-scoring pairs). Combined with the direct coupling analysis of inter-protein residue-residue coevolution, we provide multi-scale evidence for direct but unknown interaction between protein families. An in-depth discussion shows these to be biologically sensible and directly experimentally testable. Negative phyletic couplings highlight alternative solutions for the same functionality, including documented cases of convergent evolution. Thereby our work proves the strong potential of global statistical modeling approaches to genome-wide coevolutionary analysis, far beyond the established use for individual protein complexes and domain-domain interactions.
Collapse
Affiliation(s)
- Giancarlo Croce
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | | | - Maria Virginia Ruiz Cuevas
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | - Victoria Keidel
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona CA, United States of America
| | - Matteo Figliuzzi
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| | - Hendrik Szurmant
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona CA, United States of America
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie computationnelle et quantitative–LCQB, Paris, France
| |
Collapse
|
35
|
Marchi J, Galpern EA, Espada R, Ferreiro DU, Walczak AM, Mora T. Size and structure of the sequence space of repeat proteins. PLoS Comput Biol 2019; 15:e1007282. [PMID: 31415557 PMCID: PMC6733475 DOI: 10.1371/journal.pcbi.1007282] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2019] [Revised: 09/09/2019] [Accepted: 07/24/2019] [Indexed: 11/18/2022] Open
Abstract
The coding space of protein sequences is shaped by evolutionary constraints set by requirements of function and stability. We show that the coding space of a given protein family—the total number of sequences in that family—can be estimated using models of maximum entropy trained on multiple sequence alignments of naturally occuring amino acid sequences. We analyzed and calculated the size of three abundant repeat proteins families, whose members are large proteins made of many repetitions of conserved portions of ∼30 amino acids. While amino acid conservation at each position of the alignment explains most of the reduction of diversity relative to completely random sequences, we found that correlations between amino acid usage at different positions significantly impact that diversity. We quantified the impact of different types of correlations, functional and evolutionary, on sequence diversity. Analysis of the detailed structure of the coding space of the families revealed a rugged landscape, with many local energy minima of varying sizes with a hierarchical structure, reminiscent of fustrated energy landscapes of spin glass in physics. This clustered structure indicates a multiplicity of subtypes within each family, and suggests new strategies for protein design. Natural protein molecules are only a small subset of the possible strings of amino acids. This naturally calls the question of how many protein sequences theoretically exist that are functional, and how many have already been explored by nature. To help answer this question, we developed a statistical method to calculate the total potential number of protein sequences of a given family, focusing on three families of repeat proteins, which play important roles in a variety of cellular processes. The number of sequences that we compute is limited by functional interactions between the residues of the protein, as well as its evolutionary history. Applying techniques from the physics of disordered systems, we show that the space of sequences has a rugged structure, which could hinder their evolution. Individual proteins can be organised into distinct clusters corresponding to basins of attraction of the landscape, suggesting the existence of subfamilies within each family.
Collapse
Affiliation(s)
- Jacopo Marchi
- Laboratoire de physique de l’École normale supérieure (PSL University), CNRS, Sorbonne Université, and Université de Paris, 75005 Paris, France
| | - Ezequiel A. Galpern
- Protein Physiology Lab, Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Departamento de Química Biológica, Buenos Aires, Argentina
- CONICET - Universidad de Buenos Aires, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Buenos Aires, Argentina
| | - Rocio Espada
- Laboratoire Gulliver, Ecole supérieure de physique et chimie industrielles (PSL University) and CNRS, 75005, Paris, France
| | - Diego U. Ferreiro
- Protein Physiology Lab, Universidad de Buenos Aires, Facultad de Ciencias Exactas y Naturales, Departamento de Química Biológica, Buenos Aires, Argentina
- CONICET - Universidad de Buenos Aires, Instituto de Química Biológica de la Facultad de Ciencias Exactas y Naturales (IQUIBICEN), Buenos Aires, Argentina
| | - Aleksandra M. Walczak
- Laboratoire de physique de l’École normale supérieure (PSL University), CNRS, Sorbonne Université, and Université de Paris, 75005 Paris, France
- * E-mail: (AMW); (TM)
| | - Thierry Mora
- Laboratoire de physique de l’École normale supérieure (PSL University), CNRS, Sorbonne Université, and Université de Paris, 75005 Paris, France
- * E-mail: (AMW); (TM)
| |
Collapse
|
36
|
Szurmant H. Evolutionary couplings of amino acid residues reveal structure and function of bacterial signaling proteins. Mol Microbiol 2019; 112:432-437. [PMID: 31102561 DOI: 10.1111/mmi.14282] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 05/15/2019] [Indexed: 12/12/2022]
Abstract
The genomic era along with major advances in high-throughput sequencing technology has led to a rapid expansion of the genomic and consequently the protein sequence space. Bacterial extracytoplasmic function sigma factors have emerged as an important group of signaling proteins in bacteria involved in many regulatory decisions, most notably the adaptation to cell envelope stress. Their wide prevalence and amplification among bacterial genomes has led to sub-group classification and the realization of diverse signaling mechanisms. Mathematical frameworks have been developed to utilize extensive protein sequence alignments to extract co-evolutionary signals of interaction. This has proven useful in a number of different biological fields, including de novo structure prediction, protein-protein partner identification and the elucidation of alternative protein conformations for signal proteins, to name a few. The mathematical tools, commonly referred to under the name 'Direct Coupling Analysis' have now been applied to deduce molecular mechanisms of activation for sub-groups of extracytoplasmic sigma factors adding to previous successes on bacterial two-component signaling proteins. The amplification of signal transduction protein genes in bacterial genomes made them the first to be amenable to this approach but the sequences are available now to aid the molecular microbiologist, no matter their protein pathway of interest.
Collapse
Affiliation(s)
- Hendrik Szurmant
- Basic Medical Science, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona, CA, USA
| |
Collapse
|
37
|
Luo Y, Ahmad M, Schug A, Tsotsalas M. Rising Up: Hierarchical Metal-Organic Frameworks in Experiments and Simulations. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2019; 31:e1901744. [PMID: 31106914 DOI: 10.1002/adma.201901744] [Citation(s) in RCA: 67] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 04/09/2019] [Indexed: 06/09/2023]
Abstract
Controlled synthesis across several length scales, ranging from discrete molecular building blocks to size- and morphology-controlled nanoparticles to 2D sheets and thin films and finally to 3D architectures, is an advanced and highly active research field within both the metal-organic framework (MOF) domain and the overall material science community. Along with synthetic progress, theoretical simulations of MOF structures and properties have shown tremendous progress in both accuracy and system size. Further advancements in the field of hierarchically structured MOF materials will allow the optimization of their performance; however, this optimization requires a deep understanding of the different synthesis and processing techniques and an enhanced implementation of material modeling. Such modeling approaches will allow us to select and synthesize the highest-performing structures in a targeted rational manner. Here, recent progress in the synthesis of hierarchically structured MOFs and multiscale modeling and associated simulation techniques is presented, along with a brief overview of the challenges and future perspectives associated with a simulation-based approach toward the development of advanced hierarchically structured MOF materials.
Collapse
Affiliation(s)
- Yi Luo
- Institute of Functional Interfaces (IFG), Karlsruhe Institute of Technology (KIT), Hermann-von-Helmholtz-Platz 1, D-76344, Eggenstein-Leopoldshafen, Germany
| | - Momin Ahmad
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology (KIT), Hermann-von-Helmholtz-Platz 1, D-76344, Eggenstein-Leopoldshafen, Germany
- Institute for Theoretical Solid State Theory, Karlsruhe Institute of Technology (KIT), Wolfgang-Gaede-Str. 1, D-76131, Karlsruhe, Germany
| | - Alexander Schug
- John von Neumann Institute for Computing, Jülich Supercomputer Centre, Forschungszentrum Jülich, Wilhelm-Johnen-Straße, 52428, Jülich, Germany
| | - Manuel Tsotsalas
- Institute of Functional Interfaces (IFG), Karlsruhe Institute of Technology (KIT), Hermann-von-Helmholtz-Platz 1, D-76344, Eggenstein-Leopoldshafen, Germany
- Institute of Organic Chemistry (IOC), Karlsruhe Institute of Technology (KIT), Fritz-Haber-Weg 6, D-76131, Karlsruhe, Germany
| |
Collapse
|
38
|
Pucci F, Schug A. Shedding light on the dark matter of the biomolecular structural universe: Progress in RNA 3D structure prediction. Methods 2019; 162-163:68-73. [DOI: 10.1016/j.ymeth.2019.04.012] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2018] [Revised: 04/12/2019] [Accepted: 04/22/2019] [Indexed: 11/25/2022] Open
|
39
|
The role of coevolutionary signatures in protein interaction dynamics, complex inference, molecular recognition, and mutational landscapes. Curr Opin Struct Biol 2019; 56:179-186. [PMID: 31029927 DOI: 10.1016/j.sbi.2019.03.024] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Revised: 03/18/2019] [Accepted: 03/19/2019] [Indexed: 11/22/2022]
Abstract
Evolution imposes constraints at the interface of interacting biomolecules in order to preserve function or maintain fitness. This pressure may have a direct effect on the sequence composition of interacting biomolecules. As a result, statistical patterns of amino acid or nucleotide covariance that encode for physical and functional interactions are observed in sequences of extant organisms. In recent years, global pairwise models of amino acid and nucleotide coevolution from multiple sequence alignments have been developed and utilized to study molecular interactions in structural biology. In proteins, for which the energy landscape is funneled and minimally frustrated, a direct connection between the physical and sequence space landscapes can be established. Estimating coevolutionary information from sequences of interacting molecules has a broad impact in molecular biology. Applications include the accurate determination of 3D structures of molecular complexes, inference of protein interaction partners, models of protein-protein interaction specificity, the elucidation, and design of protein-nucleic acid recognition as well as the discovery of genome-wide epistatic effects. The current state of the art of coevolutionary analysis includes biomedical applications ranging from mutational landscapes and drug-design to vaccine development.
Collapse
|
40
|
Thirumalai D, Hyeon C, Zhuravlev PI, Lorimer GH. Symmetry, Rigidity, and Allosteric Signaling: From Monomeric Proteins to Molecular Machines. Chem Rev 2019; 119:6788-6821. [DOI: 10.1021/acs.chemrev.8b00760] [Citation(s) in RCA: 51] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- D. Thirumalai
- Department of Chemistry, The University of Texas, Austin, Texas 78712, United States
| | - Changbong Hyeon
- Korea Institute for Advanced Study, Seoul 02455, Republic of Korea
| | - Pavel I. Zhuravlev
- Biophysics Program, Institute for Physical Science and Technology and Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland 20742, United States
| | - George H. Lorimer
- Biophysics Program, Institute for Physical Science and Technology and Department of Chemistry and Biochemistry, University of Maryland, College Park, Maryland 20742, United States
| |
Collapse
|
41
|
Weiel M, Reinartz I, Schug A. Rapid interpretation of small-angle X-ray scattering data. PLoS Comput Biol 2019; 15:e1006900. [PMID: 30901335 PMCID: PMC6447237 DOI: 10.1371/journal.pcbi.1006900] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 04/03/2019] [Accepted: 02/24/2019] [Indexed: 12/20/2022] Open
Abstract
The fundamental aim of structural analyses in biophysics is to reveal a mutual relation between a molecule’s dynamic structure and its physiological function. Small-angle X-ray scattering (SAXS) is an experimental technique for structural characterization of macromolecules in solution and enables time-resolved analysis of conformational changes under physiological conditions. As such experiments measure spatially averaged low-resolution scattering intensities only, the sparse information obtained is not sufficient to uniquely reconstruct a three-dimensional atomistic model. Here, we integrate the information from SAXS into molecular dynamics simulations using computationally efficient native structure-based models. Dynamically fitting an initial structure towards a scattering intensity, such simulations produce atomistic models in agreement with the target data. In this way, SAXS data can be rapidly interpreted while retaining physico-chemical knowledge and sampling power of the underlying force field. We demonstrate our method’s performance using the example of three protein systems. Simulations are faster than full molecular dynamics approaches by more than two orders of magnitude and consistently achieve comparable accuracy. Computational demands are reduced sufficiently to run the simulations on commodity desktop computers instead of high-performance computing systems. These results underline that scattering-guided structure-based simulations provide a suitable framework for rapid early-stage refinement of structures towards SAXS data with particular focus on minimal computational resources and time. Proteins are the molecular nanomachines in biological cells and thus vital to any known form of life. From the evolutionary perspective, viable protein structure emerges on the basis of a ‘form-follows-function’ principle. A protein’s designated function is inextricably linked to dynamic conformational changes, which can be observed by small-angle X-ray scattering. Intensities from SAXS contain low-resolution information on the protein’s shape at different steps of its functional cycle. We are interested in directly getting an atomistic model of this encoded structure. One powerful approach is to include the experimental data into computational simulations of the protein’s function-related physical motions. We combine scattering intensities with coarse-grained native structure-based models. These models are computationally highly efficient yet describe the system’s dynamics realistically. Here, we present our method for rapid interpretation of scattering intensities from SAXS to derive structural models, using minimal computational resources and time.
Collapse
Affiliation(s)
- Marie Weiel
- Department of Physics, Karlsruhe Institute of Technology, Karlsruhe, Germany
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
| | - Ines Reinartz
- Department of Physics, Karlsruhe Institute of Technology, Karlsruhe, Germany
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
| | - Alexander Schug
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
- Institute for Advanced Simulation, Jülich Supercomputing Center, Jülich, Germany
- * E-mail:
| |
Collapse
|
42
|
Figliuzzi M, Barrat-Charlaix P, Weigt M. How Pairwise Coevolutionary Models Capture the Collective Residue Variability in Proteins? Mol Biol Evol 2019; 35:1018-1027. [PMID: 29351669 DOI: 10.1093/molbev/msy007] [Citation(s) in RCA: 68] [Impact Index Per Article: 11.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
Global coevolutionary models of homologous protein families, as constructed by direct coupling analysis (DCA), have recently gained popularity in particular due to their capacity to accurately predict residue-residue contacts from sequence information alone, and thereby to facilitate tertiary and quaternary protein structure prediction. More recently, they have also been used to predict fitness effects of amino-acid substitutions in proteins, and to predict evolutionary conserved protein-protein interactions. These models are based on two currently unjustified hypotheses: 1) correlations in the amino-acid usage of different positions are resulting collectively from networks of direct couplings; and 2) pairwise couplings are sufficient to capture the amino-acid variability. Here, we propose a highly precise inference scheme based on Boltzmann-machine learning, which allows us to systematically address these hypotheses. We show how correlations are built up in a highly collective way by a large number of coupling paths, which are based on the proteins three-dimensional structure. We further find that pairwise coevolutionary models capture the collective residue variability across homologous proteins even for quantities which are not imposed by the inference procedure, like three-residue correlations, the clustered structure of protein families in sequence space or the sequence distances between homologs. These findings strongly suggest that pairwise coevolutionary models are actually sufficient to accurately capture the residue variability in homologous protein families.
Collapse
Affiliation(s)
- Matteo Figliuzzi
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Computational and Quantitative Biology - UMR7238, 75005 Paris, France
| | - Pierre Barrat-Charlaix
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Computational and Quantitative Biology - UMR7238, 75005 Paris, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Computational and Quantitative Biology - UMR7238, 75005 Paris, France
| |
Collapse
|
43
|
Jarmolinska AI, Zhou Q, Sulkowska JI, Morcos F. DCA-MOL: A PyMOL Plugin To Analyze Direct Evolutionary Couplings. J Chem Inf Model 2019; 59:625-629. [DOI: 10.1021/acs.jcim.8b00690] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Affiliation(s)
- Aleksandra I. Jarmolinska
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland
- College of Inter-Faculty Individual Studies in Mathematics and Natural Sciences, Banacha 2c, 02-097 Warsaw, Poland
| | - Qin Zhou
- Department of Biological Sciences, University of Texas at Dallas, Richardson, Texas 75080, United States
| | - Joanna I. Sulkowska
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, Texas 75080, United States
- Center for Systems Biology, University of Texas at Dallas, Richardson, Texas 75080, United States
| |
Collapse
|
44
|
Abstract
Thanks to the explosion of genomic sequencing, coevolutionary analysis of protein sequences has gained great and ever-increasing popularity in the last decade, and it is currently an important and well-established tool in structural bioinformatics and computational biology. This chapter concisely introduces the theoretical foundation and the practical aspects of coevolutionary analysis, as well as discusses the molecular modeling strategies to exploit its results in the study of protein structure, dynamics, and interactions. We present here a complete pipeline from sequence extraction to contact prediction through two examples, focusing on the predictions of inter-residue contacts in a single protein domain and on the analysis of a multi-domain protein that undergoes functional, large-scale conformational transitions.
Collapse
Affiliation(s)
- Duccio Malinverni
- Laboratory of Statistical Biophysics, Institute of Physics, École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
| | - Alessandro Barducci
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, Montpellier, France.
| |
Collapse
|
45
|
Coevolutionary Signals and Structure-Based Models for the Prediction of Protein Native Conformations. Methods Mol Biol 2019; 1851:83-103. [PMID: 30298393 DOI: 10.1007/978-1-4939-8736-8_5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The analysis of coevolutionary signals from families of evolutionarily related sequences is a recent conceptual framework that provides valuable information about unique intramolecular interactions and, therefore, can assist in the elucidation of biomolecular conformations. It is based on the idea that compensatory mutations at specific residue positions in a sequence help preserve stability of protein architecture and function and leave a statistical signature related to residue-residue interactions in the 3D structure of the protein. Consequently, statistical analysis of these correlated mutations in subsets of protein sequence alignments can be used to predict which residue pairs should be in spatial proximity in the native functional protein fold. These predicted signals can be then used to guide molecular dynamics (MD) simulations to predict the three-dimensional coordinates of a functional amino acid chain. In this chapter, we introduce a general and efficient methodology to perform coevolutionary analysis on protein sequences and to use this information in combination with computational physical models to predict the native 3D conformation of functional polypeptides. We present a step-by-step methodology that includes the description and application of software tools and databases required to infer tertiary structures of a protein fold. The general pipeline includes instructions on (1) how to obtain direct amino acid couplings from protein sequences using direct coupling analysis (DCA), (2) how to incorporate such signals as interaction potentials in Cα structure-based models (SBMs) to drive protein-folding MD simulations, (3) a procedure to estimate secondary structure and how to include such estimates in the topology files required in the MD simulations, and (4) how to build full atomic models based on the top Cα candidates selected in the pipeline. The information presented in this chapter is self-contained and sufficient to allow a computational scientist to predict structures of proteins using publicly available algorithms and databases.
Collapse
|
46
|
Duclert-Savatier N, Bouvier G, Nilges M, Malliavin TE. Conformational sampling of CpxA: Connecting HAMP motions to the histidine kinase function. PLoS One 2018; 13:e0207899. [PMID: 30496238 PMCID: PMC6264157 DOI: 10.1371/journal.pone.0207899] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2017] [Accepted: 11/06/2018] [Indexed: 11/29/2022] Open
Abstract
In the histidine kinase family, the HAMP and DHp domains are considered to play an important role into the transmission of signal arising from environmental conditions to the auto-phosphorylation site and to the binding site of response regulator. Several conformational motions inside HAMP have been proposed to transmit this signal: (i) the gearbox model, (ii) α helices rotations, pistons and scissoring, (iii) transition between ordered and disordered states. In the present work, we explore by temperature-accelerated molecular dynamics (TAMD), an enhanced sampling technique, the conformational space of the cytoplasmic region of histidine kinase CpxA. Several HAMP motions, corresponding to α helices rotations, pistoning and scissoring have been detected and correlated to the segmental motions of HAMP and DHp domains of CpxA.
Collapse
Affiliation(s)
- Nathalie Duclert-Savatier
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR3528, Paris, France
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative, Institut Pasteur and CNRS USR3756, Paris, France
| | - Guillaume Bouvier
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR3528, Paris, France
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative, Institut Pasteur and CNRS USR3756, Paris, France
| | - Michael Nilges
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR3528, Paris, France
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative, Institut Pasteur and CNRS USR3756, Paris, France
| | - Thérèse E. Malliavin
- Unité de Bioinformatique Structurale, Institut Pasteur and CNRS UMR3528, Paris, France
- Centre de Bioinformatique, Biostatistique et Biologie Intégrative, Institut Pasteur and CNRS USR3756, Paris, France
- * E-mail:
| |
Collapse
|
47
|
Butler BM, Kazan IC, Kumar A, Ozkan SB. Coevolving residues inform protein dynamics profiles and disease susceptibility of nSNVs. PLoS Comput Biol 2018; 14:e1006626. [PMID: 30496278 PMCID: PMC6289467 DOI: 10.1371/journal.pcbi.1006626] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Revised: 12/11/2018] [Accepted: 11/09/2018] [Indexed: 11/18/2022] Open
Abstract
The conformational dynamics of proteins is rarely used in methodologies used to predict the impact of genetic mutations due to the paucity of three-dimensional protein structures as compared to the vast number of available sequences. Until now a three-dimensional (3D) structure has been required to predict the conformational dynamics of a protein. We introduce an approach that estimates the conformational dynamics of a protein, without relying on structural information. This de novo approach utilizes coevolving residues identified from a multiple sequence alignment (MSA) using Potts models. These coevolving residues are used as contacts in a Gaussian network model (GNM) to obtain protein dynamics. B-factors calculated using sequence-based GNM (Seq-GNM) are in agreement with crystallographic B-factors as well as theoretical B-factors from the original GNM that utilizes the 3D structure. Moreover, we demonstrate the ability of the calculated B-factors from the Seq-GNM approach to discriminate genomic variants according to their phenotypes for a wide range of proteins. These results suggest that protein dynamics can be approximated based on sequence information alone, making it possible to assess the phenotypes of nSNVs in cases where a 3D structure is unknown. We hope this work will promote the use of dynamics information in genetic disease prediction at scale by circumventing the need for 3D structures. Proteins are dynamic machines that undergo atomic fluctuations, side chain rotations, and collective domain movements that are required for biological function. There is, therefore, a need for quantitative metrics that capture the dynamic fluctuations per position to understand the critical role of protein dynamics in shaping biological functions. A limiting factor in incorporating structural dynamics information in the classification of non-synonymous single nucleotide variants (nSNVs) is the limited number of known 3D structures compared to the vast number of available sequences. We have developed a new sequence-based GNM method, termed Seq-GNM, which uses co-evolving amino acid positions based on the multiple sequence alignment of a given query sequence to estimate the thermal motions of C-alpha atoms. In this paper, we have demonstrated that the predicted thermal motions using Seq-GNM are in reasonable agreement with experimental B-factors as well as B-factors computed using 3D crystal structures. We also provide evidence that B-factors predicted by Seq-GNM are capable of distinguishing between disease-associated and neutral nSNVs.
Collapse
Affiliation(s)
- Brandon M. Butler
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
| | - I. Can Kazan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
| | - Avishek Kumar
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
- Harris School of Public Policy and Center for Data Science and Public Policy, University of Chicago, Chicago, IL, United States of America
| | - S. Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, AZ, United States of America
- * E-mail:
| |
Collapse
|
48
|
Bretl DJ, Ladd KM, Atkinson SN, Müller S, Kirby JR. Suppressor mutations reveal an NtrC-like response regulator, NmpR, for modulation of Type-IV Pili-dependent motility in Myxococcus xanthus. PLoS Genet 2018; 14:e1007714. [PMID: 30346960 PMCID: PMC6211767 DOI: 10.1371/journal.pgen.1007714] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2018] [Revised: 11/01/2018] [Accepted: 09/26/2018] [Indexed: 12/03/2022] Open
Abstract
Two-component signaling systems (TCS) regulate bacterial responses to environmental signals through the process of protein phosphorylation. Specifically, sensor histidine kinases (SK) recognize signals and propagate the response via phosphorylation of a cognate response regulator (RR) that functions to initiate transcription of specific genes. Signaling within a single TCS is remarkably specific and cross-talk between TCS is limited. However, regulation of the flow of information through complex signaling networks that include closely related TCS remains largely unknown. Additionally, many bacteria utilize multi-component signaling networks which provide additional genetic and biochemical interactions that must be regulated for signaling fidelity, input and output specificity, and phosphorylation kinetics. Here we describe the characterization of an NtrC-like RR that participates in regulation of Type-IV pilus-dependent motility of Myxococcus xanthus and is thus named NmpR, NtrC Modulator of Pili Regulator. A complex multi-component signaling system including NmpR was revealed by suppressor mutations that restored motility to cells lacking PilR, an evolutionarily conserved RR required for expression of pilA encoding the major Type-IV pilus monomer found in many bacterial species. The system contains at least four signaling proteins: a SK with a protoglobin sensor domain (NmpU), a hybrid SK (NmpS), a phospho-sink protein (NmpT), and an NtrC-like RR (NmpR). We demonstrate that ΔpilR bypass suppressor mutations affect regulation of the NmpRSTU multi-component system, such that NmpR activation is capable of restoring expression of pilA in the absence of PilR. Our findings indicate that pilus gene expression in M. xanthus is regulated by an extended network of TCS which interact to refine control of pilus function.
Collapse
Affiliation(s)
- Daniel J. Bretl
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States of America
| | - Kayla M. Ladd
- Department of Biochemistry, University of Iowa, Iowa City, Iowa, United States of America
| | - Samantha N. Atkinson
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States of America
- Department of Bioinformatics, University of Iowa, Iowa City, Iowa, United States of America
| | - Susanne Müller
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States of America
| | - John R. Kirby
- Department of Microbiology and Immunology, Medical College of Wisconsin, Milwaukee, WI, United States of America
| |
Collapse
|
49
|
Formation of a Secretion-Competent Protein Complex by a Dynamic Wrap-around Binding Mechanism. J Mol Biol 2018; 430:3157-3169. [DOI: 10.1016/j.jmb.2018.07.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2018] [Revised: 06/20/2018] [Accepted: 07/10/2018] [Indexed: 11/18/2022]
|
50
|
Cheng RR, Haglund E, Tiee NS, Morcos F, Levine H, Adams JA, Jennings PA, Onuchic JN. Designing bacterial signaling interactions with coevolutionary landscapes. PLoS One 2018; 13:e0201734. [PMID: 30125296 PMCID: PMC6101370 DOI: 10.1371/journal.pone.0201734] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 07/21/2018] [Indexed: 11/19/2022] Open
Abstract
Selecting amino acids to design novel protein-protein interactions that facilitate catalysis is a daunting challenge. We propose that a computational coevolutionary landscape based on sequence analysis alone offers a major advantage over expensive, time-consuming brute-force approaches currently employed. Our coevolutionary landscape allows prediction of single amino acid substitutions that produce functional interactions between non-cognate, interspecies signaling partners. In addition, it can also predict mutations that maintain segregation of signaling pathways across species. Specifically, predictions of phosphotransfer activity between the Escherichia coli histidine kinase EnvZ to the non-cognate receiver Spo0F from Bacillus subtilis were compiled. Twelve mutations designed to enhance, suppress, or have a neutral effect on kinase phosphotransfer activity to a non-cognate partner were selected. We experimentally tested the ability of the kinase to relay phosphate to the respective designed Spo0F receiver proteins against the theoretical predictions. Our key finding is that the coevolutionary landscape theory, with limited structural data, can significantly reduce the search-space for successful prediction of single amino acid substitutions that modulate phosphotransfer between the two-component His-Asp relay partners in a predicted fashion. This combined approach offers significant improvements over large-scale mutations studies currently used for protein engineering and design.
Collapse
Affiliation(s)
- Ryan R. Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States of America
- * E-mail: (RRC); (JNO)
| | - Ellinor Haglund
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States of America
| | - Nicholas S. Tiee
- Department of Chemistry & Biochemistry, The University of California, San Diego, California, United States of America
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Dallas, Texas, United States of America
- Department of Bioengineering, University of Texas at Dallas, Dallas, Texas, United States of America
| | - Herbert Levine
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States of America
- Department of Bioengineering, Rice University, Houston, Texas, United States of America
- Department of Biosciences, Rice University, Houston, Texas, United States of America
- Department of Physics & Astronomy, Rice University, Houston, Texas, United States of America
| | - Joseph A. Adams
- Department of Pharmacology, The University of California, San Diego, California, United States of America
| | - Patricia A. Jennings
- Department of Chemistry & Biochemistry, The University of California, San Diego, California, United States of America
| | - José N. Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States of America
- Department of Biosciences, Rice University, Houston, Texas, United States of America
- Department of Physics & Astronomy, Rice University, Houston, Texas, United States of America
- Department of Chemistry, Rice University, Houston, Texas, United States of America
- * E-mail: (RRC); (JNO)
| |
Collapse
|