51
|
Global pairwise RNA interaction landscapes reveal core features of protein recognition. Nat Commun 2018; 9:2511. [PMID: 29955037 PMCID: PMC6023938 DOI: 10.1038/s41467-018-04729-0] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2017] [Accepted: 05/16/2018] [Indexed: 01/14/2023] Open
Abstract
RNA–protein interactions permeate biology. Transcription, translation, and splicing all hinge on the recognition of structured RNA elements by RNA-binding proteins. Models of RNA–protein interactions are generally limited to short linear motifs and structures because of the vast sequence sampling required to access longer elements. Here, we develop an integrated approach that calculates global pairwise interaction scores from in vitro selection and high-throughput sequencing. We examine four RNA-binding proteins of phage, viral, and human origin. Our approach reveals regulatory motifs, discriminates between regulated and non-regulated RNAs within their native genomic context, and correctly predicts the consequence of mutational events on binding activity. We design binding elements that improve binding activity in cells and infer mutational pathways that reveal permissive versus disruptive evolutionary trajectories between regulated motifs. These coupling landscapes are broadly applicable for the discovery and characterization of protein–RNA recognition at single nucleotide resolution. RNA–protein interactions often depend on the recognition of extended RNA elements but the identification of these motifs is challenging. Here, the authors present a global integrated approach to analyze RNA–protein binding landscapes, mapping extended RNA interaction motifs for four RNA-binding proteins.
Collapse
|
52
|
Szurmant H, Weigt M. Inter-residue, inter-protein and inter-family coevolution: bridging the scales. Curr Opin Struct Biol 2018; 50:26-32. [PMID: 29101847 PMCID: PMC5940578 DOI: 10.1016/j.sbi.2017.10.014] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 10/12/2017] [Accepted: 10/13/2017] [Indexed: 10/18/2022]
Abstract
Interacting proteins coevolve at multiple but interconnected scales, from the residue-residue over the protein-protein up to the family-family level. The recent accumulation of enormous amounts of sequence data allows for the development of novel, data-driven computational approaches. Notably, these approaches can bridge scales within a single statistical framework. Although being currently applied mostly to isolated problems on single scales, their immense potential for an evolutionary informed, structural systems biology is steadily emerging.
Collapse
Affiliation(s)
- Hendrik Szurmant
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona, CA 91766, USA.
| | - Martin Weigt
- Sorbonne Universités, UPMC Université Paris 06, CNRS, Biologie Computationnelle et Quantitative - Institut de Biologie Paris Seine, 75005 Paris, France.
| |
Collapse
|
53
|
Tian P, Louis JM, Baber JL, Aniana A, Best RB. Co-Evolutionary Fitness Landscapes for Sequence Design. Angew Chem Int Ed Engl 2018; 57:5674-5678. [PMID: 29512300 PMCID: PMC6147258 DOI: 10.1002/anie.201713220] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2017] [Indexed: 11/10/2022]
Abstract
Efficient and accurate models to predict the fitness of a sequence would be extremely valuable in protein design. We have explored the use of statistical potentials for the coevolutionary fitness landscape, extracted from known protein sequences, in conjunction with Monte Carlo simulations, as a tool for design. As proof of principle, we created a series of predicted high-fitness sequences for three different protein folds, representative of different structural classes: the GA (all-α) and GB (α/β) binding domains of streptococcal protein G, and an SH3 (all-β) domain. We found that most of the designed proteins can fold stably to the target structure, and a structure for a representative of each for GA, GB and SH3 was determined. Several of our designed proteins were also able to bind to native ligands, in some cases with higher affinity than wild-type. Thus, a search using a statistical fitness landscape is a remarkably effective tool for finding novel stable protein sequences.
Collapse
Affiliation(s)
- Pengfei Tian
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0520 (USA)
| | - John M. Louis
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0520 (USA)
| | - James L. Baber
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0520 (USA)
| | - Annie Aniana
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0520 (USA)
| | - Robert B. Best
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, MD 20892-0520 (USA)
| |
Collapse
|
54
|
Tian P, Louis JM, Baber JL, Aniana A, Best RB. Co-Evolutionary Fitness Landscapes for Sequence Design. Angew Chem Int Ed Engl 2018. [DOI: 10.1002/ange.201713220] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Affiliation(s)
- Pengfei Tian
- Laboratory of Chemical Physics; National Institute of Diabetes and Digestive and Kidney Diseases; National Institutes of Health; Bethesda MD 20892-0520 USA
| | - John M. Louis
- Laboratory of Chemical Physics; National Institute of Diabetes and Digestive and Kidney Diseases; National Institutes of Health; Bethesda MD 20892-0520 USA
| | - James L. Baber
- Laboratory of Chemical Physics; National Institute of Diabetes and Digestive and Kidney Diseases; National Institutes of Health; Bethesda MD 20892-0520 USA
| | - Annie Aniana
- Laboratory of Chemical Physics; National Institute of Diabetes and Digestive and Kidney Diseases; National Institutes of Health; Bethesda MD 20892-0520 USA
| | - Robert B. Best
- Laboratory of Chemical Physics; National Institute of Diabetes and Digestive and Kidney Diseases; National Institutes of Health; Bethesda MD 20892-0520 USA
| |
Collapse
|
55
|
Abstract
This is a tale of how technology drove the discovery of the molecular basis for signal transduction in the initiation of sporulation in Bacillus subtilis and in bacterial two-component systems. It progresses from genetics to cloning and sequencing to biochemistry to structural biology to an understanding of how proteins evolve interaction specificity and to identification of interaction surfaces by statistical physics. This is about how the people in my laboratory accomplished this feat; without them little would have been done.
Collapse
Affiliation(s)
- James A Hoch
- Department of Molecular Medicine, The Scripps Research Institute, La Jolla, California 92037;
| |
Collapse
|
56
|
Reinartz I, Sinner C, Nettels D, Stucki-Buchli B, Stockmar F, Panek PT, Jacob CR, Nienhaus GU, Schuler B, Schug A. Simulation of FRET dyes allows quantitative comparison against experimental data. J Chem Phys 2018; 148:123321. [DOI: 10.1063/1.5010434] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Affiliation(s)
- Ines Reinartz
- Department of Physics, Karlsruhe Institute of Technology, Wolfgang-Gaede-Str. 1, 76131 Karlsruhe, Germany
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Claude Sinner
- Department of Physics, Karlsruhe Institute of Technology, Wolfgang-Gaede-Str. 1, 76131 Karlsruhe, Germany
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Daniel Nettels
- Department of Biochemistry, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - Brigitte Stucki-Buchli
- Department of Biochemistry, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - Florian Stockmar
- Institute of Applied Physics, Karlsruhe Institute of Technology, Wolfgang-Gaede-Str. 1, 76131 Karlsruhe, Germany
| | - Pawel T. Panek
- Institute of Physical and Theoretical Chemistry, TU Braunschweig, Gaußstraße 17, 38106 Braunschweig, Germany
| | - Christoph R. Jacob
- Institute of Physical and Theoretical Chemistry, TU Braunschweig, Gaußstraße 17, 38106 Braunschweig, Germany
| | - Gerd Ulrich Nienhaus
- Institute of Applied Physics, Karlsruhe Institute of Technology, Wolfgang-Gaede-Str. 1, 76131 Karlsruhe, Germany
- HEiKA–Heidelberg Karlsruhe Research Partnership, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
- Institute of Nanotechnology and Institute of Toxicology and Genetics, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
- Department of Physics, University of Illinois at Urbana-Champaign, Urbana, Illinois 61801, USA
| | - Benjamin Schuler
- Department of Biochemistry, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
- Department of Physics, University of Zurich, Winterthurerstrasse 190, 8057 Zurich, Switzerland
| | - Alexander Schug
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
- John von Neumann Institute for Computing, Jülich Supercomputing Centre, Forschungszentrum Jülich, 52425 Jülich, Germany
| |
Collapse
|
57
|
Cocco S, Feinauer C, Figliuzzi M, Monasson R, Weigt M. Inverse statistical physics of protein sequences: a key issues review. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2018; 81:032601. [PMID: 29120346 DOI: 10.1088/1361-6633/aa9965] [Citation(s) in RCA: 126] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.
Collapse
Affiliation(s)
- Simona Cocco
- Laboratoire de Physique Statistique de l'Ecole Normale Supérieure-UMR 8549, CNRS and PSL Research, Sorbonne Universités UPMC, Paris, France
| | | | | | | | | |
Collapse
|
58
|
dos Santos RN, Ferrari AJR, de Jesus HCR, Gozzo FC, Morcos F, Martínez L. Enhancing protein fold determination by exploring the complementary information of chemical cross-linking and coevolutionary signals. Bioinformatics 2018; 34:2201-2208. [DOI: 10.1093/bioinformatics/bty074] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Accepted: 02/10/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Ricardo N dos Santos
- Institute of Chemistry, University of Campinas, Campinas, Brazil
- Center for Computational Engineering and Sciences, University of Campinas, Campinas, Brazil
| | | | | | - Fábio C Gozzo
- Institute of Chemistry, University of Campinas, Campinas, Brazil
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, USA
| | - Leandro Martínez
- Institute of Chemistry, University of Campinas, Campinas, Brazil
- Center for Computational Engineering and Sciences, University of Campinas, Campinas, Brazil
| |
Collapse
|
59
|
Barrat-Charlaix P, Weigt M. [From sequence variability to structural and functional prediction: modeling of homologous protein families]. Biol Aujourdhui 2018; 211:239-244. [PMID: 29412135 DOI: 10.1051/jbio/2017030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Indexed: 06/08/2023]
Abstract
Thanks to next-generation sequencing, the number of sequenced genomes grows rapidly, providing in particular ample examples for the sequence variability between homologous proteins. This article discusses data-driven probabilistic sequence models, which are able to extract a multitude of information from sequence data alone, including (i) structural features like residue-residue contacts, which are formed in the folded protein, (ii) protein-protein interaction interfaces and (iii) phenotypic effects of amino-acid substitutions in proteins.
Collapse
Affiliation(s)
- Pierre Barrat-Charlaix
- Sorbonne Universités, UPMC Université Paris 06, CNRS, Biologie Computationnelle et Quantitative, Institut de Biologie Paris Seine, 75005 Paris, France
| | - Martin Weigt
- Sorbonne Universités, UPMC Université Paris 06, CNRS, Biologie Computationnelle et Quantitative, Institut de Biologie Paris Seine, 75005 Paris, France
| |
Collapse
|
60
|
Mandalaparthy V, Sanaboyana VR, Rafalia H, Gosavi S. Exploring the effects of sparse restraints on protein structure prediction. Proteins 2017; 86:248-262. [DOI: 10.1002/prot.25438] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2017] [Revised: 11/20/2017] [Accepted: 11/29/2017] [Indexed: 01/06/2023]
Affiliation(s)
- Varun Mandalaparthy
- Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road; Bangalore 560065 India
| | - Venkata Ramana Sanaboyana
- Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road; Bangalore 560065 India
| | - Hitesh Rafalia
- Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road; Bangalore 560065 India
- Manipal University, Madhav Nagar; Manipal 576104 India
| | - Shachi Gosavi
- Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bellary Road; Bangalore 560065 India
| |
Collapse
|
61
|
Carapia-Minero N, Castelán-Vega JA, Pérez NO, Rodríguez-Tovar AV. The phosphorelay signal transduction system in Candida glabrata: an in silico analysis. J Mol Model 2017; 24:13. [PMID: 29248994 DOI: 10.1007/s00894-017-3545-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Accepted: 11/24/2017] [Indexed: 01/18/2023]
Abstract
Signaling systems allow microorganisms to sense and respond to different stimuli through the modification of gene expression. The phosphorelay signal transduction system in eukaryotes involves three proteins: a sensor protein, an intermediate protein and a response regulator, and requires the transfer of a phosphate group between two histidine-aspartic residues. The SLN1-YPD1-SSK1 system enables yeast to adapt to hyperosmotic stress through the activation of the HOG1-MAPK pathway. The genetic sequences available from Saccharomyces cerevisiae were used to identify orthologous sequences in Candida glabrata, and putative genes were identified and characterized by in silico assays. An interactome analysis was carried out with the complete genome of C. glabrata and the putative proteins of the phosphorelay signal transduction system. Next, we modeled the complex formed between the sensor protein CgSln1p and the intermediate CgYpd1p. Finally, phosphate transfer was examined by a molecular dynamic assay. Our in silico analysis showed that the putative proteins of the C. glabrata phosphorelay signal transduction system present the functional domains of histidine kinase, a downstream response regulator protein, and an intermediate histidine phosphotransfer protein. All the sequences are phylogenetically more related to S. cerevisiae than to C. albicans. The interactome suggests that the C. glabrata phosphorelay signal transduction system interacts with different proteins that regulate cell wall biosynthesis and responds to oxidative and osmotic stress the same way as similar systems in S. cerevisiae and C. albicans. Molecular dynamics simulations showed complex formation between the response regulator domain of histidine kinase CgSln1 and intermediate protein CgYpd1 in the presence of a phosphate group and interactions between the aspartic residue and the histidine residue. Overall, our research showed that C. glabrata harbors a functional SLN1-YPD1-SSK1 phosphorelay system.
Collapse
Affiliation(s)
- Natalee Carapia-Minero
- Laboratorio de Micología Médica, Depto. de Microbiología, Escuela Nacional de Ciencias Biológicas (ENCB) , Instituto Politécnico Nacional, Prolongación de Carpio y Plan de Ayala s/n, Col. Casco de Santo Tomás, Del. Miguel Hidalgo, CP 11340, Ciudad de México, Mexico
| | - Juan Arturo Castelán-Vega
- Laboratorio de Producción y Control de Biológicos ENCB, Instituto Politécnico Nacional, Carpio y Plan de Ayala s/n, Col. Casco de Santo Tomás, Del. Miguel Hidalgo, CP 11340, Ciudad de México, Mexico
| | - Néstor Octavio Pérez
- Unidad de investigación y Desarrollo, Probiomed, SA de CV, Cruce de Carreteras Acatzingo-Zumpahuacan S/N, CP 52400, Tenancingo, Edo de México, Mexico.
| | - Aída Verónica Rodríguez-Tovar
- Laboratorio de Micología Médica, Depto. de Microbiología, Escuela Nacional de Ciencias Biológicas (ENCB) , Instituto Politécnico Nacional, Prolongación de Carpio y Plan de Ayala s/n, Col. Casco de Santo Tomás, Del. Miguel Hidalgo, CP 11340, Ciudad de México, Mexico.
| |
Collapse
|
62
|
Tsirigos KD, Govindarajan S, Bassot C, Västermark Å, Lamb J, Shu N, Elofsson A. Topology of membrane proteins-predictions, limitations and variations. Curr Opin Struct Biol 2017; 50:9-17. [PMID: 29100082 DOI: 10.1016/j.sbi.2017.10.003] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Revised: 09/29/2017] [Accepted: 10/03/2017] [Indexed: 10/18/2022]
Abstract
Transmembrane proteins perform a variety of important biological functions necessary for the survival and growth of the cells. Membrane proteins are built up by transmembrane segments that span the lipid bilayer. The segments can either be in the form of hydrophobic alpha-helices or beta-sheets which create a barrel. A fundamental aspect of the structure of transmembrane proteins is the membrane topology, that is, the number of transmembrane segments, their position in the protein sequence and their orientation in the membrane. Along these lines, many predictive algorithms for the prediction of the topology of alpha-helical and beta-barrel transmembrane proteins exist. The newest algorithms obtain an accuracy close to 80% both for alpha-helical and beta-barrel transmembrane proteins. However, lately it has been shown that the simplified picture presented when describing a protein family by its topology is limited. To demonstrate this, we highlight examples where the topology is either not conserved in a protein superfamily or where the structure cannot be described solely by the topology of a protein. The prediction of these non-standard features from sequence alone was not successful until the recent revolutionary progress in 3D-structure prediction of proteins.
Collapse
Affiliation(s)
| | - Sudha Govindarajan
- Science for Life Laboratory, Stockholm University, SE-171 21 Solna, Sweden; Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Claudio Bassot
- Science for Life Laboratory, Stockholm University, SE-171 21 Solna, Sweden; Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Åke Västermark
- Science for Life Laboratory, Stockholm University, SE-171 21 Solna, Sweden; Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden; NITECH, Showa-Ku, Nagoya 466-8555 Japan
| | - John Lamb
- Science for Life Laboratory, Stockholm University, SE-171 21 Solna, Sweden; Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden
| | - Nanjiang Shu
- Science for Life Laboratory, Stockholm University, SE-171 21 Solna, Sweden; Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden; National Bioinformatics Infrastructure, Sweden; Nordic e-Infrastructure Collaboration, Sweden
| | - Arne Elofsson
- Science for Life Laboratory, Stockholm University, SE-171 21 Solna, Sweden; Department of Biochemistry and Biophysics, Stockholm University, SE-106 91 Stockholm, Sweden; Swedish e-Science Research Center (SeRC), Sweden.
| |
Collapse
|
63
|
Biomolecular coevolution and its applications: Going from structure prediction toward signaling, epistasis, and function. Biochem Soc Trans 2017; 45:1253-1261. [DOI: 10.1042/bst20170063] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Revised: 08/30/2017] [Accepted: 09/04/2017] [Indexed: 01/01/2023]
Abstract
Evolution leads to considerable changes in the sequence of biomolecules, while their overall structure and function remain quite conserved. The wealth of genomic sequences, the ‘Biological Big Data’, modern sequencing techniques provide allows us to investigate biomolecular evolution with unprecedented detail. Sophisticated statistical models can infer residue pair mutations resulting from spatial proximity. The introduction of predicted spatial adjacencies as constraints in biomolecular structure prediction workflows has transformed the field of protein and RNA structure prediction toward accuracies approaching the experimental resolution limit. Going beyond structure prediction, the same mathematical framework allows mimicking evolutionary fitness landscapes to infer signaling interactions, epistasis, or mutational landscapes.
Collapse
|
64
|
Tian P, Best RB. How Many Protein Sequences Fold to a Given Structure? A Coevolutionary Analysis. Biophys J 2017; 113:1719-1730. [PMID: 29045866 PMCID: PMC5647607 DOI: 10.1016/j.bpj.2017.08.039] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Revised: 08/03/2017] [Accepted: 08/08/2017] [Indexed: 12/23/2022] Open
Abstract
Quantifying the relationship between protein sequence and structure is key to understanding the protein universe. A fundamental measure of this relationship is the total number of amino acid sequences that can fold to a target protein structure, known as the "sequence capacity," which has been suggested as a proxy for how designable a given protein fold is. Although sequence capacity has been extensively studied using lattice models and theory, numerical estimates for real protein structures are currently lacking. In this work, we have quantitatively estimated the sequence capacity of 10 proteins with a variety of different structures using a statistical model based on residue-residue co-evolution to capture the variation of sequences from the same protein family. Remarkably, we find that even for the smallest protein folds, such as the WW domain, the number of foldable sequences is extremely large, exceeding the Avogadro constant. In agreement with earlier theoretical work, the calculated sequence capacity is positively correlated with the size of the protein, or better, the density of contacts. This allows the absolute sequence capacity of a given protein to be approximately predicted from its structure. On the other hand, the relative sequence capacity, i.e., normalized by the total number of possible sequences, is an extremely tiny number and is strongly anti-correlated with the protein length. Thus, although there may be more foldable sequences for larger proteins, it will be much harder to find them. Lastly, we have correlated the evolutionary age of proteins in the CATH database with their sequence capacity as predicted by our model. The results suggest a trade-off between the opposing requirements of high designability and the likelihood of a novel fold emerging by chance.
Collapse
Affiliation(s)
- Pengfei Tian
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland
| | - Robert B Best
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland.
| |
Collapse
|
65
|
Origins of coevolution between residues distant in protein 3D structures. Proc Natl Acad Sci U S A 2017; 114:9122-9127. [PMID: 28784799 DOI: 10.1073/pnas.1702664114] [Citation(s) in RCA: 127] [Impact Index Per Article: 15.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Residue pairs that directly coevolve in protein families are generally close in protein 3D structures. Here we study the exceptions to this general trend-directly coevolving residue pairs that are distant in protein structures-to determine the origins of evolutionary pressure on spatially distant residues and to understand the sources of error in contact-based structure prediction. Over a set of 4,000 protein families, we find that 25% of directly coevolving residue pairs are separated by more than 5 Å in protein structures and 3% by more than 15 Å. The majority (91%) of directly coevolving residue pairs in the 5-15 Å range are found to be in contact in at least one homologous structure-these exceptions arise from structural variation in the family in the region containing the residues. Thirty-five percent of the exceptions greater than 15 Å are at homo-oligomeric interfaces, 19% arise from family structural variation, and 27% are in repeat proteins likely reflecting alignment errors. Of the remaining long-range exceptions (<1% of the total number of coupled pairs), many can be attributed to close interactions in an oligomeric state. Overall, the results suggest that directly coevolving residue pairs not in repeat proteins are spatially proximal in at least one biologically relevant protein conformation within the family; we find little evidence for direct coupling between residues at spatially separated allosteric and functional sites or for increased direct coupling between residue pairs on putative allosteric pathways connecting them.
Collapse
|
66
|
Uguzzoni G, John Lovis S, Oteri F, Schug A, Szurmant H, Weigt M. Large-scale identification of coevolution signals across homo-oligomeric protein interfaces by direct coupling analysis. Proc Natl Acad Sci U S A 2017; 114:E2662-E2671. [PMID: 28289198 PMCID: PMC5380090 DOI: 10.1073/pnas.1615068114] [Citation(s) in RCA: 64] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Proteins have evolved to perform diverse cellular functions, from serving as reaction catalysts to coordinating cellular propagation and development. Frequently, proteins do not exert their full potential as monomers but rather undergo concerted interactions as either homo-oligomers or with other proteins as hetero-oligomers. The experimental study of such protein complexes and interactions has been arduous. Theoretical structure prediction methods are an attractive alternative. Here, we investigate homo-oligomeric interfaces by tracing residue coevolution via the global statistical direct coupling analysis (DCA). DCA can accurately infer spatial adjacencies between residues. These adjacencies can be included as constraints in structure prediction techniques to predict high-resolution models. By taking advantage of the ongoing exponential growth of sequence databases, we go significantly beyond anecdotal cases of a few protein families and apply DCA to a systematic large-scale study of nearly 2,000 Pfam protein families with sufficient sequence information and structurally resolved homo-oligomeric interfaces. We find that large interfaces are commonly identified by DCA. We further demonstrate that DCA can differentiate between subfamilies with different binding modes within one large Pfam family. Sequence-derived contact information for the subfamilies proves sufficient to assemble accurate structural models of the diverse protein-oligomers. Thus, we provide an approach to investigate oligomerization for arbitrary protein families leading to structural models complementary to often-difficult experimental methods. Combined with ever more abundant sequential data, we anticipate that this study will be instrumental to allow the structural description of many heteroprotein complexes in the future.
Collapse
Affiliation(s)
- Guido Uguzzoni
- Sorbonne Universités, Université Pierre-et-Marie-Curie Université Paris 06, CNRS, Biologie Computationnelle et Quantitative-Institut de Biologie Paris Seine, 75005 Paris, France
| | - Shalini John Lovis
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany
| | - Francesco Oteri
- Sorbonne Universités, Université Pierre-et-Marie-Curie Université Paris 06, CNRS, Biologie Computationnelle et Quantitative-Institut de Biologie Paris Seine, 75005 Paris, France
| | - Alexander Schug
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, 76344 Eggenstein-Leopoldshafen, Germany;
| | - Hendrik Szurmant
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona, CA 91766
| | - Martin Weigt
- Sorbonne Universités, Université Pierre-et-Marie-Curie Université Paris 06, CNRS, Biologie Computationnelle et Quantitative-Institut de Biologie Paris Seine, 75005 Paris, France;
| |
Collapse
|
67
|
Martinez M, Duclert-Savatier N, Betton JM, Alzari PM, Nilges M, Malliavin TE. Modification in hydrophobic packing of HAMP domain induces a destabilization of the auto-phosphorylation site in the histidine kinase CpxA. Biopolymers 2017; 105:670-82. [PMID: 27124288 DOI: 10.1002/bip.22864] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2015] [Revised: 04/22/2016] [Accepted: 04/25/2016] [Indexed: 12/13/2022]
Abstract
The histidine kinases belong to the family of two-component systems, which serves in bacteria to couple environmental stimuli to adaptive responses. Most of the histidine kinases are homodimers, in which the HAMP and DHp domains assemble into an elongated helical region flanked by two CA domains. Recently, X-ray crystallographic structures of the cytoplasmic region of the Escherichia coli histidine kinase CpxA were determined and a phosphotransferase-defective mutant, M228V, located in HAMP, was identified. In the present study, we recorded 1 μs molecular dynamics trajectories to compare the behavior of the WT and M228V protein dimers. The M228V modification locally induces the appearance of larger voids within HAMP as well as a perturbation of the number of voids within DHp, thus destabilizing the HAMP and DHp hydrophobic packing. In addition, a disruption of the stacking interaction between F403 located in the lid of the CA domain involved in the auto-phosphorylation and R296 located in the interacting DHp region, is more often observed in the presence of the M228V modification. Experimental modifications R296A and R296D of CpxA have been observed to reduce also the CpxA activity. These observations agree with the destabilization of the R296/F403 stacking, and could be the sign of the transmission of a conformational event taking place in HAMP to the auto-phosphorylation site of histidine kinase. © 2016 Wiley Periodicals, Inc. Biopolymers 105: 670-682, 2016.
Collapse
Affiliation(s)
- Marlet Martinez
- Institut Pasteur and CNRS UMR 3528, Rue Du Dr Roux, Unité De Bioinformatique Structurale, Paris, 75015, France
| | - Nathalie Duclert-Savatier
- Institut Pasteur and CNRS UMR 3528, Rue Du Dr Roux, Unité De Bioinformatique Structurale, Paris, 75015, France
| | - Jean-Michel Betton
- Institut Pasteur and CNRS UMR 3528, Rue Du Dr Roux, Unité De Microbiologie Structurale, Paris, 75015, France
| | - Pedro M Alzari
- Institut Pasteur and CNRS UMR 3528, Rue Du Dr Roux, Unité De Microbiologie Structurale, Paris, 75015, France
| | - Michael Nilges
- Institut Pasteur and CNRS UMR 3528, Rue Du Dr Roux, Unité De Bioinformatique Structurale, Paris, 75015, France
| | - Thérèse E Malliavin
- Institut Pasteur and CNRS UMR 3528, Rue Du Dr Roux, Unité De Bioinformatique Structurale, Paris, 75015, France
| |
Collapse
|
68
|
Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone. Proc Natl Acad Sci U S A 2016; 113:15018-15023. [PMID: 27965389 DOI: 10.1073/pnas.1611861114] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.
Collapse
|
69
|
Bai F, Morcos F, Cheng RR, Jiang H, Onuchic JN. Elucidating the druggable interface of protein-protein interactions using fragment docking and coevolutionary analysis. Proc Natl Acad Sci U S A 2016; 113:E8051-E8058. [PMID: 27911825 PMCID: PMC5167203 DOI: 10.1073/pnas.1615932113] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Protein-protein interactions play a central role in cellular function. Improving the understanding of complex formation has many practical applications, including the rational design of new therapeutic agents and the mechanisms governing signal transduction networks. The generally large, flat, and relatively featureless binding sites of protein complexes pose many challenges for drug design. Fragment docking and direct coupling analysis are used in an integrated computational method to estimate druggable protein-protein interfaces. (i) This method explores the binding of fragment-sized molecular probes on the protein surface using a molecular docking-based screen. (ii) The energetically favorable binding sites of the probes, called hot spots, are spatially clustered to map out candidate binding sites on the protein surface. (iii) A coevolution-based interface interaction score is used to discriminate between different candidate binding sites, yielding potential interfacial targets for therapeutic drug design. This approach is validated for important, well-studied disease-related proteins with known pharmaceutical targets, and also identifies targets that have yet to be studied. Moreover, therapeutic agents are proposed by chemically connecting the fragments that are strongly bound to the hot spots.
Collapse
Affiliation(s)
- Fang Bai
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Dallas, TX 75080
- Department of Bioengineering, University of Texas at Dallas, Dallas, TX 75080
- Center for Systems Biology, University of Texas at Dallas, Dallas, TX 75080
| | - Ryan R Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China;
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005;
- Department of Physics and Astronomy, Rice University, Houston, TX 77005
- Department of Chemistry, Rice University, Houston, TX 77005
- Department of Biosciences, Rice University, Houston, TX 77005
| |
Collapse
|
70
|
Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis. Proc Natl Acad Sci U S A 2016; 113:12186-12191. [PMID: 27729520 DOI: 10.1073/pnas.1607570113] [Citation(s) in RCA: 71] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Understanding protein-protein interactions is central to our understanding of almost all complex biological processes. Computational tools exploiting rapidly growing genomic databases to characterize protein-protein interactions are urgently needed. Such methods should connect multiple scales from evolutionary conserved interactions between families of homologous proteins, over the identification of specifically interacting proteins in the case of multiple paralogs inside a species, down to the prediction of residues being in physical contact across interaction interfaces. Statistical inference methods detecting residue-residue coevolution have recently triggered considerable progress in using sequence data for quaternary protein structure prediction; they require, however, large joint alignments of homologous protein pairs known to interact. The generation of such alignments is a complex computational task on its own; application of coevolutionary modeling has, in turn, been restricted to proteins without paralogs, or to bacterial systems with the corresponding coding genes being colocalized in operons. Here we show that the direct coupling analysis of residue coevolution can be extended to connect the different scales, and simultaneously to match interacting paralogs, to identify interprotein residue-residue contacts and to discriminate interacting from noninteracting families in a multiprotein system. Our results extend the potential applications of coevolutionary analysis far beyond cases treatable so far.
Collapse
|
71
|
Cheng RR, Nordesjö O, Hayes RL, Levine H, Flores SC, Onuchic JN, Morcos F. Connecting the Sequence-Space of Bacterial Signaling Proteins to Phenotypes Using Coevolutionary Landscapes. Mol Biol Evol 2016; 33:3054-3064. [PMID: 27604223 PMCID: PMC5100047 DOI: 10.1093/molbev/msw188] [Citation(s) in RCA: 48] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Two-component signaling (TCS) is the primary means by which bacteria sense and respond to the environment. TCS involves two partner proteins working in tandem, which interact to perform cellular functions whereas limiting interactions with non-partners (i.e., cross-talk). We construct a Potts model for TCS that can quantitatively predict how mutating amino acid identities affect the interaction between TCS partners and non-partners. The parameters of this model are inferred directly from protein sequence data. This approach drastically reduces the computational complexity of exploring the sequence-space of TCS proteins. As a stringent test, we compare its predictions to a recent comprehensive mutational study, which characterized the functionality of 204 mutational variants of the PhoQ kinase in Escherichia coli We find that our best predictions accurately reproduce the amino acid combinations found in experiment, which enable functional signaling with its partner PhoP. These predictions demonstrate the evolutionary pressure to preserve the interaction between TCS partners as well as prevent unwanted cross-talk. Further, we calculate the mutational change in the binding affinity between PhoQ and PhoP, providing an estimate to the amount of destabilization needed to disrupt TCS.
Collapse
Affiliation(s)
- R R Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, TX
| | - O Nordesjö
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - R L Hayes
- Department of Biophysics, University of Michigan, Ann Arbor, MI
| | - H Levine
- Center for Theoretical Biological Physics, Rice University, Houston, TX.,Department of Bioengineering, Rice University, Houston, TX
| | - S C Flores
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - J N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX .,Department of Physics and Astronomy, Rice University, Houston, TX.,Department of Chemistry, and Biosciences, Rice University, Houston, TX
| | - F Morcos
- Department of Biological Sciences and Center for Systems Biology, University of Texas at Dallas, Dallas, TX
| |
Collapse
|
72
|
Zschiedrich CP, Keidel V, Szurmant H. Molecular Mechanisms of Two-Component Signal Transduction. J Mol Biol 2016; 428:3752-75. [PMID: 27519796 DOI: 10.1016/j.jmb.2016.08.003] [Citation(s) in RCA: 376] [Impact Index Per Article: 41.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2016] [Revised: 07/30/2016] [Accepted: 08/01/2016] [Indexed: 02/03/2023]
Abstract
Two-component systems (TCS) comprising sensor histidine kinases and response regulator proteins are among the most important players in bacterial and archaeal signal transduction and also occur in reduced numbers in some eukaryotic organisms. Given their importance to cellular survival, virulence, and cellular development, these systems are among the most scrutinized bacterial proteins. In the recent years, a flurry of bioinformatics, genetic, biochemical, and structural studies have provided detailed insights into many molecular mechanisms that underlie the detection of signals and the generation of the appropriate response by TCS. Importantly, it has become clear that there is significant diversity in the mechanisms employed by individual systems. This review discusses the current knowledge on common themes and divergences from the paradigm of TCS signaling. An emphasis is on the information gained by a flurry of recent structural and bioinformatics studies.
Collapse
Affiliation(s)
- Christopher P Zschiedrich
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, 309 E Second Street, Pomona, CA 91766, USA; Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 N Torrey Pines Road, La Jolla, CA 92037, USA
| | - Victoria Keidel
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, 309 E Second Street, Pomona, CA 91766, USA; Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 N Torrey Pines Road, La Jolla, CA 92037, USA
| | - Hendrik Szurmant
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, 309 E Second Street, Pomona, CA 91766, USA; Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 N Torrey Pines Road, La Jolla, CA 92037, USA.
| |
Collapse
|
73
|
Wagner JR, Lee CT, Durrant JD, Malmstrom RD, Feher VA, Amaro RE. Emerging Computational Methods for the Rational Discovery of Allosteric Drugs. Chem Rev 2016; 116:6370-90. [PMID: 27074285 PMCID: PMC4901368 DOI: 10.1021/acs.chemrev.5b00631] [Citation(s) in RCA: 179] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
![]()
Allosteric drug development holds
promise for delivering medicines
that are more selective and less toxic than those that target orthosteric
sites. To date, the discovery of allosteric binding sites and lead
compounds has been mostly serendipitous, achieved through high-throughput
screening. Over the past decade, structural data has become more readily
available for larger protein systems and more membrane protein classes
(e.g., GPCRs and ion channels), which are common allosteric drug targets.
In parallel, improved simulation methods now provide better atomistic
understanding of the protein dynamics and cooperative motions that
are critical to allosteric mechanisms. As a result of these advances,
the field of predictive allosteric drug development is now on the
cusp of a new era of rational structure-based computational methods.
Here, we review algorithms that predict allosteric sites based on
sequence data and molecular dynamics simulations, describe tools that
assess the druggability of these pockets, and discuss how Markov state
models and topology analyses provide insight into the relationship
between protein dynamics and allosteric drug binding. In each section,
we first provide an overview of the various method classes before
describing relevant algorithms and software packages.
Collapse
Affiliation(s)
- Jeffrey R Wagner
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Christopher T Lee
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Jacob D Durrant
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Robert D Malmstrom
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Victoria A Feher
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Rommie E Amaro
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| |
Collapse
|
74
|
Feinauer C, Szurmant H, Weigt M, Pagnani A. Inter-Protein Sequence Co-Evolution Predicts Known Physical Interactions in Bacterial Ribosomes and the Trp Operon. PLoS One 2016; 11:e0149166. [PMID: 26882169 PMCID: PMC4755613 DOI: 10.1371/journal.pone.0149166] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Accepted: 01/28/2016] [Indexed: 11/29/2022] Open
Abstract
Interaction between proteins is a fundamental mechanism that underlies virtually all biological processes. Many important interactions are conserved across a large variety of species. The need to maintain interaction leads to a high degree of co-evolution between residues in the interface between partner proteins. The inference of protein-protein interaction networks from the rapidly growing sequence databases is one of the most formidable tasks in systems biology today. We propose here a novel approach based on the Direct-Coupling Analysis of the co-evolution between inter-protein residue pairs. We use ribosomal and trp operon proteins as test cases: For the small resp. large ribosomal subunit our approach predicts protein-interaction partners at a true-positive rate of 70% resp. 90% within the first 10 predictions, with areas of 0.69 resp. 0.81 under the ROC curves for all predictions. In the trp operon, it assigns the two largest interaction scores to the only two interactions experimentally known. On the level of residue interactions we show that for both the small and the large ribosomal subunit our approach predicts interacting residues in the system with a true positive rate of 60% and 85% in the first 20 predictions. We use artificial data to show that the performance of our approach depends crucially on the size of the joint multiple sequence alignments and analyze how many sequences would be necessary for a perfect prediction if the sequences were sampled from the same model that we use for prediction. Given the performance of our approach on the test data we speculate that it can be used to detect new interactions, especially in the light of the rapid growth of available sequence data.
Collapse
Affiliation(s)
- Christoph Feinauer
- Department of Applied Science and Technology, and Center for Computational Sciences, Politecnico di Torino, Torino, Italy
| | - Hendrik Szurmant
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, United States of America
| | - Martin Weigt
- Sorbonne Universités, UPMC, UMR 7238, Computational and Quantitative Biology, Paris, France
- CNRS, UMR 7238, Computational and Quantitative Biology, Paris, France
- * E-mail: (MW); (AP)
| | - Andrea Pagnani
- Department of Applied Science and Technology, and Center for Computational Sciences, Politecnico di Torino, Torino, Italy
- Human Genetics Foundation, Molecular Biotechnology Center (MBC), Torino, Italy
- * E-mail: (MW); (AP)
| |
Collapse
|
75
|
Noel JK, Morcos F, Onuchic JN. Sequence co-evolutionary information is a natural partner to minimally-frustrated models of biomolecular dynamics. F1000Res 2016; 5. [PMID: 26918164 PMCID: PMC4755392 DOI: 10.12688/f1000research.7186.1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/21/2016] [Indexed: 11/25/2022] Open
Abstract
Experimentally derived structural constraints have been crucial to the implementation of computational models of biomolecular dynamics. For example, not only does crystallography provide essential starting points for molecular simulations but also high-resolution structures permit for parameterization of simplified models. Since the energy landscapes for proteins and other biomolecules have been shown to be minimally frustrated and therefore funneled, these structure-based models have played a major role in understanding the mechanisms governing folding and many functions of these systems. Structural information, however, may be limited in many interesting cases. Recently, the statistical analysis of residue co-evolution in families of protein sequences has provided a complementary method of discovering residue-residue contact interactions involved in functional configurations. These functional configurations are often transient and difficult to capture experimentally. Thus, co-evolutionary information can be merged with that available for experimentally characterized low free-energy structures, in order to more fully capture the true underlying biomolecular energy landscape.
Collapse
Affiliation(s)
- Jeffrey K Noel
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA; Kristallographie, Max-Delbrück-Centrum für Molekulare Medizin, Berlin, Germany
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, USA
| | - Jose N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA
| |
Collapse
|
76
|
Cheng RR, Raghunathan M, Noel JK, Onuchic JN. Constructing sequence-dependent protein models using coevolutionary information. Protein Sci 2016; 25:111-22. [PMID: 26223372 PMCID: PMC4815312 DOI: 10.1002/pro.2758] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2015] [Accepted: 07/27/2015] [Indexed: 11/08/2022]
Abstract
Recent developments in global statistical methodologies have advanced the analysis of large collections of protein sequences for coevolutionary information. Coevolution between amino acids in a protein arises from compensatory mutations that are needed to maintain the stability or function of a protein over the course of evolution. This gives rise to quantifiable correlations between amino acid sites within the multiple sequence alignment of a protein family. Here, we use the maximum entropy-based approach called mean field Direct Coupling Analysis (mfDCA) to infer a Potts model Hamiltonian governing the correlated mutations in a protein family. We use the inferred pairwise statistical couplings to generate the sequence-dependent heterogeneous interaction energies of a structure-based model (SBM) where only native contacts are considered. Considering the ribosomal S6 protein and its circular permutants as well as the SH3 protein, we demonstrate that these models quantitatively agree with experimental data on folding mechanisms. This work serves as a new framework for generating coevolutionary data-enriched models that can potentially be used to engineer key functional motions and novel interactions in protein systems.
Collapse
Affiliation(s)
- Ryan R Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, 77005
| | - Mohit Raghunathan
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, 77005
- Department of Physics & Astronomy, Rice University, Houston, Texas, 77005
| | - Jeffrey K Noel
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, 77005
- Department of Physics & Astronomy, Rice University, Houston, Texas, 77005
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, 77005
- Department of Physics & Astronomy, Rice University, Houston, Texas, 77005
| |
Collapse
|
77
|
Braun T, Koehler Leman J, Lange OF. Combining Evolutionary Information and an Iterative Sampling Strategy for Accurate Protein Structure Prediction. PLoS Comput Biol 2015; 11:e1004661. [PMID: 26713437 PMCID: PMC4694711 DOI: 10.1371/journal.pcbi.1004661] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2015] [Accepted: 11/17/2015] [Indexed: 12/18/2022] Open
Abstract
Recent work has shown that the accuracy of ab initio structure prediction can be significantly improved by integrating evolutionary information in form of intra-protein residue-residue contacts. Following this seminal result, much effort is put into the improvement of contact predictions. However, there is also a substantial need to develop structure prediction protocols tailored to the type of restraints gained by contact predictions. Here, we present a structure prediction protocol that combines evolutionary information with the resolution-adapted structural recombination approach of Rosetta, called RASREC. Compared to the classic Rosetta ab initio protocol, RASREC achieves improved sampling, better convergence and higher robustness against incorrect distance restraints, making it the ideal sampling strategy for the stated problem. To demonstrate the accuracy of our protocol, we tested the approach on a diverse set of 28 globular proteins. Our method is able to converge for 26 out of the 28 targets and improves the average TM-score of the entire benchmark set from 0.55 to 0.72 when compared to the top ranked models obtained by the EVFold web server using identical contact predictions. Using a smaller benchmark, we furthermore show that the prediction accuracy of our method is only slightly reduced when the contact prediction accuracy is comparatively low. This observation is of special interest for protein sequences that only have a limited number of homologs. Recently, a breakthrough has been achieved in modeling the atomic 3D structures of proteins from their sequence alone without requiring any experimental work on the protein itself. To achieve this goal, a database of evolutionary related sequences is analyzed to find co-evolving residues, giving insight into which residues are in close proximity to each other. These residue-residue contacts can help to drive a computer simulation with an atomic-scale physical model of the protein structure from a random starting conformation to a native-like 3D conformation. Although much effort is being put into the improvement of residue-residue contact predictions, their accuracy will always be limited. Therefore, structure prediction protocols with a high tolerance against incorrect distance restraints are needed. Here, we present a structure prediction protocol that combines evolutionary information with the iterative sampling approach of the molecular modeling suite Rosetta, called RASREC. RASREC has been shown to converge faster to near-native models and to be more robust against incorrect distance restraints than standard prediction protocols. It is therefore perfectly suited for restraints obtained from predicted residue-residue contacts with limited accuracy. We show that our protocol outperforms other currently published structure prediction methods and is able to achieve accurate structures, even if the accuracy of predicted contacts is low.
Collapse
Affiliation(s)
- Tatjana Braun
- Biomolecular NMR and Munich Center for Integrated Protein Science, Department Chemie, Technische Universität München, Garching, Germany
- * E-mail:
| | - Julia Koehler Leman
- Department of Chemical and Biomolecular Engineering, Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Oliver F. Lange
- Biomolecular NMR and Munich Center for Integrated Protein Science, Department Chemie, Technische Universität München, Garching, Germany
| |
Collapse
|
78
|
Sinner C, Lutz B, Verma A, Schug A. Revealing the global map of protein folding space by large-scale simulations. J Chem Phys 2015; 143:243154. [DOI: 10.1063/1.4938172] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Affiliation(s)
- Claude Sinner
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Karlsruhe, Germany
- Department of Physics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Benjamin Lutz
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Karlsruhe, Germany
- Department of Physics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Abhinav Verma
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Alexander Schug
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Karlsruhe, Germany
| |
Collapse
|
79
|
Emamjomeh A, Goliaei B, Torkamani A, Ebrahimpour R, Mohammadi N, Parsian A. Protein-protein interaction prediction by combined analysis of genomic and conservation information. Genes Genet Syst 2015; 89:259-72. [PMID: 25948120 DOI: 10.1266/ggs.89.259] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
Protein-protein interactions (PPIs) are highly important because of their main role in cellular processes and biochemical pathways; therefore, PPI can be very useful in the prediction of protein functions. Experimental techniques of PPI detection have certain drawbacks; hence computational methods can be used to complement wet lab techniques. Such methods can be applied to PPI prediction as well as validation of experimental results. Computational algorithms can lead to many false PPI predictions, which in turn result in non-adequate performance. We have developed a novel method based on combined analysis, entitled PPIccc. Three different descriptors for PPIccc included gene co-expression values, codon usage similarity and conservation of surface residues between protein products of a gene pair, which combined to predict PPI. Validation of results based on Human Protein Reference Database (HPRD) indicated improvement of performance in our proposed method. The results also revealed that conservation of surface residues between proteins in combination with codon usage similarity of their related genes increase the performance of PPI prediction. This means that codon usage similarity and surface residues between proteins (only sequence-based features) can predict PPIs as good as PPIccc.
Collapse
|
80
|
From residue coevolution to protein conformational ensembles and functional dynamics. Proc Natl Acad Sci U S A 2015; 112:13567-72. [PMID: 26487681 DOI: 10.1073/pnas.1508584112] [Citation(s) in RCA: 92] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The analysis of evolutionary amino acid correlations has recently attracted a surge of renewed interest, also due to their successful use in de novo protein native structure prediction. However, many aspects of protein function, such as substrate binding and product release in enzymatic activity, can be fully understood only in terms of an equilibrium ensemble of alternative structures, rather than a single static structure. In this paper we combine coevolutionary data and molecular dynamics simulations to study protein conformational heterogeneity. To that end, we adapt the Boltzmann-learning algorithm to the analysis of homologous protein sequences and develop a coarse-grained protein model specifically tailored to convert the resulting contact predictions to a protein structural ensemble. By means of exhaustive sampling simulations, we analyze the set of conformations that are consistent with the observed residue correlations for a set of representative protein domains, showing that (i) the most representative structure is consistent with the experimental fold and (ii) the various regions of the sequence display different stability, related to multiple biologically relevant conformations and to the cooperativity of the coevolving pairs. Moreover, we show that the proposed protocol is able to reproduce the essential features of a protein folding mechanism as well as to account for regions involved in conformational transitions through the correct sampling of the involved conformers.
Collapse
|
81
|
Figliuzzi M, Jacquier H, Schug A, Tenaillon O, Weigt M. Coevolutionary Landscape Inference and the Context-Dependence of Mutations in Beta-Lactamase TEM-1. Mol Biol Evol 2015; 33:268-80. [PMID: 26446903 PMCID: PMC4693977 DOI: 10.1093/molbev/msv211] [Citation(s) in RCA: 176] [Impact Index Per Article: 17.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The quantitative characterization of mutational landscapes is a task of outstanding importance in evolutionary and medical biology: It is, for example, of central importance for our understanding of the phenotypic effect of mutations related to disease and antibiotic drug resistance. Here we develop a novel inference scheme for mutational landscapes, which is based on the statistical analysis of large alignments of homologs of the protein of interest. Our method is able to capture epistatic couplings between residues, and therefore to assess the dependence of mutational effects on the sequence context where they appear. Compared with recent large-scale mutagenesis data of the beta-lactamase TEM-1, a protein providing resistance against beta-lactam antibiotics, our method leads to an increase of about 40% in explicative power as compared with approaches neglecting epistasis. We find that the informative sequence context extends to residues at native distances of about 20 Å from the mutated site, reaching thus far beyond residues in direct physical contact.
Collapse
Affiliation(s)
- Matteo Figliuzzi
- UPMC, Institut de Calcul et de la Simulation, Sorbonne Universités, Paris, France Computational and Quantitative Biology, UPMC, UMR 7238, Sorbonne Universités, Paris, France Computational and Quantitative Biology, CNRS, UMR 7238, Paris, France
| | - Hervé Jacquier
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Université Denis Diderot Paris 7, UMR 1137, Sorbonne Paris Cité, Paris, France Service de Bactériologie-Virologie, Groupe Hospitalier Lariboisiére-Fernand Widal, Assistance Publique-Hôpitaux de Paris (AP-HP), Paris, France
| | - Alexander Schug
- Steinbuch Centre for Computing, Karlsruhe Institute for Technology, Eggenstein-Leopoldshafen, Germany
| | - Oliver Tenaillon
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Université Denis Diderot Paris 7, UMR 1137, Sorbonne Paris Cité, Paris, France
| | - Martin Weigt
- Computational and Quantitative Biology, UPMC, UMR 7238, Sorbonne Universités, Paris, France Computational and Quantitative Biology, CNRS, UMR 7238, Paris, France
| |
Collapse
|
82
|
De Leonardis E, Lutz B, Ratz S, Cocco S, Monasson R, Schug A, Weigt M. Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction. Nucleic Acids Res 2015; 43:10444-55. [PMID: 26420827 PMCID: PMC4666395 DOI: 10.1093/nar/gkv932] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Accepted: 09/07/2015] [Indexed: 12/16/2022] Open
Abstract
Despite the biological importance of non-coding RNA, their structural characterization remains challenging. Making use of the rapidly growing sequence databases, we analyze nucleotide coevolution across homologous sequences via Direct-Coupling Analysis to detect nucleotide-nucleotide contacts. For a representative set of riboswitches, we show that the results of Direct-Coupling Analysis in combination with a generalized Nussinov algorithm systematically improve the results of RNA secondary structure prediction beyond traditional covariance approaches based on mutual information. Even more importantly, we show that the results of Direct-Coupling Analysis are enriched in tertiary structure contacts. By integrating these predictions into molecular modeling tools, systematically improved tertiary structure predictions can be obtained, as compared to using secondary structure information alone.
Collapse
Affiliation(s)
- Eleonora De Leonardis
- Computational and Quantitative Biology, Sorbonne Universités, Université Pierre et Marie Curie, UMR 7238, 75006 Paris, France Computational and Quantitative Biology, CNRS, UMR 7238, 75006 Paris, France Laboratoire de Physique Statistique de l'Ecole Normale Supérieure, associé au CNRS et à l'Université Pierre et Marie Curie, 75005 Paris, France
| | - Benjamin Lutz
- Steinbuch Centre for Computing, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany Fakultät für Physik, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany
| | - Sebastian Ratz
- Steinbuch Centre for Computing, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany Fakultät für Physik, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany
| | - Simona Cocco
- Laboratoire de Physique Statistique de l'Ecole Normale Supérieure, associé au CNRS et à l'Université Pierre et Marie Curie, 75005 Paris, France
| | - Rémi Monasson
- Laboratoire de Physique Théorique de l'Ecole Normale Supérieure, associé au CNRS et à l'Université Pierre et Marie Curie, 75005 Paris, France
| | - Alexander Schug
- Steinbuch Centre for Computing, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany
| | - Martin Weigt
- Computational and Quantitative Biology, Sorbonne Universités, Université Pierre et Marie Curie, UMR 7238, 75006 Paris, France Computational and Quantitative Biology, CNRS, UMR 7238, 75006 Paris, France
| |
Collapse
|
83
|
dos Santos RN, Morcos F, Jana B, Andricopulo AD, Onuchic JN. Dimeric interactions and complex formation using direct coevolutionary couplings. Sci Rep 2015; 5:13652. [PMID: 26338201 PMCID: PMC4559900 DOI: 10.1038/srep13652] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Accepted: 07/13/2015] [Indexed: 11/09/2022] Open
Abstract
We develop a procedure to characterize the association of protein structures into homodimers using coevolutionary couplings extracted from Direct Coupling Analysis (DCA) in combination with Structure Based Models (SBM). Identification of dimerization contacts using DCA is more challenging than intradomain contacts since direct couplings are mixed with monomeric contacts. Therefore a systematic way to extract dimerization signals has been elusive. We provide evidence that the prediction of homodimeric complexes is possible with high accuracy for all the cases we studied which have rich sequence information. For the most accurate conformations of the structurally diverse dimeric complexes studied the mean and interfacial RMSDs are 1.95Å and 1.44Å, respectively. This methodology is also able to identify distinct dimerization conformations as for the case of the family of response regulators, which dimerize upon activation. The identification of dimeric complexes can provide interesting molecular insights in the construction of large oligomeric complexes and be useful in the study of aggregation related diseases like Alzheimer's or Parkinson's.
Collapse
Affiliation(s)
- Ricardo N. dos Santos
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005-1827
- Laboratório de Química Medicinal e Computacional, Instituto de Física de São Carlos, Universidade de São Paulo, São Paulo, São Carlos, 13563-120, Brazil
| | - Faruck Morcos
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005-1827
| | - Biman Jana
- Department of Physical Chemistry, Indian Association for the Cultivation of Science, Jadavpur, Kolkata-700032, India
| | - Adriano D. Andricopulo
- Laboratório de Química Medicinal e Computacional, Instituto de Física de São Carlos, Universidade de São Paulo, São Paulo, São Carlos, 13563-120, Brazil
| | - José N. Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005-1827
| |
Collapse
|
84
|
Identification of Protein–Protein Interactions by Detecting Correlated Mutation at the Interface. J Chem Inf Model 2015; 55:2042-9. [DOI: 10.1021/acs.jcim.5b00320] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
|
85
|
Avila-Herrera A, Pollard KS. Coevolutionary analyses require phylogenetically deep alignments and better null models to accurately detect inter-protein contacts within and between species. BMC Bioinformatics 2015; 16:268. [PMID: 26303588 PMCID: PMC4549020 DOI: 10.1186/s12859-015-0677-y] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 07/17/2015] [Indexed: 01/09/2023] Open
Abstract
Background When biomolecules physically interact, natural selection operates on them jointly. Contacting positions in protein and RNA structures exhibit correlated patterns of sequence evolution due to constraints imposed by the interaction, and molecular arms races can develop between interacting proteins in pathogens and their hosts. To evaluate how well methods developed to detect coevolving residues within proteins can be adapted for cross-species, inter-protein analysis, we used statistical criteria to quantify the performance of these methods in detecting inter-protein residues within 8 angstroms of each other in the co-crystal structures of 33 bacterial protein interactions. We also evaluated their performance for detecting known residues at the interface of a host-virus protein complex with a partially solved structure. Results Our quantitative benchmarking showed that all coevolutionary methods clearly benefit from alignments with many sequences. Methods that aim to detect direct correlations generally outperform other approaches. However, faster mutual information based methods are occasionally competitive in small alignments and with relaxed false positive rates. Two commonly used null distributions are anti-conservative and have high false positive rates in some scenarios, although the empirical distribution of scores performs reasonably well with deep alignments. Conclusions We conclude that coevolutionary analysis of cross-species protein interactions holds great promise but requires sequencing many more species pairs. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0677-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Aram Avila-Herrera
- Bioinformatics Graduate Program, University of California, San Francisco, USA. .,Gladstone Institute of Cardiovascular Disease, University of California, San Francisco, USA.
| | - Katherine S Pollard
- Bioinformatics Graduate Program, University of California, San Francisco, USA. .,Gladstone Institute of Cardiovascular Disease, University of California, San Francisco, USA. .,Department of Epidemiology and Biostatistics, University of California, San Francisco, USA. .,Institute for Human Genetics, University of California, San Francisco, 94158, CA, USA.
| |
Collapse
|
86
|
Sikosek T, Chan HS. Biophysics of protein evolution and evolutionary protein biophysics. J R Soc Interface 2015; 11:20140419. [PMID: 25165599 DOI: 10.1098/rsif.2014.0419] [Citation(s) in RCA: 163] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
The study of molecular evolution at the level of protein-coding genes often entails comparing large datasets of sequences to infer their evolutionary relationships. Despite the importance of a protein's structure and conformational dynamics to its function and thus its fitness, common phylogenetic methods embody minimal biophysical knowledge of proteins. To underscore the biophysical constraints on natural selection, we survey effects of protein mutations, highlighting the physical basis for marginal stability of natural globular proteins and how requirement for kinetic stability and avoidance of misfolding and misinteractions might have affected protein evolution. The biophysical underpinnings of these effects have been addressed by models with an explicit coarse-grained spatial representation of the polypeptide chain. Sequence-structure mappings based on such models are powerful conceptual tools that rationalize mutational robustness, evolvability, epistasis, promiscuous function performed by 'hidden' conformational states, resolution of adaptive conflicts and conformational switches in the evolution from one protein fold to another. Recently, protein biophysics has been applied to derive more accurate evolutionary accounts of sequence data. Methods have also been developed to exploit sequence-based evolutionary information to predict biophysical behaviours of proteins. The success of these approaches demonstrates a deep synergy between the fields of protein biophysics and protein evolution.
Collapse
Affiliation(s)
- Tobias Sikosek
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| | - Hue Sun Chan
- Department of Biochemistry, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada M5S 1A8 Department of Physics, University of Toronto, Toronto, Ontario, Canada M5S 1A8
| |
Collapse
|
87
|
Raimondi D, Orlando G, Vranken WF. Clustering-based model of cysteine co-evolution improves disulfide bond connectivity prediction and reduces homologous sequence requirements. Bioinformatics 2014; 31:1219-25. [DOI: 10.1093/bioinformatics/btu794] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2014] [Accepted: 11/18/2014] [Indexed: 12/23/2022] Open
|
88
|
Tamir S, Paddock ML, Darash-Yahana-Baram M, Holt SH, Sohn YS, Agranat L, Michaeli D, Stofleth JT, Lipper CH, Morcos F, Cabantchik IZ, Onuchic JN, Jennings PA, Mittler R, Nechushtai R. Structure-function analysis of NEET proteins uncovers their role as key regulators of iron and ROS homeostasis in health and disease. BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH 2014; 1853:1294-315. [PMID: 25448035 DOI: 10.1016/j.bbamcr.2014.10.014] [Citation(s) in RCA: 120] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Revised: 10/01/2014] [Accepted: 10/16/2014] [Indexed: 12/31/2022]
Abstract
A novel family of 2Fe-2S proteins, the NEET family, was discovered during the last decade in numerous organisms, including archea, bacteria, algae, plant and human; suggesting an evolutionary-conserved function, potentially mediated by their CDGSH Iron-Sulfur Domain. In human, three NEET members encoded by the CISD1-3 genes were identified. The structures of CISD1 (mitoNEET, mNT), CISD2 (NAF-1), and the plant At-NEET uncovered a homodimer with a unique "NEET fold", as well as two distinct domains: a beta-cap and a 2Fe-2S cluster-binding domain. The 2Fe-2S clusters of NEET proteins were found to be coordinated by a novel 3Cys:1His structure that is relatively labile compared to other 2Fe-2S proteins and is the reason of the NEETs' clusters could be transferred to apo-acceptor protein(s) or mitochondria. Positioned at the protein surface, the NEET's 2Fe-2S's coordinating His is exposed to protonation upon changes in its environment, potentially suggesting a sensing function for this residue. Studies in different model systems demonstrated a role for NAF-1 and mNT in the regulation of cellular iron, calcium and ROS homeostasis, and uncovered a key role for NEET proteins in critical processes, such as cancer cell proliferation and tumor growth, lipid and glucose homeostasis in obesity and diabetes, control of autophagy, longevity in mice, and senescence in plants. Abnormal regulation of NEET proteins was consequently found to result in multiple health conditions, and aberrant splicing of NAF-1 was found to be a causative of the neurological genetic disorder Wolfram Syndrome 2. Here we review the discovery of NEET proteins, their structural, biochemical and biophysical characterization, and their most recent structure-function analyses. We additionally highlight future avenues of research focused on NEET proteins and propose an essential role for NEETs in health and disease. This article is part of a Special Issue entitled: Fe/S proteins: Analysis, structure, function, biogenesis and diseases.
Collapse
Affiliation(s)
- Sagi Tamir
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Mark L Paddock
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA 92093, USA
| | - Merav Darash-Yahana-Baram
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Sarah H Holt
- Department of Biology, University of North Texas, Denton, TX 76203, USA
| | - Yang Sung Sohn
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Lily Agranat
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Dorit Michaeli
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Jason T Stofleth
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA 92093, USA
| | - Colin H Lipper
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA 92093, USA
| | - Faruck Morcos
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77050, USA; Department of Physics and Astronomy, Rice University, Houston, TX 77050, USA; Department of Chemistry, Rice University, Houston, TX 77050, USA; Department of Biochemistry and Cell Biology, Rice University, Houston, TX 77050, USA
| | - Ioav Z Cabantchik
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Jose' N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77050, USA; Department of Physics and Astronomy, Rice University, Houston, TX 77050, USA; Department of Chemistry, Rice University, Houston, TX 77050, USA; Department of Biochemistry and Cell Biology, Rice University, Houston, TX 77050, USA
| | - Patricia A Jennings
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA 92093, USA
| | - Ron Mittler
- Department of Biology, University of North Texas, Denton, TX 76203, USA
| | - Rachel Nechushtai
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel.
| |
Collapse
|
89
|
Michel M, Hayat S, Skwark MJ, Sander C, Marks DS, Elofsson A. PconsFold: improved contact predictions improve protein models. Bioinformatics 2014; 30:i482-8. [PMID: 25161237 PMCID: PMC4147911 DOI: 10.1093/bioinformatics/btu458] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
MOTIVATION Recently it has been shown that the quality of protein contact prediction from evolutionary information can be improved significantly if direct and indirect information is separated. Given sufficiently large protein families, the contact predictions contain sufficient information to predict the structure of many protein families. However, since the first studies contact prediction methods have improved. Here, we ask how much the final models are improved if improved contact predictions are used. RESULTS In a small benchmark of 15 proteins, we show that the TM-scores of top-ranked models are improved by on average 33% using PconsFold compared with the original version of EVfold. In a larger benchmark, we find that the quality is improved with 15-30% when using PconsC in comparison with earlier contact prediction methods. Further, using Rosetta instead of CNS does not significantly improve global model accuracy, but the chemistry of models generated with Rosetta is improved. AVAILABILITY PconsFold is a fully automated pipeline for ab initio protein structure prediction based on evolutionary information. PconsFold is based on PconsC contact prediction and uses the Rosetta folding protocol. Due to its modularity, the contact prediction tool can be easily exchanged. The source code of PconsFold is available on GitHub at https://www.github.com/ElofssonLab/pcons-fold under the MIT license. PconsC is available from http://c.pcons.net/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mirco Michel
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Sikander Hayat
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Marcin J Skwark
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Chris Sander
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Debora S Marks
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Arne Elofsson
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
90
|
Native structure-based modeling and simulation of biomolecular systems per mouse click. BMC Bioinformatics 2014; 15:292. [PMID: 25176255 PMCID: PMC4162935 DOI: 10.1186/1471-2105-15-292] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2014] [Accepted: 08/22/2014] [Indexed: 01/25/2023] Open
Abstract
BACKGROUND Molecular dynamics (MD) simulations provide valuable insight into biomolecular systems at the atomic level. Notwithstanding the ever-increasing power of high performance computers current MD simulations face several challenges: the fastest atomic movements require time steps of a few femtoseconds which are small compared to biomolecular relevant timescales of milliseconds or even seconds for large conformational motions. At the same time, scalability to a large number of cores is limited mostly due to long-range interactions. An appealing alternative to atomic-level simulations is coarse-graining the resolution of the system or reducing the complexity of the Hamiltonian to improve sampling while decreasing computational costs. Native structure-based models, also called Gō-type models, are based on energy landscape theory and the principle of minimal frustration. They have been tremendously successful in explaining fundamental questions of, e.g., protein folding, RNA folding or protein function. At the same time, they are computationally sufficiently inexpensive to run complex simulations on smaller computing systems or even commodity hardware. Still, their setup and evaluation is quite complex even though sophisticated software packages support their realization. RESULTS Here, we establish an efficient infrastructure for native structure-based models to support the community and enable high-throughput simulations on remote computing resources via GridBeans and UNICORE middleware. This infrastructure organizes the setup of such simulations resulting in increased comparability of simulation results. At the same time, complete workflows for advanced simulation protocols can be established and managed on remote resources by a graphical interface which increases reusability of protocols and additionally lowers the entry barrier into such simulations for, e.g., experimental scientists who want to compare their results against simulations. We demonstrate the power of this approach by illustrating it for protein folding simulations for a range of proteins. CONCLUSIONS We present software enhancing the entire workflow for native structure-based simulations including exception-handling and evaluations. Extending the capability and improving the accessibility of existing simulation packages the software goes beyond the state of the art in the domain of biomolecular simulations. Thus we expect that it will stimulate more individuals from the community to employ more confidently modeling in their research.
Collapse
|
91
|
Thomas N, Best K, Cinelli M, Reich-Zeliger S, Gal H, Shifrut E, Madi A, Friedman N, Shawe-Taylor J, Chain B. Tracking global changes induced in the CD4 T-cell receptor repertoire by immunization with a complex antigen using short stretches of CDR3 protein sequence. ACTA ACUST UNITED AC 2014; 30:3181-8. [PMID: 25095879 PMCID: PMC4221123 DOI: 10.1093/bioinformatics/btu523] [Citation(s) in RCA: 68] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Motivation: The clonal theory of adaptive immunity proposes that immunological responses are encoded by increases in the frequency of lymphocytes carrying antigen-specific receptors. In this study, we measure the frequency of different T-cell receptors (TcR) in CD4 + T cell populations of mice immunized with a complex antigen, killed Mycobacterium tuberculosis, using high throughput parallel sequencing of the TcRβ chain. Our initial hypothesis that immunization would induce repertoire convergence proved to be incorrect, and therefore an alternative approach was developed that allows accurate stratification of TcR repertoires and provides novel insights into the nature of CD4 + T-cell receptor recognition. Results: To track the changes induced by immunization within this heterogeneous repertoire, the sequence data were classified by counting the frequency of different clusters of short (3 or 4) continuous stretches of amino acids within the antigen binding complementarity determining region 3 (CDR3) repertoire of different mice. Both unsupervised (hierarchical clustering) and supervised (support vector machine) analyses of these different distributions of sequence clusters differentiated between immunized and unimmunized mice with 100% efficiency. The CD4 + TcR repertoires of mice 5 and 14 days postimmunization were clearly different from that of unimmunized mice but were not distinguishable from each other. However, the repertoires of mice 60 days postimmunization were distinct both from naive mice and the day 5/14 animals. Our results reinforce the remarkable diversity of the TcR repertoire, resulting in many diverse private TcRs contributing to the T-cell response even in genetically identical mice responding to the same antigen. However, specific motifs defined by short stretches of amino acids within the CDR3 region may determine TcR specificity and define a new approach to TcR sequence classification. Availability and implementation: The analysis was implemented in R and Python, and source code can be found in Supplementary Data. Contact:b.chain@ucl.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Niclas Thomas
- UCL CoMPLEX, UCL Division of Infection and Immunity, London WC1 6BT, UK, Weizmann Institute of Science, Rehovot 76000, Israel and UCL Computer Science, London WC1E 6BT, UK
| | - Katharine Best
- UCL CoMPLEX, UCL Division of Infection and Immunity, London WC1 6BT, UK, Weizmann Institute of Science, Rehovot 76000, Israel and UCL Computer Science, London WC1E 6BT, UK
| | - Mattia Cinelli
- UCL CoMPLEX, UCL Division of Infection and Immunity, London WC1 6BT, UK, Weizmann Institute of Science, Rehovot 76000, Israel and UCL Computer Science, London WC1E 6BT, UK
| | - Shlomit Reich-Zeliger
- UCL CoMPLEX, UCL Division of Infection and Immunity, London WC1 6BT, UK, Weizmann Institute of Science, Rehovot 76000, Israel and UCL Computer Science, London WC1E 6BT, UK
| | - Hilah Gal
- UCL CoMPLEX, UCL Division of Infection and Immunity, London WC1 6BT, UK, Weizmann Institute of Science, Rehovot 76000, Israel and UCL Computer Science, London WC1E 6BT, UK
| | - Eric Shifrut
- UCL CoMPLEX, UCL Division of Infection and Immunity, London WC1 6BT, UK, Weizmann Institute of Science, Rehovot 76000, Israel and UCL Computer Science, London WC1E 6BT, UK
| | - Asaf Madi
- UCL CoMPLEX, UCL Division of Infection and Immunity, London WC1 6BT, UK, Weizmann Institute of Science, Rehovot 76000, Israel and UCL Computer Science, London WC1E 6BT, UK
| | - Nir Friedman
- UCL CoMPLEX, UCL Division of Infection and Immunity, London WC1 6BT, UK, Weizmann Institute of Science, Rehovot 76000, Israel and UCL Computer Science, London WC1E 6BT, UK
| | - John Shawe-Taylor
- UCL CoMPLEX, UCL Division of Infection and Immunity, London WC1 6BT, UK, Weizmann Institute of Science, Rehovot 76000, Israel and UCL Computer Science, London WC1E 6BT, UK
| | - Benny Chain
- UCL CoMPLEX, UCL Division of Infection and Immunity, London WC1 6BT, UK, Weizmann Institute of Science, Rehovot 76000, Israel and UCL Computer Science, London WC1E 6BT, UK
| |
Collapse
|
92
|
Liu Z, Zheng G, Dong X, Wang Z, Ying B, Zhong Y, Li Y. Investigating co-evolution of functionally associated phosphosites in human. Mol Genet Genomics 2014; 289:1217-23. [PMID: 25005854 DOI: 10.1007/s00438-014-0881-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2013] [Accepted: 06/19/2014] [Indexed: 11/30/2022]
Abstract
Phosphorylation is essential for protein function and signal transduction in eukaryotic cells. With the rapid development of mass spectrometry technology, a large number of phosphosites are identified. However, high-throughput methods of functional characterization for phosphosites are still scarce. In this study, we inspected if the co-evolution property can be used as an indicator to explore function of phosphosites through investigating co-evolutionary relationship between functionally associated phosphosites in human. In practice, the evolution attributes of phosphosites were represented with phylogenetic profiles, and then co-evolutionary correlations of functionally associated phosphosites were detected on three levels: (1) phosphosites within one protein; (2) phosphosites in different proteins participating in the same signal transduction pathways, and (3) general phosphosites. Results of the detection show that co-evolution is a general property of functionally associated phosphosites. This finding suggests to some degree that it is feasible to use the co-evolution property in exploring the function of phosphosites and investigating the functional association between them.
Collapse
Affiliation(s)
- Zhi Liu
- Key Laboratory of Systems Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Rd., Shanghai, 200031, People's Republic of China
| | | | | | | | | | | | | |
Collapse
|
93
|
Andreani J, Guerois R. Evolution of protein interactions: From interactomes to interfaces. Arch Biochem Biophys 2014; 554:65-75. [DOI: 10.1016/j.abb.2014.05.010] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Revised: 04/28/2014] [Accepted: 05/12/2014] [Indexed: 12/16/2022]
|
94
|
Sinner C, Lutz B, John S, Reinartz I, Verma A, Schug A. Simulating Biomolecular Folding and Function by Native-Structure-Based/Go-Type Models. Isr J Chem 2014. [DOI: 10.1002/ijch.201400012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
95
|
Ovchinnikov S, Kamisetty H, Baker D. Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife 2014; 3:e02030. [PMID: 24842992 PMCID: PMC4034769 DOI: 10.7554/elife.02030] [Citation(s) in RCA: 461] [Impact Index Per Article: 41.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Do the amino acid sequence identities of residues that make contact across protein interfaces covary during evolution? If so, such covariance could be used to predict contacts across interfaces and assemble models of biological complexes. We find that residue pairs identified using a pseudo-likelihood-based method to covary across protein–protein interfaces in the 50S ribosomal unit and 28 additional bacterial protein complexes with known structure are almost always in contact in the complex, provided that the number of aligned sequences is greater than the average length of the two proteins. We use this method to make subunit contact predictions for an additional 36 protein complexes with unknown structures, and present models based on these predictions for the tripartite ATP-independent periplasmic (TRAP) transporter, the tripartite efflux system, the pyruvate formate lyase-activating enzyme complex, and the methionine ABC transporter. DOI:http://dx.doi.org/10.7554/eLife.02030.001 Proteins are considered the ‘workhorse molecules’ of life and they are involved in virtually everything that cells do. Proteins are strings of amino acids that have folded into a specific three-dimensional shape. Proteins must have the correct shape to function properly, as they often work by binding to other proteins or molecules—much like a key fitting into a lock. Working out the structure of a protein can, therefore, provide major insights into how the protein does its job. Two or more proteins can bind together and form a complex to perform various tasks; and solving the structures of these complexes can be challenging, even if the structures of the protein subunits are known. Now, Ovchinnikov, Kamisetty, and Baker have developed a method for predicting which parts of the proteins make contact with each other in a two-protein complex. Different species can have copies of the same proteins; but a copy from one species might have different amino acids at certain positions when compared to a related copy from another species. As such, when pairs of interacting proteins from different species are compared, there will be many positions in the two proteins that vary. However, if the amino acid at a position in one protein (let's call it ‘X’) varies, and the amino acid at, say, position ‘Y’ in the other protein also varies such that for any given amino acid at position Y there is often a specific amino acid at position X; positions X and Y are said to ‘co-vary’. Ovchinnikov et al. noticed that when a pair of amino acids (one from each protein in a two-protein complex) co-varied, these two amino acids tended to make contact with each other at the protein–protein interface. Ovchinnikov et al. used the new method to make predictions about the protein–protein interfaces in 28 protein complexes found in bacteria, and also to make a prediction about the interface between protein subunits in the bacterial ribosome. When these predictions were checked against the actual structures, which were all known beforehand, they were found to be accurate if the number of copies of each protein being compared is greater than the average length of the two proteins. Ovchinnikov et al. went on to predict the amino acids on the protein–protein interfaces for another 36 bacterial protein complexes with unknown structures, and to present models for four larger complexes. The next challenge is to extend the method to protein complexes that are found only in eukaryotes (i.e., not in bacteria). Since the number of related copies for eukaryotic proteins tends to be smaller, there are fewer proteins to compare and it is therefore harder to detect ‘covariation’ when it occurs. DOI:http://dx.doi.org/10.7554/eLife.02030.002
Collapse
Affiliation(s)
- Sergey Ovchinnikov
- Department of Biochemistry, Howard Hughes Medical Institute, University of Washington, Seattle, United States Molecular and Cellular Biology Program, University of Washington, Seattle, United States
| | - Hetunandan Kamisetty
- Department of Biochemistry, Howard Hughes Medical Institute, University of Washington, Seattle, United States Facebook Inc., Seattle, United States
| | - David Baker
- Department of Biochemistry, Howard Hughes Medical Institute, University of Washington, Seattle, United States
| |
Collapse
|
96
|
Integrated strategy reveals the protein interface between cancer targets Bcl-2 and NAF-1. Proc Natl Acad Sci U S A 2014; 111:5177-82. [PMID: 24706857 DOI: 10.1073/pnas.1403770111] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Life requires orchestrated control of cell proliferation, cell maintenance, and cell death. Involved in these decisions are protein complexes that assimilate a variety of inputs that report on the status of the cell and lead to an output response. Among the proteins involved in this response are nutrient-deprivation autophagy factor-1 (NAF-1)- and Bcl-2. NAF-1 is a homodimeric member of the novel Fe-S protein NEET family, which binds two 2Fe-2S clusters. NAF-1 is an important partner for Bcl-2 at the endoplasmic reticulum to functionally antagonize Beclin 1-dependent autophagy [Chang NC, Nguyen M, Germain M, Shore GC (2010) EMBO J 29(3):606-618]. We used an integrated approach involving peptide array, deuterium exchange mass spectrometry (DXMS), and functional studies aided by the power of sufficient constraints from direct coupling analysis (DCA) to determine the dominant docked conformation of the NAF-1-Bcl-2 complex. NAF-1 binds to both the pro- and antiapoptotic regions (BH3 and BH4) of Bcl-2, as demonstrated by a nested protein fragment analysis in a peptide array and DXMS analysis. A combination of the solution studies together with a new application of DCA to the eukaryotic proteins NAF-1 and Bcl-2 provided sufficient constraints at amino acid resolution to predict the interaction surfaces and orientation of the protein-protein interactions involved in the docked structure. The specific integrated approach described in this paper provides the first structural information, to our knowledge, for future targeting of the NAF-1-Bcl-2 complex in the regulation of apoptosis/autophagy in cancer biology.
Collapse
|
97
|
Baldassi C, Zamparo M, Feinauer C, Procaccini A, Zecchina R, Weigt M, Pagnani A. Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners. PLoS One 2014; 9:e92721. [PMID: 24663061 PMCID: PMC3963956 DOI: 10.1371/journal.pone.0092721] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Accepted: 02/24/2014] [Indexed: 11/18/2022] Open
Abstract
In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code.
Collapse
Affiliation(s)
- Carlo Baldassi
- Department of Applied Science and Technology and Center for Computational Sciences, Politecnico di Torino, Torino, Italy
- Human Genetics Foundation-Torino, Torino, Italy
| | - Marco Zamparo
- Department of Applied Science and Technology and Center for Computational Sciences, Politecnico di Torino, Torino, Italy
- Human Genetics Foundation-Torino, Torino, Italy
| | - Christoph Feinauer
- Department of Applied Science and Technology and Center for Computational Sciences, Politecnico di Torino, Torino, Italy
| | | | - Riccardo Zecchina
- Department of Applied Science and Technology and Center for Computational Sciences, Politecnico di Torino, Torino, Italy
- Human Genetics Foundation-Torino, Torino, Italy
| | - Martin Weigt
- Sorbonne Universités, Université Pierre et Marie Curie Paris 06, UMR 7238, Computational and Quantitative Biology, Paris, France
- Centre National de la Recherche Scientifique, UMR 7238, Computational and Quantitative Biology, Paris, France
| | - Andrea Pagnani
- Department of Applied Science and Technology and Center for Computational Sciences, Politecnico di Torino, Torino, Italy
- Human Genetics Foundation-Torino, Torino, Italy
- * E-mail:
| |
Collapse
|
98
|
Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc Natl Acad Sci U S A 2014; 111:E563-71. [PMID: 24449878 DOI: 10.1073/pnas.1323734111] [Citation(s) in RCA: 94] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
A challenge in molecular biology is to distinguish the key subset of residues that allow two-component signaling (TCS) proteins to recognize their correct signaling partner such that they can transiently bind and transfer signal, i.e., phosphoryl group. Detailed knowledge of this information would allow one to search sequence space for mutations that can be used to systematically tune the signal transmission between TCS partners as well as potentially encode a TCS protein to preferentially transfer signals to a nonpartner. Motivated by the notion that this detailed information is found in sequence data, we explore the sequence coevolution between signaling partners to better understand how mutations can positively or negatively alter their ability to transfer signal. Using direct coupling analysis for determining evolutionarily conserved protein-protein interactions, we apply a metric called the direct information score to quantify mutational changes in the interaction between TCS proteins and demonstrate that it accurately correlates with experimental mutagenesis studies probing the mutational change in measured in vitro phosphotransfer. Furthermore, by subtracting from our metric an appropriate null model corresponding to generic, conserved features in TCS signaling pairs, we can isolate the determinants that give rise to interaction specificity and recognition, which are variable among different TCS partners. Our methodology forms a potential framework for the rational design of TCS systems by allowing one to quickly search sequence space for mutations or even entirely new sequences that can increase or decrease our metric, as a proxy for increasing or decreasing phosphotransfer ability between TCS proteins.
Collapse
|
99
|
Morcos F, Hwa T, Onuchic JN, Weigt M. Direct coupling analysis for protein contact prediction. Methods Mol Biol 2014; 1137:55-70. [PMID: 24573474 DOI: 10.1007/978-1-4939-0366-5_5] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
During evolution, structure, and function of proteins are remarkably conserved, whereas amino-acid sequences vary strongly between homologous proteins. Structural conservation constrains sequence variability and forces different residues to coevolve, i.e., to show correlated patterns of amino-acid occurrences. However, residue correlation may result from direct coupling, e.g., by a contact in the folded protein, or be induced indirectly via intermediate residues. To use empirically observed correlations for predicting residue-residue contacts, direct and indirect effects have to be disentangled. Here we present mechanistic details on how to achieve this using a methodology called Direct Coupling Analysis (DCA). DCA has been shown to produce highly accurate estimates of amino-acid pairs that have direct reciprocal constraints in evolution. Specifically, we provide instructions and protocols on how to use the algorithmic implementations of DCA starting from data extraction to predicted-contact visualization in contact maps or representative protein structures.
Collapse
Affiliation(s)
- Faruck Morcos
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA
| | | | | | | |
Collapse
|
100
|
Josenhans C, Jung K, Rao CV, Wolfe AJ. A tale of two machines: a review of the BLAST meeting, Tucson, AZ, 20-24 January 2013. Mol Microbiol 2013; 91:6-25. [PMID: 24125587 DOI: 10.1111/mmi.12427] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/08/2013] [Indexed: 01/06/2023]
Abstract
Since its inception, Bacterial Locomotion and Signal Transduction (BLAST) meetings have been the place to exchange and share the latest developments in the field of bacterial signal transduction and motility. At the 12th BLAST meeting, held last January in Tucson, AZ, researchers from all over the world met to report and discuss progress in diverse aspects of the field. The majority of these advances, however, came at the level of atomic level structures and their associated mechanisms. This was especially true of the biological machines that sense and respond to environmental changes.
Collapse
Affiliation(s)
- Christine Josenhans
- Institute for Medical Microbiology and Hospital Epidemiology, Hannover Medical School, Carl-Neuberg Strasse 1, 30625, Hannover, Germany
| | | | | | | |
Collapse
|