1
|
Paloncýová M, Valério M, Dos Santos RN, Kührová P, Šrejber M, Čechová P, Dobchev DA, Balsubramani A, Banáš P, Agarwal V, Souza PCT, Otyepka M. Computational Methods for Modeling Lipid-Mediated Active Pharmaceutical Ingredient Delivery. Mol Pharm 2025; 22:1110-1141. [PMID: 39879096 PMCID: PMC11881150 DOI: 10.1021/acs.molpharmaceut.4c00744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2024] [Revised: 01/06/2025] [Accepted: 01/06/2025] [Indexed: 01/31/2025]
Abstract
Lipid-mediated delivery of active pharmaceutical ingredients (API) opened new possibilities in advanced therapies. By encapsulating an API into a lipid nanocarrier (LNC), one can safely deliver APIs not soluble in water, those with otherwise strong adverse effects, or very fragile ones such as nucleic acids. However, for the rational design of LNCs, a detailed understanding of the composition-structure-function relationships is missing. This review presents currently available computational methods for LNC investigation, screening, and design. The state-of-the-art physics-based approaches are described, with the focus on molecular dynamics simulations in all-atom and coarse-grained resolution. Their strengths and weaknesses are discussed, highlighting the aspects necessary for obtaining reliable results in the simulations. Furthermore, a machine learning, i.e., data-based learning, approach to the design of lipid-mediated API delivery is introduced. The data produced by the experimental and theoretical approaches provide valuable insights. Processing these data can help optimize the design of LNCs for better performance. In the final section of this Review, state-of-the-art of computer simulations of LNCs are reviewed, specifically addressing the compatibility of experimental and computational insights.
Collapse
Affiliation(s)
- Markéta Paloncýová
- Regional
Center of Advanced Technologies and Materials, Czech Advanced Technology and Research Institute (CATRIN), Palacký
University Olomouc, Šlechtitelů 27, 779 00 Olomouc, Czech Republic
| | - Mariana Valério
- Laboratoire
de Biologie et Modélisation de la Cellule, CNRS, UMR 5239,
Inserm, U1293, Université Claude Bernard Lyon 1, Ecole Normale
Supérieure de Lyon, 46 Allée d’Italie, 69364 Lyon, France
- Centre Blaise
Pascal de Simulation et de Modélisation Numérique, Ecole Normale Supérieure de Lyon, 46 Allée d’Italie, 69364 Lyon, France
| | | | - Petra Kührová
- Regional
Center of Advanced Technologies and Materials, Czech Advanced Technology and Research Institute (CATRIN), Palacký
University Olomouc, Šlechtitelů 27, 779 00 Olomouc, Czech Republic
| | - Martin Šrejber
- Regional
Center of Advanced Technologies and Materials, Czech Advanced Technology and Research Institute (CATRIN), Palacký
University Olomouc, Šlechtitelů 27, 779 00 Olomouc, Czech Republic
| | - Petra Čechová
- Regional
Center of Advanced Technologies and Materials, Czech Advanced Technology and Research Institute (CATRIN), Palacký
University Olomouc, Šlechtitelů 27, 779 00 Olomouc, Czech Republic
| | | | - Akshay Balsubramani
- mRNA Center
of Excellence, Sanofi, Waltham, Massachusetts 02451, United States
| | - Pavel Banáš
- Regional
Center of Advanced Technologies and Materials, Czech Advanced Technology and Research Institute (CATRIN), Palacký
University Olomouc, Šlechtitelů 27, 779 00 Olomouc, Czech Republic
| | - Vikram Agarwal
- mRNA Center
of Excellence, Sanofi, Waltham, Massachusetts 02451, United States
| | - Paulo C. T. Souza
- Laboratoire
de Biologie et Modélisation de la Cellule, CNRS, UMR 5239,
Inserm, U1293, Université Claude Bernard Lyon 1, Ecole Normale
Supérieure de Lyon, 46 Allée d’Italie, 69364 Lyon, France
- Centre Blaise
Pascal de Simulation et de Modélisation Numérique, Ecole Normale Supérieure de Lyon, 46 Allée d’Italie, 69364 Lyon, France
| | - Michal Otyepka
- Regional
Center of Advanced Technologies and Materials, Czech Advanced Technology and Research Institute (CATRIN), Palacký
University Olomouc, Šlechtitelů 27, 779 00 Olomouc, Czech Republic
- IT4Innovations,
VŠB − Technical University of Ostrava, 17. listopadu 2172/15, 708 00 Ostrava-Poruba, Czech Republic
| |
Collapse
|
2
|
Akbarzadeh S, Coşkun Ö, Günçer B. Studying protein-protein interactions: Latest and most popular approaches. J Struct Biol 2024; 216:108118. [PMID: 39214321 DOI: 10.1016/j.jsb.2024.108118] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2024] [Revised: 08/20/2024] [Accepted: 08/23/2024] [Indexed: 09/04/2024]
Abstract
PPIs, or protein-protein interactions, are essential for many biological processes. According to the findings, abnormal PPIs have been linked to several diseases, such as cancer and infectious and neurological disorders. Consequently, focusing on PPIs is a path toward disease treatment and a crucial tool for producing novel medications. Many methods exist to investigate PPIs, including low- and high-throughput studies. Since many PPIs have been discovered using in vitro and in vivo experimental approaches, the use of computational methods to predict PPIs has grown due to the expanding scale of PPI data and the intrinsic complexity of interacting mechanisms. Recognizing PPI networks offers a systematic means of predicting protein functions, and pathways that are included. These investigations can help uncover the underlying molecular mechanisms of complex phenotypes and clarify the biological processes related to health and diseases. Therefore, our goal in this study is to provide an overview of the latest and most popular approaches for investigating PPIs. We also overview some important clinical approaches based on the PPIs and how these interactions can be targeted.
Collapse
Affiliation(s)
- Sama Akbarzadeh
- Department of Biophysics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye; Institute of Graduate Studies in Health Sciences, Istanbul University, Istanbul, Türkiye
| | - Özlem Coşkun
- Department of Biophysics, Faculty of Medicine, Çanakkale Onsekiz Mart University, Çanakkale, Türkiye
| | - Başak Günçer
- Department of Biophysics, Istanbul Faculty of Medicine, Istanbul University, Istanbul, Türkiye.
| |
Collapse
|
3
|
Pawnikar S, Magenheimer BS, Joshi K, Munoz EN, Haldane A, Maser RL, Miao Y. Activation of Polycystin-1 Signaling by Binding of Stalk-derived Peptide Agonists. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.06.574465. [PMID: 38260358 PMCID: PMC10802338 DOI: 10.1101/2024.01.06.574465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Polycystin-1 (PC1) is the membrane protein product of the PKD1 gene whose mutation is responsible for 85% of the cases of autosomal dominant polycystic kidney disease (ADPKD). ADPKD is primarily characterized by the formation of renal cysts and potential kidney failure. PC1 is an atypical G protein-coupled receptor (GPCR) consisting of 11 transmembrane helices and an autocatalytic GAIN domain that cleaves PC1 into extracellular N-terminal (NTF) and membrane-embedded C-terminal (CTF) fragments. Recently, signaling activation of the PC1 CTF was shown to be regulated by a stalk tethered agonist (TA), a distinct mechanism observed in the adhesion GPCR family. A novel allosteric activation pathway was elucidated for the PC1 CTF through a combination of Gaussian accelerated molecular dynamics (GaMD), mutagenesis and cellular signaling experiments. Here, we show that synthetic, soluble peptides with 7 to 21 residues derived from the stalk TA, in particular, peptides including the first 9 residues (p9), 17 residues (p17) and 21 residues (p21) exhibited the ability to re-activate signaling by a stalkless PC1 CTF mutant in cellular assays. To reveal molecular mechanisms of stalk peptide-mediated signaling activation, we have applied a novel Peptide GaMD (Pep-GaMD) algorithm to elucidate binding conformations of selected stalk peptide agonists p9, p17 and p21 to the stalkless PC1 CTF. The simulations revealed multiple specific binding regions of the stalk peptide agonists to the PC1 protein including an "intermediate" bound yet inactive state. Our Pep-GaMD simulation findings were consistent with the cellular assay experimental data. Binding of peptide agonists to the TOP domain of PC1 induced close TOP-putative pore loop interactions, a characteristic feature of the PC1 CTF signaling activation mechanism. Using sequence covariation analysis of PC1 homologs, we further showed that the peptide binding regions were consistent with covarying residue pairs identified between the TOP domain and the stalk TA. Therefore, structural dynamic insights into the mechanisms of PC1 activation by stalk-derived peptide agonists have enabled an in-depth understanding of PC1 signaling. They will form a foundation for development of PC1 as a therapeutic target for the treatment of ADPKD.
Collapse
Affiliation(s)
- Shristi Pawnikar
- Center for Computational Biology and Department of Molecular Biosciences, University of Kansas, Lawrence, KS 66047
| | - Brenda S. Magenheimer
- Clinical Laboratory Sciences, University of Kansas Medical Center, Kansas City, KS 66160
- The Jared Grantham Kidney Institute, University of Kansas Medical Center, Kansas City, KS 66160
| | - Keya Joshi
- Department of Pharmacology and Computational Medicine Program, University of North Carolina – Chapel Hill, Chapel Hill, NC 27599
| | - Ericka Nevarez Munoz
- Clinical Laboratory Sciences, University of Kansas Medical Center, Kansas City, KS 66160
| | - Allan Haldane
- Dept of Physics, and Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA 19122
| | - Robin L. Maser
- Departments of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, KS 66160
- Clinical Laboratory Sciences, University of Kansas Medical Center, Kansas City, KS 66160
- The Jared Grantham Kidney Institute, University of Kansas Medical Center, Kansas City, KS 66160
| | - Yinglong Miao
- Department of Pharmacology and Computational Medicine Program, University of North Carolina – Chapel Hill, Chapel Hill, NC 27599
| |
Collapse
|
4
|
Cocco S, Posani L, Monasson R. Functional effects of mutations in proteins can be predicted and interpreted by guided selection of sequence covariation information. Proc Natl Acad Sci U S A 2024; 121:e2312335121. [PMID: 38889151 PMCID: PMC11214004 DOI: 10.1073/pnas.2312335121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Accepted: 04/21/2024] [Indexed: 06/20/2024] Open
Abstract
Predicting the effects of one or more mutations to the in vivo or in vitro properties of a wild-type protein is a major computational challenge, due to the presence of epistasis, that is, of interactions between amino acids in the sequence. We introduce a computationally efficient procedure to build minimal epistatic models to predict mutational effects by combining evolutionary (homologous sequence) and few mutational-scan data. Mutagenesis measurements guide the selection of links in a sparse graphical model, while the parameters on the nodes and the edges are inferred from sequence data. We show, on 10 mutational scans, that our pipeline exhibits performances comparable to state-of-the-art deep networks trained on many more data, while requiring much less parameters and being hence more interpretable. In particular, the identified interactions adapt to the wild-type protein and to the fitness or biochemical property experimentally measured, mostly focus on key functional sites, and are not necessarily related to structural contacts. Therefore, our method is able to extract information relevant for one mutational experiment from homologous sequence data reflecting the multitude of structural and functional constraints acting on proteins throughout evolution.
Collapse
Affiliation(s)
- Simona Cocco
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR8023 and Paris Sciences & Lettres (PSL) Research, Sorbonne Université, 75005Paris, France
| | - Lorenzo Posani
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR8023 and Paris Sciences & Lettres (PSL) Research, Sorbonne Université, 75005Paris, France
| | - Rémi Monasson
- Laboratory of Physics of the Ecole Normale Supérieure, CNRS UMR8023 and Paris Sciences & Lettres (PSL) Research, Sorbonne Université, 75005Paris, France
| |
Collapse
|
5
|
Nechushtai R, Rowland L, Karmi O, Marjault HB, Nguyen TT, Mittal S, Ahmed RS, Grant D, Manrique-Acevedo C, Morcos F, Onuchic JN, Mittler R. CISD3/MiNT is required for complex I function, mitochondrial integrity, and skeletal muscle maintenance. Proc Natl Acad Sci U S A 2024; 121:e2405123121. [PMID: 38781208 PMCID: PMC11145280 DOI: 10.1073/pnas.2405123121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2024] [Accepted: 04/23/2024] [Indexed: 05/25/2024] Open
Abstract
Mitochondria play a central role in muscle metabolism and function. A unique family of iron-sulfur proteins, termed CDGSH Iron Sulfur Domain-containing (CISD/NEET) proteins, support mitochondrial function in skeletal muscles. The abundance of these proteins declines during aging leading to muscle degeneration. Although the function of the outer mitochondrial CISD/NEET proteins, CISD1/mitoNEET and CISD2/NAF-1, has been defined in skeletal muscle cells, the role of the inner mitochondrial CISD protein, CISD3/MiNT, is currently unknown. Here, we show that CISD3 deficiency in mice results in muscle atrophy that shares proteomic features with Duchenne muscular dystrophy. We further reveal that CISD3 deficiency impairs the function and structure of skeletal muscles, as well as their mitochondria, and that CISD3 interacts with, and donates its [2Fe-2S] clusters to, complex I respiratory chain subunit NADH Ubiquinone Oxidoreductase Core Subunit V2 (NDUFV2). Using coevolutionary and structural computational tools, we model a CISD3-NDUFV2 complex with proximal coevolving residue interactions conducive of [2Fe-2S] cluster transfer reactions, placing the clusters of the two proteins 10 to 16 Å apart. Taken together, our findings reveal that CISD3/MiNT is important for supporting the biogenesis and function of complex I, essential for muscle maintenance and function. Interventions that target CISD3 could therefore impact different muscle degeneration syndromes, aging, and related conditions.
Collapse
Affiliation(s)
- Rachel Nechushtai
- Plant & Environmental Sciences, The Alexander Silberman Institute of Life Science and The Wolfson Centre for Applied Structural Biology, Faculty of Science and Mathematics, The Edmond J. Safra Campus at Givat Ram, The Hebrew University of Jerusalem, Jerusalem91904, Israel
| | - Linda Rowland
- Department of Surgery, University of Missouri School of Medicine, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO65201
| | - Ola Karmi
- Plant & Environmental Sciences, The Alexander Silberman Institute of Life Science and The Wolfson Centre for Applied Structural Biology, Faculty of Science and Mathematics, The Edmond J. Safra Campus at Givat Ram, The Hebrew University of Jerusalem, Jerusalem91904, Israel
| | - Henri-Baptiste Marjault
- Department of Surgery, University of Missouri School of Medicine, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO65201
| | - Thi Thao Nguyen
- Gehrke Proteomics Center, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO65211
| | - Shubham Mittal
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - Raheel S. Ahmed
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
| | - DeAna Grant
- Electron Microscopy Core Facility, University of Missouri, NextGen Precision Health Institute, Columbia, MO65211
| | - Camila Manrique-Acevedo
- Division of Endocrinology and Metabolism, Department of Medicine, University of Missouri, Columbia, MO 65201
- NextGen Precision Health, University of Missouri, Columbia, MO 65201
- Harry S. Truman Memorial Veterans’ Hospital, Columbia, MO 65201
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX75080
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX75080
- Department of Physics, University of Texas at Dallas, Richardson, TX75080
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX75080
| | - José N. Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX77005
- Department of Physics and Astronomy, Rice University, Houston, TX77005
- Department of Chemistry, Rice University, Houston, TX77005
- Department of Biosciences, Rice University, Houston, TX77005
| | - Ron Mittler
- Department of Surgery, University of Missouri School of Medicine, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO65201
| |
Collapse
|
6
|
Shibata M, Lin X, Onuchic JN, Yura K, Cheng RR. Residue coevolution and mutational landscape for OmpR and NarL response regulator subfamilies. Biophys J 2024; 123:681-692. [PMID: 38291753 PMCID: PMC10995415 DOI: 10.1016/j.bpj.2024.01.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 12/31/2023] [Accepted: 01/24/2024] [Indexed: 02/01/2024] Open
Abstract
DNA-binding response regulators (DBRRs) are a broad class of proteins that operate in tandem with their partner kinase proteins to form two-component signal transduction systems in bacteria. Typical DBRRs are composed of two domains where the conserved N-terminal domain accepts transduced signals and the evolutionarily diverse C-terminal domain binds to DNA. These domains are assumed to be functionally independent, and hence recombination of the two domains should yield novel DBRRs of arbitrary input/output response, which can be used as biosensors. This idea has been proved to be successful in some cases; yet, the error rate is not trivial. Improvement of the success rate of this technique requires a deeper understanding of the linker-domain and inter-domain residue interactions, which have not yet been thoroughly examined. Here, we studied residue coevolution of DBRRs of the two main subfamilies (OmpR and NarL) using large collections of bacterial amino acid sequences to extensively investigate the evolutionary signatures of linker-domain and inter-domain residue interactions. Coevolutionary analysis uncovered evolutionarily selected linker-domain and inter-domain residue interactions of known experimental structures, as well as previously unknown inter-domain residue interactions. We examined the possibility of these inter-domain residue interactions as contacts that stabilize an inactive conformation of the DBRR where DNA binding is inhibited for both subfamilies. The newly gained insights on linker-domain/inter-domain residue interactions and shared inactivation mechanisms improve the understanding of the functional mechanism of DBRRs, providing clues to efficiently create functional DBRR-based biosensors. Additionally, we show the feasibility of applying coevolutionary landscape models to predict the functionality of domain-swapped DBRR proteins. The presented result demonstrates that sequence information can be used to filter out bioengineered DBRR proteins that are predicted to be nonfunctional due to a high negative predictive value.
Collapse
Affiliation(s)
- Mayu Shibata
- Graduate School of Humanities and Sciences, Ochanomizu University, Bunkyo, Tokyo, Japan; Center for Theoretical Biological Physics, Rice University, Houston Texas
| | - Xingcheng Lin
- Department of Physics, North Carolina State University, Raleigh, North Carolina; Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston Texas; Department of Physics and Astronomy, Chemistry, and Biosciences, Rice University, Houston, Texas
| | - Kei Yura
- Graduate School of Humanities and Sciences, Ochanomizu University, Bunkyo, Tokyo, Japan; Center for Interdisciplinary AI and Data Science, Ochanomizu University, Bunkyo, Tokyo, Japan; Graduate School of Advanced Science and Engineering, Waseda University, Shinjuku, Tokyo, Japan
| | - Ryan R Cheng
- Department of Chemistry, University of Kentucky, Lexington, Kentucky.
| |
Collapse
|
7
|
Fongang B, Wadop YN, Zhu Y, Wagner EJ, Kudlicki A, Rowicka M. Coevolution combined with molecular dynamics simulations provides structural and mechanistic insights into the interactions between the integrator complex subunits. Comput Struct Biotechnol J 2023; 21:5686-5697. [PMID: 38074468 PMCID: PMC10700540 DOI: 10.1016/j.csbj.2023.11.022] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Revised: 11/10/2023] [Accepted: 11/10/2023] [Indexed: 01/18/2024] Open
Abstract
Finding the 3D structure of large, multi-subunit complexes is difficult, despite recent advances in cryo-EM technology, due to remaining challenges to expressing and purifying subunits. Computational approaches that predict protein-protein interactions, including Direct Coupling Analysis (DCA), represent an attractive alternative for dissecting interactions within protein complexes. However, they are readily applicable only to small proteins due to high computational complexity and a high number of false positives. To solve this problem, we proposed a modified DCA approach, a powerful tool to predict the most likely interfaces of protein complexes. Since our modified approach cannot provide structural and mechanistic details of interacting peptides, we combine it with Molecular Dynamics (MD) simulations. To illustrate this novel approach, we predict interacting domains and structural details of interactions of two Integrator complex subunits, INTS9 and INTS11. Our predictions of interacting residues of INTS9/INTS11 are highly consistent with crystallographic structure. We then expand our procedure to two complexes whose structures are not well-studied: 1) The heterodimer formed by the Cleavage and Polyadenylation Specificity Factor 100-kD (CPSF100) and 73-kD (CPSF73); 2) The heterotrimer formed by INTS4/INTS9/INTS11. Experimental data supports our predictions of interactions within these two complexes, demonstrating that combining DCA and MD simulations is a powerful approach to revealing structural insights of large protein complexes.
Collapse
Affiliation(s)
- Bernard Fongang
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, The University of Texas Health Science Center at San Antonio, San Antonio, TX, United States
- Department of Biochemistry and Structural Biology, The University of Texas Health Science Center at San Antonio, San Antonio, TX, United States
- Department of Population Health Sciences, The University of Texas Health Science Center at San Antonio, San Antonio, TX, United States
- Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States
| | - Yannick N. Wadop
- Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, The University of Texas Health Science Center at San Antonio, San Antonio, TX, United States
- Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States
| | - Yingjie Zhu
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX, United States
- Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States
| | - Eric J. Wagner
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX, United States
- Department of Biochemistry and Biophysics, The University of Rochester Medical Center, Rochester, NY, United States
- Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States
| | - Andrzej Kudlicki
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX, United States
- Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States
- Informatics Service Center, The University of Texas Medical Branch, Galveston, TX, United States
| | - Maga Rowicka
- Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX, United States
- Institute for Translational Sciences, The University of Texas Medical Branch, Galveston, TX, United States
| |
Collapse
|
8
|
Ziegler C, Martin J, Sinner C, Morcos F. Latent generative landscapes as maps of functional diversity in protein sequence space. Nat Commun 2023; 14:2222. [PMID: 37076519 PMCID: PMC10113739 DOI: 10.1038/s41467-023-37958-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2022] [Accepted: 04/05/2023] [Indexed: 04/21/2023] Open
Abstract
Variational autoencoders are unsupervised learning models with generative capabilities, when applied to protein data, they classify sequences by phylogeny and generate de novo sequences which preserve statistical properties of protein composition. While previous studies focus on clustering and generative features, here, we evaluate the underlying latent manifold in which sequence information is embedded. To investigate properties of the latent manifold, we utilize direct coupling analysis and a Potts Hamiltonian model to construct a latent generative landscape. We showcase how this landscape captures phylogenetic groupings, functional and fitness properties of several systems including Globins, β-lactamases, ion channels, and transcription factors. We provide support on how the landscape helps us understand the effects of sequence variability observed in experimental data and provides insights on directed and natural protein evolution. We propose that combining generative properties and functional predictive power of variational autoencoders and coevolutionary analysis could be beneficial in applications for protein engineering and design.
Collapse
Affiliation(s)
- Cheyenne Ziegler
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Jonathan Martin
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Claude Sinner
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, 75080, USA.
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX, 75080, USA.
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX, 75080, USA.
| |
Collapse
|
9
|
Artsimovitch I, Ramírez-Sarmiento CA. Metamorphic proteins under a computational microscope: Lessons from a fold-switching RfaH protein. Comput Struct Biotechnol J 2022; 20:5824-5837. [PMID: 36382197 PMCID: PMC9630627 DOI: 10.1016/j.csbj.2022.10.024] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2022] [Revised: 10/18/2022] [Accepted: 10/18/2022] [Indexed: 11/28/2022] Open
Abstract
Metamorphic proteins constitute unexpected paradigms of the protein folding problem, as their sequences encode two alternative folds, which reversibly interconvert within biologically relevant timescales to trigger different cellular responses. Once considered a rare aberration, metamorphism may be common among proteins that must respond to rapidly changing environments, exemplified by NusG-like proteins, the only transcription factors present in every domain of life. RfaH, a specialized paralog of bacterial NusG, undergoes an all-α to all-β domain switch to activate expression of virulence and conjugation genes in many animal and plant pathogens and is the quintessential example of a metamorphic protein. The dramatic nature of RfaH structural transformation and the richness of its evolutionary history makes for an excellent model for studying how metamorphic proteins switch folds. Here, we summarize the structural and functional evidence that sparked the discovery of RfaH as a metamorphic protein, the experimental and computational approaches that enabled the description of the molecular mechanism and refolding pathways of its structural interconversion, and the ongoing efforts to find signatures and general properties to ultimately describe the protein metamorphome.
Collapse
Affiliation(s)
- Irina Artsimovitch
- Department of Microbiology and The Center for RNA Biology, The Ohio State University, Columbus, OH, USA
| | - César A. Ramírez-Sarmiento
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine and Biological Sciences, Pontificia Universidad Católica de Chile, Santiago, Chile
- ANID, Millennium Science Initiative Program, Millennium Institute for Integrative Biology (iBio), Santiago, Chile
| |
Collapse
|
10
|
Ravishankar K, Jiang X, Leddin EM, Morcos F, Cisneros GA. Computational compensatory mutation discovery approach: Predicting a PARP1 variant rescue mutation. Biophys J 2022; 121:3663-3673. [PMID: 35642254 PMCID: PMC9617126 DOI: 10.1016/j.bpj.2022.05.036] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Revised: 05/20/2022] [Accepted: 05/23/2022] [Indexed: 11/02/2022] Open
Abstract
The prediction of protein mutations that affect function may be exploited for multiple uses. In the context of disease variants, the prediction of compensatory mutations that reestablish functional phenotypes could aid in the development of genetic therapies. In this work, we present an integrated approach that combines coevolutionary analysis and molecular dynamics (MD) simulations to discover functional compensatory mutations. This approach is employed to investigate possible rescue mutations of a poly(ADP-ribose) polymerase 1 (PARP1) variant, PARP1 V762A, associated with lung cancer and follicular lymphoma. MD simulations show PARP1 V762A exhibits noticeable changes in structural and dynamical behavior compared with wild-type (WT) PARP1. Our integrated approach predicts A755E as a possible compensatory mutation based on coevolutionary information, and molecular simulations indicate that the PARP1 A755E/V762A double mutant exhibits similar structural and dynamical behavior to WT PARP1. Our methodology can be broadly applied to a large number of systems where single-nucleotide polymorphisms have been identified as connected to disease and can shed light on the biophysical effects of such changes as well as provide a way to discover potential mutants that could restore WT-like functionality. This can, in turn, be further utilized in the design of molecular therapeutics that aim to mimic such compensatory effect.
Collapse
Affiliation(s)
| | - Xianli Jiang
- Department of Biological Sciences, The University of Texas at Dallas, Richardson, Texas; Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Emmett M Leddin
- Department of Chemistry, University of North Texas, Denton, Texas
| | - Faruck Morcos
- Department of Biological Sciences, The University of Texas at Dallas, Richardson, Texas; Department of Bioengineering, The University of Texas at Dallas, Richardson, Texas; Center for Systems Biology, The University of Texas at Dallas, Richardson, Texas.
| | - G Andrés Cisneros
- Department of Chemistry, University of North Texas, Denton, Texas; Department of Physics, The University of Texas at Dallas, Richardson, Texas; Department of Chemistry, The University of Texas at Dallas, Richardson, Texas.
| |
Collapse
|
11
|
Gerardos A, Dietler N, Bitbol AF. Correlations from structure and phylogeny combine constructively in the inference of protein partners from sequences. PLoS Comput Biol 2022; 18:e1010147. [PMID: 35576238 PMCID: PMC9135348 DOI: 10.1371/journal.pcbi.1010147] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 05/26/2022] [Accepted: 04/27/2022] [Indexed: 11/19/2022] Open
Abstract
Inferring protein-protein interactions from sequences is an important task in computational biology. Recent methods based on Direct Coupling Analysis (DCA) or Mutual Information (MI) allow to find interaction partners among paralogs of two protein families. Does successful inference mainly rely on correlations from structural contacts or from phylogeny, or both? Do these two types of signal combine constructively or hinder each other? To address these questions, we generate and analyze synthetic data produced using a minimal model that allows us to control the amounts of structural constraints and phylogeny. We show that correlations from these two sources combine constructively to increase the performance of partner inference by DCA or MI. Furthermore, signal from phylogeny can rescue partner inference when signal from contacts becomes less informative, including in the realistic case where inter-protein contacts are restricted to a small subset of sites. We also demonstrate that DCA-inferred couplings between non-contact pairs of sites improve partner inference in the presence of strong phylogeny, while deteriorating it otherwise. Moreover, restricting to non-contact pairs of sites preserves inference performance in the presence of strong phylogeny. In a natural data set, as well as in realistic synthetic data based on it, we find that non-contact pairs of sites contribute positively to partner inference performance, and that restricting to them preserves performance, evidencing an important role of phylogeny.
Collapse
Affiliation(s)
- Andonis Gerardos
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nicola Dietler
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
12
|
Chi H, Zhou Q, Tutol JN, Phelps SM, Lee J, Kapadia P, Morcos F, Dodani SC. Coupling a Live Cell Directed Evolution Assay with Coevolutionary Landscapes to Engineer an Improved Fluorescent Rhodopsin Chloride Sensor. ACS Synth Biol 2022; 11:1627-1638. [PMID: 35389621 PMCID: PMC9184236 DOI: 10.1021/acssynbio.2c00033] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Our understanding of chloride in biology has been accelerated through the application of fluorescent protein-based sensors in living cells. These sensors can be generated and diversified to have a range of properties using laboratory-guided evolution. Recently, we established that the fluorescent proton-pumping rhodopsin wtGR from Gloeobacter violaceus can be converted into a fluorescent sensor for chloride. To unlock this non-natural function, a single point mutation at the Schiff counterion position (D121V) was introduced into wtGR fused to cyan fluorescent protein (CFP) resulting in GR1-CFP. Here, we have integrated coevolutionary analysis with directed evolution to understand how the rhodopsin sequence space can be explored and engineered to improve this starting point. We first show how evolutionary couplings are predictive of functional sites in the rhodopsin family and how a fitness metric based on a sequence can be used to quantify the known proton-pumping activities of GR-CFP variants. Then, we couple this ability to predict potential functional outcomes with a screening and selection assay in live Escherichia coli to reduce the mutational search space of five residues along the proton-pumping pathway in GR1-CFP. This iterative selection process results in GR2-CFP with four additional mutations: E132K, A84K, T125C, and V245I. Finally, bulk and single fluorescence measurements in live E. coli reveal that GR2-CFP is a reversible, ratiometric fluorescent sensor for extracellular chloride with an improved dynamic range. We anticipate that our framework will be applicable to other systems, providing a more efficient methodology to engineer fluorescent protein-based sensors with desired properties.
Collapse
|
13
|
Chu WT, Yan Z, Chu X, Zheng X, Liu Z, Xu L, Zhang K, Wang J. Physics of biomolecular recognition and conformational dynamics. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2021; 84:126601. [PMID: 34753115 DOI: 10.1088/1361-6633/ac3800] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 11/09/2021] [Indexed: 06/13/2023]
Abstract
Biomolecular recognition usually leads to the formation of binding complexes, often accompanied by large-scale conformational changes. This process is fundamental to biological functions at the molecular and cellular levels. Uncovering the physical mechanisms of biomolecular recognition and quantifying the key biomolecular interactions are vital to understand these functions. The recently developed energy landscape theory has been successful in quantifying recognition processes and revealing the underlying mechanisms. Recent studies have shown that in addition to affinity, specificity is also crucial for biomolecular recognition. The proposed physical concept of intrinsic specificity based on the underlying energy landscape theory provides a practical way to quantify the specificity. Optimization of affinity and specificity can be adopted as a principle to guide the evolution and design of molecular recognition. This approach can also be used in practice for drug discovery using multidimensional screening to identify lead compounds. The energy landscape topography of molecular recognition is important for revealing the underlying flexible binding or binding-folding mechanisms. In this review, we first introduce the energy landscape theory for molecular recognition and then address four critical issues related to biomolecular recognition and conformational dynamics: (1) specificity quantification of molecular recognition; (2) evolution and design in molecular recognition; (3) flexible molecular recognition; (4) chromosome structural dynamics. The results described here and the discussions of the insights gained from the energy landscape topography can provide valuable guidance for further computational and experimental investigations of biomolecular recognition and conformational dynamics.
Collapse
Affiliation(s)
- Wen-Ting Chu
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Zhiqiang Yan
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Xiakun Chu
- Department of Chemistry & Physics, State University of New York at Stony Brook, Stony Brook, NY 11794, United States of America
| | - Xiliang Zheng
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Zuojia Liu
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Li Xu
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Kun Zhang
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Jin Wang
- Department of Chemistry & Physics, State University of New York at Stony Brook, Stony Brook, NY 11794, United States of America
| |
Collapse
|
14
|
Gaba A, Hix MA, Suhail S, Flath B, Boysan B, Williams DR, Pelletier T, Emerman M, Morcos F, Cisneros GA, Chelico L. Divergence in Dimerization and Activity of Primate APOBEC3C. J Mol Biol 2021; 433:167306. [PMID: 34666043 PMCID: PMC9202443 DOI: 10.1016/j.jmb.2021.167306] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Revised: 10/08/2021] [Accepted: 10/08/2021] [Indexed: 11/21/2022]
Abstract
The APOBEC3 (A3) family of single-stranded DNA cytidine deaminases are host restriction factors that inhibit lentiviruses, such as HIV-1, in the absence of the Vif protein that causes their degradation. Deamination of cytidine in HIV-1 (−)DNA forms uracil that causes inactivating mutations when uracil is used as a template for (+)DNA synthesis. For APOBEC3C (A3C), the chimpanzee and gorilla orthologues are more active than human A3C, and we determined that Old World Monkey A3C from rhesus macaque (rh) is not active against HIV-1. Biochemical, virological, and coevolutionary analyses combined with molecular dynamics simulations showed that the key amino acids needed to promote rhA3C antiviral activity, 44, 45, and 144, also promoted dimerization and changes to the dynamics of loop 1, near the enzyme active site. Although forced evolution of rhA3C resulted in a similar dimer interface with hominid A3C, the key amino acid contacts were different. Overall, our results determine the basis for why rhA3C is less active than human A3C and establish the amino acid network for dimerization and increased activity. Based on identification of the key amino acids determining Old World Monkey antiviral activity we predict that other Old World Monkey A3Cs did not impart anti-lentiviral activity, despite fixation of a key residue needed for hominid A3C activity. Overall, the coevolutionary analysis of the A3C dimerization interface presented also provides a basis from which to analyze dimerization interfaces of other A3 family members.
Collapse
Affiliation(s)
- Amit Gaba
- Department of Biochemistry, Microbiology, and Immunology, College of Medicine, University of Saskatchewan, Saskatoon, Canada. https://twitter.com/optimist1023
| | - Mark A Hix
- Department of Chemistry, University of North Texas, Denton, TX, USA. https://twitter.com/markahix
| | - Sana Suhail
- Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX, USA. https://twitter.com/sakuraa_329
| | - Ben Flath
- Department of Biochemistry, Microbiology, and Immunology, College of Medicine, University of Saskatchewan, Saskatoon, Canada
| | - Brock Boysan
- Department of Chemistry, University of North Texas, Denton, TX, USA
| | - Danielle R Williams
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA, USA. https://twitter.com/dani_renee_
| | - Tomas Pelletier
- Department of Biochemistry, Microbiology, and Immunology, College of Medicine, University of Saskatchewan, Saskatoon, Canada
| | - Michael Emerman
- Division of Human Biology, Fred Hutchinson Cancer Research Center, Seattle, WA, USA; Division of Basic Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA. https://twitter.com/memerman
| | - Faruck Morcos
- Department of Biological Sciences, Center for Systems Biology, University of Texas at Dallas, Richardson, TX, USA; Department of Bioengineering, University of Texas at Dallas, Dallas, TX, USA. https://twitter.com/MorcosLab
| | - G Andrés Cisneros
- Department of Chemistry, University of North Texas, Denton, TX, USA. https://twitter.com/CisnerosRes
| | - Linda Chelico
- Department of Biochemistry, Microbiology, and Immunology, College of Medicine, University of Saskatchewan, Saskatoon, Canada.
| |
Collapse
|
15
|
Mehrabiani KM, Cheng RR, Onuchic JN. Expanding Direct Coupling Analysis to Identify Heterodimeric Interfaces from Limited Protein Sequence Data. J Phys Chem B 2021; 125:11408-11417. [PMID: 34618469 DOI: 10.1021/acs.jpcb.1c07145] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Direct coupling analysis (DCA) is a global statistical approach that uses information encoded in protein sequence data to predict spatial contacts in a three-dimensional structure of a folded protein. DCA has been widely used to predict the monomeric fold at amino acid resolution and to identify biologically relevant interaction sites within a folded protein. Going beyond single proteins, DCA has also been used to identify spatial contacts that stabilize the interaction in protein complex formation. However, extracting this higher order information necessary to predict dimer contacts presents a significant challenge. A DCA evolutionary signal is much stronger at the single protein level (intraprotein contacts) than at the protein-protein interface (interprotein contacts). Therefore, if DCA-derived information is to be used to predict the structure of these complexes, there is a need to identify statistically significant DCA predictions. We propose a simple Z-score measure that can filter good predictions despite noisy, limited data. This new methodology not only improves our prediction ability but also provides a quantitative measure for the validity of the prediction.
Collapse
Affiliation(s)
- Kareem M Mehrabiani
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Systems, Synthetic, and Physical Biology, Rice University, Houston, Texas 77005, United States
| | - Ryan R Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Systems, Synthetic, and Physical Biology, Rice University, Houston, Texas 77005, United States.,Department of Physics & Astronomy, Rice University, Houston, Texas 77005, United States.,Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Department of Biosciences, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
16
|
Suh D, Lee JW, Choi S, Lee Y. Recent Applications of Deep Learning Methods on Evolution- and Contact-Based Protein Structure Prediction. Int J Mol Sci 2021; 22:6032. [PMID: 34199677 PMCID: PMC8199773 DOI: 10.3390/ijms22116032] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2021] [Revised: 05/29/2021] [Accepted: 05/29/2021] [Indexed: 01/23/2023] Open
Abstract
The new advances in deep learning methods have influenced many aspects of scientific research, including the study of the protein system. The prediction of proteins' 3D structural components is now heavily dependent on machine learning techniques that interpret how protein sequences and their homology govern the inter-residue contacts and structural organization. Especially, methods employing deep neural networks have had a significant impact on recent CASP13 and CASP14 competition. Here, we explore the recent applications of deep learning methods in the protein structure prediction area. We also look at the potential opportunities for deep learning methods to identify unknown protein structures and functions to be discovered and help guide drug-target interactions. Although significant problems still need to be addressed, we expect these techniques in the near future to play crucial roles in protein structural bioinformatics as well as in drug discovery.
Collapse
Affiliation(s)
- Donghyuk Suh
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Jai Woo Lee
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Sun Choi
- Global AI Drug Discovery Center, School of Pharmaceutical Sciences, College of Pharmacy and Graduate, Ewha Womans University, Seoul 03760, Korea; (D.S.); (J.W.L.); (S.C.)
| | - Yoonji Lee
- College of Pharmacy, Chung-Ang University, Seoul 06974, Korea
| |
Collapse
|
17
|
Anton B, Besalú M, Fornes O, Bonet J, Molina A, Molina-Fernandez R, De Las Cuevas G, Fernandez-Fuentes N, Oliva B. On the use of direct-coupling analysis with a reduced alphabet of amino acids combined with super-secondary structure motifs for protein fold prediction. NAR Genom Bioinform 2021; 3:lqab027. [PMID: 33937764 PMCID: PMC8061457 DOI: 10.1093/nargab/lqab027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2020] [Revised: 02/27/2021] [Accepted: 03/26/2021] [Indexed: 11/12/2022] Open
Abstract
Direct-coupling analysis (DCA) for studying the coevolution of residues in proteins has been widely used to predict the three-dimensional structure of a protein from its sequence. We present RADI/raDIMod, a variation of the original DCA algorithm that groups chemically equivalent residues combined with super-secondary structure motifs to model protein structures. Interestingly, the simplification produced by grouping amino acids into only two groups (polar and non-polar) is still representative of the physicochemical nature that characterizes the protein structure and it is in line with the role of hydrophobic forces in protein-folding funneling. As a result of a compressed alphabet, the number of sequences required for the multiple sequence alignment is reduced. The number of long-range contacts predicted is limited; therefore, our approach requires the use of neighboring sequence-positions. We use the prediction of secondary structure and motifs of super-secondary structures to predict local contacts. We use RADI and raDIMod, a fragment-based protein structure modelling, achieving near native conformations when the number of super-secondary motifs covers >30-50% of the sequence. Interestingly, although different contacts are predicted with different alphabets, they produce similar structures.
Collapse
Affiliation(s)
- Bernat Anton
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona 08005, Catalonia, Spain
| | - Mireia Besalú
- Departament de Genètica, Microbiologia i Estadística, Universitat de Barcelona, Barcelona 08028, Catalonia, Spain
| | - Oriol Fornes
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona 08005, Catalonia, Spain
| | - Jaume Bonet
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona 08005, Catalonia, Spain
| | - Alexis Molina
- Electronic and Atomic Protein Modeling, Life Sciences, Barcelona Supercomputing Center, Barcelona 08034, Catalonia, Spain
| | - Ruben Molina-Fernandez
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona 08005, Catalonia, Spain
| | - Gemma De Las Cuevas
- Institut für Theoritische Physik, School of Mathematics, Computer Science and Physics, Universität Innsbruck. A-6020 Innsbruck, Austria
| | - Narcis Fernandez-Fuentes
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, SY233EB Aberystwyth, United Kingdom
| | - Baldo Oliva
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, University Pompeu Fabra, Barcelona 08005, Catalonia, Spain
| |
Collapse
|
18
|
Zou T, Woodrum BW, Halloran N, Campitelli P, Bobkov AA, Ghirlanda G, Ozkan SB. Local Interactions That Contribute Minimal Frustration Determine Foldability. J Phys Chem B 2021; 125:2617-2626. [PMID: 33687216 DOI: 10.1021/acs.jpcb.1c00364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Earlier experiments suggest that the evolutionary information (conservation and coevolution) encoded in protein sequences is necessary and sufficient to specify the fold of a protein family. However, there is no computational work to quantify the effect of such evolutionary information on the folding process. Here we explore the role of early folding steps for sequences designed using coevolution and conservation through a combination of computational and experimental methods. We simulated a repertoire of native and designed WW domain sequences to analyze early local contact formation and found that the N-terminal β-hairpin turn would not form correctly due to strong non-native local contacts in unfoldable sequences. Through a maximum likelihood approach, we identified five local contacts that play a critical role in folding, suggesting that a small subset of amino acid pairs can be used to solve the "needle in the haystack" problem to design foldable sequences. Thus, using the contact probability of those five local contacts that form during the early stage of folding, we built a classification model that predicts the foldability of a WW sequence with 81% accuracy. This classification model was used to redesign WW domain sequences that could not fold due to frustration and make them foldable by introducing a few mutations that led to the stabilization of these critical local contacts. The experimental analysis shows that a redesigned sequence folds and binds to polyproline peptides with a similar affinity as those observed for native WW domains. Overall, our analysis shows that evolutionary-designed sequences should not only satisfy the folding stability but also ensure a minimally frustrated folding landscape.
Collapse
Affiliation(s)
- Taisong Zou
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona 85287, United States
| | - Brian W Woodrum
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States
| | - Nicholas Halloran
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States
| | - Paul Campitelli
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona 85287, United States
| | - Andrey A Bobkov
- Conrad Prebys Center for Chemical Genomics, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, California 92037, United States
| | - Giovanna Ghirlanda
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States
| | - Sefika Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona 85287, United States
| |
Collapse
|
19
|
Crippa M, Andreghetti D, Capelli R, Tiana G. Evolution of frustrated and stabilising contacts in reconstructed ancient proteins. EUROPEAN BIOPHYSICS JOURNAL 2021; 50:699-712. [PMID: 33569610 PMCID: PMC8260555 DOI: 10.1007/s00249-021-01500-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 12/14/2020] [Accepted: 01/13/2021] [Indexed: 11/30/2022]
Abstract
Energetic properties of a protein are a major determinant of its evolutionary fitness. Using a reconstruction algorithm, dating the reconstructed proteins and calculating the interaction network between their amino acids through a coevolutionary approach, we studied how the interactions that stabilise 890 proteins, belonging to five families, evolved for billions of years. In particular, we focused our attention on the network of most strongly attractive contacts and on that of poorly optimised, frustrated contacts. Our results support the idea that the cluster of most attractive interactions extends its size along evolutionary time, but from the data, we cannot conclude that protein stability or that the degree of frustration tends always to decrease.
Collapse
Affiliation(s)
- Martina Crippa
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Damiano Andreghetti
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy
| | - Riccardo Capelli
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Guido Tiana
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy.
| |
Collapse
|
20
|
Slater O, Miller B, Kontoyianni M. Decoding Protein-protein Interactions: An Overview. Curr Top Med Chem 2021; 20:855-882. [PMID: 32101126 DOI: 10.2174/1568026620666200226105312] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2019] [Revised: 11/27/2019] [Accepted: 11/27/2019] [Indexed: 12/24/2022]
Abstract
Drug discovery has focused on the paradigm "one drug, one target" for a long time. However, small molecules can act at multiple macromolecular targets, which serves as the basis for drug repurposing. In an effort to expand the target space, and given advances in X-ray crystallography, protein-protein interactions have become an emerging focus area of drug discovery enterprises. Proteins interact with other biomolecules and it is this intricate network of interactions that determines the behavior of the system and its biological processes. In this review, we briefly discuss networks in disease, followed by computational methods for protein-protein complex prediction. Computational methodologies and techniques employed towards objectives such as protein-protein docking, protein-protein interactions, and interface predictions are described extensively. Docking aims at producing a complex between proteins, while interface predictions identify a subset of residues on one protein that could interact with a partner, and protein-protein interaction sites address whether two proteins interact. In addition, approaches to predict hot spots and binding sites are presented along with a representative example of our internal project on the chemokine CXC receptor 3 B-isoform and predictive modeling with IP10 and PF4.
Collapse
Affiliation(s)
- Olivia Slater
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| | - Bethany Miller
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| | - Maria Kontoyianni
- Department of Pharmaceutical Sciences, Southern Illinois University, Edwardsville, IL 62026, United States
| |
Collapse
|
21
|
Thadani NN, Zhou Q, Reyes Gamas K, Butler S, Bueno C, Schafer NP, Morcos F, Wolynes PG, Suh J. Frustration and Direct-Coupling Analyses to Predict Formation and Function of Adeno-Associated Virus. Biophys J 2020; 120:489-503. [PMID: 33359833 DOI: 10.1016/j.bpj.2020.12.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 11/08/2020] [Accepted: 12/08/2020] [Indexed: 01/03/2023] Open
Abstract
Adeno-associated virus (AAV) is a promising gene therapy vector because of its efficient gene delivery and relatively mild immunogenicity. To improve delivery target specificity, researchers use combinatorial and rational library design strategies to generate novel AAV capsid variants. These approaches frequently propose high proportions of nonforming or noninfective capsid protein sequences that reduce the effective depth of synthesized vector DNA libraries, thereby raising the discovery cost of novel vectors. We evaluated two computational techniques for their ability to estimate the impact of residue mutations on AAV capsid protein-protein interactions and thus predict changes in vector fitness, reasoning that these approaches might inform the design of functionally enriched AAV libraries and accelerate therapeutic candidate identification. The Frustratometer computes an energy function derived from the energy landscape theory of protein folding. Direct-coupling analysis (DCA) is a statistical framework that captures residue coevolution within proteins. We applied the Frustratometer to select candidate protein residues predicted to favor assembled or disassembled capsid states, then predicted mutation effects at these sites using the Frustratometer and DCA. Capsid mutants were experimentally assessed for changes in virus formation, stability, and transduction ability. The Frustratometer-based metric showed a counterintuitive correlation with viral stability, whereas a DCA-derived metric was highly correlated with virus transduction ability in the small population of residues studied. Our results suggest that coevolutionary models may be able to elucidate complex capsid residue-residue interaction networks essential for viral function, but further study is needed to understand the relationship between protein energy simulations and viral capsid metastability.
Collapse
Affiliation(s)
| | - Qin Zhou
- Department of Biological Sciences, University of Texas at Dallas, Richardson, Texas
| | | | - Susan Butler
- Department of Bioengineering, Rice University, Houston, Texas
| | - Carlos Bueno
- Center for Theoretical Biological Physics, Rice University, Houston, Texas; Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas
| | - Nicholas P Schafer
- Center for Theoretical Biological Physics, Rice University, Houston, Texas; Department of Chemistry, Rice University, Houston, Texas
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, Texas; Center for Systems Biology, University of Texas at Dallas, Richardson, Texas; Department of Bioengineering, University of Texas at Dallas, Richardson, Texas
| | - Peter G Wolynes
- Center for Theoretical Biological Physics, Rice University, Houston, Texas; Department of Chemistry, Rice University, Houston, Texas; Department of Biosciences, Rice University, Houston, Texas; Department of Physics, Rice University, Houston, Texas
| | - Junghae Suh
- Department of Bioengineering, Rice University, Houston, Texas; Department of Biosciences, Rice University, Houston, Texas; Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas; Systems, Synthetic, and Physical Biology Program, Rice University, Houston, Texas.
| |
Collapse
|
22
|
Terzoli S, Tiana G. Molecular Recognition between Cadherins Studied by a Coarse-Grained Model Interacting with a Coevolutionary Potential. J Phys Chem B 2020; 124:4079-4088. [PMID: 32336092 PMCID: PMC8007105 DOI: 10.1021/acs.jpcb.0c01671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
![]()
Studying the conformations
involved in the dimerization of cadherins
is highly relevant to understand the development of tissues and its
failure, which is associated with tumors and metastases. Experimental
techniques, like X-ray crystallography, can usually report only the
most stable conformations, missing minority states that could nonetheless
be important for the recognition mechanism. Computer simulations could
be a valid complement to the experimental approach. However, standard
all-atom protein models in explicit solvent are computationally too
demanding to search thoroughly the conformational space of multiple
chains composed of several hundreds of amino acids. To reach this
goal, we resorted to a coarse-grained model in implicit solvent. The
standard problem with this kind of model is to find a realistic potential
to describe its interactions. We used coevolutionary information from
cadherin alignments, corrected by a statistical potential, to build
an interaction potential, which is agnostic about the experimental
conformations of the protein. Using this model, we explored the conformational
space of multichain systems and validated the results comparing with
experimental data. We identified dimeric conformations that are sequence
specific and that can be useful to rationalize the mechanism of recognition
between cadherins.
Collapse
Affiliation(s)
- Sara Terzoli
- Department of Physics and Center for Complexity and Biosystems, Universitá degli Studi di Milano and INFN, via Celoria 16, Milano 20133, Italy
| | - Guido Tiana
- Department of Physics and Center for Complexity and Biosystems, Universitá degli Studi di Milano and INFN, via Celoria 16, Milano 20133, Italy
| |
Collapse
|
23
|
Andreani J, Quignot C, Guerois R. Structural prediction of protein interactions and docking using conservation and coevolution. WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL MOLECULAR SCIENCE 2020. [DOI: 10.1002/wcms.1470] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Jessica Andreani
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| | - Chloé Quignot
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| | - Raphael Guerois
- Université Paris‐Saclay CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC) Gif‐sur‐Yvette France
| |
Collapse
|
24
|
Gandarilla-Pérez CA, Mergny P, Weigt M, Bitbol AF. Statistical physics of interacting proteins: Impact of dataset size and quality assessed in synthetic sequences. Phys Rev E 2020; 101:032413. [PMID: 32290011 DOI: 10.1103/physreve.101.032413] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Accepted: 03/04/2020] [Indexed: 11/07/2022]
Abstract
Identifying protein-protein interactions is crucial for a systems-level understanding of the cell. Recently, algorithms based on inverse statistical physics, e.g., direct coupling analysis (DCA), have allowed to use evolutionarily related sequences to address two conceptually related inference tasks: finding pairs of interacting proteins and identifying pairs of residues which form contacts between interacting proteins. Here we address two underlying questions: How are the performances of both inference tasks related? How does performance depend on dataset size and the quality? To this end, we formalize both tasks using Ising models defined over stochastic block models, with individual blocks representing single proteins and interblock couplings protein-protein interactions; controlled synthetic sequence data are generated by Monte Carlo simulations. We show that DCA is able to address both inference tasks accurately when sufficiently large training sets of known interaction partners are available and that an iterative pairing algorithm allows to make predictions even without a training set. Noise in the training data deteriorates performance. In both tasks we find a quadratic scaling relating dataset quality and size that is consistent with noise adding in square-root fashion and signal adding linearly when increasing the dataset. This implies that it is generally good to incorporate more data even if their quality are imperfect, thereby shedding light on the empirically observed performance of DCA applied to natural protein sequences.
Collapse
Affiliation(s)
- Carlos A Gandarilla-Pérez
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), F-75005 Paris, France.,Facultad de Física, Universidad de la Habana, San Lázaro y L, Vedado, Habana 4, CP-10400, Cuba
| | - Pierre Mergny
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), F-75005 Paris, France.,Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire Jean Perrin (LJP, UMR 8237), F-75005 Paris, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), F-75005 Paris, France
| | - Anne-Florence Bitbol
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire Jean Perrin (LJP, UMR 8237), F-75005 Paris, France.,Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
| |
Collapse
|
25
|
Epistatic contributions promote the unification of incompatible models of neutral molecular evolution. Proc Natl Acad Sci U S A 2020; 117:5873-5882. [PMID: 32123092 PMCID: PMC7084075 DOI: 10.1073/pnas.1913071117] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Mathematical models of evolution help us understand mechanisms driving protein-sequence change. Previous models recapitulate a disjoint subset of statistical features of natural sequences. We present a neutral evolution model that unifies features including extreme variance of the molecular clock’s tick rate and the observation of an evolutionary Stokes shift, an irreversible effect of mutations in the fitness landscape during sequence evolution. We show that interactions between amino acid sites, which inform our fitness metric, are required to observe these features. These interactions are inferred by using direct coupling analysis, which has been successfully utilized to predict protein structures, dynamics, and complexes from coevolutionary information. We anticipate our model will have applications in phylogenetics, ancestral reconstruction of sequences, and protein design. We introduce a model of amino acid sequence evolution that accounts for the statistical behavior of real sequences induced by epistatic interactions. We base the model dynamics on parameters derived from multiple sequence alignments analyzed by using direct coupling analysis methodology. Known statistical properties such as overdispersion, heterotachy, and gamma-distributed rate-across-sites are shown to be emergent properties of this model while being consistent with neutral evolution theory, thereby unifying observations from previously disjointed evolutionary models of sequences. The relationship between site restriction and heterotachy is characterized by tracking the effective alphabet dynamics of sites. We also observe an evolutionary Stokes shift in the fitness of sequences that have undergone evolution under our simulation. By analyzing the structural information of some proteins, we corroborate that the strongest Stokes shifts derive from sites that physically interact in networks near biochemically important regions. Perspectives on the implementation of our model in the context of the molecular clock are discussed.
Collapse
|
26
|
Sala D, Cerofolini L, Fragai M, Giachetti A, Luchinat C, Rosato A. A protocol to automatically calculate homo-oligomeric protein structures through the integration of evolutionary constraints and NMR ambiguous contacts. Comput Struct Biotechnol J 2019; 18:114-124. [PMID: 31969972 PMCID: PMC6961069 DOI: 10.1016/j.csbj.2019.12.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2019] [Revised: 11/20/2019] [Accepted: 12/06/2019] [Indexed: 12/15/2022] Open
Abstract
Protein assemblies are involved in many important biological processes. Solid-state NMR (SSNMR) spectroscopy is a technique suitable for the structural characterization of samples with high molecular weight and thus can be applied to such assemblies. A significant bottleneck in terms of both effort and time required is the manual identification of unambiguous intermolecular contacts. This is particularly challenging for homo-oligomeric complexes, where simple uniform labeling may not be effective. We tackled this challenge by exploiting coevolution analysis to extract information on homo-oligomeric interfaces from NMR-derived ambiguous contacts. After removing the evolutionary couplings (ECs) that are already satisfied by the 3D structure of the monomer, the predicted ECs are matched with the automatically generated list of experimental contacts. This approach provides a selection of potential interface residues that is used directly in monomer-monomer docking calculations. We validated the protocol on tetrameric L-asparaginase II and dimeric Sod1.
Collapse
Affiliation(s)
- Davide Sala
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Linda Cerofolini
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Marco Fragai
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Andrea Giachetti
- Consorzio Interuniversitario di Risonanze Magnetiche di Metallo Proteine, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
| | - Claudio Luchinat
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| | - Antonio Rosato
- Magnetic Resonance Center (CERM), University of Florence, Via Luigi Sacconi 6, 50019 Sesto Fiorentino, Italy
- Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Italy
| |
Collapse
|
27
|
Zerihun MB, Pucci F, Peter EK, Schug A. pydca v1.0: a comprehensive software for direct coupling analysis of RNA and protein sequences. Bioinformatics 2019; 36:2264-2265. [DOI: 10.1093/bioinformatics/btz892] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2019] [Revised: 10/14/2019] [Accepted: 11/26/2019] [Indexed: 12/20/2022] Open
Abstract
Abstract
Motivation
The ongoing advances in sequencing technologies have provided a massive increase in the availability of sequence data. This made it possible to study the patterns of correlated substitution between residues in families of homologous proteins or RNAs and to retrieve structural and stability information. Direct coupling analysis (DCA) infers coevolutionary couplings between pairs of residues indicating their spatial proximity, making such information a valuable input for subsequent structure prediction.
Results
Here, we present pydca, a standalone Python-based software package for the DCA of protein- and RNA-homologous families. It is based on two popular inverse statistical approaches, namely, the mean-field and the pseudo-likelihood maximization and is equipped with a series of functionalities that range from multiple sequence alignment trimming to contact map visualization. Thanks to its efficient implementation, features and user-friendly command line interface, pydca is a modular and easy-to-use tool that can be used by researchers with a wide range of backgrounds.
Availability and implementation
pydca can be obtained from https://github.com/KIT-MBS/pydca or from the Python Package Index under the MIT License.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mehari B Zerihun
- Steinbuch Centre for Computing, Eggenstein-Leopoldshafen 76344
- Department of Physics, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen 76344
| | - Fabrizio Pucci
- John von Neumann Institute for Computing, Jülich Supercomputer Centre, Forschungszentrum Jülich, Jülich 52428, Germany
| | - Emanuel K Peter
- John von Neumann Institute for Computing, Jülich Supercomputer Centre, Forschungszentrum Jülich, Jülich 52428, Germany
| | - Alexander Schug
- John von Neumann Institute for Computing, Jülich Supercomputer Centre, Forschungszentrum Jülich, Jülich 52428, Germany
| |
Collapse
|
28
|
Li Y, De la Paz JA, Jiang X, Liu R, Pokkulandra AP, Bleris L, Morcos F. Coevolutionary Couplings Unravel PAM-Proximal Constraints of CRISPR-SpCas9. Biophys J 2019; 117:1684-1691. [PMID: 31648792 DOI: 10.1016/j.bpj.2019.09.040] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 09/25/2019] [Accepted: 09/30/2019] [Indexed: 01/07/2023] Open
Abstract
The clustered regularly interspaced short palindromic repeats (CRISPR) system, an immune system analog found in prokaryotes, allows a single-guide RNA to direct a CRISPR-associated protein (Cas) with combined helicase and nuclease activity to DNA. The presence of a specific protospacer adjacent motif (PAM) next to the DNA target site plays a crucial role in determining both efficacy and specificity of gene editing. Herein, we introduce a coevolutionary framework to computationally unveil nonobvious molecular interactions in CRISPR systems and experimentally probe their functional role. Specifically, we use direct coupling analysis, a statistical inference framework used to infer direct coevolutionary couplings, in the context of protein/nucleic acid interactions. Applied to Streptococcus pyogenes Cas9, a Hamiltonian metric obtained from coevolutionary relationships reveals, to our knowledge, novel PAM-proximal nucleotide preferences at the seventh position of S. pyogenes Cas9 PAM (5'-NGRNNNT-3'), which was experimentally confirmed by in vitro and functional assays in human cells. We show that coevolved and conserved interactions point to specific clues toward rationally engineering new generations of Cas9 systems and may eventually help decipher the diversity of this family of proteins.
Collapse
Affiliation(s)
- Yi Li
- Department of Bioengineering, The University of Texas at Dallas, Richardson, Texas; Center for Systems Biology, The University of Texas at Dallas, Richardson, Texas
| | - José A De la Paz
- Department of Biological Sciences, The University of Texas at Dallas, Richardson, Texas
| | - Xianli Jiang
- Department of Biological Sciences, The University of Texas at Dallas, Richardson, Texas
| | - Richard Liu
- Department of Bioengineering, The University of Texas at Dallas, Richardson, Texas
| | - Adarsha P Pokkulandra
- School of Behavioral and Brain Sciences, The University of Texas at Dallas, Richardson, Texas
| | - Leonidas Bleris
- Department of Bioengineering, The University of Texas at Dallas, Richardson, Texas; Center for Systems Biology, The University of Texas at Dallas, Richardson, Texas; Department of Biological Sciences, The University of Texas at Dallas, Richardson, Texas.
| | - Faruck Morcos
- Department of Bioengineering, The University of Texas at Dallas, Richardson, Texas; Center for Systems Biology, The University of Texas at Dallas, Richardson, Texas; Department of Biological Sciences, The University of Texas at Dallas, Richardson, Texas.
| |
Collapse
|
29
|
Phylogenetic correlations can suffice to infer protein partners from sequences. PLoS Comput Biol 2019; 15:e1007179. [PMID: 31609984 PMCID: PMC6812855 DOI: 10.1371/journal.pcbi.1007179] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2019] [Revised: 10/24/2019] [Accepted: 09/25/2019] [Indexed: 12/30/2022] Open
Abstract
Determining which proteins interact together is crucial to a systems-level understanding of the cell. Recently, algorithms based on Direct Coupling Analysis (DCA) pairwise maximum-entropy models have allowed to identify interaction partners among paralogous proteins from sequence data. This success of DCA at predicting protein-protein interactions could be mainly based on its known ability to identify pairs of residues that are in contact in the three-dimensional structure of protein complexes and that coevolve to remain physicochemically complementary. However, interacting proteins possess similar evolutionary histories. What is the role of purely phylogenetic correlations in the performance of DCA-based methods to infer interaction partners? To address this question, we employ controlled synthetic data that only involve phylogeny and no interactions or contacts. We find that DCA accurately identifies the pairs of synthetic sequences that share evolutionary history. While phylogenetic correlations confound the identification of contacting residues by DCA, they are thus useful to predict interacting partners among paralogs. We find that DCA performs as well as phylogenetic methods to this end, and slightly better than them with large and accurate training sets. Employing DCA or phylogenetic methods within an Iterative Pairing Algorithm (IPA) allows to predict pairs of evolutionary partners without a training set. We further demonstrate the ability of these various methods to correctly predict pairings among real paralogous proteins with genome proximity but no known direct physical interaction, illustrating the importance of phylogenetic correlations in natural data. However, for physically interacting and strongly coevolving proteins, DCA and mutual information outperform phylogenetic methods. We finally discuss how to distinguish physically interacting proteins from proteins that only share a common evolutionary history. Many biologically important protein-protein interactions are conserved over evolutionary time scales. This leads to two different signals that can be used to computationally predict interactions between protein families and to identify specific interaction partners. First, the shared evolutionary history leads to highly similar phylogenetic relationships between interacting proteins of the two families. Second, the need to keep the interaction surfaces of partner proteins biophysically compatible causes a correlated amino-acid usage of interface residues. Employing simulated data, we show that the shared history alone can be used to detect partner proteins. Similar accuracies are achieved by algorithms comparing phylogenetic relationships and by methods based on Direct Coupling Analysis (DCA), which are primarily known for their ability to detect the second type of signal. Using natural sequence data, we show that in cases with shared evolutionary history but without known physical interactions, both methods work with similar accuracy, while for some physically interacting systems, DCA and mutual information outperform phylogenetic methods. We propose methods allowing both to predict interactions between protein families and to find interacting partners among paralogs.
Collapse
|
30
|
Accurate Classification of Biological and non-Biological Interfaces in Protein Crystal Structures using Subtle Covariation Signals. Sci Rep 2019; 9:12603. [PMID: 31471543 PMCID: PMC6717244 DOI: 10.1038/s41598-019-48913-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2017] [Accepted: 08/14/2019] [Indexed: 11/08/2022] Open
Abstract
Proteins often work as oligomers or multimers in vivo. Therefore, elucidating their oligomeric or multimeric form (quaternary structure) is crucially important to ascertain their function. X-ray crystal structures of numerous proteins have been accumulated, providing information related to their biological units. Extracting information of biological units from protein crystal structures represents a meaningful task for modern biology. Nevertheless, although many methods have been proposed for identifying biological units appearing in protein crystal structures, it is difficult to distinguish biological protein-protein interfaces from crystallographic ones. Therefore, our simple but highly accurate classifier was developed to infer biological units in protein crystal structures using large amounts of protein sequence information and a modern contact prediction method to exploit covariation signals (CSs) in proteins. We demonstrate that our proposed method is promising even for weak signals of biological interfaces. We also discuss the relation between classification accuracy and conservation of biological units, and illustrate how the selection of sequences included in multiple sequence alignments as sources for obtaining CSs affects the results. With increased amounts of sequence data, the proposed method is expected to become increasingly useful.
Collapse
|
31
|
Fongang B, Cunningham KA, Rowicka M, Kudlicki A. Coevolution of Residues Provides Evidence of a Functional Heterodimer of 5-HT 2AR and 5-HT 2CR Involving Both Intracellular and Extracellular Domains. Neuroscience 2019; 412:48-59. [PMID: 31158438 PMCID: PMC7299066 DOI: 10.1016/j.neuroscience.2019.05.013] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 05/02/2019] [Accepted: 05/07/2019] [Indexed: 10/26/2022]
Abstract
Serotonin is a neurotransmitter that plays a role in regulating activities such as sleep, appetite, mood and substance abuse disorders; serotonin receptors 5-HT2AR and 5-HT2CR are active within pathways associated with substance abuse. It has been suggested that 5-HT2AR and 5-HT2CR may form a dimer that affects behavioral processes. Here we study the coevolution of residues in 5-HT2AR and 5-HT2CR to identify potential interactions between residues in both proteins. Coevolution studies can detect protein interactions, and since the thus uncovered interactions are subject to evolutionary pressure, they are likely functional. We assessed the significance of the 5-HT2AR/5-HT2CR interactions using randomized phylogenetic trees and found the coevolution significant (p-value = 0.01). We also discuss how co-expression of the receptors suggests the predicted interaction is functional. Finally, we analyze how several single nucleotide polymorphisms for the 5-HT2AR and 5-HT2CR genes affect their interaction. Our findings are the first to characterize the binding interface of 5-HT2AR/5-HT2CR and indicate a correlation between this interface and location of SNPs in both proteins.
Collapse
MESH Headings
- Animals
- Databases, Genetic
- Evolution, Molecular
- Papio anubis
- Phosphorylation
- Receptor, Serotonin, 5-HT2A/genetics
- Receptor, Serotonin, 5-HT2A/metabolism
- Receptor, Serotonin, 5-HT2C/genetics
- Receptor, Serotonin, 5-HT2C/metabolism
- Transcriptome
Collapse
Affiliation(s)
- Bernard Fongang
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX 77555, USA; Glenn Biggs Institute for Alzheimer's & Neurodegenerative Diseases, UTHSCSA, San Antonio, TX 78229, USA; Department of Biochemistry and Structural Biology, UTHSCSA, San Antonio, TX 78229, USA; Department of Epidemiology and Biostatistics, UTHSCSA, San Antonio, TX 78229, USA.
| | - Kathryn A Cunningham
- Center for Addiction Research and Department of Pharmacology and Toxicology, University of Texas Medical Branch, Galveston, TX 77555, USA
| | - Maga Rowicka
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX 77555, USA; Institute for Translational Sciences, University of Texas Medical Branch, Galveston, TX 77555, USA; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, TX 77555, USA
| | - Andrzej Kudlicki
- Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX 77555, USA; Institute for Translational Sciences, University of Texas Medical Branch, Galveston, TX 77555, USA; Sealy Center for Molecular Medicine, University of Texas Medical Branch, Galveston, TX 77555, USA.
| |
Collapse
|
32
|
The role of coevolutionary signatures in protein interaction dynamics, complex inference, molecular recognition, and mutational landscapes. Curr Opin Struct Biol 2019; 56:179-186. [PMID: 31029927 DOI: 10.1016/j.sbi.2019.03.024] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Revised: 03/18/2019] [Accepted: 03/19/2019] [Indexed: 11/22/2022]
Abstract
Evolution imposes constraints at the interface of interacting biomolecules in order to preserve function or maintain fitness. This pressure may have a direct effect on the sequence composition of interacting biomolecules. As a result, statistical patterns of amino acid or nucleotide covariance that encode for physical and functional interactions are observed in sequences of extant organisms. In recent years, global pairwise models of amino acid and nucleotide coevolution from multiple sequence alignments have been developed and utilized to study molecular interactions in structural biology. In proteins, for which the energy landscape is funneled and minimally frustrated, a direct connection between the physical and sequence space landscapes can be established. Estimating coevolutionary information from sequences of interacting molecules has a broad impact in molecular biology. Applications include the accurate determination of 3D structures of molecular complexes, inference of protein interaction partners, models of protein-protein interaction specificity, the elucidation, and design of protein-nucleic acid recognition as well as the discovery of genome-wide epistatic effects. The current state of the art of coevolutionary analysis includes biomedical applications ranging from mutational landscapes and drug-design to vaccine development.
Collapse
|
33
|
Jarmolinska AI, Zhou Q, Sulkowska JI, Morcos F. DCA-MOL: A PyMOL Plugin To Analyze Direct Evolutionary Couplings. J Chem Inf Model 2019; 59:625-629. [DOI: 10.1021/acs.jcim.8b00690] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Affiliation(s)
- Aleksandra I. Jarmolinska
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland
- College of Inter-Faculty Individual Studies in Mathematics and Natural Sciences, Banacha 2c, 02-097 Warsaw, Poland
| | - Qin Zhou
- Department of Biological Sciences, University of Texas at Dallas, Richardson, Texas 75080, United States
| | - Joanna I. Sulkowska
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, Texas 75080, United States
- Center for Systems Biology, University of Texas at Dallas, Richardson, Texas 75080, United States
| |
Collapse
|
34
|
Coevolutionary Signals and Structure-Based Models for the Prediction of Protein Native Conformations. Methods Mol Biol 2019; 1851:83-103. [PMID: 30298393 DOI: 10.1007/978-1-4939-8736-8_5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The analysis of coevolutionary signals from families of evolutionarily related sequences is a recent conceptual framework that provides valuable information about unique intramolecular interactions and, therefore, can assist in the elucidation of biomolecular conformations. It is based on the idea that compensatory mutations at specific residue positions in a sequence help preserve stability of protein architecture and function and leave a statistical signature related to residue-residue interactions in the 3D structure of the protein. Consequently, statistical analysis of these correlated mutations in subsets of protein sequence alignments can be used to predict which residue pairs should be in spatial proximity in the native functional protein fold. These predicted signals can be then used to guide molecular dynamics (MD) simulations to predict the three-dimensional coordinates of a functional amino acid chain. In this chapter, we introduce a general and efficient methodology to perform coevolutionary analysis on protein sequences and to use this information in combination with computational physical models to predict the native 3D conformation of functional polypeptides. We present a step-by-step methodology that includes the description and application of software tools and databases required to infer tertiary structures of a protein fold. The general pipeline includes instructions on (1) how to obtain direct amino acid couplings from protein sequences using direct coupling analysis (DCA), (2) how to incorporate such signals as interaction potentials in Cα structure-based models (SBMs) to drive protein-folding MD simulations, (3) a procedure to estimate secondary structure and how to include such estimates in the topology files required in the MD simulations, and (4) how to build full atomic models based on the top Cα candidates selected in the pipeline. The information presented in this chapter is self-contained and sufficient to allow a computational scientist to predict structures of proteins using publicly available algorithms and databases.
Collapse
|
35
|
Huang YJ, Brock KP, Ishida Y, Swapna GVT, Inouye M, Marks DS, Sander C, Montelione GT. Combining Evolutionary Covariance and NMR Data for Protein Structure Determination. Methods Enzymol 2018; 614:363-392. [PMID: 30611430 PMCID: PMC6640129 DOI: 10.1016/bs.mie.2018.11.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Accurate protein structure determination by solution-state NMR is challenging for proteins greater than about 20kDa, for which extensive perdeuteration is generally required, providing experimental data that are incomplete (sparse) and ambiguous. However, the massive increase in evolutionary sequence information coupled with advances in methods for sequence covariance analysis can provide reliable residue-residue contact information for a protein from sequence data alone. These "evolutionary couplings (ECs)" can be combined with sparse NMR data to determine accurate 3D protein structures. This hybrid "EC-NMR" method has been developed using NMR data for several soluble proteins and validated by comparison with corresponding reference structures determined by X-ray crystallography and/or conventional NMR methods. For small proteins, only backbone resonance assignments are utilized, while for larger proteins both backbone and some sidechain methyl resonance assignments are generally required. ECs can be combined with sparse NMR data obtained on deuterated, selectively protonated protein samples to provide structures that are more accurate and complete than those obtained using such sparse NMR data alone. EC-NMR also has significant potential for analysis of protein structures from solid-state NMR data and for studies of integral membrane proteins. The requirement that ECs are consistent with NMR data recorded on a specific member of a protein family, under specific conditions, also allows identification of ECs that reflect alternative allosteric or excited states of the protein structure.
Collapse
Affiliation(s)
- Yuanpeng Janet Huang
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
| | - Kelly P Brock
- Department of Systems Biology, Harvard Medical School, Boston, MA, United States
| | - Yojiro Ishida
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
| | - Gurla V T Swapna
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
| | - Masayori Inouye
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ, United States
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, United States
| | - Chris Sander
- Department of Cell Biology, Harvard Medical School and cBio Center, Dana-Farber Cancer Institute, Boston, MA, United States
| | - Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, United States; Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, NJ, United States.
| |
Collapse
|
36
|
Abstract
Protein assemblies consisting of structural maintenance of chromosomes (SMC) and kleisin subunits are essential for the process of chromosome segregation across all domains of life. Prokaryotic condensin belonging to this class of protein complexes is composed of a homodimer of SMC that associates with a kleisin protein subunit called ScpA. While limited structural data exist for the proteins that comprise the (SMC)-kleisin complex, the complete structure of the entire complex remains unknown. Using an integrative approach combining both crystallographic data and coevolutionary information, we predict an atomic-scale structure of the whole condensin complex, which our results indicate being composed of a single ring. Coupling coevolutionary information with molecular-dynamics simulations, we study the interaction surfaces between the subunits and examine the plausibility of alternative stoichiometries of the complex. Our analysis also reveals several additional configurational states of the condensin hinge domain and the SMC-kleisin interaction domains, which are likely involved with the functional opening and closing of the condensin ring. This study provides the foundation for future investigations of the structure-function relationship of the various SMC-kleisin protein complexes at atomic resolution.
Collapse
|
37
|
Bitbol AF. Inferring interaction partners from protein sequences using mutual information. PLoS Comput Biol 2018; 14:e1006401. [PMID: 30422978 PMCID: PMC6258550 DOI: 10.1371/journal.pcbi.1006401] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 11/27/2018] [Accepted: 10/27/2018] [Indexed: 11/30/2022] Open
Abstract
Functional protein-protein interactions are crucial in most cellular processes. They enable multi-protein complexes to assemble and to remain stable, and they allow signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interacting partners, and thus in correlations between their sequences. Pairwise maximum-entropy based models have enabled successful inference of pairs of amino-acid residues that are in contact in the three-dimensional structure of multi-protein complexes, starting from the correlations in the sequence data of known interaction partners. Recently, algorithms inspired by these methods have been developed to identify which proteins are functional interaction partners among the paralogous proteins of two families, starting from sequence data alone. Here, we demonstrate that a slightly higher performance for partner identification can be reached by an approximate maximization of the mutual information between the sequence alignments of the two protein families. Our mutual information-based method also provides signatures of the existence of interactions between protein families. These results stand in contrast with structure prediction of proteins and of multi-protein complexes from sequence data, where pairwise maximum-entropy based global statistical models substantially improve performance compared to mutual information. Our findings entail that the statistical dependences allowing interaction partner prediction from sequence data are not restricted to the residue pairs that are in direct contact at the interface between the partner proteins.
Collapse
Affiliation(s)
- Anne-Florence Bitbol
- Sorbonne Université, CNRS, Laboratoire Jean Perrin (UMR 8237), F-75005 Paris, France
| |
Collapse
|
38
|
Cheng RR, Haglund E, Tiee NS, Morcos F, Levine H, Adams JA, Jennings PA, Onuchic JN. Designing bacterial signaling interactions with coevolutionary landscapes. PLoS One 2018; 13:e0201734. [PMID: 30125296 PMCID: PMC6101370 DOI: 10.1371/journal.pone.0201734] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 07/21/2018] [Indexed: 11/19/2022] Open
Abstract
Selecting amino acids to design novel protein-protein interactions that facilitate catalysis is a daunting challenge. We propose that a computational coevolutionary landscape based on sequence analysis alone offers a major advantage over expensive, time-consuming brute-force approaches currently employed. Our coevolutionary landscape allows prediction of single amino acid substitutions that produce functional interactions between non-cognate, interspecies signaling partners. In addition, it can also predict mutations that maintain segregation of signaling pathways across species. Specifically, predictions of phosphotransfer activity between the Escherichia coli histidine kinase EnvZ to the non-cognate receiver Spo0F from Bacillus subtilis were compiled. Twelve mutations designed to enhance, suppress, or have a neutral effect on kinase phosphotransfer activity to a non-cognate partner were selected. We experimentally tested the ability of the kinase to relay phosphate to the respective designed Spo0F receiver proteins against the theoretical predictions. Our key finding is that the coevolutionary landscape theory, with limited structural data, can significantly reduce the search-space for successful prediction of single amino acid substitutions that modulate phosphotransfer between the two-component His-Asp relay partners in a predicted fashion. This combined approach offers significant improvements over large-scale mutations studies currently used for protein engineering and design.
Collapse
Affiliation(s)
- Ryan R. Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States of America
- * E-mail: (RRC); (JNO)
| | - Ellinor Haglund
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States of America
| | - Nicholas S. Tiee
- Department of Chemistry & Biochemistry, The University of California, San Diego, California, United States of America
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Dallas, Texas, United States of America
- Department of Bioengineering, University of Texas at Dallas, Dallas, Texas, United States of America
| | - Herbert Levine
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States of America
- Department of Bioengineering, Rice University, Houston, Texas, United States of America
- Department of Biosciences, Rice University, Houston, Texas, United States of America
- Department of Physics & Astronomy, Rice University, Houston, Texas, United States of America
| | - Joseph A. Adams
- Department of Pharmacology, The University of California, San Diego, California, United States of America
| | - Patricia A. Jennings
- Department of Chemistry & Biochemistry, The University of California, San Diego, California, United States of America
| | - José N. Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States of America
- Department of Biosciences, Rice University, Houston, Texas, United States of America
- Department of Physics & Astronomy, Rice University, Houston, Texas, United States of America
- Department of Chemistry, Rice University, Houston, Texas, United States of America
- * E-mail: (RRC); (JNO)
| |
Collapse
|
39
|
Hu J, Liu HF, Sun J, Wang J, Liu R. Integrating co-evolutionary signals and other properties of residue pairs to distinguish biological interfaces from crystal contacts. Protein Sci 2018; 27:1723-1735. [PMID: 29931702 DOI: 10.1002/pro.3448] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2018] [Revised: 04/21/2018] [Accepted: 05/16/2018] [Indexed: 12/25/2022]
Abstract
It remains challenging to accurately discriminate between biological and crystal interfaces. Most existing analyses and algorithms focused on the features derived from a single side of the interface. However, less attention has been paid to the properties of residue pairs across protein interfaces. To address this problem, we defined a novel co-evolutionary feature for homodimers through integrating direct coupling analysis and image processing techniques. The residue pairs across biological homodimeric interfaces were significantly enriched in co-evolving residues compared to those across crystal contacts, resulting in a promising classification accuracy with area under the curves (AUCs) of >0.85. Considering the availability of co-evolutionary feature, we also designed other residue pair based features that were useful for both homodimers and heterodimers. The most informative residue pairs were identified to reflect the interaction preferences across protein interfaces. Regarding the other extant properties, we designed the new descriptors at the interface residue level as well as at the pairwise contact level. Extensive validation showed that these single properties can be used to identify biological interfaces with AUCs ranging from 0.60 to 0.88. By integrating co-evolutionary feature with other residue pair based properties, our final prediction model output excellent performance with AUCs of >0.91 on different datasets. Compared to existing methods, our algorithm not only yielded better or comparable results but also provided complementary information. An easy-to-use web server is freely accessible at http://liulab.hzau.edu.cn/RPAIAnalyst.
Collapse
Affiliation(s)
- Jian Hu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China.,College of Biomedical Engineering, South-Central University for Nationalities, Wuhan, 430074, P. R. China
| | - Hui-Fang Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Jun Sun
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Jia Wang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| | - Rong Liu
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, 430070, P. R. China
| |
Collapse
|
40
|
Holland J, Pan Q, Grigoryan G. Contact prediction is hardest for the most informative contacts, but improves with the incorporation of contact potentials. PLoS One 2018; 13:e0199585. [PMID: 29953468 PMCID: PMC6023208 DOI: 10.1371/journal.pone.0199585] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2017] [Accepted: 06/11/2018] [Indexed: 11/18/2022] Open
Abstract
Co-evolution between pairs of residues in a multiple sequence alignment (MSA) of homologous proteins has long been proposed as an indicator of structural contacts. Recently, several methods, such as direct-coupling analysis (DCA) and MetaPSICOV, have been shown to achieve impressive rates of contact prediction by taking advantage of considerable sequence data. In this paper, we show that prediction success rates are highly sensitive to the structural definition of a contact, with more permissive definitions (i.e., those classifying more pairs as true contacts) naturally leading to higher positive predictive rates, but at the expense of the amount of structural information contributed by each contact. Thus, the remaining limitations of contact prediction algorithms are most noticeable in conjunction with geometrically restrictive contacts—precisely those that contribute more information in structure prediction. We suggest that to improve prediction rates for such “informative” contacts one could combine co-evolution scores with additional indicators of contact likelihood. Specifically, we find that when a pair of co-varying positions in an MSA is occupied by residue pairs with favorable statistical contact energies, that pair is more likely to represent a true contact. We show that combining a contact potential metric with DCA or MetaPSICOV performs considerably better than DCA or MetaPSICOV alone, respectively. This is true regardless of contact definition, but especially true for stricter and more informative contact definitions. In summary, this work outlines some remaining challenges to be addressed in contact prediction and proposes and validates a promising direction towards improvement.
Collapse
Affiliation(s)
- Jack Holland
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States of America
| | - Qinxin Pan
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States of America
| | - Gevorg Grigoryan
- Department of Computer Science, Dartmouth College, Hanover, NH 03755, United States of America
- Department of Biological Sciences, Dartmouth College, Hanover, NH 03755, United States of America
- * E-mail:
| |
Collapse
|
41
|
Szurmant H, Weigt M. Inter-residue, inter-protein and inter-family coevolution: bridging the scales. Curr Opin Struct Biol 2018; 50:26-32. [PMID: 29101847 PMCID: PMC5940578 DOI: 10.1016/j.sbi.2017.10.014] [Citation(s) in RCA: 55] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 10/12/2017] [Accepted: 10/13/2017] [Indexed: 10/18/2022]
Abstract
Interacting proteins coevolve at multiple but interconnected scales, from the residue-residue over the protein-protein up to the family-family level. The recent accumulation of enormous amounts of sequence data allows for the development of novel, data-driven computational approaches. Notably, these approaches can bridge scales within a single statistical framework. Although being currently applied mostly to isolated problems on single scales, their immense potential for an evolutionary informed, structural systems biology is steadily emerging.
Collapse
Affiliation(s)
- Hendrik Szurmant
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona, CA 91766, USA.
| | - Martin Weigt
- Sorbonne Universités, UPMC Université Paris 06, CNRS, Biologie Computationnelle et Quantitative - Institut de Biologie Paris Seine, 75005 Paris, France.
| |
Collapse
|
42
|
dos Santos RN, Khan S, Morcos F. Characterization of C-ring component assembly in flagellar motors from amino acid coevolution. ROYAL SOCIETY OPEN SCIENCE 2018; 5:171854. [PMID: 29892378 PMCID: PMC5990795 DOI: 10.1098/rsos.171854] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2017] [Accepted: 04/05/2018] [Indexed: 06/08/2023]
Abstract
Bacterial flagellar motility, an important virulence factor, is energized by a rotary motor localized within the flagellar basal body. The rotor module consists of a large framework (the C-ring), composed of the FliG, FliM and FliN proteins. FliN and FliM contacts the FliG torque ring to control the direction of flagellar rotation. We report that structure-based models constrained only by residue coevolution can recover the binding interface of atomic X-ray dimer complexes with remarkable accuracy (approx. 1 Å RMSD). We propose a model for FliM-FliN heterodimerization, which agrees accurately with homologous interfaces as well as in situ cross-linking experiments, and hence supports a proposed architecture for the lower portion of the C-ring. Furthermore, this approach allowed the identification of two discrete and interchangeable homodimerization interfaces between FliM middle domains that agree with experimental measurements and might be associated with C-ring directional switching dynamics triggered upon binding of CheY signal protein. Our findings provide structural details of complex formation at the C-ring that have been difficult to obtain with previous methodologies and clarify the architectural principle that underpins the ultra-sensitive allostery exhibited by this ring assembly that controls the clockwise or counterclockwise rotation of flagella.
Collapse
Affiliation(s)
- Ricardo Nascimento dos Santos
- Institute of Chemistry and Center for Computational Engineering and Science, University of Campinas, Campinas, SP, Brazil
| | - Shahid Khan
- Molecular Biology Consortium, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, USA
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX, USA
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX, USA
| |
Collapse
|
43
|
Sanchez-Ibarra HE, Reyes-Cortes LM, Jiang XL, Luna-Aguirre CM, Aguirre-Trevino D, Morales-Alvarado IA, Leon-Cachon RB, Lavalle-Gonzalez F, Morcos F, Barrera-Saldaña HA. Genotypic and Phenotypic Factors Influencing Drug Response in Mexican Patients With Type 2 Diabetes Mellitus. Front Pharmacol 2018; 9:320. [PMID: 29681852 PMCID: PMC5898372 DOI: 10.3389/fphar.2018.00320] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2018] [Accepted: 03/20/2018] [Indexed: 12/17/2022] Open
Abstract
The treatment of Type 2 Diabetes Mellitus (T2DM) consists primarily of oral antidiabetic drugs (OADs) that stimulate insulin secretion, such as sulfonylureas (SUs) and reduce hepatic glucose production (e.g., biguanides), among others. The marked inter-individual differences among T2DM patients’ response to these drugs have become an issue on prescribing and dosing efficiently. In this study, fourteen polymorphisms selected from Genome-wide association studies (GWAS) were screened in 495 T2DM Mexican patients previously treated with OADs to find the relationship between the presence of these polymorphisms and response to the OADs. Then, a novel association screening method, based on global probabilities, was used to globally characterize important relationships between the drug response to OADs and genetic and clinical parameters, including polymorphisms, patient information, and type of treatment. Two polymorphisms, ABCC8-Ala1369Ser and KCNJ11-Glu23Lys, showed a significant impact on response to SUs. Heterozygous ABCC8-Ala1369Ser variant (A/C) carriers exhibited a higher response to SUs compared to homozygous ABCC8-Ala1369Ser variant (A/A) carriers (p-value = 0.029) and to homozygous wild-type genotypes (C/C) (p-value = 0.012). The homozygous KCNJ11-Glu23Lys variant (C/C) and wild-type (T/T) genotypes had a lower response to SUs compared to heterozygous (C/T) carriers (p-value = 0.039). The screening of OADs response related genetic and clinical factors could help improve the prescribing and dosing of OADs for T2DM patients and thus contribute to the design of personalized treatments.
Collapse
Affiliation(s)
| | | | - Xian-Li Jiang
- Evolutionary Information Laboratory, Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, United States
| | | | | | | | - Rafael B Leon-Cachon
- Departamento de Ciencias Básicas, Centro de Diagnóstico Molecular y Medicina Personalizada, Vicerrectoría de Ciencias de la Salud, Universidad de Monterrey, Monterrey, Mexico
| | - Fernando Lavalle-Gonzalez
- Servicio de Endocrinología, Hospital Universitario Dr. José E. González, Universidad Autónoma de Nuevo León, Monterrey, Mexico
| | - Faruck Morcos
- Evolutionary Information Laboratory, Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, United States.,Center for Systems Biology, University of Texas at Dallas, Richardson, TX, United States
| | - Hugo A Barrera-Saldaña
- Molecular Genetics Laboratory, Vitagénesis, S.A. de C.V., Monterrey, Mexico.,Tecnológico de Monterrey, Monterrey, Mexico
| |
Collapse
|
44
|
Nicoludis JM, Gaudet R. Applications of sequence coevolution in membrane protein biochemistry. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2018; 1860:895-908. [PMID: 28993150 PMCID: PMC5807202 DOI: 10.1016/j.bbamem.2017.10.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 09/28/2017] [Accepted: 10/02/2017] [Indexed: 12/22/2022]
Abstract
Recently, protein sequence coevolution analysis has matured into a predictive powerhouse for protein structure and function. Direct methods, which use global statistical models of sequence coevolution, have enabled the prediction of membrane and disordered protein structures, protein complex architectures, and the functional effects of mutations in proteins. The field of membrane protein biochemistry and structural biology has embraced these computational techniques, which provide functional and structural information in an otherwise experimentally-challenging field. Here we review recent applications of protein sequence coevolution analysis to membrane protein structure and function and highlight the promising directions and future obstacles in these fields. We provide insights and guidelines for membrane protein biochemists who wish to apply sequence coevolution analysis to a given experimental system.
Collapse
Affiliation(s)
- John M Nicoludis
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, United States
| | - Rachelle Gaudet
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, 02138, United States.
| |
Collapse
|
45
|
dos Santos RN, Ferrari AJR, de Jesus HCR, Gozzo FC, Morcos F, Martínez L. Enhancing protein fold determination by exploring the complementary information of chemical cross-linking and coevolutionary signals. Bioinformatics 2018; 34:2201-2208. [DOI: 10.1093/bioinformatics/bty074] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2017] [Accepted: 02/10/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Ricardo N dos Santos
- Institute of Chemistry, University of Campinas, Campinas, Brazil
- Center for Computational Engineering and Sciences, University of Campinas, Campinas, Brazil
| | | | | | - Fábio C Gozzo
- Institute of Chemistry, University of Campinas, Campinas, Brazil
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, USA
| | - Leandro Martínez
- Institute of Chemistry, University of Campinas, Campinas, Brazil
- Center for Computational Engineering and Sciences, University of Campinas, Campinas, Brazil
| |
Collapse
|
46
|
Huang YJ, Brock KP, Sander C, Marks DS, Montelione GT. A Hybrid Approach for Protein Structure Determination Combining Sparse NMR with Evolutionary Coupling Sequence Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2018; 1105:153-169. [PMID: 30617828 DOI: 10.1007/978-981-13-2200-6_10] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
While 3D structure determination of small (<15 kDa) proteins by solution NMR is largely automated and routine, structural analysis of larger proteins is more challenging. An emerging hybrid strategy for modeling protein structures combines sparse NMR data that can be obtained for larger proteins with sequence co-variation data, called evolutionary couplings (ECs), obtained from multiple sequence alignments of protein families. This hybrid "EC-NMR" method can be used to accurately model larger (15-60 kDa) proteins, and more rapidly determine structures of smaller (5-15 kDa) proteins using only backbone NMR data. The resulting structures have accuracies relative to reference structures comparable to those obtained with full backbone and sidechain NMR resonance assignments. The requirement that evolutionary couplings (ECs) are consistent with NMR data recorded on a specific member of a protein family, under specific conditions, potentially also allows identification of ECs that reflect alternative allosteric or excited states of the protein structure.
Collapse
Affiliation(s)
- Yuanpeng Janet Huang
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Kelly P Brock
- cBio Center, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Chris Sander
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- cBio Center, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, USA.
| |
Collapse
|
47
|
Suplatov D, Sharapova Y, Timonina D, Kopylov K, Švedas V. The visualCMAT: A web-server to select and interpret correlated mutations/co-evolving residues in protein families. J Bioinform Comput Biol 2017; 16:1840005. [PMID: 29361894 DOI: 10.1142/s021972001840005x] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
The visualCMAT web-server was designed to assist experimental research in the fields of protein/enzyme biochemistry, protein engineering, and drug discovery by providing an intuitive and easy-to-use interface to the analysis of correlated mutations/co-evolving residues. Sequence and structural information describing homologous proteins are used to predict correlated substitutions by the Mutual information-based CMAT approach, classify them into spatially close co-evolving pairs, which either form a direct physical contact or interact with the same ligand (e.g. a substrate or a crystallographic water molecule), and long-range correlations, annotate and rank binding sites on the protein surface by the presence of statistically significant co-evolving positions. The results of the visualCMAT are organized for a convenient visual analysis and can be downloaded to a local computer as a content-rich all-in-one PyMol session file with multiple layers of annotation corresponding to bioinformatic, statistical and structural analyses of the predicted co-evolution, or further studied online using the built-in interactive analysis tools. The online interactivity is implemented in HTML5 and therefore neither plugins nor Java are required. The visualCMAT web-server is integrated with the Mustguseal web-server capable of constructing large structure-guided sequence alignments of protein families and superfamilies using all available information about their structures and sequences in public databases. The visualCMAT web-server can be used to understand the relationship between structure and function in proteins, implemented at selecting hotspots and compensatory mutations for rational design and directed evolution experiments to produce novel enzymes with improved properties, and employed at studying the mechanism of selective ligand's binding and allosteric communication between topologically independent sites in protein structures. The web-server is freely available at https://biokinet.belozersky.msu.ru/visualcmat and there are no login requirements.
Collapse
Affiliation(s)
- Dmitry Suplatov
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Yana Sharapova
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Daria Timonina
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Kirill Kopylov
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| | - Vytas Švedas
- 1 Belozersky Institute of Physicochemical Biology, Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Leninskiye Gory 1-73, Moscow 119991, Russia
| |
Collapse
|
48
|
Biomolecular coevolution and its applications: Going from structure prediction toward signaling, epistasis, and function. Biochem Soc Trans 2017; 45:1253-1261. [DOI: 10.1042/bst20170063] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Revised: 08/30/2017] [Accepted: 09/04/2017] [Indexed: 01/01/2023]
Abstract
Evolution leads to considerable changes in the sequence of biomolecules, while their overall structure and function remain quite conserved. The wealth of genomic sequences, the ‘Biological Big Data’, modern sequencing techniques provide allows us to investigate biomolecular evolution with unprecedented detail. Sophisticated statistical models can infer residue pair mutations resulting from spatial proximity. The introduction of predicted spatial adjacencies as constraints in biomolecular structure prediction workflows has transformed the field of protein and RNA structure prediction toward accuracies approaching the experimental resolution limit. Going beyond structure prediction, the same mathematical framework allows mimicking evolutionary fitness landscapes to infer signaling interactions, epistasis, or mutational landscapes.
Collapse
|
49
|
Shamsi Z, Moffett AS, Shukla D. Enhanced unbiased sampling of protein dynamics using evolutionary coupling information. Sci Rep 2017; 7:12700. [PMID: 28983093 PMCID: PMC5629199 DOI: 10.1038/s41598-017-12874-7] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2016] [Accepted: 09/14/2017] [Indexed: 12/25/2022] Open
Abstract
One of the major challenges in atomistic simulations of proteins is efficient sampling of pathways associated with rare conformational transitions. Recent developments in statistical methods for computation of direct evolutionary couplings between amino acids within and across polypeptide chains have allowed for inference of native residue contacts, informing accurate prediction of protein folds and multimeric structures. In this study, we assess the use of distances between evolutionarily coupled residues as natural choices for reaction coordinates which can be incorporated into Markov state model-based adaptive sampling schemes and potentially used to predict not only functional conformations but also pathways of conformational change, protein folding, and protein-protein association. We demonstrate the utility of evolutionary couplings in sampling and predicting activation pathways of the β 2-adrenergic receptor (β 2-AR), folding of the FiP35 WW domain, and dimerization of the E. coli molybdopterin synthase subunits. We find that the time required for β 2-AR activation and folding of the WW domain are greatly diminished using evolutionary couplings-guided adaptive sampling. Additionally, we were able to identify putative molybdopterin synthase association pathways and near-crystal structure complexes from protein-protein association simulations.
Collapse
Affiliation(s)
- Zahra Shamsi
- Department of Chemical and Biomolecular Engineering, University of Illinois, Urbana, IL, 61801, USA
| | - Alexander S Moffett
- Center for Biophysics and Quantitative Biology, University of Illinois, Urbana, IL, 61801, USA
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois, Urbana, IL, 61801, USA.
- Center for Biophysics and Quantitative Biology, University of Illinois, Urbana, IL, 61801, USA.
- Department of Plant Biology, University of Illinois, Urbana, IL, 61801, USA.
- National Center for Supercomputing Applications, University of Illinois, Urbana, IL, 61801, USA.
| |
Collapse
|
50
|
Sirovetz BJ, Schafer NP, Wolynes PG. Protein structure prediction: making AWSEM AWSEM-ER by adding evolutionary restraints. Proteins 2017; 85:2127-2142. [PMID: 28799172 DOI: 10.1002/prot.25367] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2017] [Revised: 07/29/2017] [Accepted: 08/08/2017] [Indexed: 11/07/2022]
Abstract
Protein sequences have evolved to fold into functional structures, resulting in families of diverse protein sequences that all share the same overall fold. One can harness protein family sequence data to infer likely contacts between pairs of residues. In the current study, we combine this kind of inference from coevolutionary information with a coarse-grained protein force field ordinarily used with single sequence input, the Associative memory, Water mediated, Structure and Energy Model (AWSEM), to achieve improved structure prediction. The resulting Associative memory, Water mediated, Structure and Energy Model with Evolutionary Restraints (AWSEM-ER) yields a significant improvement in the quality of protein structure prediction over the single sequence prediction from AWSEM when a sufficiently large number of homologous sequences are available. Free energy landscape analysis shows that the addition of the evolutionary term shifts the free energy minimum to more native-like structures, which explains the improvement in the quality of structures when performing predictions using simulated annealing. Simulations using AWSEM without coevolutionary information have proved useful in elucidating not only protein folding behavior, but also mechanisms of protein function. The success of AWSEM-ER in de novo structure prediction suggests that the enhanced model opens the door to functional studies of proteins even when no experimentally solved structures are available.
Collapse
Affiliation(s)
- Brian J Sirovetz
- Center for Theoretical Biological Physics, Rice University, Houston, Texas.,Department of Chemistry, Rice University, Houston, Texas
| | - Nicholas P Schafer
- Center for Theoretical Biological Physics, Rice University, Houston, Texas
| | - Peter G Wolynes
- Center for Theoretical Biological Physics, Rice University, Houston, Texas.,Department of Chemistry, Rice University, Houston, Texas.,Department of Physics, Rice University, Houston, Texas.,Department of Biosciences, Rice University, Houston, Texas
| |
Collapse
|