1
|
Guo L, Yu Q, Wang D, Wu X, Wolynes PG, Chen M. Generating the polymorph landscapes of amyloid fibrils using AI: RibbonFold. Proc Natl Acad Sci U S A 2025; 122:e2501321122. [PMID: 40232799 PMCID: PMC12037047 DOI: 10.1073/pnas.2501321122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2025] [Accepted: 03/07/2025] [Indexed: 04/16/2025] Open
Abstract
The concept that proteins are selected to fold into a well-defined native state has been effectively addressed within the framework of energy landscapes, underpinning the recent successes of structure prediction tools like AlphaFold. The amyloid fold, however, does not represent a unique minimum for a given single sequence. While the cross-β hydrogen-bonding pattern is common to all amyloids, other aspects of amyloid fiber structures are sensitive not only to the sequence of the aggregating peptides but also to the experimental conditions. This polymorphic nature of amyloid structures challenges structure predictions. In this paper, we use AI to explore the landscape of possible amyloid protofilament structures composed of a single stack of peptides aligned in a parallel, in-register manner. This perspective enables a practical method for predicting protofilament structures of arbitrary sequences: RibbonFold. RibbonFold is adapted from AlphaFold2, incorporating parallel in-register constraints within AlphaFold2's template module, along with an appropriate polymorphism loss function to address the structural diversity of folds. RibbonFold outperforms AlphaFold2/3 on independent test sets, achieving a mean TM-score of 0.5. RibbonFold proves well-suited to study the polymorphic landscapes of widely studied sequences with documented polymorphisms. The resulting landscapes capture these observed polymorphisms effectively. We show that while well-known amyloid-forming sequences exhibit a limited number of plausible polymorphs on their "solubility" landscape, randomly shuffled sequences with the same composition appear to be negatively selected in terms of their relative solubility. RibbonFold is a valuable framework for structurally characterizing amyloid polymorphism landscapes.
Collapse
Affiliation(s)
| | - Qilin Yu
- Changping Laboratory, Beijing102206, China
| | - Di Wang
- Changping Laboratory, Beijing102206, China
| | - Xiaoyu Wu
- Changping Laboratory, Beijing102206, China
| | - Peter G. Wolynes
- Center for Theoretical Biological Physics, Rice University, Houston, TX77005
- Department of Chemistry, Rice University, Houston, TX77005
- Department of Physics and Astronomy, Rice University, Houston, TX77005
- Department of Biosciences, Rice University, Houston, TX77005
| | | |
Collapse
|
2
|
Kulkarni P, Porter L, Chou TF, Chong S, Chiti F, Schafer JW, Mohanty A, Ramisetty S, Onuchic JN, Tuite M, Uversky VN, Weninger KR, Koonin EV, Orban J, Salgia R. Evolving concepts of the protein universe. iScience 2025; 28:112012. [PMID: 40124498 PMCID: PMC11926713 DOI: 10.1016/j.isci.2025.112012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2025] Open
Abstract
The protein universe is the collection of all proteins on earth from all organisms both extant and extinct. Classical studies on protein folding suggested that proteins exist as a unique three-dimensional conformation that is dictated by the genetic code and is critical for function. In this perspective, we discuss ideas and developments that emerged over the past three decades regarding the protein structure-function paradigm. It is now clear that ordered (active/functional) and disordered/denatured (and hence inactive/non-functional) represent a continuum of states rather than binary states. Some proteins can switch folds without sequence change. Others exist as conformational ensembles lacking defined structure yet play critical roles in many biological processes, including forming membrane-less organelles driven by liquid-liquid phase separation. Numerous diverse proteins harbor segments with the potential to form amyloid fibrils, many of which are functional, and some possess prion-like properties enabling conformation-based transfer of heritable information. Taken together, these developments reveal the remarkable complexity of the protein universe.
Collapse
Affiliation(s)
- Prakash Kulkarni
- Department of Medical Oncology, City of Hope Medical Center, Duarte, CA, USA
- Department of Systems Biology, City of Hope Medical Center, Duarte, CA, USA
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Lauren Porter
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Tsui-Fen Chou
- Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, CA, USA
- Proteome Exploration Laboratory, Beckman Institute, California Institute of Technology, Pasadena, CA, USA
| | - Shasha Chong
- Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA, USA
| | - Fabrizio Chiti
- Department of Experimental and Clinical Biomedical Sciences “Mario Serio”, University of Florence, Florence, Italy
| | - Joseph W. Schafer
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Atish Mohanty
- Department of Medical Oncology, City of Hope Medical Center, Duarte, CA, USA
| | - Sravani Ramisetty
- Department of Medical Oncology, City of Hope Medical Center, Duarte, CA, USA
| | - Jose N. Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA
- Department of Physics and Astronomy, Rice University, Houston, TX, USA
| | - Mick Tuite
- Kent Fungal Group, School of Biosciences, Division of Natural Sciences, University of Kent, CT2 7NJ Canterbury, UK
| | - Vladimir N. Uversky
- Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL, USA
| | - Keith R. Weninger
- Department of Physics, North Carolina State University, Raleigh, NC, USA
| | - Eugene V. Koonin
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - John Orban
- W. M. Keck Laboratory for Structural Biology, University of Maryland Institute for Bioscience and Biotechnology Research, Rockville, MD, USA
- Department of Chemistry and Biochemistry, University of Maryland, College Park, MD, USA
| | - Ravi Salgia
- Department of Medical Oncology, City of Hope Medical Center, Duarte, CA, USA
| |
Collapse
|
3
|
Caredda F, Pagnani A. Direct coupling analysis and the attention mechanism. BMC Bioinformatics 2025; 26:41. [PMID: 39915710 PMCID: PMC11804077 DOI: 10.1186/s12859-025-06062-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2024] [Accepted: 01/22/2025] [Indexed: 02/09/2025] Open
Abstract
Proteins are involved in nearly all cellular functions, encompassing roles in transport, signaling, enzymatic activity, and more. Their functionalities crucially depend on their complex three-dimensional arrangement. For this reason, being able to predict their structure from the amino acid sequence has been and still is a phenomenal computational challenge that the introduction of AlphaFold solved with unprecedented accuracy. However, the inherent complexity of AlphaFold's architectures makes it challenging to understand the rules that ultimately shape the protein's predicted structure. This study investigates a single-layer unsupervised model based on the attention mechanism. More precisely, we explore a Direct Coupling Analysis (DCA) method that mimics the attention mechanism of several popular Transformer architectures, such as AlphaFold itself. The model's parameters, notably fewer than those in standard DCA-based algorithms, can be directly used for extracting structural determinants such as the contact map of the protein family under study. Additionally, the functional form of the energy function of the model enables us to deploy a multi-family learning strategy, allowing us to effectively integrate information across multiple protein families, whereas standard DCA algorithms are typically limited to single protein families. Finally, we implemented a generative version of the model using an autoregressive architecture, capable of efficiently generating new proteins in silico.
Collapse
Affiliation(s)
- Francesco Caredda
- DISAT, Politecnico di Torino, Corso Duca degli Abruzzi, I-10129, Torino, Italy.
| | - Andrea Pagnani
- DISAT, Politecnico di Torino, Corso Duca degli Abruzzi, I-10129, Torino, Italy
- Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060, Candiolo, Italy
- INFN, Sezione di Torino, Via Pietro Giuria, I-10125, Torino, Italy
| |
Collapse
|
4
|
Swapna GVT, Dube N, Roth MJ, Montelione GT. Modeling Alternative Conformational States of Pseudo-Symmetric Solute Carrier Transporters using Methods from Deep Learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.07.15.603529. [PMID: 39071413 PMCID: PMC11275918 DOI: 10.1101/2024.07.15.603529] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/30/2024]
Abstract
The Solute Carrier (SLC) superfamily of integral membrane proteins function to transport a wide array of small molecules across plasma and organelle membranes. SLC proteins also function as important drug transporters and as viral receptors. Despite being classified as a single superfamily, SLC proteins do not share a single common fold classification; however, most belong to multi-pass transmembrane helical protein fold families. SLC proteins populate different conformational states during the solute transport process, including outward-open, intermediate (occluded), and inward-open conformational states. For some SLC fold families this structural "flipping" corresponds to swapping between conformations of their N-terminal and C-terminal symmetry-related sub-structures. Conventional AlphaFold2, AlphaFold3, or Evolutionary Scale Modeling methods typically generate models for only one of these multiple conformational states of SLC proteins. Several modifications of these AI-based protocols for modeling multiple conformational states of proteins have been described recently. These methods are often impacted by "memorization" of one of the alternative conformational states, and do not always provide both the inward and outward facing conformations of SLC proteins. Here we describe a combined ESM - template-based-modeling process, based on a previously described template-based modeling method that relies on the internal pseudo-symmetry of many SLC proteins, to consistently model alternate conformational states of SLC proteins. We further demonstrate how the resulting multi-state models can be validated experimentally by comparison with sequence-based evolutionary co-variance data (ECs) that encode information about contacts present in the various conformational states adopted by the protein. This simple, rapid, and robust approach for modeling conformational landscapes of pseudo-symmetric SLC proteins is demonstrated for several integral membrane protein transporters, including SLC35F2 the receptor of a feline leukemia virus envelope protein required for viral entry into eukaryotic cells.
Collapse
Affiliation(s)
- G V T Swapna
- Dept. of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, New York, 12180 USA
- Department of Pharmacology, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway NJ 08854 USA
| | - Namita Dube
- Dept. of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, New York, 12180 USA
| | - Monica J Roth
- Department of Pharmacology, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway NJ 08854 USA
| | - Gaetano T Montelione
- Dept. of Chemistry and Chemical Biology, Center for Biotechnology and Interdisciplinary Sciences, Rensselaer Polytechnic Institute, Troy, New York, 12180 USA
| |
Collapse
|
5
|
Dietler N, Abbara A, Choudhury S, Bitbol AF. Impact of phylogeny on the inference of functional sectors from protein sequence data. PLoS Comput Biol 2024; 20:e1012091. [PMID: 39312591 PMCID: PMC11449291 DOI: 10.1371/journal.pcbi.1012091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Revised: 10/03/2024] [Accepted: 09/10/2024] [Indexed: 09/25/2024] Open
Abstract
Statistical analysis of multiple sequence alignments of homologous proteins has revealed groups of coevolving amino acids called sectors. These groups of amino-acid sites feature collective correlations in their amino-acid usage, and they are associated to functional properties. Modeling showed that nonlinear selection on an additive functional trait of a protein is generically expected to give rise to a functional sector. These modeling results motivated a principled method, called ICOD, which is designed to identify functional sectors, as well as mutational effects, from sequence data. However, a challenge for all methods aiming to identify sectors from multiple sequence alignments is that correlations in amino-acid usage can also arise from the mere fact that homologous sequences share common ancestry, i.e. from phylogeny. Here, we generate controlled synthetic data from a minimal model comprising both phylogeny and functional sectors. We use this data to dissect the impact of phylogeny on sector identification and on mutational effect inference by different methods. We find that ICOD is most robust to phylogeny, but that conservation is also quite robust. Next, we consider natural multiple sequence alignments of protein families for which deep mutational scan experimental data is available. We show that in this natural data, conservation and ICOD best identify sites with strong functional roles, in agreement with our results on synthetic data. Importantly, these two methods have different premises, since they respectively focus on conservation and on correlations. Thus, their joint use can reveal complementary information.
Collapse
Affiliation(s)
- Nicola Dietler
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Alia Abbara
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Subham Choudhury
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
6
|
Guan X, Tang QY, Ren W, Chen M, Wang W, Wolynes PG, Li W. Predicting protein conformational motions using energetic frustration analysis and AlphaFold2. Proc Natl Acad Sci U S A 2024; 121:e2410662121. [PMID: 39163334 PMCID: PMC11363347 DOI: 10.1073/pnas.2410662121] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2024] [Accepted: 07/16/2024] [Indexed: 08/22/2024] Open
Abstract
Proteins perform their biological functions through motion. Although high throughput prediction of the three-dimensional static structures of proteins has proved feasible using deep-learning-based methods, predicting the conformational motions remains a challenge. Purely data-driven machine learning methods encounter difficulty for addressing such motions because available laboratory data on conformational motions are still limited. In this work, we develop a method for generating protein allosteric motions by integrating physical energy landscape information into deep-learning-based methods. We show that local energetic frustration, which represents a quantification of the local features of the energy landscape governing protein allosteric dynamics, can be utilized to empower AlphaFold2 (AF2) to predict protein conformational motions. Starting from ground state static structures, this integrative method generates alternative structures as well as pathways of protein conformational motions, using a progressive enhancement of the energetic frustration features in the input multiple sequence alignment sequences. For a model protein adenylate kinase, we show that the generated conformational motions are consistent with available experimental and molecular dynamics simulation data. Applying the method to another two proteins KaiB and ribose-binding protein, which involve large-amplitude conformational changes, can also successfully generate the alternative conformations. We also show how to extract overall features of the AF2 energy landscape topography, which has been considered by many to be black box. Incorporating physical knowledge into deep-learning-based structure prediction algorithms provides a useful strategy to address the challenges of dynamic structure prediction of allosteric proteins.
Collapse
Affiliation(s)
- Xingyue Guan
- Department of Physics, National Laboratory of Solid State Microstructure, Nanjing University, Nanjing210093, China
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang325000, China
| | - Qian-Yuan Tang
- Department of Physics, Hong Kong Baptist University, Kowloon Tong, Hong Kong Special Administrative Region999077, China
| | - Weitong Ren
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang325000, China
| | | | - Wei Wang
- Department of Physics, National Laboratory of Solid State Microstructure, Nanjing University, Nanjing210093, China
| | - Peter G. Wolynes
- Center for Theoretical Biological Physics, Rice University, Houston, TX77005
| | - Wenfei Li
- Department of Physics, National Laboratory of Solid State Microstructure, Nanjing University, Nanjing210093, China
- Wenzhou Key Laboratory of Biophysics, Wenzhou Institute, University of Chinese Academy of Sciences, Wenzhou, Zhejiang325000, China
| |
Collapse
|
7
|
Pawnikar S, Magenheimer BS, Joshi K, Munoz EN, Haldane A, Maser RL, Miao Y. Activation of Polycystin-1 Signaling by Binding of Stalk-derived Peptide Agonists. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.01.06.574465. [PMID: 38260358 PMCID: PMC10802338 DOI: 10.1101/2024.01.06.574465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/24/2024]
Abstract
Polycystin-1 (PC1) is the membrane protein product of the PKD1 gene whose mutation is responsible for 85% of the cases of autosomal dominant polycystic kidney disease (ADPKD). ADPKD is primarily characterized by the formation of renal cysts and potential kidney failure. PC1 is an atypical G protein-coupled receptor (GPCR) consisting of 11 transmembrane helices and an autocatalytic GAIN domain that cleaves PC1 into extracellular N-terminal (NTF) and membrane-embedded C-terminal (CTF) fragments. Recently, signaling activation of the PC1 CTF was shown to be regulated by a stalk tethered agonist (TA), a distinct mechanism observed in the adhesion GPCR family. A novel allosteric activation pathway was elucidated for the PC1 CTF through a combination of Gaussian accelerated molecular dynamics (GaMD), mutagenesis and cellular signaling experiments. Here, we show that synthetic, soluble peptides with 7 to 21 residues derived from the stalk TA, in particular, peptides including the first 9 residues (p9), 17 residues (p17) and 21 residues (p21) exhibited the ability to re-activate signaling by a stalkless PC1 CTF mutant in cellular assays. To reveal molecular mechanisms of stalk peptide-mediated signaling activation, we have applied a novel Peptide GaMD (Pep-GaMD) algorithm to elucidate binding conformations of selected stalk peptide agonists p9, p17 and p21 to the stalkless PC1 CTF. The simulations revealed multiple specific binding regions of the stalk peptide agonists to the PC1 protein including an "intermediate" bound yet inactive state. Our Pep-GaMD simulation findings were consistent with the cellular assay experimental data. Binding of peptide agonists to the TOP domain of PC1 induced close TOP-putative pore loop interactions, a characteristic feature of the PC1 CTF signaling activation mechanism. Using sequence covariation analysis of PC1 homologs, we further showed that the peptide binding regions were consistent with covarying residue pairs identified between the TOP domain and the stalk TA. Therefore, structural dynamic insights into the mechanisms of PC1 activation by stalk-derived peptide agonists have enabled an in-depth understanding of PC1 signaling. They will form a foundation for development of PC1 as a therapeutic target for the treatment of ADPKD.
Collapse
Affiliation(s)
- Shristi Pawnikar
- Center for Computational Biology and Department of Molecular Biosciences, University of Kansas, Lawrence, KS 66047
| | - Brenda S. Magenheimer
- Clinical Laboratory Sciences, University of Kansas Medical Center, Kansas City, KS 66160
- The Jared Grantham Kidney Institute, University of Kansas Medical Center, Kansas City, KS 66160
| | - Keya Joshi
- Department of Pharmacology and Computational Medicine Program, University of North Carolina – Chapel Hill, Chapel Hill, NC 27599
| | - Ericka Nevarez Munoz
- Clinical Laboratory Sciences, University of Kansas Medical Center, Kansas City, KS 66160
| | - Allan Haldane
- Dept of Physics, and Center for Biophysics and Computational Biology, Temple University, Philadelphia, PA 19122
| | - Robin L. Maser
- Departments of Biochemistry and Molecular Biology, University of Kansas Medical Center, Kansas City, KS 66160
- Clinical Laboratory Sciences, University of Kansas Medical Center, Kansas City, KS 66160
- The Jared Grantham Kidney Institute, University of Kansas Medical Center, Kansas City, KS 66160
| | - Yinglong Miao
- Department of Pharmacology and Computational Medicine Program, University of North Carolina – Chapel Hill, Chapel Hill, NC 27599
| |
Collapse
|
8
|
Xie T, Huang J. Can Protein Structure Prediction Methods Capture Alternative Conformations of Membrane Transporters? J Chem Inf Model 2024; 64:3524-3536. [PMID: 38564295 DOI: 10.1021/acs.jcim.3c01936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Understanding the conformational dynamics of proteins, such as the inward-facing (IF) and outward-facing (OF) transition observed in transporters, is vital for elucidating their functional mechanisms. Despite significant advances in protein structure prediction (PSP) over the past three decades, most efforts have been focused on single-state prediction, leaving multistate or alternative conformation prediction (ACP) relatively unexplored. This discrepancy has led to the development of highly accurate PSP methods such as AlphaFold, yet their capabilities for ACP remain limited. To investigate the performance of current PSP methods in ACP, we curated a data set, named IOMemP, consisting of 32 experimentally determined high-resolution IF and OF structures of 16 membrane proteins with substantial conformational changes. We benchmarked 12 representative PSP methods, along with two recent multistate methods based on AlphaFold, against this data set. Our findings reveal a remarkably consistent preference for specific states across various PSP methods. We elucidated how coevolution information in MSAs influences state preference. Moreover, we showed that AlphaFold, when excluding coevolution information, estimated similar energies between the experimental IF and OF conformations, indicating that the energy model learned by AlphaFold is not biased toward any particular state. Our IOMemP data set and benchmark results are anticipated to advance the development of robust ACP methods.
Collapse
Affiliation(s)
- Tengyu Xie
- College of Life Science, Zhejiang University, HangZhou Zhejiang 310058, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, HangZhou Zhejiang 310024, China
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, HangZhou Zhejiang 310024, China
| | - Jing Huang
- College of Life Science, Zhejiang University, HangZhou Zhejiang 310058, China
- Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, HangZhou Zhejiang 310024, China
- Westlake AI Therapeutics Lab, Westlake Laboratory of Life Sciences and Biomedicine, HangZhou Zhejiang 310024, China
| |
Collapse
|
9
|
Shibata M, Lin X, Onuchic JN, Yura K, Cheng RR. Residue coevolution and mutational landscape for OmpR and NarL response regulator subfamilies. Biophys J 2024; 123:681-692. [PMID: 38291753 PMCID: PMC10995415 DOI: 10.1016/j.bpj.2024.01.028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2023] [Revised: 12/31/2023] [Accepted: 01/24/2024] [Indexed: 02/01/2024] Open
Abstract
DNA-binding response regulators (DBRRs) are a broad class of proteins that operate in tandem with their partner kinase proteins to form two-component signal transduction systems in bacteria. Typical DBRRs are composed of two domains where the conserved N-terminal domain accepts transduced signals and the evolutionarily diverse C-terminal domain binds to DNA. These domains are assumed to be functionally independent, and hence recombination of the two domains should yield novel DBRRs of arbitrary input/output response, which can be used as biosensors. This idea has been proved to be successful in some cases; yet, the error rate is not trivial. Improvement of the success rate of this technique requires a deeper understanding of the linker-domain and inter-domain residue interactions, which have not yet been thoroughly examined. Here, we studied residue coevolution of DBRRs of the two main subfamilies (OmpR and NarL) using large collections of bacterial amino acid sequences to extensively investigate the evolutionary signatures of linker-domain and inter-domain residue interactions. Coevolutionary analysis uncovered evolutionarily selected linker-domain and inter-domain residue interactions of known experimental structures, as well as previously unknown inter-domain residue interactions. We examined the possibility of these inter-domain residue interactions as contacts that stabilize an inactive conformation of the DBRR where DNA binding is inhibited for both subfamilies. The newly gained insights on linker-domain/inter-domain residue interactions and shared inactivation mechanisms improve the understanding of the functional mechanism of DBRRs, providing clues to efficiently create functional DBRR-based biosensors. Additionally, we show the feasibility of applying coevolutionary landscape models to predict the functionality of domain-swapped DBRR proteins. The presented result demonstrates that sequence information can be used to filter out bioengineered DBRR proteins that are predicted to be nonfunctional due to a high negative predictive value.
Collapse
Affiliation(s)
- Mayu Shibata
- Graduate School of Humanities and Sciences, Ochanomizu University, Bunkyo, Tokyo, Japan; Center for Theoretical Biological Physics, Rice University, Houston Texas
| | - Xingcheng Lin
- Department of Physics, North Carolina State University, Raleigh, North Carolina; Bioinformatics Research Center, North Carolina State University, Raleigh, North Carolina
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston Texas; Department of Physics and Astronomy, Chemistry, and Biosciences, Rice University, Houston, Texas
| | - Kei Yura
- Graduate School of Humanities and Sciences, Ochanomizu University, Bunkyo, Tokyo, Japan; Center for Interdisciplinary AI and Data Science, Ochanomizu University, Bunkyo, Tokyo, Japan; Graduate School of Advanced Science and Engineering, Waseda University, Shinjuku, Tokyo, Japan
| | - Ryan R Cheng
- Department of Chemistry, University of Kentucky, Lexington, Kentucky.
| |
Collapse
|
10
|
Pucci F, Zerihun MB, Rooman M, Schug A. pycofitness-Evaluating the fitness landscape of RNA and protein sequences. Bioinformatics 2024; 40:btae074. [PMID: 38335928 PMCID: PMC10881095 DOI: 10.1093/bioinformatics/btae074] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2023] [Revised: 01/25/2024] [Accepted: 02/06/2024] [Indexed: 02/12/2024] Open
Abstract
MOTIVATION The accurate prediction of how mutations change biophysical properties of proteins or RNA is a major goal in computational biology with tremendous impacts on protein design and genetic variant interpretation. Evolutionary approaches such as coevolution can help solving this issue. RESULTS We present pycofitness, a standalone Python-based software package for the in silico mutagenesis of protein and RNA sequences. It is based on coevolution and, more specifically, on a popular inverse statistical approach, namely direct coupling analysis by pseudo-likelihood maximization. Its efficient implementation and user-friendly command line interface make it an easy-to-use tool even for researchers with no bioinformatics background. To illustrate its strengths, we present three applications in which pycofitness efficiently predicts the deleteriousness of genetic variants and the effect of mutations on protein fitness and thermodynamic stability. AVAILABILITY AND IMPLEMENTATION https://github.com/KIT-MBS/pycofitness.
Collapse
Affiliation(s)
- Fabrizio Pucci
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, 1050 Brussels, Belgium
| | - Mehari B Zerihun
- John von Neumann Institute for Computing, Jülich Supercomputer Centre, 52428 Jülich, Germany
| | - Marianne Rooman
- Computational Biology and Bioinformatics, Université Libre de Bruxelles, 1050 Brussels, Belgium
- Interuniversity Institute of Bioinformatics in Brussels, 1050 Brussels, Belgium
| | - Alexander Schug
- John von Neumann Institute for Computing, Jülich Supercomputer Centre, 52428 Jülich, Germany
- Department of Biology, University of Duisburg-Essen, D-45141 Essen, Germany
| |
Collapse
|
11
|
Li J, Wang L, Zhu Z, Song C. Exploring the Alternative Conformation of a Known Protein Structure Based on Contact Map Prediction. J Chem Inf Model 2024; 64:301-315. [PMID: 38117138 PMCID: PMC10777399 DOI: 10.1021/acs.jcim.3c01381] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2023] [Revised: 12/03/2023] [Accepted: 12/05/2023] [Indexed: 12/21/2023]
Abstract
The rapid development of deep learning-based methods has considerably advanced the field of protein structure prediction. The accuracy of predicting the 3D structures of simple proteins is comparable to that of experimentally determined structures, providing broad possibilities for structure-based biological studies. Another critical question is whether and how multistate structures can be predicted from a given protein sequence. In this study, analysis of tens of two-state proteins demonstrated that deep learning-based contact map predictions contain structural information on both states, which suggests that it is probably appropriate to change the target of deep learning-based protein structure prediction from one specific structure to multiple likely structures. Furthermore, by combining deep learning- and physics-based computational methods, we developed a protocol for exploring alternative conformations from a known structure of a given protein, by which we successfully approached the holo-state conformations of multiple representative proteins from their apo-state structures.
Collapse
Affiliation(s)
- Jiaxuan Li
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Lei Wang
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua
Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Zefeng Zhu
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua
Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| | - Chen Song
- Center
for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
- Peking-Tsinghua
Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
| |
Collapse
|
12
|
Wayment-Steele HK, Ojoawo A, Otten R, Apitz JM, Pitsawong W, Hömberger M, Ovchinnikov S, Colwell L, Kern D. Predicting multiple conformations via sequence clustering and AlphaFold2. Nature 2024; 625:832-839. [PMID: 37956700 PMCID: PMC10808063 DOI: 10.1038/s41586-023-06832-9] [Citation(s) in RCA: 163] [Impact Index Per Article: 163.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 11/03/2023] [Indexed: 11/15/2023]
Abstract
AlphaFold2 (ref. 1) has revolutionized structural biology by accurately predicting single structures of proteins. However, a protein's biological function often depends on multiple conformational substates2, and disease-causing point mutations often cause population changes within these substates3,4. We demonstrate that clustering a multiple-sequence alignment by sequence similarity enables AlphaFold2 to sample alternative states of known metamorphic proteins with high confidence. Using this method, named AF-Cluster, we investigated the evolutionary distribution of predicted structures for the metamorphic protein KaiB5 and found that predictions of both conformations were distributed in clusters across the KaiB family. We used nuclear magnetic resonance spectroscopy to confirm an AF-Cluster prediction: a cyanobacteria KaiB variant is stabilized in the opposite state compared with the more widely studied variant. To test AF-Cluster's sensitivity to point mutations, we designed and experimentally verified a set of three mutations predicted to flip KaiB from Rhodobacter sphaeroides from the ground to the fold-switched state. Finally, screening for alternative states in protein families without known fold switching identified a putative alternative state for the oxidoreductase Mpt53 in Mycobacterium tuberculosis. Further development of such bioinformatic methods in tandem with experiments will probably have a considerable impact on predicting protein energy landscapes, essential for illuminating biological function.
Collapse
Affiliation(s)
- Hannah K Wayment-Steele
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
| | - Adedolapo Ojoawo
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
| | - Renee Otten
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
- Treeline Biosciences, Watertown, MA, USA
| | - Julia M Apitz
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
| | - Warintra Pitsawong
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
- Biomolecular Discovery, Relay Therapeutics, Cambridge, MA, USA
| | - Marc Hömberger
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA
- Treeline Biosciences, Watertown, MA, USA
| | | | - Lucy Colwell
- Google Research, Cambridge, MA, USA
- Cambridge University, Cambridge, UK
| | - Dorothee Kern
- Department of Biochemistry, Brandeis University and Howard Hughes Medical Institute, Waltham, MA, USA.
| |
Collapse
|
13
|
Zhang H, Quadeer AA, McKay MR. Direct-acting antiviral resistance of Hepatitis C virus is promoted by epistasis. Nat Commun 2023; 14:7457. [PMID: 37978179 PMCID: PMC10656532 DOI: 10.1038/s41467-023-42550-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2023] [Accepted: 10/13/2023] [Indexed: 11/19/2023] Open
Abstract
Direct-acting antiviral agents (DAAs) provide efficacious therapeutic treatments for chronic Hepatitis C virus (HCV) infection. However, emergence of drug resistance mutations (DRMs) can greatly affect treatment outcomes and impede virological cure. While multiple DRMs have been observed for all currently used DAAs, the evolutionary determinants of such mutations are not currently well understood. Here, by considering DAAs targeting the nonstructural 3 (NS3) protein of HCV, we present results suggesting that epistasis plays an important role in the evolution of DRMs. Employing a sequence-based fitness landscape model whose predictions correlate highly with experimental data, we identify specific DRMs that are associated with strong epistatic interactions, and these are found to be enriched in multiple NS3-specific DAAs. Evolutionary modelling further supports that the identified DRMs involve compensatory mutational interactions that facilitate relatively easy escape from drug-induced selection pressures. Our results indicate that accounting for epistasis is important for designing future HCV NS3-targeting DAAs.
Collapse
Affiliation(s)
- Hang Zhang
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, China
| | - Ahmed Abdul Quadeer
- Department of Electronic and Computer Engineering, The Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong SAR, China.
| | - Matthew R McKay
- Department of Electrical and Electronic Engineering, University of Melbourne, Melbourne, VIC, Australia.
- Department of Microbiology and Immunology, University of Melbourne, at The Peter Doherty Institute for Infection and Immunity, Melbourne, VIC, Australia.
| |
Collapse
|
14
|
Kilian M, Bischofs IB. Co-evolution at protein-protein interfaces guides inference of stoichiometry of oligomeric protein complexes by de novo structure prediction. Mol Microbiol 2023; 120:763-782. [PMID: 37777474 DOI: 10.1111/mmi.15169] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2023] [Revised: 09/10/2023] [Accepted: 09/11/2023] [Indexed: 10/02/2023]
Abstract
The quaternary structure with specific stoichiometry is pivotal to the specific function of protein complexes. However, determining the structure of many protein complexes experimentally remains a major bottleneck. Structural bioinformatics approaches, such as the deep learning algorithm Alphafold2-multimer (AF2-multimer), leverage the co-evolution of amino acids and sequence-structure relationships for accurate de novo structure and contact prediction. Pseudo-likelihood maximization direct coupling analysis (plmDCA) has been used to detect co-evolving residue pairs by statistical modeling. Here, we provide evidence that combining both methods can be used for de novo prediction of the quaternary structure and stoichiometry of a protein complex. We achieve this by augmenting the existing AF2-multimer confidence metrics with an interpretable score to identify the complex with an optimal fraction of native contacts of co-evolving residue pairs at intermolecular interfaces. We use this strategy to predict the quaternary structure and non-trivial stoichiometries of Bacillus subtilis spore germination protein complexes with unknown structures. Co-evolution at intermolecular interfaces may therefore synergize with AI-based de novo quaternary structure prediction of structurally uncharacterized bacterial protein complexes.
Collapse
Affiliation(s)
- Max Kilian
- Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany
- BioQuant Center for Quantitative Analysis of Molecular and Cellular Biosystems, Heidelberg University, Heidelberg, Germany
- Center for Molecular Biology of Heidelberg University (ZMBH), Heidelberg, Germany
| | - Ilka B Bischofs
- Max-Planck-Institute for Terrestrial Microbiology, Marburg, Germany
- BioQuant Center for Quantitative Analysis of Molecular and Cellular Biosystems, Heidelberg University, Heidelberg, Germany
- Center for Molecular Biology of Heidelberg University (ZMBH), Heidelberg, Germany
| |
Collapse
|
15
|
Schafer JW, Porter LL. Evolutionary selection of proteins with two folds. Nat Commun 2023; 14:5478. [PMID: 37673981 PMCID: PMC10482954 DOI: 10.1038/s41467-023-41237-2] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Accepted: 08/24/2023] [Indexed: 09/08/2023] Open
Abstract
Although most globular proteins fold into a single stable structure, an increasing number have been shown to remodel their secondary and tertiary structures in response to cellular stimuli. State-of-the-art algorithms predict that these fold-switching proteins adopt only one stable structure, missing their functionally critical alternative folds. Why these algorithms predict a single fold is unclear, but all of them infer protein structure from coevolved amino acid pairs. Here, we hypothesize that coevolutionary signatures are being missed. Suspecting that single-fold variants could be masking these signatures, we developed an approach, called Alternative Contact Enhancement (ACE), to search both highly diverse protein superfamilies-composed of single-fold and fold-switching variants-and protein subfamilies with more fold-switching variants. ACE successfully revealed coevolution of amino acid pairs uniquely corresponding to both conformations of 56/56 fold-switching proteins from distinct families. Then, we used ACE-derived contacts to (1) predict two experimentally consistent conformations of a candidate protein with unsolved structure and (2) develop a blind prediction pipeline for fold-switching proteins. The discovery of widespread dual-fold coevolution indicates that fold-switching sequences have been preserved by natural selection, implying that their functionalities provide evolutionary advantage and paving the way for predictions of diverse protein structures from single sequences.
Collapse
Affiliation(s)
- Joseph W Schafer
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA
| | - Lauren L Porter
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD, 20894, USA.
- National Heart, Lung, and Blood Institute, Biochemistry and Biophysics Center, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
16
|
Xie J, Zhang W, Zhu X, Deng M, Lai L. Coevolution-based prediction of key allosteric residues for protein function regulation. eLife 2023; 12:81850. [PMID: 36799896 PMCID: PMC9981151 DOI: 10.7554/elife.81850] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2022] [Accepted: 02/16/2023] [Indexed: 02/18/2023] Open
Abstract
Allostery is fundamental to many biological processes. Due to the distant regulation nature, how allosteric mutations, modifications, and effector binding impact protein function is difficult to forecast. In protein engineering, remote mutations cannot be rationally designed without large-scale experimental screening. Allosteric drugs have raised much attention due to their high specificity and possibility of overcoming existing drug-resistant mutations. However, optimization of allosteric compounds remains challenging. Here, we developed a novel computational method KeyAlloSite to predict allosteric site and to identify key allosteric residues (allo-residues) based on the evolutionary coupling model. We found that protein allosteric sites are strongly coupled to orthosteric site compared to non-functional sites. We further inferred key allo-residues by pairwise comparing the difference of evolutionary coupling scores of each residue in the allosteric pocket with the functional site. Our predicted key allo-residues are in accordance with previous experimental studies for typical allosteric proteins like BCR-ABL1, Tar, and PDZ3, as well as key cancer mutations. We also showed that KeyAlloSite can be used to predict key allosteric residues distant from the catalytic site that are important for enzyme catalysis. Our study demonstrates that weak coevolutionary couplings contain important information of protein allosteric regulation function. KeyAlloSite can be applied in studying the evolution of protein allosteric regulation, designing and optimizing allosteric drugs, and performing functional protein design and enzyme engineering.
Collapse
Affiliation(s)
- Juan Xie
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking UniversityBeijingChina
| | - Weilin Zhang
- BNLMS, Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking UniversityBeijingChina
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural UniversityHefeiChina
| | - Minghua Deng
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking UniversityBeijingChina
- School of Mathematical Sciences, Peking UniversityBeijingChina
- Center for Statistical Science, Peking UniversityBeijingChina
| | - Luhua Lai
- Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking UniversityBeijingChina
- BNLMS, Peking-Tsinghua Center for Life Sciences at the College of Chemistry and Molecular Engineering, Peking UniversityBeijingChina
- Research Unit of Drug Design Method, Chinese Academy of Medical Sciences (2021RU014)BeijingChina
| |
Collapse
|
17
|
Sgarbossa D, Lupo U, Bitbol AF. Generative power of a protein language model trained on multiple sequence alignments. eLife 2023; 12:e79854. [PMID: 36734516 PMCID: PMC10038667 DOI: 10.7554/elife.79854] [Citation(s) in RCA: 16] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 02/02/2023] [Indexed: 02/04/2023] Open
Abstract
Computational models starting from large ensembles of evolutionarily related protein sequences capture a representation of protein families and learn constraints associated to protein structure and function. They thus open the possibility for generating novel sequences belonging to protein families. Protein language models trained on multiple sequence alignments, such as MSA Transformer, are highly attractive candidates to this end. We propose and test an iterative method that directly employs the masked language modeling objective to generate sequences using MSA Transformer. We demonstrate that the resulting sequences score as well as natural sequences, for homology, coevolution, and structure-based measures. For large protein families, our synthetic sequences have similar or better properties compared to sequences generated by Potts models, including experimentally validated ones. Moreover, for small protein families, our generation method based on MSA Transformer outperforms Potts models. Our method also more accurately reproduces the higher-order statistics and the distribution of sequences in sequence space of natural data than Potts models. MSA Transformer is thus a strong candidate for protein sequence generation and protein design.
Collapse
Affiliation(s)
- Damiano Sgarbossa
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)LausanneSwitzerland
- SIB Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Umberto Lupo
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)LausanneSwitzerland
- SIB Swiss Institute of BioinformaticsLausanneSwitzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL)LausanneSwitzerland
- SIB Swiss Institute of BioinformaticsLausanneSwitzerland
| |
Collapse
|
18
|
Krishnamohan A, Hamilton GL, Goutam R, Sanabria H, Morcos F. Coevolution and smFRET Enhances Conformation Sampling and FRET Experimental Design in Tandem PDZ1-2 Proteins. J Phys Chem B 2023; 127:884-898. [PMID: 36693159 PMCID: PMC9900596 DOI: 10.1021/acs.jpcb.2c06720] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
The structural flexibility of proteins is crucial for their functions. Many experimental and computational approaches can probe protein dynamics across a range of time and length-scales. Integrative approaches synthesize the complementary outputs of these techniques and provide a comprehensive view of the dynamic conformational space of proteins, including the functionally relevant limiting conformational states and transition pathways between them. Here, we introduce an integrative paradigm to model the conformational states of multidomain proteins. As a model system, we use the first two tandem PDZ domains of postsynaptic density protein 95. First, we utilize available sequence information collected from genomic databases to identify potential amino acid interactions in the PDZ1-2 tandem that underlie modeling of the functionally relevant conformations maintained through evolution. This was accomplished through combination of coarse-grained structural modeling with outputs from direct coupling analysis measuring amino acid coevolution, a hybrid approach called SBM+DCA. We recapitulated five distinct, experimentally derived PDZ1-2 tandem conformations. In addition, SBM+DCA unveiled an unidentified, twisted conformation of the PDZ1-2 tandem. Finally, we implemented an integrative framework for the design of single-molecule Förster resonance energy transfer (smFRET) experiments incorporating the outputs of SBM+DCA with simulated FRET observables. This resulting FRET network is designed to mutually resolve the predicted limiting state conformations through global analysis. Using simulated FRET observables, we demonstrate that structural modeling with the newly designed FRET network is expected to outperform a previously used empirical FRET network at resolving all states simultaneously. Integrative approaches to experimental design have the potential to provide a new level of detail in characterizing the evolutionarily conserved conformational landscapes of proteins, and thus new insights into functional relevance of protein dynamics in biological function.
Collapse
Affiliation(s)
- Aishwarya Krishnamohan
- Departments of Biological Sciences and Bioengineering, University of Texas at Dallas, Richardson, Texas75080, United States
| | - George L Hamilton
- Department of Physics and Astronomy, Clemson University, Clemson, South Carolina29634, United States
| | - Rajen Goutam
- Department of Physics and Astronomy, Clemson University, Clemson, South Carolina29634, United States
| | - Hugo Sanabria
- Department of Physics and Astronomy, Clemson University, Clemson, South Carolina29634, United States
| | - Faruck Morcos
- Departments of Biological Sciences and Bioengineering, University of Texas at Dallas, Richardson, Texas75080, United States.,Center for Systems Biology, University of Texas at Dallas, Richardson, Texas75080, United States
| |
Collapse
|
19
|
Dietler N, Lupo U, Bitbol AF. Impact of phylogeny on structural contact inference from protein sequence data. J R Soc Interface 2023; 20:20220707. [PMID: 36751926 PMCID: PMC9905998 DOI: 10.1098/rsif.2022.0707] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2022] [Accepted: 01/09/2023] [Indexed: 02/09/2023] Open
Abstract
Local and global inference methods have been developed to infer structural contacts from multiple sequence alignments of homologous proteins. They rely on correlations in amino acid usage at contacting sites. Because homologous proteins share a common ancestry, their sequences also feature phylogenetic correlations, which can impair contact inference. We investigate this effect by generating controlled synthetic data from a minimal model where the importance of contacts and of phylogeny can be tuned. We demonstrate that global inference methods, specifically Potts models, are more resilient to phylogenetic correlations than local methods, based on covariance or mutual information. This holds whether or not phylogenetic corrections are used, and may explain the success of global methods. We analyse the roles of selection strength and of phylogenetic relatedness. We show that sites that mutate early in the phylogeny yield false positive contacts. We consider natural data and realistic synthetic data, and our findings generalize to these cases. Our results highlight the impact of phylogeny on contact prediction from protein sequences and illustrate the interplay between the rich structure of biological data and inference.
Collapse
Affiliation(s)
- Nicola Dietler
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Umberto Lupo
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| |
Collapse
|
20
|
Schafer JW, Porter LL. Evolutionary selection of proteins with two folds. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.18.524637. [PMID: 36789442 PMCID: PMC9928049 DOI: 10.1101/2023.01.18.524637] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
Abstract
Although most globular proteins fold into a single stable structure 1 , an increasing number have been shown to remodel their secondary and tertiary structures in response to cellular stimuli 2 . State-of-the-art algorithms 3-5 predict that these fold-switching proteins assume only one stable structure 6,7 , missing their functionally critical alternative folds. Why these algorithms predict a single fold is unclear, but all of them infer protein structure from coevolved amino acid pairs. Here, we hypothesize that coevolutionary signatures are being missed. Suspecting that over-represented single-fold sequences may be masking these signatures, we developed an approach to search both highly diverse protein superfamilies-composed of single-fold and fold-switching variants-and protein subfamilies with more fold-switching variants. This approach successfully revealed coevolution of amino acid pairs uniquely corresponding to both conformations of 56/58 fold-switching proteins from distinct families. Then, using a set of coevolved amino acid pairs predicted by our approach, we successfully biased AlphaFold2 5 to predict two experimentally consistent conformations of a candidate protein with unsolved structure. The discovery of widespread dual-fold coevolution indicates that fold-switching sequences have been preserved by natural selection, implying that their functionalities provide evolutionary advantage and paving the way for predictions of diverse protein structures from single sequences.
Collapse
Affiliation(s)
- Joseph W. Schafer
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Lauren L. Porter
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
- National Heart, Lung, and Blood Institute, Biochemistry and Biophysics Center, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
21
|
Rosignoli S, Paiardini A. Boosting the Full Potential of PyMOL with Structural Biology Plugins. Biomolecules 2022; 12:biom12121764. [PMID: 36551192 PMCID: PMC9775141 DOI: 10.3390/biom12121764] [Citation(s) in RCA: 95] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 11/23/2022] [Accepted: 11/24/2022] [Indexed: 11/29/2022] Open
Abstract
Over the past few decades, the number of available structural bioinformatics pipelines, libraries, plugins, web resources and software has increased exponentially and become accessible to the broad realm of life scientists. This expansion has shaped the field as a tangled network of methods, algorithms and user interfaces. In recent years PyMOL, widely used software for biomolecules visualization and analysis, has started to play a key role in providing an open platform for the successful implementation of expert knowledge into an easy-to-use molecular graphics tool. This review outlines the plugins and features that make PyMOL an eligible environment for supporting structural bioinformatics analyses.
Collapse
|
22
|
Ravishankar K, Jiang X, Leddin EM, Morcos F, Cisneros GA. Computational compensatory mutation discovery approach: Predicting a PARP1 variant rescue mutation. Biophys J 2022; 121:3663-3673. [PMID: 35642254 PMCID: PMC9617126 DOI: 10.1016/j.bpj.2022.05.036] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2021] [Revised: 05/20/2022] [Accepted: 05/23/2022] [Indexed: 11/02/2022] Open
Abstract
The prediction of protein mutations that affect function may be exploited for multiple uses. In the context of disease variants, the prediction of compensatory mutations that reestablish functional phenotypes could aid in the development of genetic therapies. In this work, we present an integrated approach that combines coevolutionary analysis and molecular dynamics (MD) simulations to discover functional compensatory mutations. This approach is employed to investigate possible rescue mutations of a poly(ADP-ribose) polymerase 1 (PARP1) variant, PARP1 V762A, associated with lung cancer and follicular lymphoma. MD simulations show PARP1 V762A exhibits noticeable changes in structural and dynamical behavior compared with wild-type (WT) PARP1. Our integrated approach predicts A755E as a possible compensatory mutation based on coevolutionary information, and molecular simulations indicate that the PARP1 A755E/V762A double mutant exhibits similar structural and dynamical behavior to WT PARP1. Our methodology can be broadly applied to a large number of systems where single-nucleotide polymorphisms have been identified as connected to disease and can shed light on the biophysical effects of such changes as well as provide a way to discover potential mutants that could restore WT-like functionality. This can, in turn, be further utilized in the design of molecular therapeutics that aim to mimic such compensatory effect.
Collapse
Affiliation(s)
| | - Xianli Jiang
- Department of Biological Sciences, The University of Texas at Dallas, Richardson, Texas; Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas
| | - Emmett M Leddin
- Department of Chemistry, University of North Texas, Denton, Texas
| | - Faruck Morcos
- Department of Biological Sciences, The University of Texas at Dallas, Richardson, Texas; Department of Bioengineering, The University of Texas at Dallas, Richardson, Texas; Center for Systems Biology, The University of Texas at Dallas, Richardson, Texas.
| | - G Andrés Cisneros
- Department of Chemistry, University of North Texas, Denton, Texas; Department of Physics, The University of Texas at Dallas, Richardson, Texas; Department of Chemistry, The University of Texas at Dallas, Richardson, Texas.
| |
Collapse
|
23
|
Magi Meconi G, Sasselli IR, Bianco V, Onuchic JN, Coluzza I. Key aspects of the past 30 years of protein design. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2022; 85:086601. [PMID: 35704983 DOI: 10.1088/1361-6633/ac78ef] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 06/15/2022] [Indexed: 06/15/2023]
Abstract
Proteins are the workhorse of life. They are the building infrastructure of living systems; they are the most efficient molecular machines known, and their enzymatic activity is still unmatched in versatility by any artificial system. Perhaps proteins' most remarkable feature is their modularity. The large amount of information required to specify each protein's function is analogically encoded with an alphabet of just ∼20 letters. The protein folding problem is how to encode all such information in a sequence of 20 letters. In this review, we go through the last 30 years of research to summarize the state of the art and highlight some applications related to fundamental problems of protein evolution.
Collapse
Affiliation(s)
- Giulia Magi Meconi
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | - Ivan R Sasselli
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | | | - Jose N Onuchic
- Center for Theoretical Biological Physics, Department of Physics & Astronomy, Department of Chemistry, Department of Biosciences, Rice University, Houston, TX 77251, United States of America
| | - Ivan Coluzza
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Bld. Martina Casiano, UPV/EHU Science Park, Barrio Sarriena s/n, 48940 Leioa, Spain
- Basque Foundation for Science, Ikerbasque, 48009, Bilbao, Spain
| |
Collapse
|
24
|
van Keulen SC, Martin J, Colizzi F, Frezza E, Trpevski D, Diaz NC, Vidossich P, Rothlisberger U, Hellgren Kotaleski J, Wade RC, Carloni P. Multiscale molecular simulations to investigate adenylyl cyclase‐based signaling in the brain. WIRES COMPUTATIONAL MOLECULAR SCIENCE 2022. [DOI: 10.1002/wcms.1623] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Affiliation(s)
- Siri C. van Keulen
- Computational Structural Biology Group, Bijvoet Center for Biomolecular Research, Science for Life, Faculty of Science – Chemistry Utrecht University Utrecht The Netherlands
| | - Juliette Martin
- CNRS, UMR 5086 Molecular Microbiology and Structural Biochemistry University of Lyon Lyon France
| | - Francesco Colizzi
- Molecular Ocean Laboratory, Department of Marine Biology and Oceanography Institute of Marine Sciences, ICM‐CSIC Barcelona Spain
| | - Elisa Frezza
- Université Paris Cité, CiTCoM, CNRS Paris France
| | - Daniel Trpevski
- Science for Life Laboratory, School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Stockholm
| | - Nuria Cirauqui Diaz
- CNRS, UMR 5086 Molecular Microbiology and Structural Biochemistry University of Lyon Lyon France
| | - Pietro Vidossich
- Molecular Modeling and Drug Discovery Lab Istituto Italiano di Tecnologia Genoa Italy
| | - Ursula Rothlisberger
- Laboratory of Computational Chemistry and Biochemistry Ecole Polytechnique Fédérale de Lausanne (EPFL) Lausanne
| | - Jeanette Hellgren Kotaleski
- Science for Life Laboratory, School of Electrical Engineering and Computer Science KTH Royal Institute of Technology Stockholm
- Department of Neuroscience Karolinska Institute Stockholm
| | - Rebecca C. Wade
- Molecular and Cellular Modeling Group Heidelberg Institute for Theoretical Studies (HITS) Heidelberg Germany
- Center for Molecular Biology (ZMBH), DKFZ‐ZMBH Alliance, and Interdisciplinary Center for Scientific Computing (IWR) Heidelberg University Heidelberg Germany
| | - Paolo Carloni
- Institute for Neuroscience and Medicine (INM‐9) and Institute for Advanced Simulations (IAS‐5) “Computational biomedicine” Forschungszentrum Jülich Jülich Germany
- INM‐11 JARA‐Institute: Molecular Neuroscience and Neuroimaging Forschungszentrum Jülich Jülich Germany
| |
Collapse
|
25
|
Oteri F, Sarti E, Nadalin F, Carbone A. iBIS2Analyzer: a web server for a phylogeny-driven coevolution analysis of protein families. Nucleic Acids Res 2022; 50:W412-W419. [PMID: 35670671 PMCID: PMC9252744 DOI: 10.1093/nar/gkac481] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2022] [Revised: 05/20/2022] [Accepted: 05/25/2022] [Indexed: 12/27/2022] Open
Abstract
Residue coevolution within and between proteins is used as a marker of physical interaction and/or residue functional cooperation. Pairs or groups of coevolving residues are extracted from multiple sequence alignments based on a variety of computational approaches. However, coevolution signals emerging in subsets of sequences might be lost if the full alignment is considered. iBIS2Analyzer is a web server dedicated to a phylogeny-driven coevolution analysis of protein families with different evolutionary pressure. It is based on the iterative version, iBIS2, of the coevolution analysis method BIS, Blocks in Sequences. iBIS2 is designed to iteratively select and analyse subtrees in phylogenetic trees, possibly large and comprising thousands of sequences. With iBIS2Analyzer, openly accessible at http://ibis2analyzer.lcqb.upmc.fr/, the user visualizes, compares and inspects clusters of coevolving residues by mapping them onto sequences, alignments or structures of choice, greatly simplifying downstream analysis steps. A rich and interactive graphic interface facilitates the biological interpretation of the results.
Collapse
Affiliation(s)
- Francesco Oteri
- Sorbonne Université, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Edoardo Sarti
- Sorbonne Université, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Francesca Nadalin
- Sorbonne Université, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| | - Alessandra Carbone
- Sorbonne Université, CNRS, IBPS, UMR 7238, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), 75005 Paris, France
| |
Collapse
|
26
|
Galaz‐Davison P, Ferreiro DU, Ramírez‐Sarmiento CA. Coevolution-derived native and non-native contacts determine the emergence of a novel fold in a universally conserved family of transcription factors. Protein Sci 2022; 31:e4337. [PMID: 35634768 PMCID: PMC9123645 DOI: 10.1002/pro.4337] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2022] [Revised: 04/18/2022] [Accepted: 05/03/2022] [Indexed: 11/07/2022]
Abstract
The NusG protein family is structurally and functionally conserved in all domains of life. Its members directly bind RNA polymerases and regulate transcription processivity and termination. RfaH, a divergent sub-family in its evolutionary history, is known for displaying distinct features than those in NusG proteins, which allows them to regulate the expression of virulence factors in enterobacteria in a DNA sequence-dependent manner. A striking feature is its structural interconversion between an active fold, which is the canonical NusG three-dimensional structure, and an autoinhibited fold, which is distinctively novel. How this novel fold is encoded within RfaH sequence to encode a metamorphic protein remains elusive. In this work, we used publicly available genomic RfaH protein sequences to construct a complete multiple sequence alignment, which was further augmented with metagenomic sequences and curated by predicting their secondary structure propensities using JPred. Coevolving pairs of residues were calculated from these sequences using plmDCA and GREMLIN, which allowed us to detect the enrichment of key metamorphic contacts after sequence filtering. Finally, we combined our coevolutionary predictions with molecular dynamics to demonstrate that these interactions are sufficient to predict the structures of both native folds, where coevolutionary-derived non-native contacts may play a key role in achieving the compact RfaH novel fold. All in all, emergent coevolutionary signals found within RfaH sequences encode the autoinhibited and active folds of this protein, shedding light on the key interactions responsible for the action of this metamorphic protein.
Collapse
Affiliation(s)
- Pablo Galaz‐Davison
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine and Biological SciencesPontificia Universidad Católica de ChileSantiagoChile
- ANID—Millennium Science Initiative Program—Millennium Institute for Integrative Biology (iBio)SantiagoChile
| | - Diego U. Ferreiro
- Protein Physiology Lab, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales (IQUIBICEN‐CONICET)Universidad de Buenos AiresBuenos AiresArgentina
| | - César A. Ramírez‐Sarmiento
- Institute for Biological and Medical Engineering, Schools of Engineering, Medicine and Biological SciencesPontificia Universidad Católica de ChileSantiagoChile
- ANID—Millennium Science Initiative Program—Millennium Institute for Integrative Biology (iBio)SantiagoChile
| |
Collapse
|
27
|
Saldaño T, Escobedo N, Marchetti J, Zea DJ, Mac Donagh J, Velez Rueda AJ, Gonik E, García Melani A, Novomisky Nechcoff J, Salas MN, Peters T, Demitroff N, Fernandez Alberti S, Palopoli N, Fornasari MS, Parisi G. Impact of protein conformational diversity on AlphaFold predictions. Bioinformatics 2022; 38:2742-2748. [PMID: 35561203 DOI: 10.1093/bioinformatics/btac202] [Citation(s) in RCA: 87] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2021] [Revised: 02/10/2022] [Accepted: 03/31/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION After the outstanding breakthrough of AlphaFold in predicting protein 3D models, new questions appeared and remain unanswered. The ensemble nature of proteins, for example, challenges the structural prediction methods because the models should represent a set of conformers instead of single structures. The evolutionary and structural features captured by effective deep learning techniques may unveil the information to generate several diverse conformations from a single sequence. Here, we address the performance of AlphaFold2 predictions obtained through ColabFold under this ensemble paradigm. RESULTS Using a curated collection of apo-holo pairs of conformers, we found that AlphaFold2 predicts the holo form of a protein in ∼70% of the cases, being unable to reproduce the observed conformational diversity with the same error for both conformers. More importantly, we found that AlphaFold2's performance worsens with the increasing conformational diversity of the studied protein. This impairment is related to the heterogeneity in the degree of conformational diversity found between different members of the homologous family of the protein under study. Finally, we found that main-chain flexibility associated with apo-holo pairs of conformers negatively correlates with the predicted local model quality score plDDT, indicating that plDDT values in a single 3D model could be used to infer local conformational changes linked to ligand binding transitions. AVAILABILITY AND IMPLEMENTATION Data and code used in this manuscript are publicly available at https://gitlab.com/sbgunq/publications/af2confdiv-oct2021. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Tadeo Saldaño
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Nahuel Escobedo
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Julia Marchetti
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | | | - Juan Mac Donagh
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Ana Julia Velez Rueda
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Eduardo Gonik
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
- INIFTA (CONICET-UNLP) - Fotoquímica y Nanomateriales para el Ambiente y la Biología (nanoFOT), La Plata, Argentina
| | | | | | - Martín N Salas
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
| | - Tomás Peters
- Fundación Instituto Leloir-Instituto de Investigaciones Bioquímicas de Buenos Aires, Buenos Aires, Argentina
| | - Nicolás Demitroff
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
- Fundación Instituto Leloir-Instituto de Investigaciones Bioquímicas de Buenos Aires, Buenos Aires, Argentina
| | - Sebastian Fernandez Alberti
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Nicolas Palopoli
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Maria Silvina Fornasari
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| | - Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Bernal, Argentina
- Consejo Nacional de Investigaciones Científicas y Técnicas, Buenos Aires, Argentina
| |
Collapse
|
28
|
Gerardos A, Dietler N, Bitbol AF. Correlations from structure and phylogeny combine constructively in the inference of protein partners from sequences. PLoS Comput Biol 2022; 18:e1010147. [PMID: 35576238 PMCID: PMC9135348 DOI: 10.1371/journal.pcbi.1010147] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 05/26/2022] [Accepted: 04/27/2022] [Indexed: 11/19/2022] Open
Abstract
Inferring protein-protein interactions from sequences is an important task in computational biology. Recent methods based on Direct Coupling Analysis (DCA) or Mutual Information (MI) allow to find interaction partners among paralogs of two protein families. Does successful inference mainly rely on correlations from structural contacts or from phylogeny, or both? Do these two types of signal combine constructively or hinder each other? To address these questions, we generate and analyze synthetic data produced using a minimal model that allows us to control the amounts of structural constraints and phylogeny. We show that correlations from these two sources combine constructively to increase the performance of partner inference by DCA or MI. Furthermore, signal from phylogeny can rescue partner inference when signal from contacts becomes less informative, including in the realistic case where inter-protein contacts are restricted to a small subset of sites. We also demonstrate that DCA-inferred couplings between non-contact pairs of sites improve partner inference in the presence of strong phylogeny, while deteriorating it otherwise. Moreover, restricting to non-contact pairs of sites preserves inference performance in the presence of strong phylogeny. In a natural data set, as well as in realistic synthetic data based on it, we find that non-contact pairs of sites contribute positively to partner inference performance, and that restricting to them preserves performance, evidencing an important role of phylogeny.
Collapse
Affiliation(s)
- Andonis Gerardos
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Nicola Dietler
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Anne-Florence Bitbol
- Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland
- SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| |
Collapse
|
29
|
Chi H, Zhou Q, Tutol JN, Phelps SM, Lee J, Kapadia P, Morcos F, Dodani SC. Coupling a Live Cell Directed Evolution Assay with Coevolutionary Landscapes to Engineer an Improved Fluorescent Rhodopsin Chloride Sensor. ACS Synth Biol 2022; 11:1627-1638. [PMID: 35389621 PMCID: PMC9184236 DOI: 10.1021/acssynbio.2c00033] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Our understanding of chloride in biology has been accelerated through the application of fluorescent protein-based sensors in living cells. These sensors can be generated and diversified to have a range of properties using laboratory-guided evolution. Recently, we established that the fluorescent proton-pumping rhodopsin wtGR from Gloeobacter violaceus can be converted into a fluorescent sensor for chloride. To unlock this non-natural function, a single point mutation at the Schiff counterion position (D121V) was introduced into wtGR fused to cyan fluorescent protein (CFP) resulting in GR1-CFP. Here, we have integrated coevolutionary analysis with directed evolution to understand how the rhodopsin sequence space can be explored and engineered to improve this starting point. We first show how evolutionary couplings are predictive of functional sites in the rhodopsin family and how a fitness metric based on a sequence can be used to quantify the known proton-pumping activities of GR-CFP variants. Then, we couple this ability to predict potential functional outcomes with a screening and selection assay in live Escherichia coli to reduce the mutational search space of five residues along the proton-pumping pathway in GR1-CFP. This iterative selection process results in GR2-CFP with four additional mutations: E132K, A84K, T125C, and V245I. Finally, bulk and single fluorescence measurements in live E. coli reveal that GR2-CFP is a reversible, ratiometric fluorescent sensor for extracellular chloride with an improved dynamic range. We anticipate that our framework will be applicable to other systems, providing a more efficient methodology to engineer fluorescent protein-based sensors with desired properties.
Collapse
|
30
|
Extracting phylogenetic dimensions of coevolution reveals hidden functional signals. Sci Rep 2022; 12:820. [PMID: 35039514 PMCID: PMC8764114 DOI: 10.1038/s41598-021-04260-1] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Accepted: 12/17/2021] [Indexed: 11/08/2022] Open
Abstract
Despite the structural and functional information contained in the statistical coupling between pairs of residues in a protein, coevolution associated with function is often obscured by artifactual signals such as genetic drift, which shapes a protein's phylogenetic history and gives rise to concurrent variation between protein sequences that is not driven by selection for function. Here, we introduce a background model for phylogenetic contributions of statistical coupling that separates the coevolution signal due to inter-clade and intra-clade sequence comparisons and demonstrate that coevolution can be measured on multiple phylogenetic timescales within a single protein. Our method, nested coevolution (NC), can be applied as an extension to any coevolution metric. We use NC to demonstrate that poorly conserved residues can nonetheless have important roles in protein function. Moreover, NC improved the structural-contact predictions of several coevolution-based methods, particularly in subsampled alignments with fewer sequences. NC also lowered the noise in detecting functional sectors of collectively coevolving residues. Sectors of coevolving residues identified after application of NC were more spatially compact and phylogenetically distinct from the rest of the protein, and strongly enriched for mutations that disrupt protein activity. Thus, our conceptualization of the phylogenetic separation of coevolution provides the potential to further elucidate relationships among protein evolution, function, and genetic diseases.
Collapse
|
31
|
Kazan IC, Sharma P, Rahman MI, Bobkov A, Fromme R, Ghirlanda G, Ozkan SB. Design of novel cyanovirin-N variants by modulation of binding dynamics through distal mutations. eLife 2022; 11:67474. [PMID: 36472898 PMCID: PMC9725752 DOI: 10.7554/elife.67474] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2021] [Accepted: 11/28/2022] [Indexed: 12/07/2022] Open
Abstract
We develop integrated co-evolution and dynamic coupling (ICDC) approach to identify, mutate, and assess distal sites to modulate function. We validate the approach first by analyzing the existing mutational fitness data of TEM-1 β-lactamase and show that allosteric positions co-evolved and dynamically coupled with the active site significantly modulate function. We further apply ICDC approach to identify positions and their mutations that can modulate binding affinity in a lectin, cyanovirin-N (CV-N), that selectively binds to dimannose, and predict binding energies of its variants through Adaptive BP-Dock. Computational and experimental analyses reveal that binding enhancing mutants identified by ICDC impact the dynamics of the binding pocket, and show that rigidification of the binding residues compensates for the entropic cost of binding. This work suggests a mechanism by which distal mutations modulate function through dynamic allostery and provides a blueprint to identify candidates for mutagenesis in order to optimize protein function.
Collapse
Affiliation(s)
- I Can Kazan
- Center for Biological Physics and Department of Physics, Arizona State UniversityTempeUnited States,School of Molecular Sciences, Arizona State UniversityTempeUnited States
| | - Prerna Sharma
- School of Molecular Sciences, Arizona State UniversityTempeUnited States
| | | | - Andrey Bobkov
- Sanford Burnham Prebys Medical Discovery InstituteLa JollaUnited States
| | - Raimund Fromme
- School of Molecular Sciences, Arizona State UniversityTempeUnited States
| | - Giovanna Ghirlanda
- School of Molecular Sciences, Arizona State UniversityTempeUnited States
| | - S Banu Ozkan
- Center for Biological Physics and Department of Physics, Arizona State UniversityTempeUnited States
| |
Collapse
|
32
|
Chu WT, Yan Z, Chu X, Zheng X, Liu Z, Xu L, Zhang K, Wang J. Physics of biomolecular recognition and conformational dynamics. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2021; 84:126601. [PMID: 34753115 DOI: 10.1088/1361-6633/ac3800] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2021] [Accepted: 11/09/2021] [Indexed: 06/13/2023]
Abstract
Biomolecular recognition usually leads to the formation of binding complexes, often accompanied by large-scale conformational changes. This process is fundamental to biological functions at the molecular and cellular levels. Uncovering the physical mechanisms of biomolecular recognition and quantifying the key biomolecular interactions are vital to understand these functions. The recently developed energy landscape theory has been successful in quantifying recognition processes and revealing the underlying mechanisms. Recent studies have shown that in addition to affinity, specificity is also crucial for biomolecular recognition. The proposed physical concept of intrinsic specificity based on the underlying energy landscape theory provides a practical way to quantify the specificity. Optimization of affinity and specificity can be adopted as a principle to guide the evolution and design of molecular recognition. This approach can also be used in practice for drug discovery using multidimensional screening to identify lead compounds. The energy landscape topography of molecular recognition is important for revealing the underlying flexible binding or binding-folding mechanisms. In this review, we first introduce the energy landscape theory for molecular recognition and then address four critical issues related to biomolecular recognition and conformational dynamics: (1) specificity quantification of molecular recognition; (2) evolution and design in molecular recognition; (3) flexible molecular recognition; (4) chromosome structural dynamics. The results described here and the discussions of the insights gained from the energy landscape topography can provide valuable guidance for further computational and experimental investigations of biomolecular recognition and conformational dynamics.
Collapse
Affiliation(s)
- Wen-Ting Chu
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Zhiqiang Yan
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Xiakun Chu
- Department of Chemistry & Physics, State University of New York at Stony Brook, Stony Brook, NY 11794, United States of America
| | - Xiliang Zheng
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Zuojia Liu
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Li Xu
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Kun Zhang
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun 130022, People's Republic of China
| | - Jin Wang
- Department of Chemistry & Physics, State University of New York at Stony Brook, Stony Brook, NY 11794, United States of America
| |
Collapse
|
33
|
McGee F, Hauri S, Novinger Q, Vucetic S, Levy RM, Carnevale V, Haldane A. The generative capacity of probabilistic protein sequence models. Nat Commun 2021; 12:6302. [PMID: 34728624 PMCID: PMC8563988 DOI: 10.1038/s41467-021-26529-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Accepted: 09/23/2021] [Indexed: 01/10/2023] Open
Abstract
Potts models and variational autoencoders (VAEs) have recently gained popularity as generative protein sequence models (GPSMs) to explore fitness landscapes and predict mutation effects. Despite encouraging results, current model evaluation metrics leave unclear whether GPSMs faithfully reproduce the complex multi-residue mutational patterns observed in natural sequences due to epistasis. Here, we develop a set of sequence statistics to assess the "generative capacity" of three current GPSMs: the pairwise Potts Hamiltonian, the VAE, and the site-independent model. We show that the Potts model's generative capacity is largest, as the higher-order mutational statistics generated by the model agree with those observed for natural sequences, while the VAE's lies between the Potts and site-independent models. Importantly, our work provides a new framework for evaluating and interpreting GPSM accuracy which emphasizes the role of higher-order covariation and epistasis, with broader implications for probabilistic sequence models in general.
Collapse
Affiliation(s)
- Francisco McGee
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, 19122, USA
- Institute for Computational Molecular Science, Temple University, Philadelphia, 19122, USA
- Department of Biology, Temple University, Philadelphia, 19122, USA
| | - Sandro Hauri
- Center for Hybrid Intelligence, Temple University, Philadelphia, 19122, USA
- Department of Computer & Information Sciences, Temple University, Philadelphia, 19122, USA
| | - Quentin Novinger
- Institute for Computational Molecular Science, Temple University, Philadelphia, 19122, USA
- Department of Computer & Information Sciences, Temple University, Philadelphia, 19122, USA
| | - Slobodan Vucetic
- Center for Hybrid Intelligence, Temple University, Philadelphia, 19122, USA
- Department of Computer & Information Sciences, Temple University, Philadelphia, 19122, USA
| | - Ronald M Levy
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, 19122, USA
- Department of Biology, Temple University, Philadelphia, 19122, USA
- Department of Physics, Temple University, Philadelphia, 19122, USA
- Department of Chemistry, Temple University, Philadelphia, 19122, USA
| | - Vincenzo Carnevale
- Institute for Computational Molecular Science, Temple University, Philadelphia, 19122, USA.
- Department of Biology, Temple University, Philadelphia, 19122, USA.
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, 19122, USA.
- Department of Chemistry, Temple University, Philadelphia, 19122, USA.
| |
Collapse
|
34
|
Mehrabiani KM, Cheng RR, Onuchic JN. Expanding Direct Coupling Analysis to Identify Heterodimeric Interfaces from Limited Protein Sequence Data. J Phys Chem B 2021; 125:11408-11417. [PMID: 34618469 DOI: 10.1021/acs.jpcb.1c07145] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Direct coupling analysis (DCA) is a global statistical approach that uses information encoded in protein sequence data to predict spatial contacts in a three-dimensional structure of a folded protein. DCA has been widely used to predict the monomeric fold at amino acid resolution and to identify biologically relevant interaction sites within a folded protein. Going beyond single proteins, DCA has also been used to identify spatial contacts that stabilize the interaction in protein complex formation. However, extracting this higher order information necessary to predict dimer contacts presents a significant challenge. A DCA evolutionary signal is much stronger at the single protein level (intraprotein contacts) than at the protein-protein interface (interprotein contacts). Therefore, if DCA-derived information is to be used to predict the structure of these complexes, there is a need to identify statistically significant DCA predictions. We propose a simple Z-score measure that can filter good predictions despite noisy, limited data. This new methodology not only improves our prediction ability but also provides a quantitative measure for the validity of the prediction.
Collapse
Affiliation(s)
- Kareem M Mehrabiani
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Systems, Synthetic, and Physical Biology, Rice University, Houston, Texas 77005, United States
| | - Ryan R Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, United States.,Systems, Synthetic, and Physical Biology, Rice University, Houston, Texas 77005, United States.,Department of Physics & Astronomy, Rice University, Houston, Texas 77005, United States.,Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Department of Biosciences, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
35
|
Laine E, Eismann S, Elofsson A, Grudinin S. Protein sequence-to-structure learning: Is this the end(-to-end revolution)? Proteins 2021; 89:1770-1786. [PMID: 34519095 DOI: 10.1002/prot.26235] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/16/2021] [Accepted: 09/03/2021] [Indexed: 01/08/2023]
Abstract
The potential of deep learning has been recognized in the protein structure prediction community for some time, and became indisputable after CASP13. In CASP14, deep learning has boosted the field to unanticipated levels reaching near-experimental accuracy. This success comes from advances transferred from other machine learning areas, as well as methods specifically designed to deal with protein sequences and structures, and their abstractions. Novel emerging approaches include (i) geometric learning, that is, learning on representations such as graphs, three-dimensional (3D) Voronoi tessellations, and point clouds; (ii) pretrained protein language models leveraging attention; (iii) equivariant architectures preserving the symmetry of 3D space; (iv) use of large meta-genome databases; (v) combinations of protein representations; and (vi) finally truly end-to-end architectures, that is, differentiable models starting from a sequence and returning a 3D structure. Here, we provide an overview and our opinion of the novel deep learning approaches developed in the last 2 years and widely used in CASP14.
Collapse
Affiliation(s)
- Elodie Laine
- Sorbonne Université, CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Paris, France
| | - Stephan Eismann
- Department of Computer Science and Applied Physics, Stanford University, Stanford, California, USA
| | - Arne Elofsson
- Department of Biochemistry and Biophysics and Science for Life Laboratory, Stockholm University, Solna, Sweden
| | - Sergei Grudinin
- Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, Grenoble, France
| |
Collapse
|
36
|
Colizzi F, Orozco M. Probing allosteric regulations with coevolution-driven molecular simulations. SCIENCE ADVANCES 2021; 7:eabj0786. [PMID: 34516882 PMCID: PMC8442858 DOI: 10.1126/sciadv.abj0786] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/19/2021] [Accepted: 07/19/2021] [Indexed: 06/13/2023]
Abstract
Protein-mediated allosteric regulations are essential in biology, but their quantitative characterization continues to posit formidable challenges for both experiments and computations. Here, we combine coevolutionary information, multiscale molecular simulations, and free-energy methods to interrogate and quantify the allosteric regulation of functional changes in protein complexes. We apply this approach to investigate the regulation of adenylyl cyclase (AC) by stimulatory and inhibitory G proteins—a prototypical allosteric system that has long escaped from in-depth molecular characterization. We reveal a surprisingly simple ON/OFF regulation of AC functional dynamics through multiple pathways of information transfer. The binding of G proteins reshapes the free-energy landscape of AC following the classical population-shift paradigm. The model agrees with structural and biochemical data and reveals previously unknown experimentally consistent intermediates. Our approach showcases a general strategy to explore uncharted functional space in complex biomolecular regulations.
Collapse
Affiliation(s)
- Francesco Colizzi
- Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute of Science and Technology (BIST), Carrer de Baldiri Reixac 10, Barcelona 08028, Spain
| | - Modesto Orozco
- Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute of Science and Technology (BIST), Carrer de Baldiri Reixac 10, Barcelona 08028, Spain
- Departament de Bioquímica i Biomedicina, Facultat de Biologia, Universitat de Barcelona, Avinguda Diagonal 647, Barcelona 08028, Spain
| |
Collapse
|
37
|
Fleishman SJ, Horovitz A. Extending the New Generation of Structure Predictors to Account for Dynamics and Allostery. J Mol Biol 2021; 433:167007. [PMID: 33901536 DOI: 10.1016/j.jmb.2021.167007] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 04/18/2021] [Accepted: 04/19/2021] [Indexed: 10/21/2022]
Abstract
Recent progress in structure-prediction methods that rely on deep learning suggests that the atomic structure of almost any protein may soon be predictable directly from its amino acid sequence. This much-awaited revolution was driven by substantial improvements in the reliability of methods for inferring the spatial distances between amino acid pairs from an analysis of homologous sequences. Improved reliability has been accompanied, however, by a reduced ability to detect amino acid relationships that are not due to direct spatial contacts, such as those that arise from protein dynamics or allostery. Given the central importance of dynamics and allostery to protein activity, we argue that an important future advance would extend modeling beyond predicting a single static structure. Here, we briefly review some of the developments that have led to the remarkable recent achievement in structure prediction and speculate what methods and sources of information may be leveraged in the future to develop a modeling framework that addresses protein dynamics and allostery.
Collapse
Affiliation(s)
- Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7600001, Israel.
| | - Amnon Horovitz
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot 7600001, Israel.
| |
Collapse
|
38
|
Laine E, Grudinin S. HOPMA: Boosting Protein Functional Dynamics with Colored Contact Maps. J Phys Chem B 2021; 125:2577-2588. [PMID: 33687221 DOI: 10.1021/acs.jpcb.0c11633] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
In light of the recent very rapid progress in protein structure prediction, accessing the multitude of functional protein states is becoming more central than ever before. Indeed, proteins are flexible macromolecules, and they often perform their function by switching between different conformations. However, high-resolution experimental techniques such as X-ray crystallography and cryogenic electron microscopy can catch relatively few protein functional states. Many others are only accessible under physiological conditions in solution. Therefore, there is a pressing need to fill this gap with computational approaches. We present HOPMA, a novel method to predict protein functional states and transitions by using a modified elastic network model. The method exploits patterns in a protein contact map, taking its 3D structure as input, and excludes some disconnected patches from the elastic network. Combined with nonlinear normal mode analysis, this strategy boosts the protein conformational space exploration, especially when the input structure is highly constrained, as we demonstrate on a set of more than 400 transitions. Our results let us envision the discovery of new functional conformations, which were unreachable previously, starting from the experimentally known protein structures. The method is computationally efficient and available at https://github.com/elolaine/HOPMA and https://team.inria.fr/nano-d/software/nolb-normal-modes.
Collapse
Affiliation(s)
- Elodie Laine
- CNRS, IBPS, Laboratoire de Biologie Computationnelle et Quantitative (LCQB), Sorbonne Université, 75005 Paris, France
| | - Sergei Grudinin
- CNRS, Inria, Grenoble INP, LJK, Univ. Grenoble Alpes, 38000 Grenoble, France
| |
Collapse
|
39
|
Ghosh C, Jana B. Role of Calcium in Modulating the Conformational Landscape and Peptide Binding Induced Closing of Calmodulin. J Phys Chem B 2021; 125:2317-2327. [DOI: 10.1021/acs.jpcb.1c00783] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Affiliation(s)
- Catherine Ghosh
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Jadavpur, Kolkata 700032, India
| | - Biman Jana
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Jadavpur, Kolkata 700032, India
| |
Collapse
|
40
|
Crippa M, Andreghetti D, Capelli R, Tiana G. Evolution of frustrated and stabilising contacts in reconstructed ancient proteins. EUROPEAN BIOPHYSICS JOURNAL 2021; 50:699-712. [PMID: 33569610 PMCID: PMC8260555 DOI: 10.1007/s00249-021-01500-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 12/14/2020] [Accepted: 01/13/2021] [Indexed: 11/30/2022]
Abstract
Energetic properties of a protein are a major determinant of its evolutionary fitness. Using a reconstruction algorithm, dating the reconstructed proteins and calculating the interaction network between their amino acids through a coevolutionary approach, we studied how the interactions that stabilise 890 proteins, belonging to five families, evolved for billions of years. In particular, we focused our attention on the network of most strongly attractive contacts and on that of poorly optimised, frustrated contacts. Our results support the idea that the cluster of most attractive interactions extends its size along evolutionary time, but from the data, we cannot conclude that protein stability or that the degree of frustration tends always to decrease.
Collapse
Affiliation(s)
- Martina Crippa
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Damiano Andreghetti
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy
| | - Riccardo Capelli
- Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi 24, 10129, Turin, Italy
| | - Guido Tiana
- Department of Physics and Center for Complexity and Biosystems, Università degli Studi di Milano and INFN, via Celoria 16, 20133, Milan, Italy.
| |
Collapse
|
41
|
Thadani NN, Zhou Q, Reyes Gamas K, Butler S, Bueno C, Schafer NP, Morcos F, Wolynes PG, Suh J. Frustration and Direct-Coupling Analyses to Predict Formation and Function of Adeno-Associated Virus. Biophys J 2020; 120:489-503. [PMID: 33359833 DOI: 10.1016/j.bpj.2020.12.018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2020] [Revised: 11/08/2020] [Accepted: 12/08/2020] [Indexed: 01/03/2023] Open
Abstract
Adeno-associated virus (AAV) is a promising gene therapy vector because of its efficient gene delivery and relatively mild immunogenicity. To improve delivery target specificity, researchers use combinatorial and rational library design strategies to generate novel AAV capsid variants. These approaches frequently propose high proportions of nonforming or noninfective capsid protein sequences that reduce the effective depth of synthesized vector DNA libraries, thereby raising the discovery cost of novel vectors. We evaluated two computational techniques for their ability to estimate the impact of residue mutations on AAV capsid protein-protein interactions and thus predict changes in vector fitness, reasoning that these approaches might inform the design of functionally enriched AAV libraries and accelerate therapeutic candidate identification. The Frustratometer computes an energy function derived from the energy landscape theory of protein folding. Direct-coupling analysis (DCA) is a statistical framework that captures residue coevolution within proteins. We applied the Frustratometer to select candidate protein residues predicted to favor assembled or disassembled capsid states, then predicted mutation effects at these sites using the Frustratometer and DCA. Capsid mutants were experimentally assessed for changes in virus formation, stability, and transduction ability. The Frustratometer-based metric showed a counterintuitive correlation with viral stability, whereas a DCA-derived metric was highly correlated with virus transduction ability in the small population of residues studied. Our results suggest that coevolutionary models may be able to elucidate complex capsid residue-residue interaction networks essential for viral function, but further study is needed to understand the relationship between protein energy simulations and viral capsid metastability.
Collapse
Affiliation(s)
| | - Qin Zhou
- Department of Biological Sciences, University of Texas at Dallas, Richardson, Texas
| | | | - Susan Butler
- Department of Bioengineering, Rice University, Houston, Texas
| | - Carlos Bueno
- Center for Theoretical Biological Physics, Rice University, Houston, Texas; Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas
| | - Nicholas P Schafer
- Center for Theoretical Biological Physics, Rice University, Houston, Texas; Department of Chemistry, Rice University, Houston, Texas
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, Texas; Center for Systems Biology, University of Texas at Dallas, Richardson, Texas; Department of Bioengineering, University of Texas at Dallas, Richardson, Texas
| | - Peter G Wolynes
- Center for Theoretical Biological Physics, Rice University, Houston, Texas; Department of Chemistry, Rice University, Houston, Texas; Department of Biosciences, Rice University, Houston, Texas; Department of Physics, Rice University, Houston, Texas
| | - Junghae Suh
- Department of Bioengineering, Rice University, Houston, Texas; Department of Biosciences, Rice University, Houston, Texas; Department of Chemical and Biomolecular Engineering, Rice University, Houston, Texas; Systems, Synthetic, and Physical Biology Program, Rice University, Houston, Texas.
| |
Collapse
|
42
|
Voronin A, Weiel M, Schug A. Including residual contact information into replica-exchange MD simulations significantly enriches native-like conformations. PLoS One 2020; 15:e0242072. [PMID: 33196676 PMCID: PMC7668583 DOI: 10.1371/journal.pone.0242072] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2020] [Accepted: 10/27/2020] [Indexed: 11/19/2022] Open
Abstract
Proteins are complex biomolecules which perform critical tasks in living organisms. Knowledge of a protein's structure is essential for understanding its physiological function in detail. Despite the incredible progress in experimental techniques, protein structure determination is still expensive, time-consuming, and arduous. That is why computer simulations are often used to complement or interpret experimental data. Here, we explore how in silico protein structure determination based on replica-exchange molecular dynamics (REMD) can benefit from including contact information derived from theoretical and experimental sources, such as direct coupling analysis or NMR spectroscopy. To reflect the influence from erroneous and noisy data we probe how false-positive contacts influence the simulated ensemble. Specifically, we integrate varying numbers of randomly selected native and non-native contacts and explore how such a bias can guide simulations towards the native state. We investigate the number of contacts needed for a significant enrichment of native-like conformations and show the capabilities and limitations of this method. Adhering to a threshold of approximately 75% true-positive contacts within a simulation, we obtain an ensemble with native-like conformations of high quality. We find that contact-guided REMD is capable of delivering physically reasonable models of a protein's structure.
Collapse
Affiliation(s)
- Arthur Voronin
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
- Department of Physics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Marie Weiel
- Steinbuch Centre for Computing, Karlsruhe Institute of Technology, Eggenstein-Leopoldshafen, Germany
- Department of Physics, Karlsruhe Institute of Technology, Karlsruhe, Germany
| | - Alexander Schug
- Institute for Advanced Simulation, Jülich Supercomputing Center, Jülich, Germany
- Faculty of Biology, University of Duisburg-Essen, Duisburg, Germany
| |
Collapse
|
43
|
Cooper CJ, Zheng K, Rush KW, Johs A, Sanders BC, Pavlopoulos GA, Kyrpides NC, Podar M, Ovchinnikov S, Ragsdale SW, Parks JM. Structure determination of the HgcAB complex using metagenome sequence data: insights into microbial mercury methylation. Commun Biol 2020; 3:320. [PMID: 32561885 PMCID: PMC7305189 DOI: 10.1038/s42003-020-1047-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2020] [Accepted: 05/27/2020] [Indexed: 11/09/2022] Open
Abstract
Bacteria and archaea possessing the hgcAB gene pair methylate inorganic mercury (Hg) to form highly toxic methylmercury. HgcA consists of a corrinoid binding domain and a transmembrane domain, and HgcB is a dicluster ferredoxin. However, their detailed structure and function have not been thoroughly characterized. We modeled the HgcAB complex by combining metagenome sequence data mining, coevolution analysis, and Rosetta structure calculations. In addition, we overexpressed HgcA and HgcB in Escherichia coli, confirmed spectroscopically that they bind cobalamin and [4Fe-4S] clusters, respectively, and incorporated these cofactors into the structural model. Surprisingly, the two domains of HgcA do not interact with each other, but HgcB forms extensive contacts with both domains. The model suggests that conserved cysteines in HgcB are involved in shuttling HgII, methylmercury, or both. These findings refine our understanding of the mechanism of Hg methylation and expand the known repertoire of corrinoid methyltransferases in nature. Connor J. Cooper et al. expressed HgcA and HgcB in Escherichia coli and modeled the structure of the HgcAB complex by combining metagenome sequence data, coevolution analysis, and ab initio structure calculations. This study provides insights into the biochemical mechanism of mercury (Hg) methylation.
Collapse
Affiliation(s)
- Connor J Cooper
- Graduate School of Genome Science and Technology, University of Tennessee, F225 Walters Life Science, Knoxville, TN, 37996, USA.,Biosciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37831-6038, USA
| | - Kaiyuan Zheng
- Department of Biological Chemistry, University of Michigan Medical School, 1150 West Medical Center Drive, Ann Arbor, MI, 48109-0606, USA
| | - Katherine W Rush
- Department of Biological Chemistry, University of Michigan Medical School, 1150 West Medical Center Drive, Ann Arbor, MI, 48109-0606, USA
| | - Alexander Johs
- Environmental Sciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37831-6038, USA
| | - Brian C Sanders
- Biosciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37831-6038, USA
| | - Georgios A Pavlopoulos
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA.,Institute for Fundamental Biomedical Research, Biomedical Science Research Center "Alexander Fleming", 34 Fleming Street, 16672, Vari, Greece
| | - Nikos C Kyrpides
- DOE Joint Genome Institute, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA, 94720, USA.,Environmental Genomics and Systems Biology Division, Lawrence Berkeley National Laboratory Berkeley, California, USA
| | - Mircea Podar
- Graduate School of Genome Science and Technology, University of Tennessee, F225 Walters Life Science, Knoxville, TN, 37996, USA.,Biosciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37831-6038, USA
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, MA, 02138, USA
| | - Stephen W Ragsdale
- Department of Biological Chemistry, University of Michigan Medical School, 1150 West Medical Center Drive, Ann Arbor, MI, 48109-0606, USA
| | - Jerry M Parks
- Graduate School of Genome Science and Technology, University of Tennessee, F225 Walters Life Science, Knoxville, TN, 37996, USA. .,Biosciences Division, Oak Ridge National Laboratory, 1 Bethel Valley Road, Oak Ridge, TN, 37831-6038, USA.
| |
Collapse
|
44
|
Jin S, Chen M, Chen X, Bueno C, Lu W, Schafer NP, Lin X, Onuchic JN, Wolynes PG. Protein Structure Prediction in CASP13 Using AWSEM-Suite. J Chem Theory Comput 2020; 16:3977-3988. [PMID: 32396727 DOI: 10.1021/acs.jctc.0c00188] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Recently several techniques have emerged that significantly enhance the quality of predictions of protein tertiary structures. In this study, we describe the performance of AWSEM-Suite, an algorithm that incorporates template-based modeling and coevolutionary restraints with a realistic coarse-grained force field, AWSEM. With its roots in neural networks, AWSEM contains both physical and bioinformatical energies that have been optimized using energy landscape theory. AWSEM-Suite participated in CASP13 as a server predictor and generated reliable predictions for most targets. AWSEM-Suite ranked eighth in both the free-modeling category and the hard-to-model category and in one case provided the best submitted prediction. Here we critically discuss the prediction performance of AWSEM-Suite using several examples from different categories in CASP13. Structure prediction tests on these selected targets, two of them being hard-to-model targets, show that AWSEM-Suite can achieve high-resolution structure prediction after incorporating both template guidances and coevolutionary restraints even when homology is weak. For targets with reliable templates (template-easy category), introducing coevolutionary restraints sometimes damages the overall quality of the predictions. Free energy profile analyses demonstrate, however, that the incorporations of both of these evolutionarily informed terms effectively increase the funneling of the landscape toward native-like structures while still allowing sufficient flexibility to correct for discrepancies between the correct target structure and the provided guidance. In contrast to other predictors that are exclusively oriented toward structure prediction, the connection of AWSEM-Suite to a statistical mechanical basis and affiliated molecular dynamics and importance sampling simulations makes it suitable for functional explorations.
Collapse
Affiliation(s)
| | | | - Xun Chen
- Department of Chemistry, Rice University, Houston, Texas 77005, United States
| | | | - Wei Lu
- Department of Physics, Rice University, Houston, Texas 77005, United States
| | | | - Xingcheng Lin
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, United States
| | - José N Onuchic
- Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Department of Physics, Rice University, Houston, Texas 77005, United States
| | - Peter G Wolynes
- Department of Chemistry, Rice University, Houston, Texas 77005, United States.,Department of Physics, Rice University, Houston, Texas 77005, United States
| |
Collapse
|
45
|
Terzoli S, Tiana G. Molecular Recognition between Cadherins Studied by a Coarse-Grained Model Interacting with a Coevolutionary Potential. J Phys Chem B 2020; 124:4079-4088. [PMID: 32336092 PMCID: PMC8007105 DOI: 10.1021/acs.jpcb.0c01671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
![]()
Studying the conformations
involved in the dimerization of cadherins
is highly relevant to understand the development of tissues and its
failure, which is associated with tumors and metastases. Experimental
techniques, like X-ray crystallography, can usually report only the
most stable conformations, missing minority states that could nonetheless
be important for the recognition mechanism. Computer simulations could
be a valid complement to the experimental approach. However, standard
all-atom protein models in explicit solvent are computationally too
demanding to search thoroughly the conformational space of multiple
chains composed of several hundreds of amino acids. To reach this
goal, we resorted to a coarse-grained model in implicit solvent. The
standard problem with this kind of model is to find a realistic potential
to describe its interactions. We used coevolutionary information from
cadherin alignments, corrected by a statistical potential, to build
an interaction potential, which is agnostic about the experimental
conformations of the protein. Using this model, we explored the conformational
space of multichain systems and validated the results comparing with
experimental data. We identified dimeric conformations that are sequence
specific and that can be useful to rationalize the mechanism of recognition
between cadherins.
Collapse
Affiliation(s)
- Sara Terzoli
- Department of Physics and Center for Complexity and Biosystems, Universitá degli Studi di Milano and INFN, via Celoria 16, Milano 20133, Italy
| | - Guido Tiana
- Department of Physics and Center for Complexity and Biosystems, Universitá degli Studi di Milano and INFN, via Celoria 16, Milano 20133, Italy
| |
Collapse
|
46
|
Feng J, Shukla D. FingerprintContacts: Predicting Alternative Conformations of Proteins from Coevolution. J Phys Chem B 2020; 124:3605-3615. [PMID: 32283936 DOI: 10.1021/acs.jpcb.9b11869] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Proteins are dynamic molecules which perform diverse molecular functions by adopting different three-dimensional structures. Recent progress in residue-residue contacts prediction opens up new avenues for the de novo protein structure prediction from sequence information. However, it is still difficult to predict more than one conformation from residue-residue contacts alone. This is due to the inability to deconvolve the complex signals of residue-residue contacts, i.e., spatial contacts relevant for protein folding, conformational diversity, and ligand binding. Here, we introduce a machine learning based method, called FingerprintContacts, for extending the capabilities of residue-residue contacts. This algorithm leverages the features of residue-residue contacts, that is, (1) a single conformation outperforms the others in the structural prediction using all the top ranking residue-residue contacts as structural constraints and (2) conformation specific contacts rank lower and constitute a small fraction of residue-residue contacts. We demonstrate the capabilities of FingerprintContacts on eight ligand binding proteins with varying conformational motions. Furthermore, FingerprintContacts identifies small clusters of residue-residue contacts which are preferentially located in the dynamically fluctuating regions. With the rapid growth in protein sequence information, we expect FingerprintContacts to be a powerful first step in structural understanding of protein functional mechanisms.
Collapse
|
47
|
Gandarilla-Pérez CA, Mergny P, Weigt M, Bitbol AF. Statistical physics of interacting proteins: Impact of dataset size and quality assessed in synthetic sequences. Phys Rev E 2020; 101:032413. [PMID: 32290011 DOI: 10.1103/physreve.101.032413] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2019] [Accepted: 03/04/2020] [Indexed: 11/07/2022]
Abstract
Identifying protein-protein interactions is crucial for a systems-level understanding of the cell. Recently, algorithms based on inverse statistical physics, e.g., direct coupling analysis (DCA), have allowed to use evolutionarily related sequences to address two conceptually related inference tasks: finding pairs of interacting proteins and identifying pairs of residues which form contacts between interacting proteins. Here we address two underlying questions: How are the performances of both inference tasks related? How does performance depend on dataset size and the quality? To this end, we formalize both tasks using Ising models defined over stochastic block models, with individual blocks representing single proteins and interblock couplings protein-protein interactions; controlled synthetic sequence data are generated by Monte Carlo simulations. We show that DCA is able to address both inference tasks accurately when sufficiently large training sets of known interaction partners are available and that an iterative pairing algorithm allows to make predictions even without a training set. Noise in the training data deteriorates performance. In both tasks we find a quadratic scaling relating dataset quality and size that is consistent with noise adding in square-root fashion and signal adding linearly when increasing the dataset. This implies that it is generally good to incorporate more data even if their quality are imperfect, thereby shedding light on the empirically observed performance of DCA applied to natural protein sequences.
Collapse
Affiliation(s)
- Carlos A Gandarilla-Pérez
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), F-75005 Paris, France.,Facultad de Física, Universidad de la Habana, San Lázaro y L, Vedado, Habana 4, CP-10400, Cuba
| | - Pierre Mergny
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), F-75005 Paris, France.,Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire Jean Perrin (LJP, UMR 8237), F-75005 Paris, France
| | - Martin Weigt
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire de Biologie Computationnelle et Quantitative (LCQB, UMR 7238), F-75005 Paris, France
| | - Anne-Florence Bitbol
- Sorbonne Université, CNRS, Institut de Biologie Paris-Seine, Laboratoire Jean Perrin (LJP, UMR 8237), F-75005 Paris, France.,Institute of Bioengineering, School of Life Sciences, École Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland
| |
Collapse
|
48
|
Nerattini F, Figliuzzi M, Cardelli C, Tubiana L, Bianco V, Dellago C, Coluzza I. Identification of Protein Functional Regions. Chemphyschem 2020; 21:335-347. [PMID: 31944517 DOI: 10.1002/cphc.201900898] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2019] [Revised: 11/01/2019] [Indexed: 11/12/2022]
Abstract
Protein sequence stores the information relative to both functionality and stability, thus making it difficult to disentangle the two contributions. However, the identification of critical residues for function and stability has important implications for the mapping of the proteome interactions, as well as for many pharmaceutical applications, e. g. the identification of ligand binding regions for targeted pharmaceutical protein design. In this work, we propose a computational method to identify critical residues for protein functionality and stability and to further categorise them in strictly functional, structural and intermediate. We evaluate single site conservation and use Direct Coupling Analysis (DCA) to identify co-evolved residues both in natural and artificial evolution processes. We reproduce artificial evolution using protein design and base our approach on the hypothesis that artificial evolution in the absence of any functional constraint would exclusively lead to site conservation and co-evolution events of the structural type. Conversely, natural evolution intrinsically embeds both functional and structural information. By comparing the lists of conserved and co-evolved residues, outcomes of the analysis on natural and artificial evolution, we identify the functional residues without the need of any a priori knowledge of the biological role of the analysed protein.
Collapse
Affiliation(s)
- Francesca Nerattini
- Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090, Vienna, Austria
| | - Matteo Figliuzzi
- Sorbonne Universites, UPMC, Institut de Biologie Paris-Seine, CNRS, Laboratoire de Biologie Computationnelle et Quantitative UMR, 7238, Paris, France
| | - Chiara Cardelli
- Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090, Vienna, Austria
| | - Luca Tubiana
- Physics Department, Universitá degli studi di Trento, via Sommarive 14, 38123, Trento, IT
| | - Valentino Bianco
- Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090, Vienna, Austria.,Faculty of Chemistry, Chemical Physics Department, Universidad Complutense de Madrid, Plaza de las Ciencias, Ciudad Universitaria, Madrid, 28040, Spain
| | - Christoph Dellago
- Faculty of Physics, University of Vienna, Boltzmanngasse 5, 1090, Vienna, Austria
| | - Ivan Coluzza
- CIC biomaGUNE, Paseo Miramon 182, 20014 San Sebastian, Spain, and IKERBASQUE, Basque Foundation for Science, 48013, Bilbao, Spain
| |
Collapse
|
49
|
Malinverni D, Barducci A. Coevolutionary Analysis of Protein Subfamilies by Sequence Reweighting. ENTROPY (BASEL, SWITZERLAND) 2020; 21:1127. [PMID: 32002010 PMCID: PMC6992422 DOI: 10.3390/e21111127] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2019] [Accepted: 11/14/2019] [Indexed: 01/07/2023]
Abstract
Extracting structural information from sequence co-variation has become a common computational biology practice in the recent years, mainly due to the availability of large sequence alignments of protein families. However, identifying features that are specific to sub-classes and not shared by all members of the family using sequence-based approaches has remained an elusive problem. We here present a coevolutionary-based method to differentially analyze subfamily specific structural features by a continuous sequence reweighting (SR) approach. We introduce the underlying principles and test its predictive capabilities on the Response Regulator family, whose subfamilies have been previously shown to display distinct, specific homo-dimerization patterns. Our results show that this reweighting scheme is effective in assigning structural features known a priori to subfamilies, even when sequence data is relatively scarce. Furthermore, sequence reweighting allows assessing if individual structural contacts pertain to specific subfamilies and it thus paves the way for the identification specificity-determining contacts from sequence variation data.
Collapse
Affiliation(s)
- Duccio Malinverni
- Medical Research Council (MRC) Laboratory of Molecular Biology, Cambridge CB20QH, UK
| | - Alessandro Barducci
- Centre de Biochimie Structurale (CBS), INSERM, CNRS, Université de Montpellier, 34090 Montpellier, France
| |
Collapse
|
50
|
Gershenson A, Gosavi S, Faccioli P, Wintrode PL. Successes and challenges in simulating the folding of large proteins. J Biol Chem 2020; 295:15-33. [PMID: 31712314 PMCID: PMC6952611 DOI: 10.1074/jbc.rev119.006794] [Citation(s) in RCA: 41] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
Computational simulations of protein folding can be used to interpret experimental folding results, to design new folding experiments, and to test the effects of mutations and small molecules on folding. However, whereas major experimental and computational progress has been made in understanding how small proteins fold, research on larger, multidomain proteins, which comprise the majority of proteins, is less advanced. Specifically, large proteins often fold via long-lived partially folded intermediates, whose structures, potentially toxic oligomerization, and interactions with cellular chaperones remain poorly understood. Molecular dynamics based folding simulations that rely on knowledge of the native structure can provide critical, detailed information on folding free energy landscapes, intermediates, and pathways. Further, increases in computational power and methodological advances have made folding simulations of large proteins practical and valuable. Here, using serpins that inhibit proteases as an example, we review native-centric methods for simulating the folding of large proteins. These synergistic approaches range from Gō and related structure-based models that can predict the effects of the native structure on folding to all-atom-based methods that include side-chain chemistry and can predict how disease-associated mutations may impact folding. The application of these computational approaches to serpins and other large proteins highlights the successes and limitations of current computational methods and underscores how computational results can be used to inform experiments. These powerful simulation approaches in combination with experiments can provide unique insights into how large proteins fold and misfold, expanding our ability to predict and manipulate protein folding.
Collapse
Affiliation(s)
- Anne Gershenson
- Department of Biochemistry and Molecular Biology, University of Massachusetts, Amherst, Massachusetts 01003; Molecular and Cellular Biology Graduate Program, University of Massachusetts, Amherst, Massachusetts 01003.
| | - Shachi Gosavi
- Simons Centre for the Study of Living Machines, National Centre for Biological Sciences, Tata Institute of Fundamental Research, Bangalore-560065, India.
| | - Pietro Faccioli
- Dipartimento di Fisica, Universitá degli Studi di Trento, 38122 Povo (Trento), Italy; Trento Institute for Fundamental Physics and Applications, 38123 Povo (Trento), Italy.
| | - Patrick L Wintrode
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, Maryland 21201.
| |
Collapse
|