1
|
Rocha REO, Mariano DCB, Almeida TS, CorrêaCosta LS, Fischer PHC, Santos LH, Caffarena ER, da Silveira CH, Lamp LM, Fernandez-Quintero ML, Liedl KR, de Melo-Minardi RC, de Lima LHF. Thermostabilizing mechanisms of canonical single amino acid substitutions at a GH1 β-glucosidase probed by multiple MD and computational approaches. Proteins 2023; 91:218-236. [PMID: 36114781 DOI: 10.1002/prot.26424] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 09/01/2022] [Accepted: 09/06/2022] [Indexed: 01/07/2023]
Abstract
β-glucosidases play a pivotal role in second-generation biofuel (2G-biofuel) production. For this application, thermostable enzymes are essential due to the denaturing conditions on the bioreactors. Random amino acid substitutions have originated new thermostable β-glucosidases, but without a clear understanding of their molecular mechanisms. Here, we probe by different molecular dynamics simulation approaches with distinct force fields and submitting the results to various computational analyses, the molecular bases of the thermostabilization of the Paenibacillus polymyxa GH1 β-glucosidase by two-point mutations E96K (TR1) and M416I (TR2). Equilibrium molecular dynamic simulations (eMD) at different temperatures, principal component analysis (PCA), virtual docking, metadynamics (MetaDy), accelerated molecular dynamics (aMD), Poisson-Boltzmann surface analysis, grid inhomogeneous solvation theory and colony method estimation of conformational entropy allow to converge to the idea that the stabilization carried by both substitutions depend on different contributions of three classic mechanisms: (i) electrostatic surface stabilization; (ii) efficient isolation of the hydrophobic core from the solvent, with energetic advantages at the solvation cap; (iii) higher distribution of the protein dynamics at the mobile active site loops than at the protein core, with functional and entropic advantages. Mechanisms i and ii predominate for TR1, while in TR2, mechanism iii is dominant. Loop A integrity and loops A, C, D, and E dynamics play critical roles in such mechanisms. Comparison of the dynamic and topological changes observed between the thermostable mutants and the wildtype protein with amino acid co-evolutive networks and thermostabilizing hotspots from the literature allow inferring that the mechanisms here recovered can be related to the thermostability obtained by different substitutions along the whole family GH1. We hope the results and insights discussed here can be helpful for future rational approaches to the engineering of optimized β-glucosidases for 2G-biofuel production for industry, biotechnology, and science.
Collapse
Affiliation(s)
- Rafael Eduardo Oliveira Rocha
- Laboratory of Molecular Modelling and Bioinformatics (LAMMB), Department of Physical and Biological Sciences, Campus Sete Lagoas, Universidade Federal de São João Del Rei, Sete Lagoas, Brazil.,Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.,Laboratory of Molecular Modeling and Drug Design, Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | - Diego César Batista Mariano
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Tiago Silva Almeida
- Laboratory of Molecular Modelling and Bioinformatics (LAMMB), Department of Physical and Biological Sciences, Campus Sete Lagoas, Universidade Federal de São João Del Rei, Sete Lagoas, Brazil
| | - Leon Sulfierry CorrêaCosta
- Laboratory of Molecular Modelling and Bioinformatics (LAMMB), Department of Physical and Biological Sciences, Campus Sete Lagoas, Universidade Federal de São João Del Rei, Sete Lagoas, Brazil.,Computational Modeling Coordination (COMOD), Laboratório Nacional de Computação Científica (LNCC), Petrópolis, Brazil
| | - Pedro Henrique Camargo Fischer
- Laboratory of Molecular Modelling and Bioinformatics (LAMMB), Department of Physical and Biological Sciences, Campus Sete Lagoas, Universidade Federal de São João Del Rei, Sete Lagoas, Brazil
| | - Lucianna Helene Santos
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil.,Laboratory of Molecular Modeling and Drug Design, Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil
| | | | | | - Leonida M Lamp
- Institute of General, Inorganic and Theoretical Chemistry, and Center for Chemistry and Biomedicine Innsbruck (CCB), University of Innsbruck, Innsbruck, Austria
| | - Monica Lisa Fernandez-Quintero
- Institute of General, Inorganic and Theoretical Chemistry, and Center for Chemistry and Biomedicine Innsbruck (CCB), University of Innsbruck, Innsbruck, Austria
| | - Klaus Roman Liedl
- Institute of General, Inorganic and Theoretical Chemistry, and Center for Chemistry and Biomedicine Innsbruck (CCB), University of Innsbruck, Innsbruck, Austria
| | - Raquel Cardoso de Melo-Minardi
- Laboratory of Bioinformatics and Systems (LBS), Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
| | - Leonardo Henrique França de Lima
- Laboratory of Molecular Modelling and Bioinformatics (LAMMB), Department of Physical and Biological Sciences, Campus Sete Lagoas, Universidade Federal de São João Del Rei, Sete Lagoas, Brazil.,Institute of General, Inorganic and Theoretical Chemistry, and Center for Chemistry and Biomedicine Innsbruck (CCB), University of Innsbruck, Innsbruck, Austria
| |
Collapse
|
2
|
Sun J, Kulandaisamy A, Liu J, Hu K, Gromiha MM, Zhang Y. Machine learning in computational modelling of membrane protein sequences and structures: From methodologies to applications. Comput Struct Biotechnol J 2023; 21:1205-1226. [PMID: 36817959 PMCID: PMC9932300 DOI: 10.1016/j.csbj.2023.01.036] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2022] [Revised: 01/16/2023] [Accepted: 01/25/2023] [Indexed: 01/29/2023] Open
Abstract
Membrane proteins mediate a wide spectrum of biological processes, such as signal transduction and cell communication. Due to the arduous and costly nature inherent to the experimental process, membrane proteins have long been devoid of well-resolved atomic-level tertiary structures and, consequently, the understanding of their functional roles underlying a multitude of life activities has been hampered. Currently, computational tools dedicated to furthering the structure-function understanding are primarily focused on utilizing intelligent algorithms to address a variety of site-wise prediction problems (e.g., topology and interaction sites), but are scattered across different computing sources. Moreover, the recent advent of deep learning techniques has immensely expedited the development of computational tools for membrane protein-related prediction problems. Given the growing number of applications optimized particularly by manifold deep neural networks, we herein provide a review on the current status of computational strategies mainly in membrane protein type classification, topology identification, interaction site detection, and pathogenic effect prediction. Meanwhile, we provide an overview of how the entire prediction process proceeds, including database collection, data pre-processing, feature extraction, and method selection. This review is expected to be useful for developing more extendable computational tools specific to membrane proteins.
Collapse
Affiliation(s)
- Jianfeng Sun
- Botnar Research Centre, Nuffield Department of Orthopedics, Rheumatology, and Musculoskeletal Sciences, University of Oxford, Headington, Oxford OX3 7LD, UK
| | - Arulsamy Kulandaisamy
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India
| | - Jacklyn Liu
- UCL Cancer Institute, University College London, 72 Huntley Street, London WC1E 6BT, UK
| | - Kai Hu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China
| | - M. Michael Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of BioSciences, Indian Institute of Technology Madras, Chennai 600 036, Tamilnadu, India,Corresponding authors.
| | - Yuan Zhang
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan 411105, China,Corresponding authors.
| |
Collapse
|
3
|
Tresnak DT, Hackel BJ. Deep Antimicrobial Activity and Stability Analysis Inform Lysin Sequence-Function Mapping. ACS Synth Biol 2023; 12:249-264. [PMID: 36599162 PMCID: PMC10822705 DOI: 10.1021/acssynbio.2c00509] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Antibiotic-resistant infectious disease is a critical challenge to human health. Antimicrobial proteins offer a compelling solution if engineered for potency, selectivity, and physiological stability. Lysins, which lyse cells via degradation of cell wall peptidoglycans, have significant potential to fill this role. Yet, the functional complexity of antimicrobial activity has hindered high-throughput characterization for discovery and design. To dramatically expand knowledge of the sequence-function landscape of lysins, we developed a depletion-based assay for library-scale measurement of lysin inhibitory activity. We coupled this platform with a high-throughput proteolytic stability assay to assess the activity and stability of ∼5 × 104 lysin catalytic domain variants, resulting in the discovery of a variant with increased activity (70 ± 20%) and stability (7.2 ± 0.4 °C increased midpoint of thermal denaturation). Ridge regression of the resulting data set demonstrated that libraries with a higher average Hamming distance better informed pairwise models and that coupling activity and stability assays enabled better prediction of catalytically active lysins. The best models achieved Pearson's correlation coefficients of 0.87 ± 0.01 and 0.61 ± 0.04 for predicting catalytic domain stability and activity, respectively. Our work provides an efficient strategy for constructing protein sequence-function landscapes, drastically increases screening throughput for engineering lysins, and yields promising lysins for further development.
Collapse
Affiliation(s)
- Daniel T Tresnak
- Department of Chemical Engineering and Materials Science, University of Minnesota─Twin Cities, 421 Washington Avenue SE, Minneapolis, Minnesota55455, United States
| | - Benjamin J Hackel
- Department of Chemical Engineering and Materials Science, University of Minnesota─Twin Cities, 421 Washington Avenue SE, Minneapolis, Minnesota55455, United States
| |
Collapse
|
4
|
Torielli L, Serapian SA, Mussolin L, Moroni E, Colombo G. Integrating Protein Interaction Surface Prediction with a Fragment-Based Drug Design: Automatic Design of New Leads with Fragments on Energy Surfaces. J Chem Inf Model 2023; 63:343-353. [PMID: 36574607 PMCID: PMC9832486 DOI: 10.1021/acs.jcim.2c01408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Protein-protein interactions (PPIs) have emerged in the past years as significant pharmacological targets in the development of new therapeutics due to their key roles in determining pathological pathways. Herein, we present fragments on energy surfaces, a simple and general design strategy that integrates the analysis of the dynamic and energetic signatures of proteins to unveil the substructures involved in PPIs, with docking, selection, and combination of drug-like fragments to generate new PPI inhibitor candidates. Specifically, structural representatives of the target protein are used as inputs for the blind physics-based prediction of potential protein interaction surfaces using the matrix of low coupling energy decomposition method. The predicted interaction surfaces are subdivided into overlapping windows that are used as templates to direct the docking and combination of fragments representative of moieties typically found in active drugs. This protocol is then applied and validated using structurally diverse, important PPI targets as test systems. We demonstrate that our approach facilitates the exploration of the molecular diversity space of potential ligands, with no requirement of prior information on the location and properties of interaction surfaces or on the structures of potential lead compounds. Importantly, the hit molecules that emerge from our ab initio design share high chemical similarity with experimentally tested active PPI inhibitors. We propose that the protocol we describe here represents a valuable means of generating initial leads against difficult targets for further development and refinement.
Collapse
Affiliation(s)
- Luca Torielli
- Department
of Chemistry, University of Pavia, Via Taramelli 12, Pavia27100, Italy
| | - Stefano A. Serapian
- Department
of Chemistry, University of Pavia, Via Taramelli 12, Pavia27100, Italy
| | - Lara Mussolin
- Department
of Woman’s and Child’s Health, Pediatric Hematology,
Oncology and Stem Cell Transplant Center, University of Padua, Via Giustiniani, 3, Padua35128, Italy,Istituto
di Ricerca Pediatrica Città della Speranza, Corso Stati Uniti, 4 F, Padova35127, Italy
| | | | - Giorgio Colombo
- Department
of Chemistry, University of Pavia, Via Taramelli 12, Pavia27100, Italy,
| |
Collapse
|
5
|
Do HN, Haldane A, Levy RM, Miao Y. Unique features of different classes of G-protein-coupled receptors revealed from sequence coevolutionary and structural analysis. Proteins 2022; 90:601-614. [PMID: 34599827 PMCID: PMC8738117 DOI: 10.1002/prot.26256] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2021] [Revised: 09/21/2021] [Accepted: 09/27/2021] [Indexed: 02/03/2023]
Abstract
G-protein-coupled receptors (GPCRs) are the largest family of human membrane proteins and represent the primary targets of about one third of currently marketed drugs. Despite the critical importance, experimental structures have been determined for only a limited portion of GPCRs and functional mechanisms of GPCRs remain poorly understood. Here, we have constructed novel sequence coevolutionary models of the A and B classes of GPCRs and compared them with residue contact frequency maps generated with available experimental structures. Significant portions of structural residue contacts were successfully detected in the sequence-based covariational models. "Exception" residue contacts predicted from sequence coevolutionary models but not available structures added missing links that were important for GPCR activation and allosteric modulation. Moreover, we identified distinct residue contacts involving different sets of functional motifs for GPCR activation, such as the Na+ pocket, CWxP, DRY, PIF, and NPxxY motifs in the class A and the HETx and PxxG motifs in the class B. Finally, we systematically uncovered critical residue contacts tuned by allosteric modulation in the two classes of GPCRs, including those from the activation motifs and particularly the extracellular and intracellular loops in class A GPCRs. These findings provide a promising framework for rational design of ligands to regulate GPCR activation and allosteric modulation.
Collapse
Affiliation(s)
- Hung N Do
- The Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas 66047
| | - Allan Haldane
- Department of Chemistry, Center for Biophysics and Computational Biology, Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania 19122,Corresponding authors: and
| | - Ronald M Levy
- Department of Chemistry, Center for Biophysics and Computational Biology, Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania 19122
| | - Yinglong Miao
- The Center for Computational Biology and Department of Molecular Biosciences, The University of Kansas, Lawrence, Kansas 66047,Corresponding authors: and
| |
Collapse
|
6
|
Schmidt M, Hamacher K. Identification of biophysical interaction patterns in direct coupling analysis. Phys Rev E 2021; 103:042418. [PMID: 34005861 DOI: 10.1103/physreve.103.042418] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2020] [Accepted: 03/27/2021] [Indexed: 11/07/2022]
Abstract
Direct-coupling analysis is a statistical learning method for protein contact prediction based on sequence information alone. The maximum entropy principle leads to an effective inverse Potts model. Predictions on contacts are based on fitted local fields and couplings from an empirical multiple sequence alignment. Typically, the l_{2} norm of the resulting two-body couplings is used for contact prediction. However, this procedure discards important information. In this paper we show that the usage of the full fields and coupling information improves prediction accuracy.
Collapse
Affiliation(s)
- Michael Schmidt
- Department of Physics, TU Darmstadt, Karolinenpl. 5, 64289 Darmstadt, Germany
| | - Kay Hamacher
- Department of Physics, TU Darmstadt, Karolinenpl. 5, 64289 Darmstadt, Germany.,Department of Biology, TU Darmstadt, Schnittspahnstr. 10, 64287 Darmstadt, Germany.,Department of Computer Science, TU Darmstadt, Karolinenpl. 5, 64289 Darmstadt, Germany
| |
Collapse
|
7
|
Liu CY, Cecylia Severin L, Lyu CJ, Zhu WL, Wang HP, Jiang CJ, Mei LH, Liu HG, Huang J. Improving thermostability of (R)-selective amine transaminase from Aspergillus terreus by evolutionary coupling saturation mutagenesis. Biochem Eng J 2021. [DOI: 10.1016/j.bej.2021.107926] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
|
8
|
Serapian SA, Triveri A, Marchetti F, Castelli M, Colombo G. Exploiting Folding and Degradation Machineries To Target Undruggable Proteins: What Can a Computational Approach Tell Us? ChemMedChem 2021; 16:1593-1599. [PMID: 33443306 DOI: 10.1002/cmdc.202000960] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Indexed: 01/03/2023]
Abstract
Advances in genomics and proteomics have unveiled an ever-growing number of key proteins and provided mechanistic insights into the genesis of pathologies. This wealth of data showed that changes in expression levels of specific proteins, mutations, and post-translational modifications can result in (often subtle) perturbations of functional protein-protein interaction networks, which ultimately determine disease phenotypes. Although many such validated pathogenic proteins have emerged as ideal drug targets, there are also several that escape traditional pharmacological regulation; these proteins have thus been labeled "undruggable". The challenges posed by undruggable targets call for new sorts of molecular intervention. One fascinating solution is to perturb a pathogenic protein's expression levels, rather than blocking its activities. In this Concept paper, we shall discuss chemical interventions aimed at recruiting undruggable proteins to the ubiquitin proteasome system, or aimed at disrupting protein-protein interactions in the chaperone-mediated cellular folding machinery: both kinds of intervention lead to a decrease in the amount of active pathogenic protein expressed. Specifically, we shall discuss the role of computational strategies in understanding the molecular determinants characterizing the function of synthetic molecules typically designed for either type of intervention. Finally, we shall provide our perspectives and views on the current limitations and possibilities to expand the scope of rational approaches to the design of chemical regulators of protein levels.
Collapse
Affiliation(s)
- Stefano A Serapian
- Department of Chemistry, University of Pavia, Via Taramelli 12, 27100, Pavia, Italy
| | - Alice Triveri
- Department of Chemistry, University of Pavia, Via Taramelli 12, 27100, Pavia, Italy
| | - Filippo Marchetti
- Department of Chemistry, University of Pavia, Via Taramelli 12, 27100, Pavia, Italy
| | - Matteo Castelli
- Department of Chemistry, University of Pavia, Via Taramelli 12, 27100, Pavia, Italy
| | - Giorgio Colombo
- Department of Chemistry, University of Pavia, Via Taramelli 12, 27100, Pavia, Italy
| |
Collapse
|
9
|
Mariano D, Pantuza N, Santos LH, Rocha REO, de Lima LHF, Bleicher L, de Melo-Minardi RC. Glutantβase: a database for improving the rational design of glucose-tolerant β-glucosidases. BMC Mol Cell Biol 2020; 21:50. [PMID: 32611314 PMCID: PMC7329481 DOI: 10.1186/s12860-020-00293-y] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 06/22/2020] [Indexed: 11/22/2022] Open
Abstract
Β-glucosidases are key enzymes used in second-generation biofuel production. They act in the last step of the lignocellulose saccharification, converting cellobiose in glucose. However, most of the β-glucosidases are inhibited by high glucose concentrations, which turns it a limiting step for industrial production. Thus, β-glucosidases have been targeted by several studies aiming to understand the mechanism of glucose tolerance, pH and thermal resistance for constructing more efficient enzymes. In this paper, we present a database of β-glucosidase structures, called Glutantβase. Our database includes 3842 GH1 β-glucosidase sequences collected from UniProt. We modeled the sequences by comparison and predicted important features in the 3D-structure of each enzyme. Glutantβase provides information about catalytic and conserved amino acids, residues of the coevolution network, protein secondary structure, and residues located in the channel that guides to the active site. We also analyzed the impact of beneficial mutations reported in the literature, predicted in analogous positions, for similar enzymes. We suggested these mutations based on six previously described mutants that showed high catalytic activity, glucose tolerance, or thermostability (A404V, E96K, H184F, H228T, L441F, and V174C). Then, we used molecular docking to verify the impact of the suggested mutations in the affinity of protein and ligands (substrate and product). Our results suggest that only mutations based on the H228T mutant can reduce the affinity for glucose (product) and increase affinity for cellobiose (substrate), which indicates an increment in the resistance to product inhibition and agrees with computational and experimental results previously reported in the literature. More resistant β-glucosidases are essential to saccharification in industrial applications. However, thermostable and glucose-tolerant β-glucosidases are rare, and their glucose tolerance mechanisms appear to be related to multiple and complex factors. We gather here, a set of information, and made predictions aiming to provide a tool for supporting the rational design of more efficient β-glucosidases. We hope that Glutantβase can help improve second-generation biofuel production. Glutantβase is available at http://bioinfo.dcc.ufmg.br/glutantbase .
Collapse
Affiliation(s)
- Diego Mariano
- Laboratory of Bioinformatics and Systems. Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil.
| | - Naiara Pantuza
- Laboratory of Bioinformatics and Systems. Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Lucianna H Santos
- Laboratory of Bioinformatics and Systems. Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Rafael E O Rocha
- Laboratory of Bioinformatics and Systems. Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Leonardo H F de Lima
- Laboratory of Molecular Modelling and Bioinformatics (LAMMB), Department of Physical and Biological Sciences, Universidade Federal de São João Del-Rei, Campus Sete Lagoas, Sete Lagoas, 35701-970, Brazil
| | - Lucas Bleicher
- Protein Computational Biology Laboratory, Department of Biochemistry and Immunology, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil
| | - Raquel Cardoso de Melo-Minardi
- Laboratory of Bioinformatics and Systems. Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte, 31270-901, Brazil.
| |
Collapse
|
10
|
Interaction specificity of clustered protocadherins inferred from sequence covariation and structural analysis. Proc Natl Acad Sci U S A 2019; 116:17825-17830. [PMID: 31431536 DOI: 10.1073/pnas.1821063116] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Clustered protocadherins, a large family of paralogous proteins that play important roles in neuronal development, provide an important case study of interaction specificity in a large eukaryotic protein family. A mammalian genome has more than 50 clustered protocadherin isoforms, which have remarkable homophilic specificity for interactions between cellular surfaces. A large antiparallel dimer interface formed by the first 4 extracellular cadherin (EC) domains controls this interaction. To understand how specificity is achieved between the numerous paralogs, we used a combination of structural and computational approaches. Molecular dynamics simulations revealed that individual EC interactions are weak and undergo binding and unbinding events, but together they form a stable complex through polyvalency. Strongly evolutionarily coupled residue pairs interacted more frequently in our simulations, suggesting that sequence coevolution can inform the frequency of interaction and biochemical nature of a residue interaction. With these simulations and sequence coevolution, we generated a statistical model of interaction energy for the clustered protocadherin family that measures the contributions of all amino acid pairs at the interface. Our interaction energy model assesses specificity for all possible pairs of isoforms, recapitulating known pairings and predicting the effects of experimental changes in isoform specificity that are consistent with literature results. Our results show that sequence coevolution can be used to understand specificity determinants in a protein family and prioritize interface amino acid substitutions to reprogram specific protein-protein interactions.
Collapse
|
11
|
Astl L, Verkhivker GM. Data-driven computational analysis of allosteric proteins by exploring protein dynamics, residue coevolution and residue interaction networks. Biochim Biophys Acta Gen Subj 2019:S0304-4165(19)30179-5. [PMID: 31330173 DOI: 10.1016/j.bbagen.2019.07.008] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2019] [Revised: 07/15/2019] [Accepted: 07/17/2019] [Indexed: 02/07/2023]
Abstract
BACKGROUND Computational studies of allosteric interactions have witnessed a recent renaissance fueled by the growing interest in modeling of the complex molecular assemblies and biological networks. Allosteric interactions in protein structures allow for molecular communication in signal transduction networks. METHODS In this work, we performed a large scale comprehensive and multi-faceted analysis of >300 diverse allosteric proteins and complexes with allosteric modulators. By modeling and exploring coarse-grained dynamics, residue coevolution, and residue interaction networks for allosteric proteins, we have determined unifying molecular signatures shared by allosteric systems. RESULTS The results of this study have suggested that allosteric inhibitors and allosteric activators may differentially affect global dynamics and network organization of protein systems, leading to diverse allosteric mechanisms. By using structural and functional data on protein kinases, we present a detailed case study that that included atomic-level analysis of coevolutionary networks in kinases bound with allosteric inhibitors and activators. CONCLUSIONS We have found that coevolutionary networks can form direct communication pathways connecting functional regions and can recapitulate key regulatory sites and interactions responsible for allosteric signaling in the studied protein systems. The results of this computational investigation are compared with the experimental studies and reveal molecular signatures of known regulatory hotspots in protein kinases. GENERAL SIGNIFICANCE This study has shown that allosteric inhibitors and allosteric activators can have a different effect on residue interaction networks and can exploit distinct regulatory mechanisms, which could open up opportunities for probing allostery and new drug combinations with broad range of activities.
Collapse
Affiliation(s)
- Lindy Astl
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA 92618, United States of America
| | - Gennady M Verkhivker
- Department of Biomedical and Pharmaceutical Sciences, Chapman University School of Pharmacy, Irvine, CA 92618, United States of America; Department of Pharmacology, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, 9500 Gilman Drive, La Jolla, CA 92093, United States of America.
| |
Collapse
|
12
|
Haldane A, Flynn WF, He P, Levy RM. Coevolutionary Landscape of Kinase Family Proteins: Sequence Probabilities and Functional Motifs. Biophys J 2019; 114:21-31. [PMID: 29320688 DOI: 10.1016/j.bpj.2017.10.028] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Revised: 09/11/2017] [Accepted: 10/17/2017] [Indexed: 01/25/2023] Open
Abstract
The protein kinase catalytic domain is one of the most abundant domains across all branches of life. Although kinases share a common core function of phosphoryl-transfer, they also have wide functional diversity and play varied roles in cell signaling networks, and for this reason are implicated in a number of human diseases. This functional diversity is primarily achieved through sequence variation, and uncovering the sequence-function relationships for the kinase family is a major challenge. In this study we use a statistical inference technique inspired by statistical physics, which builds a coevolutionary "Potts" Hamiltonian model of sequence variation in a protein family. We show how this model has sufficient power to predict the probability of specific subsequences in the highly diverged kinase family, which we verify by comparing the model's predictions with experimental observations in the Uniprot database. We show that the pairwise (residue-residue) interaction terms of the statistical model are necessary and sufficient to capture higher-than-pairwise mutation patterns of natural kinase sequences. We observe that previously identified functional sets of residues have much stronger correlated interaction scores than are typical.
Collapse
Affiliation(s)
- Allan Haldane
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania
| | - William F Flynn
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania; Department of Physics and Astronomy, Rutgers, The State University of New Jersey, Piscataway, New Jersey
| | - Peng He
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania
| | - Ronald M Levy
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania.
| |
Collapse
|
13
|
Marchetti F, Capelli R, Rizzato F, Laio A, Colombo G. The Subtle Trade-Off between Evolutionary and Energetic Constraints in Protein-Protein Interactions. J Phys Chem Lett 2019; 10:1489-1497. [PMID: 30855965 DOI: 10.1021/acs.jpclett.9b00191] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Life machinery, although overwhelmingly complex, is rooted on a rather limited number of molecular processes. One of the most important is protein-protein interaction. Metabolic regulation, protein folding control, and cellular motility are examples of processes based on the fine-tuned interaction of several protein partners. The region on the protein surface devoted to the recognition of a specific partner is essential for the function of the protein and is, therefore, likely to be conserved during evolution. On the other hand, the physical chemistry of amino acids underlies the mechanism of interactions. Both evolutionary and energetic constraints can then be used to build scoring functions capable of recognizing interaction sites. Our working hypothesis is that residues within the interaction interface tend at the same time to be evolutionarily conserved (to preserve their function) and to provide little contribution to the internal stabilization of the structure of their cognate protein, to facilitate conformational adaptation to the partner. Here, we show that for some classes of protein partners (for example, those involved in signal transduction and in enzymes) evolutionary constraints play the key role in defining the interaction surface. In contrast, energetic constraints emerge as more important in protein partners involved in immune response, in inhibitor proteins, and in structural proteins. Our results indicate that a general-purpose scoring function for protein-protein interaction should not be agnostic of the biological function of the partners.
Collapse
Affiliation(s)
- Filippo Marchetti
- Istituto di Chimica del Riconoscimento Molecolare , CNR Via Mario Bianco 9 , 20131 Milano , Italy
- Dipartimento di Chimica , Università degli Studi di Milano , Via Venezian 21 , I-20133 Milano , Italy
| | - Riccardo Capelli
- INM-9/IAS-5 Computational Biomedicine , Forschungszentrum Jülich , Wilhelm-Johnen-Straße , D-54245 Jülich , Germany
| | - Francesca Rizzato
- SISSA, Scuola Internazionale Superiore Studi Avanzati , Via Bonomea 265 , I-34136 Trieste , Italy
| | - Alessandro Laio
- SISSA, Scuola Internazionale Superiore Studi Avanzati , Via Bonomea 265 , I-34136 Trieste , Italy
- ICTP, International Centre for Theoretical Physics , Strada Costiera 11 , I-34100 Trieste , Italy
| | - Giorgio Colombo
- Istituto di Chimica del Riconoscimento Molecolare , CNR Via Mario Bianco 9 , 20131 Milano , Italy
- Dipartimento di Chimica , Università di Pavia , V.le Taramelli 12 , 27100 Pavia , Italy
| |
Collapse
|
14
|
Haldane A, Levy RM. Influence of multiple-sequence-alignment depth on Potts statistical models of protein covariation. Phys Rev E 2019; 99:032405. [PMID: 30999494 PMCID: PMC6508952 DOI: 10.1103/physreve.99.032405] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Indexed: 02/02/2023]
Abstract
Potts statistical models have become a popular and promising way to analyze mutational covariation in protein multiple sequence alignments (MSAs) in order to understand protein structure, function, and fitness. But the statistical limitations of these models, which can have millions of parameters and are fit to MSAs of only thousands or hundreds of effective sequences using a procedure known as inverse Ising inference, are incompletely understood. In this work we predict how model quality degrades as a function of the number of sequences N, sequence length L, amino-acid alphabet size q, and the degree of conservation of the MSA, in different applications of the Potts models: in "fitness" predictions of individual protein sequences, in predictions of the effects of single-point mutations, in "double mutant cycle" predictions of epistasis, and in 3D contact prediction in protein structure. We show how as MSA depth N decreases an "overfitting" effect occurs such that sequences in the training MSA have overestimated fitness, and we predict the magnitude of this effect and discuss how regularization can help correct for it, using a regularization procedure motivated by statistical analysis of the effects of finite sampling. We find that as N decreases the quality of point-mutation effect predictions degrade least, fitness and epistasis predictions degrade more rapidly, and contact predictions are most affected. However, overfitting becomes negligible for MSA depths of more than a few thousand effective sequences, as often used in practice, and regularization becomes less necessary. We discuss the implications of these results for users of Potts covariation analysis.
Collapse
Affiliation(s)
- Allan Haldane
- Center for Biophysics and Computational Biology, Department of
Physics, and Institute for Computational Molecular Science, Temple
University, Philadelphia, Pennsylvania 19122
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology, Department of
Chemistry, and Institute for Computational Molecular Science, Temple
University, Philadelphia, Pennsylvania 19122
| |
Collapse
|
15
|
dos Santos RN, Khan S, Morcos F. Characterization of C-ring component assembly in flagellar motors from amino acid coevolution. ROYAL SOCIETY OPEN SCIENCE 2018; 5:171854. [PMID: 29892378 PMCID: PMC5990795 DOI: 10.1098/rsos.171854] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2017] [Accepted: 04/05/2018] [Indexed: 06/08/2023]
Abstract
Bacterial flagellar motility, an important virulence factor, is energized by a rotary motor localized within the flagellar basal body. The rotor module consists of a large framework (the C-ring), composed of the FliG, FliM and FliN proteins. FliN and FliM contacts the FliG torque ring to control the direction of flagellar rotation. We report that structure-based models constrained only by residue coevolution can recover the binding interface of atomic X-ray dimer complexes with remarkable accuracy (approx. 1 Å RMSD). We propose a model for FliM-FliN heterodimerization, which agrees accurately with homologous interfaces as well as in situ cross-linking experiments, and hence supports a proposed architecture for the lower portion of the C-ring. Furthermore, this approach allowed the identification of two discrete and interchangeable homodimerization interfaces between FliM middle domains that agree with experimental measurements and might be associated with C-ring directional switching dynamics triggered upon binding of CheY signal protein. Our findings provide structural details of complex formation at the C-ring that have been difficult to obtain with previous methodologies and clarify the architectural principle that underpins the ultra-sensitive allostery exhibited by this ring assembly that controls the clockwise or counterclockwise rotation of flagella.
Collapse
Affiliation(s)
- Ricardo Nascimento dos Santos
- Institute of Chemistry and Center for Computational Engineering and Science, University of Campinas, Campinas, SP, Brazil
| | - Shahid Khan
- Molecular Biology Consortium, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, USA
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX, USA
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX, USA
| |
Collapse
|
16
|
Nicoludis JM, Gaudet R. Applications of sequence coevolution in membrane protein biochemistry. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2018; 1860:895-908. [PMID: 28993150 PMCID: PMC5807202 DOI: 10.1016/j.bbamem.2017.10.004] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 09/28/2017] [Accepted: 10/02/2017] [Indexed: 12/22/2022]
Abstract
Recently, protein sequence coevolution analysis has matured into a predictive powerhouse for protein structure and function. Direct methods, which use global statistical models of sequence coevolution, have enabled the prediction of membrane and disordered protein structures, protein complex architectures, and the functional effects of mutations in proteins. The field of membrane protein biochemistry and structural biology has embraced these computational techniques, which provide functional and structural information in an otherwise experimentally-challenging field. Here we review recent applications of protein sequence coevolution analysis to membrane protein structure and function and highlight the promising directions and future obstacles in these fields. We provide insights and guidelines for membrane protein biochemists who wish to apply sequence coevolution analysis to a given experimental system.
Collapse
Affiliation(s)
- John M Nicoludis
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, United States
| | - Rachelle Gaudet
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, 02138, United States.
| |
Collapse
|
17
|
Cocco S, Feinauer C, Figliuzzi M, Monasson R, Weigt M. Inverse statistical physics of protein sequences: a key issues review. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2018; 81:032601. [PMID: 29120346 DOI: 10.1088/1361-6633/aa9965] [Citation(s) in RCA: 126] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.
Collapse
Affiliation(s)
- Simona Cocco
- Laboratoire de Physique Statistique de l'Ecole Normale Supérieure-UMR 8549, CNRS and PSL Research, Sorbonne Universités UPMC, Paris, France
| | | | | | | | | |
Collapse
|
18
|
Schmidt M, Hamacher K. Three-body interactions improve contact prediction within direct-coupling analysis. Phys Rev E 2017; 96:052405. [PMID: 29347718 DOI: 10.1103/physreve.96.052405] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Indexed: 11/07/2022]
Abstract
The prediction of residue contacts in a protein solely from sequence information is a promising approach to computational structure prediction. Recent developments use statistical or information theoretic methods to extract contact information from a multiple sequence alignment. Despite good results, accuracy is limited due to usage of two-body interactions within a Potts model. In this paper we generalize this approach and propose a Hamiltonian with an additional three-body interaction term. We derive a mean-field approximation for inference of three-body couplings within a Potts model which is fast enough on modern computers. Finally, we show that our model has a higher accuracy in predicting residue contacts in comparison with the plain two-body-interaction model.
Collapse
Affiliation(s)
- Michael Schmidt
- Department of Physics, TU Darmstadt, Karolinenpl. 5, 64289 Darmstadt, Germany
| | - Kay Hamacher
- Department of Biology and Department of Computer Science and Department of Physics, TU Darmstadt, Karolinenpl. 5, 64289 Darmstadt, Germany
| |
Collapse
|