1
|
Kumar A, Singh NK, Ghosh D, Radhakrishna M. Understanding the role of hydrophobic patches in protein disaggregation. Phys Chem Chem Phys 2021; 23:12620-12629. [PMID: 34075973 DOI: 10.1039/d1cp00954k] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Protein folding is a very complex process and, so far, the mechanism of folding still intrigues the research community. Despite a large conformational space available (O(1047) for a 100 amino acid residue), most proteins fold into their native state within a very short time. While small proteins fold relatively fast (a few microseconds) large globular proteins may take as long as several milliseconds to fold. During the folding process, the protein synthesized in the ribosome is exposed to the crowded environment of the cell and is easily prone to misfolding and aggregation due to interactions with other proteins or biomacromolecules present within the cell. These large proteins, therefore, rely on chaperones for their folding and repair. Chaperones are known to have hydrophobic patchy domains that play a crucial role in shielding the protein against misfolding and disaggregation of aggregated proteins. In the current article, Monte Carlo simulations carried out in the framework of the hydrophobic-polar (H-P) lattice model indicate that hydrophobic patchy domains drastically reduce the inter-protein interactions and are efficient in disaggregating proteins. The effectiveness of the disaggregation depends on the size and distribution of these patches on the surface and also on the strength of the interaction between the protein and the surface. Further, our results indicate that when the patch is complementary to the exposed hydrophobic patch of the protein, protein disaggregation is accompanied by stabilization of the protein even relative to its bulk behavior due to favorable protein-surface interactions. We believe that these findings shed light on the role of the class of chaperones known as heat shock proteins (Hsps) on protein disaggregation and refolding.
Collapse
Affiliation(s)
- Avishek Kumar
- Discipline of Chemical Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gandhinagar, Gujarat-382355, India.
| | | | | | | |
Collapse
|
2
|
Funneled energy landscape unifies principles of protein binding and evolution. Proc Natl Acad Sci U S A 2020; 117:27218-27223. [PMID: 33067388 DOI: 10.1073/pnas.2013822117] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Most proteins have evolved to spontaneously fold into native structure and specifically bind with their partners for the purpose of fulfilling biological functions. According to Darwin, protein sequences evolve through random mutations, and only the fittest survives. The understanding of how the evolutionary selection sculpts the interaction patterns for both biomolecular folding and binding is still challenging. In this study, we incorporated the constraint of functional binding into the selection fitness based on the principle of minimal frustration for the underlying biomolecular interactions. Thermodynamic stability and kinetic accessibility were derived and quantified from a global funneled energy landscape that satisfies the requirements of both the folding into the stable structure and binding with the specific partner. The evolution proceeds via a bowl-like evolution energy landscape in the sequence space with a closed-ring attractor at the bottom. The sequence space is increasingly reduced until this ring attractor is reached. The molecular-interaction patterns responsible for folding and binding are identified from the evolved sequences, respectively. The residual positions participating in the interactions responsible for folding are highly conserved and maintain the hydrophobic core under additional evolutionary constraints of functional binding. The positions responsible for binding constitute a distributed network via coupling conservations that determine the specificity of binding with the partner. This work unifies the principles of protein binding and evolution under minimal frustration and sheds light on the evolutionary design of proteins for functions.
Collapse
|
3
|
Perry SL, Sing CE. 100th Anniversary of Macromolecular Science Viewpoint: Opportunities in the Physics of Sequence-Defined Polymers. ACS Macro Lett 2020; 9:216-225. [PMID: 35638672 DOI: 10.1021/acsmacrolett.0c00002] [Citation(s) in RCA: 76] [Impact Index Per Article: 19.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Polymer science has been driven by ever-increasing molecular complexity, as polymer synthesis expands an already-vast palette of chemical and architectural parameter space. Copolymers represent a key example, where simple homopolymers have given rise to random, alternating, gradient, and block copolymers. Polymer physics has provided the insight needed to explore this monomer sequence parameter space. The future of polymer science, however, must contend with further increases in monomer precision, as this class of macromolecules moves ever closer to the sequence-monodisperse polymers that are the workhorses of biology. The advent of sequence-defined polymers gives rise to opportunities for material design, with increasing levels of chemical information being incorporated into long-chain molecules; however, this also raises questions that polymer physics must address. What properties uniquely emerge from sequence-definition? Is this circumstance-dependent? How do we define and think about sequence dispersity? How do we think about a hierarchy of sequence effects? Are more sophisticated characterization methods, as well as theoretical and computational tools, needed to understand this class of macromolecules? The answers to these questions touch on many difficult scientific challenges, setting the stage for a rich future for sequence-defined polymers in polymer physics.
Collapse
Affiliation(s)
- Sarah L. Perry
- Department of Chemical Engineering, University of Massachusetts−Amherst, 686 North Pleasant Street, Amherst, Massachusetts 01003, United States
| | - Charles E. Sing
- Department of Chemical and Biomolecular Engineering, University of Illinois at Urbana−Champaign, 600 South Mathews Avenue Urbana, Illinois 61801, United States
| |
Collapse
|
4
|
Kumar A, Ghosh D, Radhakrishna M. Surface Patterning for Enhanced Protein Stability: Insights from Molecular Simulations. J Phys Chem B 2019; 123:8363-8369. [DOI: 10.1021/acs.jpcb.9b05663] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Avishek Kumar
- Department of Chemical Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gandhinagar, Gujarat 382355, India
| | - Deepshikha Ghosh
- Department of Chemical Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gandhinagar, Gujarat 382355, India
| | - Mithun Radhakrishna
- Department of Chemical Engineering, Indian Institute of Technology (IIT) Gandhinagar, Palaj, Gandhinagar, Gujarat 382355, India
| |
Collapse
|
5
|
Hu J, Lei W, Wang J, Chen HY, Xu JJ. Preservation of Protein Zwitterionic States in the Transition from Solution to Gas Phase Revealed by Sodium Adduction Mass Spectrometry. Anal Chem 2019; 91:7858-7863. [PMID: 31134800 DOI: 10.1021/acs.analchem.9b01602] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The structural characterization of proteins and their interaction network mapping in the gas phase highlights the need to preserve their most nativelike conformers in the transition from the solution to gas phase. Zwitterionic interactions in a protein are weak bonds between oppositely charged residues, which make an important contribution to protein stability. However, it is still not clear whether the native zwitterionic states of proteins can be retained or not when it is transferred from the solution to gas phase. Using the nonspecific Na+ adduction as a novel signature, here we show that the zwitterionic states of proteins can be preserved when a moderated droplet desolvation condition (temperature <30 °C) is used in native electrospray ionization mass spectrometry. The very low-level nonspecific metal adduction to proteins under such conditions also enables rapid and direct determination of the binding states of metal-binding proteins and sensitive detection of proteins from solutions containing highly concentrated involatile salts (e.g., 50 mM NaCl). We believe that our findings can be instructive for performing mass spectrometric analysis of proteins and useful for protein ions desalting which simply involves altering the temperature and flow rate of drying gas in the desolvation region.
Collapse
Affiliation(s)
- Jun Hu
- State Key Laboratory of Analytical Chemistry for Life Science, School of Chemistry and Chemical Engineering , Nanjing University , Nanjing 210023 , P. R. China
| | - Wen Lei
- State Key Laboratory of Analytical Chemistry for Life Science, School of Chemistry and Chemical Engineering , Nanjing University , Nanjing 210023 , P. R. China
| | - Jiang Wang
- State Key Laboratory of Analytical Chemistry for Life Science, School of Chemistry and Chemical Engineering , Nanjing University , Nanjing 210023 , P. R. China
| | - Hong-Yuan Chen
- State Key Laboratory of Analytical Chemistry for Life Science, School of Chemistry and Chemical Engineering , Nanjing University , Nanjing 210023 , P. R. China
| | - Jing-Juan Xu
- State Key Laboratory of Analytical Chemistry for Life Science, School of Chemistry and Chemical Engineering , Nanjing University , Nanjing 210023 , P. R. China
| |
Collapse
|
6
|
Yan Z, Wang J. Superfunneled Energy Landscape of Protein Evolution Unifies the Principles of Protein Evolution, Folding, and Design. PHYSICAL REVIEW LETTERS 2019; 122:018103. [PMID: 31012725 DOI: 10.1103/physrevlett.122.018103] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2017] [Revised: 11/08/2018] [Indexed: 06/09/2023]
Abstract
Evolution is essential for shaping the biological functions. Darwin proposed the selection as the driving force for evolution upon mutations. While mutations are clear, the quantification of the selection force is still challenging. In this study, we identified and quantified both thermodynamic stability and kinetic accessibility as the selection forces for protein evolution. The protein evolution can be viewed and quantified as a trajectory moving along a superfunneled energy landscape with a line attractor at the bottom. The resulting evolved sequences and structures show strong protein characteristics including the hydrophobic core, high designability, and fast folding. The evolution principle uncovered here is validated on real proteins and sheds light on the protein design.
Collapse
Affiliation(s)
- Zhiqiang Yan
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, Jilin 130022, China
| | - Jin Wang
- State Key Laboratory of Electroanalytical Chemistry, Changchun Institute of Applied Chemistry, Chinese Academy of Sciences, Changchun, Jilin 130022, China
- Department of Chemistry & Physics, State University of New York at Stony Brook, Stony Brook, New York 11790, USA
| |
Collapse
|
7
|
Tripathi S, Waxham MN, Cheung MS, Liu Y. Lessons in Protein Design from Combined Evolution and Conformational Dynamics. Sci Rep 2015; 5:14259. [PMID: 26388515 PMCID: PMC4585694 DOI: 10.1038/srep14259] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Accepted: 08/21/2015] [Indexed: 11/09/2022] Open
Abstract
Protein-protein interactions play important roles in the control of every cellular process. How natural selection has optimized protein design to produce molecules capable of binding to many partner proteins is a fascinating problem but not well understood. Here, we performed a combinatorial analysis of protein sequence evolution and conformational dynamics to study how calmodulin (CaM), which plays essential roles in calcium signaling pathways, has adapted to bind to a large number of partner proteins. We discovered that amino acid residues in CaM can be partitioned into unique classes according to their degree of evolutionary conservation and local stability. Holistically, categorization of CaM residues into these classes reveals enriched physico-chemical interactions required for binding to diverse targets, balanced against the need to maintain the folding and structural modularity of CaM to achieve its overall function. The sequence-structure-function relationship of CaM provides a concrete example of the general principle of protein design. We have demonstrated the synergy between the fields of molecular evolution and protein biophysics and created a generalizable framework broadly applicable to the study of protein-protein interactions.
Collapse
Affiliation(s)
- Swarnendu Tripathi
- Department of Physics, University of Houston, Houston, TX.,Center for Theoretical Biological Physics, Rice University, Houston, TX
| | - M Neal Waxham
- Department of Neurobiology and Anatomy, University of Texas, Health Science Center, Houston, TX
| | - Margaret S Cheung
- Department of Physics, University of Houston, Houston, TX.,Center for Theoretical Biological Physics, Rice University, Houston, TX
| | - Yin Liu
- Department of Neurobiology and Anatomy, University of Texas, Health Science Center, Houston, TX
| |
Collapse
|
8
|
Importance of asparagine on the conformational stability and chemical reactivity of selected anti-inflammatory peptides. Chem Phys 2015. [DOI: 10.1016/j.chemphys.2015.06.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
|
9
|
Schennach M, Breuker K. Proteins with Highly Similar Native Folds Can Show Vastly Dissimilar Folding Behavior When Desolvated. Angew Chem Int Ed Engl 2013. [DOI: 10.1002/ange.201306838] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
10
|
Schennach M, Breuker K. Proteins with highly similar native folds can show vastly dissimilar folding behavior when desolvated. Angew Chem Int Ed Engl 2013; 53:164-8. [PMID: 24259450 PMCID: PMC4065370 DOI: 10.1002/anie.201306838] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2013] [Revised: 10/07/2013] [Indexed: 01/08/2023]
Abstract
Proteins can be exposed to vastly different environments such as the cytosol or membranes, but the delicate balance between external factors and intrinsic determinants of protein structure, stability, and folding is only poorly understood. Here we used electron capture dissociation to study horse and tuna heart Cytochromes c in the complete absence of solvent. The significantly different stability of their highly similar native folds after transfer into the gas phase, and their strikingly different folding behavior in the gas phase, can be rationalized on the basis of electrostatic interactions such as salt bridges. In the absence of hydrophobic bonding, protein folding is far slower and more complex than in solution.
Collapse
Affiliation(s)
- Moritz Schennach
- Institut für Organische Chemie and Center for Molecular Biosciences Innsbruck (CMBI), Universität Innsbruck, Innrain 80-82, 6020 Innsbruck (Austria) http://www.bioms-breuker.at
| | | |
Collapse
|
11
|
Li Z, Yang Y, Zhan J, Dai L, Zhou Y. Energy functions in de novo protein design: current challenges and future prospects. Annu Rev Biophys 2013; 42:315-35. [PMID: 23451890 DOI: 10.1146/annurev-biophys-083012-130315] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
In the past decade, a concerted effort to successfully capture specific tertiary packing interactions produced specific three-dimensional structures for many de novo designed proteins that are validated by nuclear magnetic resonance and/or X-ray crystallographic techniques. However, the success rate of computational design remains low. In this review, we provide an overview of experimentally validated, de novo designed proteins and compare four available programs, RosettaDesign, EGAD, Liang-Grishin, and RosettaDesign-SR, by assessing designed sequences computationally. Computational assessment includes the recovery of native sequences, the calculation of sizes of hydrophobic patches and total solvent-accessible surface area, and the prediction of structural properties such as intrinsic disorder, secondary structures, and three-dimensional structures. This computational assessment, together with a recent community-wide experiment in assessing scoring functions for interface design, suggests that the next-generation protein-design scoring function will come from the right balance of complementary interaction terms. Such balance may be found when more negative experimental data become available as part of a training set.
Collapse
Affiliation(s)
- Zhixiu Li
- School of Informatics, Indiana University-Purdue University, Indianapolis, Indiana 46202, USA
| | | | | | | | | |
Collapse
|
12
|
Murnen HK, Khokhlov AR, Khalatur PG, Segalman RA, Zuckermann RN. Impact of Hydrophobic Sequence Patterning on the Coil-to-Globule Transition of Protein-like Polymers. Macromolecules 2012. [DOI: 10.1021/ma300707t] [Citation(s) in RCA: 67] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Affiliation(s)
- Hannah K. Murnen
- Department of Chemical and Biomolecular
Engineering, University of California, Berkeley, Berkeley, California 94720, United States
| | - Alexei R. Khokhlov
- Department of Physics, Moscow State University, Moscow, Russia
- Department of Advanced Energy
Related Nanomaterials, Ulm University,
Ulm D-89069, Germany
| | - Pavel G. Khalatur
- Department of Advanced Energy
Related Nanomaterials, Ulm University,
Ulm D-89069, Germany
| | - Rachel A. Segalman
- Department of Chemical and Biomolecular
Engineering, University of California, Berkeley, Berkeley, California 94720, United States
- Materials
Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California
94720, United States
| | - Ronald N. Zuckermann
- Materials
Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California
94720, United States
- The
Molecular Foundry, Lawrence Berkeley National laboratory, Berkeley, California
94720, United States
| |
Collapse
|
13
|
Jacak R, Leaver-Fay A, Kuhlman B. Computational protein design with explicit consideration of surface hydrophobic patches. Proteins 2011; 80:825-38. [PMID: 22223219 DOI: 10.1002/prot.23241] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2011] [Revised: 10/16/2011] [Accepted: 10/29/2011] [Indexed: 11/09/2022]
Abstract
De novo protein design requires the identification of amino-acid sequences that favor the target-folded conformation and are soluble in water. One strategy for promoting solubility is to disallow hydrophobic residues on the protein surface during design. However, naturally occurring proteins often have hydrophobic amino acids on their surface that contribute to protein stability via the partial burial of hydrophobic surface area or play a key role in the formation of protein-protein interactions. A less restrictive approach for surface design that is used by the modeling program Rosetta is to parameterize the energy function so that the number of hydrophobic amino acids designed on the protein surface is similar to what is observed in naturally occurring monomeric proteins. Previous studies with Rosetta have shown that this limits surface hydrophobics to the naturally occurring frequency (∼28%), but that it does not prevent the formation of hydrophobic patches that are considerably larger than those observed in naturally occurring proteins. Here, we describe a new score term that explicitly detects and penalizes the formation of hydrophobic patches during computational protein design. With the new term, we are able to design protein surfaces that include hydrophobic amino acids at naturally occurring frequencies, but do not have large hydrophobic patches. By adjusting the strength of the new score term, the emphasis of surface redesigns can be switched between maintaining solubility and maximizing folding free energy.
Collapse
Affiliation(s)
- Ron Jacak
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, North Carolina 27599, USA
| | | | | |
Collapse
|
14
|
Shukla P. Thermodynamics of protein folding: a random matrix formulation. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2010; 22:415106. [PMID: 21386596 DOI: 10.1088/0953-8984/22/41/415106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The process of protein folding from an unfolded state to a biologically active, folded conformation is governed by many parameters, e.g. the sequence of amino acids, intermolecular interactions, the solvent, temperature and chaperon molecules. Our study, based on random matrix modeling of the interactions, shows, however, that the evolution of the statistical measures, e.g. Gibbs free energy, heat capacity, and entropy, is single parametric. The information can explain the selection of specific folding pathways from an infinite number of possible ways as well as other folding characteristics observed in computer simulation studies.
Collapse
Affiliation(s)
- Pragya Shukla
- Department of Physics, Indian Institute of Technology, Kharagpur, India
| |
Collapse
|
15
|
Chang S, Gong X, Jiao X, Li C, Chen W, Wang C. Network analysis of protein-protein interaction. ACTA ACUST UNITED AC 2010. [DOI: 10.1007/s11434-009-0742-x] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
16
|
Kleinman CL, Rodrigue N, Lartillot N, Philippe H. Statistical potentials for improved structurally constrained evolutionary models. Mol Biol Evol 2010; 27:1546-60. [PMID: 20159780 DOI: 10.1093/molbev/msq047] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Assessing the influence of three-dimensional protein structure on sequence evolution is a difficult task, mainly because of the assumption of independence between sites required by probabilistic phylogenetic methods. Recently, models that include an explicit treatment of protein structure and site interdependencies have been developed: a statistical potential (an energy-like scoring system for sequence-structure compatibility) is used to evaluate the probability of fixation of a given mutation, assuming a coarse-grained protein structure that is constant through evolution. Yet, due to the novelty of these models and the small degree of overlap between the fields of structural and evolutionary biology, only simple representations of protein structure have been used so far. In this work, we present new forms of statistical potentials using a probabilistic framework recently developed for evolutionary studies. Terms related to pairwise distance interactions, torsion angles, solvent accessibility, and flexibility of the residues are included in the potentials, so as to study the effects of the main factors known to influence protein structure. The new potentials, with a more detailed representation of the protein structure, yield a better fit than the previously used scoring functions, with pairwise interactions contributing to more than half of this improvement. In a phylogenetic context, however, the structurally constrained models are still outperformed by some of the available site-independent models in terms of fit, possibly indicating that alternatives to coarse-grained statistical potentials should be explored in order to better model structural constraints.
Collapse
Affiliation(s)
- Claudia L Kleinman
- Département de Biochimie, Centre Robert Cedergren, Université de Montréal, Montreal, Quebec, Canada.
| | | | | | | |
Collapse
|
17
|
Bonnard C, Kleinman CL, Rodrigue N, Lartillot N. Fast optimization of statistical potentials for structurally constrained phylogenetic models. BMC Evol Biol 2009; 9:227. [PMID: 19740424 PMCID: PMC2754480 DOI: 10.1186/1471-2148-9-227] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2009] [Accepted: 09/09/2009] [Indexed: 11/16/2022] Open
Abstract
Background Statistical approaches for protein design are relevant in the field of molecular evolutionary studies. In recent years, new, so-called structurally constrained (SC) models of protein-coding sequence evolution have been proposed, which use statistical potentials to assess sequence-structure compatibility. In a previous work, we defined a statistical framework for optimizing knowledge-based potentials especially suited to SC models. Our method used the maximum likelihood principle and provided what we call the joint potentials. However, the method required numerical estimations by the use of computationally heavy Markov Chain Monte Carlo sampling algorithms. Results Here, we develop an alternative optimization procedure, based on a leave-one-out argument coupled to fast gradient descent algorithms. We assess that the leave-one-out potential yields very similar results to the joint approach developed previously, both in terms of the resulting potential parameters, and by Bayes factor evaluation in a phylogenetic context. On the other hand, the leave-one-out approach results in a considerable computational benefit (up to a 1,000 fold decrease in computational time for the optimization procedure). Conclusion Due to its computational speed, the optimization method we propose offers an attractive alternative for the design and empirical evaluation of alternative forms of potentials, using large data sets and high-dimensional parameterizations.
Collapse
Affiliation(s)
- Cécile Bonnard
- Département d'Informatique, LIRMM, 161 rue Ada, 34392 Montpellier Cedex 5, France.
| | | | | | | |
Collapse
|
18
|
Rodrigue N, Kleinman CL, Philippe H, Lartillot N. Computational Methods for Evaluating Phylogenetic Models of Coding Sequence Evolution with Dependence between Codons. Mol Biol Evol 2009; 26:1663-76. [DOI: 10.1093/molbev/msp078] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
19
|
Khodabakhshi AH, Maňuch J, Rafiey A, Gupta A. Stable Structure-Approximating Inverse Protein Folding in 2D Hydrophobic-Polar-Cysteine (HPC) Model. J Comput Biol 2009; 16:19-30. [DOI: 10.1089/cmb.2008.0096] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
| | - Ján Maňuch
- School of Computing Science, Simon Fraser University, Burnaby, Canada
| | - Arash Rafiey
- School of Computing Science, Simon Fraser University, Burnaby, Canada
| | - Arvind Gupta
- School of Computing Science, Simon Fraser University, Burnaby, Canada
| |
Collapse
|
20
|
Tsong TY, Hu CK, Wu MC. Hydrophobic condensation and modular assembly model of protein folding. Biosystems 2008; 93:78-89. [DOI: 10.1016/j.biosystems.2008.04.007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2008] [Revised: 04/01/2008] [Accepted: 04/07/2008] [Indexed: 11/26/2022]
|
21
|
Chang S, Jiao X, Li CH, Gong XQ, Chen WZ, Wang CX. Amino acid network and its scoring application in protein–protein docking. Biophys Chem 2008; 134:111-8. [DOI: 10.1016/j.bpc.2007.12.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2007] [Revised: 12/04/2007] [Accepted: 12/11/2007] [Indexed: 11/30/2022]
|
22
|
Rodrigue N, Philippe H, Lartillot N. Exploring Fast Computational Strategies for Probabilistic Phylogenetic Analysis. Syst Biol 2007; 56:711-26. [PMID: 17849326 DOI: 10.1080/10635150701611258] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
In recent years, the advent of Markov chain Monte Carlo (MCMC) techniques, coupled with modern computational capabilities, has enabled the study of evolutionary models without a closed form solution of the likelihood function. However, current Bayesian MCMC applications can incur significant computational costs, as they are based on a full sampling from the posterior probability distribution of the parameters of interest. Here, we draw attention as to how MCMC techniques can be embedded within normal approximation strategies for more economical statistical computation. The overall procedure is based on an estimate of the first and second moments of the likelihood function, as well as a maximum likelihood estimate. Through examples, we review several MCMC-based methods used in the statistical literature for such estimation, applying the approaches to constructing posterior distributions under non-analytical evolutionary models relaxing the assumptions of rate homogeneity, and of independence between sites. Finally, we use the procedures for conducting Bayesian model selection, based on Laplace approximations of Bayes factors, which we find to be accurate and computationally advantageous. Altogether, the methods we expound here, as well as other related approaches from the statistical literature, should prove useful when investigating increasingly complex descriptions of molecular evolution, alleviating some of the difficulties associated with nonanalytical models.
Collapse
Affiliation(s)
- Nicolas Rodrigue
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Québec, Canada.
| | | | | |
Collapse
|
23
|
Meyerguz L, Kleinberg J, Elber R. The network of sequence flow between protein structures. Proc Natl Acad Sci U S A 2007; 104:11627-32. [PMID: 17596339 PMCID: PMC1913895 DOI: 10.1073/pnas.0701393104] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2007] [Indexed: 12/24/2022] Open
Abstract
Sequence-structure relationships in proteins are highly asymmetric because many sequences fold into relatively few structures. What is the number of sequences that fold into a particular protein structure? Is it possible to switch between stable protein folds by point mutations? To address these questions, we compute a directed graph of sequences and structures of proteins, which is based on 2,060 experimentally determined protein shapes from the Protein Data Bank. The directed graph is highly connected at native energies with "sinks" that attract many sequences from other folds. The sinks are rich in beta-sheets. The number of sequences that transition between folds is significantly smaller than the number of sequences retained by their fold. The sequence flow into a particular protein shape from other proteins correlates with the number of sequences that matches this shape in empirically determined genomes. Properties of strongly connected components of the graph are correlated with protein length and secondary structure.
Collapse
Affiliation(s)
- Leonid Meyerguz
- Department of Computer Science, Cornell University, Ithaca, NY 14853
| | - Jon Kleinberg
- Department of Computer Science, Cornell University, Ithaca, NY 14853
| | - Ron Elber
- Department of Computer Science, Cornell University, Ithaca, NY 14853
| |
Collapse
|
24
|
Jiao X, Chang S, Li CH, Chen WZ, Wang CX. Construction and application of the weighted amino acid network based on energy. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2007; 75:051903. [PMID: 17677094 DOI: 10.1103/physreve.75.051903] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/26/2007] [Indexed: 05/16/2023]
Abstract
A method is proposed to construct the weighted amino acid network. The weight of the link is based on the contact energy between residues. For the 197 proteins with low homology, the "small-world" property was studied based on this method. Additionally, analyses were carried out for the statistic characteristics of the network parameters, the influence of the weight on the network parameters, the network parameter difference of amino acids, and the links between the hydrophobic and hydrophilic residues. Using this method, we studied the network parameter change for the protein chymotrypsin inhibitor 2 (CI2) on its high-temperature unfolding pathway. It is found that the unfolding of the protein is mainly exhibited as the derogation of the hydrophobic core and the shortest path length rise in the unfolding process. This work is helpful for studies of protein folding and the relationship between structure and function using complex network theory.
Collapse
Affiliation(s)
- Xiong Jiao
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, China
| | | | | | | | | |
Collapse
|
25
|
Lednev IK, Ermolenkov VV, Higashiya S, Popova LA, Topilina NI, Welch JT. Reversible thermal denaturation of a 60-kDa genetically engineered beta-sheet polypeptide. Biophys J 2006; 91:3805-18. [PMID: 16891363 PMCID: PMC1630459 DOI: 10.1529/biophysj.106.082792] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A de novo 687-amino-acid residue polypeptide with a regular 32-amino-acid repeat sequence, (GA)(3)GY(GA)(3)GE(GA)(3)GH(GA)(3)GK, forms large beta-sheet assemblages that exhibit remarkable folding properties and, as well, form fibrillar structures. This construct is an excellent tool to explore the details of beta-sheet formation yielding intimate folding information that is otherwise difficult to obtain and may inform folding studies of naturally occurring materials. The polypeptide assumes a fully folded antiparallel beta-sheet/turn structure at room temperature, and yet is completely and reversibly denatured at 125 degrees C, adopting a predominant polyproline II conformation. Deep ultraviolet Raman spectroscopy indicated that melting/refolding occurred without any spectroscopically distinct intermediates, yet the relaxation kinetics depend on the initial polypeptide state, as would be indicative of a non-two-state process. Thermal denaturation and refolding on cooling appeared to be monoexponential with characteristic times of approximately 1 and approximately 60 min, respectively, indicating no detectable formation of hairpin-type nuclei in the millisecond timescale that could be attributed to nonlocal "nonnative" interactions. The polypeptide folding dynamics agree with a general property of beta-sheet proteins, i.e., initial collapse precedes secondary structure formation. The observed folding is much faster than expected for a protein of this size and could be attributed to a less frustrated free-energy landscape funnel for folding. The polypeptide sequence suggests an important balance between the absence of strong nonnative contacts (salt bridges or hydrophobic collapse) and limited repulsion of charged side chains.
Collapse
Affiliation(s)
- Igor K Lednev
- Department of Chemistry, University at Albany, State University of New York, Albany, New York, USA.
| | | | | | | | | | | |
Collapse
|
26
|
Kleinman CL, Rodrigue N, Bonnard C, Philippe H, Lartillot N. A maximum likelihood framework for protein design. BMC Bioinformatics 2006; 7:326. [PMID: 16808841 PMCID: PMC1570151 DOI: 10.1186/1471-2105-7-326] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2006] [Accepted: 06/29/2006] [Indexed: 11/21/2022] Open
Abstract
Background The aim of protein design is to predict amino-acid sequences compatible with a given target structure. Traditionally envisioned as a purely thermodynamic question, this problem can also be understood in a wider context, where additional constraints are captured by learning the sequence patterns displayed by natural proteins of known conformation. In this latter perspective, however, we still need a theoretical formalization of the question, leading to general and efficient learning methods, and allowing for the selection of fast and accurate objective functions quantifying sequence/structure compatibility. Results We propose a formulation of the protein design problem in terms of model-based statistical inference. Our framework uses the maximum likelihood principle to optimize the unknown parameters of a statistical potential, which we call an inverse potential to contrast with classical potentials used for structure prediction. We propose an implementation based on Markov chain Monte Carlo, in which the likelihood is maximized by gradient descent and is numerically estimated by thermodynamic integration. The fit of the models is evaluated by cross-validation. We apply this to a simple pairwise contact potential, supplemented with a solvent-accessibility term, and show that the resulting models have a better predictive power than currently available pairwise potentials. Furthermore, the model comparison method presented here allows one to measure the relative contribution of each component of the potential, and to choose the optimal number of accessibility classes, which turns out to be much higher than classically considered. Conclusion Altogether, this reformulation makes it possible to test a wide diversity of models, using different forms of potentials, or accounting for other factors than just the constraint of thermodynamic stability. Ultimately, such model-based statistical analyses may help to understand the forces shaping protein sequences, and driving their evolution.
Collapse
Affiliation(s)
- Claudia L Kleinman
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec, Canada
| | - Nicolas Rodrigue
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec, Canada
| | - Cécile Bonnard
- Laboratoire d'lnformatique, de Robotique et de Microélectronique de Montpellier, UMR 5506, CNRS-Université de Montpellier 2, 161, rue Ada, 34392 Montpellier Cedex 5, France
| | - Hervé Philippe
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec, Canada
| | - Nicolas Lartillot
- Laboratoire d'lnformatique, de Robotique et de Microélectronique de Montpellier, UMR 5506, CNRS-Université de Montpellier 2, 161, rue Ada, 34392 Montpellier Cedex 5, France
| |
Collapse
|
27
|
Rodrigue N, Philippe H, Lartillot N. Assessing site-interdependent phylogenetic models of sequence evolution. Mol Biol Evol 2006; 23:1762-75. [PMID: 16787998 DOI: 10.1093/molbev/msl041] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
In recent works, methods have been proposed for applying phylogenetic models that allow for a general interdependence between the amino acid positions of a protein. As of yet, such models have focused on site interdependencies resulting from sequence-structure compatibility constraints, using simplified structural representations in combination with a set of statistical potentials. This structural compatibility criterion is meant as a proxy for sequence fitness, and the methods developed thus far can incorporate different site-interdependent fitness proxies based on other measurements. However, no methods have been proposed for comparing and evaluating the adequacy of alternative fitness proxies in this context, or for more general comparisons with canonical models of protein evolution. In the present work, we apply Bayesian methods of model selection-based on numerical calculations of marginal likelihoods and posterior predictive checks-to evaluate models encompassing the site-interdependent framework. Our application of these methods indicates that considering site-interdependencies, as done here, leads to an improved model fit for all data sets studied. Yet, we find that the use of pairwise contact potentials alone does not suitably account for across-site rate heterogeneity or amino acid exchange propensities; for such complexities, site-independent treatments are still called for. The most favored models combine the use of statistical potentials with a suitably rich site-independent model. Altogether, the methodology employed here should allow for a more rigorous and systematic exploration of different ways of modeling explicit structural constraints, or any other site-interdependent criterion, while best exploiting the richness of previously proposed models.
Collapse
Affiliation(s)
- Nicolas Rodrigue
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec, Canada.
| | | | | |
Collapse
|
28
|
Protein Folding Simulations: Combining Coarse-grained Models and All-atom Molecular Dynamics. Theor Chem Acc 2005. [DOI: 10.1007/s00214-005-0026-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
29
|
Gupta A, Manuch J, Stacho L. Structure-Approximating Inverse Protein Folding Problem in the 2D HP Model. J Comput Biol 2005; 12:1328-45. [PMID: 16379538 DOI: 10.1089/cmb.2005.12.1328] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The inverse protein folding problem is that of designing an amino acid sequence which has a particular native protein fold. This problem arises in drug design where a particular structure is necessary to ensure proper protein-protein interactions. In this paper, we show that in the 2D HP model of Dill it is possible to solve this problem for a broad class of structures. These structures can be used to closely approximate any given structure. One of the most important properties of a good protein (in drug design) is its stability--the aptitude not to fold simultaneously into other structures. We show that for a number of basic structures, our sequences have a unique fold.
Collapse
Affiliation(s)
- Arvind Gupta
- School of Computing Science, Simon Fraser University, Burnaby BC, Canada
| | | | | |
Collapse
|
30
|
Shell MS, Debenedetti PG, Panagiotopoulos AZ. Computational characterization of the sequence landscape in simple protein alphabets. Proteins 2005; 62:232-43. [PMID: 16284961 DOI: 10.1002/prot.20714] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
We characterize the "sequence landscapes" in several simple, heteropolymer models of proteins by examining their mutation properties. Using an efficient flat-histogram Monte Carlo search method, our approach involves determining the distribution in energy of all sequences of a given length when threaded through a common backbone. These calculations are performed for a number of Protein Data Bank structures using two variants of the 20-letter contact potential developed by Miyazawa and Jernigan [Miyazawa S, Jernigan WL. Macromolecules 1985;18:534], and the 2-monomer HP model of Lau and Dill [Lau KF, Dill KA. Macromolecules 1989;22:3986]. Our results indicate significant differences among the energy functions in terms of the "smoothness" of their landscapes. In particular, one of the Miyazawa-Jernigan contact potentials reveals unusual cooperative behavior among its species' interactions, resulting in what is essentially a set of phase transitions in sequence space. Our calculations suggest that model-specific features can have a profound effect on protein design algorithms, and our methods offer a number of ways by which sequence landscapes can be quantified.
Collapse
Affiliation(s)
- M Scott Shell
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544, USA.
| | | | | |
Collapse
|
31
|
Locker CR, Hernandez R. Folding behavior of model proteins with weak energetic frustration. J Chem Phys 2004; 120:11292-303. [PMID: 15268157 DOI: 10.1063/1.1751394] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The native structure of fast-folding proteins, albeit a deep local free-energy minimum, may involve a relatively small energetic penalty due to nonoptimal, though favorable, contacts between amino acid residues. The weak energetic frustration that such contacts represent varies among different proteins and may account for folding behavior not seen in unfrustrated models. Minimalist model proteins with heterogeneous contacts--as represented by lattice heteropolymers consisting of three types of monomers--also give rise to weak energetic frustration in their corresponding native structures, and the present study of their equilibrium and nonequilibrium properties reveals some of the breadth in their behavior. In order to capture this range within a detailed study of only a few proteins, four candidate protein structures (with their cognate sequences) have been selected according to a figure of merit called the winding index--a characteristic of the number of turns the protein winds about an axis. The temperature-dependent heat capacities reveal a high-temperature collapse transition, and an infrequently observed low-temperature rearrangement transition that arises because of the presence of weak energetic frustration. Simulation results motivate the definition of a new measure of folding affinity as a sequence-dependent free energy--a function of both a reduced stability gap and high accessibility to non-native structures--that correlates strongly with folding rates.
Collapse
Affiliation(s)
- C Rebecca Locker
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia 30332-0400, USA
| | | |
Collapse
|
32
|
Wang Y, Wang B, Liu Y, Chen W, Wang C. A generalized approach for protein design based on the relative entropy. CHINESE SCIENCE BULLETIN-CHINESE 2004. [DOI: 10.1007/bf02900958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
33
|
|
34
|
|
35
|
Aspnes J, Hartling J, Kao MY, Kim J, Shah G. A combinatorial toolbox for protein sequence design and landscape analysis in the grand canonical model. J Comput Biol 2003; 9:721-41. [PMID: 12487760 DOI: 10.1089/106652702761034154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
In modern biology, one of the most important research problems is to understand how protein sequences fold into their native 3D structures. To investigate this problem at a high level, one wishes to analyze the protein landscapes, i.e., the structures of the space of all protein sequences and their native 3D structures. Perhaps the most basic computational problem at this level is to take a target 3D structure as input and design a fittest protein sequence with respect to one or more fitness functions of the target 3D structure. We develop a toolbox of combinatorial techniques for protein landscape analysis in the Grand Canonical model of Sun, Brem, Chan, and Dill. The toolbox is based on linear programming, network flow, and a linear-size representation of all minimum cuts of a network. It not only substantially expands the network flow technique for protein sequence design in Kleinberg's seminal work but also is applicable to a considerably broader collection of computational problems than those considered by Kleinberg. We have used this toolbox to obtain a number of efficient algorithms and hardness results. We have further used the algorithms to analyze 3D structures drawn from the Protein Data Bank and have discovered some novel relationships between such native 3D structures and the Grand Canonical model.
Collapse
Affiliation(s)
- James Aspnes
- Department of Computer Science, Yale University, New Haven, CT 06520-8285, USA
| | | | | | | | | |
Collapse
|
36
|
Hennetin J, Le TK, Canard L, Colloc'h N, Mornon JP, Callebaut I. Non-intertwined binary patterns of hydrophobic/nonhydrophobic amino acids are considerably better markers of regular secondary structures than nonconstrained patterns. Proteins 2003; 51:236-44. [PMID: 12660992 DOI: 10.1002/prot.10355] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Patterns of hydrophobic and hydrophilic residues (binary patterns) play an important role in protein architecture and can be roughly categorized into two classes regarding their preferential participation in alpha-helices or beta-strands. However, a single binary pattern can be embedded into different longer patterns carrying opposite structural information and thus cannot be as much informative as expected. Here, we consider conditional binary patterns, or hydrophobic clusters, whose existence is conditioned by the presence of a minimum number of nonhydrophobic residues, called the connectivity distance, that separate two hydrophobic amino acids assumed to belong to two distinct patterns. Conditional binary patterns are distinct from simple ones in that they are not intertwined, i.e., they can not include or be included in other conditional patterns and therefore carry a much more differentiated information, in particular being dramatically better correlated with regular secondary structures (especially beta ones). The distribution of these nonintertwined binary patterns in natural proteins was assessed relative to randomness, evidencing the structural bricks that are favored and disfavored by evolutionary selection. Several connectivity distances as well as several hydrophobic alphabets were tested, evidencing the clear superiority of a connectivity distance of 4, which mimics the minimum current length of loops in globular domains, and of the VILFMYW alphabet, selected from structural data (secondary structure propension and Voronoï tesselation), in highlighting fundamental properties of protein folds.
Collapse
Affiliation(s)
- Jérôme Hennetin
- Systèmes moléculaires and Biologie structurale, LMCP, CNRS UMR 7590, Universités Paris 6 & Paris 7, case 115, 4 place Jussieu, 75252 Paris Cedex 05, France
| | | | | | | | | | | |
Collapse
|
37
|
Zou J, Saven JG. Using self-consistent fields to bias Monte Carlo methods with applications to designing and sampling protein sequences. J Chem Phys 2003. [DOI: 10.1063/1.1539845] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
38
|
|
39
|
Affiliation(s)
- J G Saven
- Department of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA
| |
Collapse
|
40
|
Locker CR, Hernandez R. A minimalist model protein with multiple folding funnels. Proc Natl Acad Sci U S A 2001; 98:9074-9. [PMID: 11470921 PMCID: PMC55375 DOI: 10.1073/pnas.161438898] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Kinetic and structural studies of wild-type proteins such as prions and amyloidogenic proteins provide suggestive evidence that proteins may adopt multiple long-lived states in addition to the native state. All of these states differ structurally because they lie far apart in configuration space, but their stability is not necessarily caused by cooperative (nucleation) effects. In this study, a minimalist model protein is designed to exhibit multiple long-lived states to explore the dynamics of the corresponding wild-type proteins. The minimalist protein is modeled as a 27-monomer sequence confined to a cubic lattice with three different monomer types. An order parameter-the winding index-is introduced to characterize the extent of folding. The winding index has several advantages over other commonly used order parameters like the number of native contacts. It can distinguish between enantiomers, its calculation requires less computational time than the number of native contacts, and reduced-dimensional landscapes can be developed when the native state structure is not known a priori. The results for the designed model protein prove by existence that the rugged energy landscape picture of protein folding can be generalized to include protein "misfolding" into long-lived states.
Collapse
Affiliation(s)
- C R Locker
- Center for Computational Molecular Science and Technology, School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, GA 30332-0400, USA
| | | |
Collapse
|
41
|
Kono H, Saven JG. Statistical theory for protein combinatorial libraries. Packing interactions, backbone flexibility, and the sequence variability of a main-chain structure. J Mol Biol 2001; 306:607-28. [PMID: 11178917 DOI: 10.1006/jmbi.2000.4422] [Citation(s) in RCA: 107] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Combinatorial experiments provide new ways to probe the determinants of protein folding and to identify novel folding amino acid sequences. These types of experiments, however, are complicated both by enormous conformational complexity and by large numbers of possible sequences. Therefore, a quantitative computational theory would be helpful in designing and interpreting these types of experiment. Here, we present and apply a statistically based, computational approach for identifying the properties of sequences compatible with a given main-chain structure. Protein side-chain conformations are included in an atom-based fashion. Calculations are performed for a variety of similar backbone structures to identify sequence properties that are robust with respect to minor changes in main-chain structure. Rather than specific sequences, the method yields the likelihood of each of the amino acids at preselected positions in a given protein structure. The theory may be used to quantify the characteristics of sequence space for a chosen structure without explicitly tabulating sequences. To account for hydrophobic effects, we introduce an environmental energy that it is consistent with other simple hydrophobicity scales and show that it is effective for side-chain modeling. We apply the method to calculate the identity probabilities of selected positions of the immunoglobulin light chain-binding domain of protein L, for which many variant folding sequences are available. The calculations compare favorably with the experimentally observed identity probabilities.
Collapse
Affiliation(s)
- H Kono
- Department of Chemistry, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | |
Collapse
|
42
|
Marshall SA, Mayo SL. Achieving stability and conformational specificity in designed proteins via binary patterning. J Mol Biol 2001; 305:619-31. [PMID: 11152617 DOI: 10.1006/jmbi.2000.4319] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
We have developed a method to determine the optimal binary pattern (arrangement of hydrophobic and polar amino acids) of a target protein fold prior to amino acid sequence selection in protein design studies. A solvent accessible surface is generated for a target fold using its backbone coordinates and "generic" side-chains, which are constructs whose size and shape are similar to an average amino acid. Each position is classified as hydrophobic or polar according to the solvent exposure of its generic side-chain. The method was tested by analyzing a set of proteins in the Protein Data Bank and by experimentally constructing and analyzing a set of engrailed homeodomain variants whose binary patterns were systematically varied. Selection of the optimal binary pattern results in a designed protein that is monomeric, well-folded, and hyperthermophilic. Homeodomain variants with fewer hydrophobic residues are destabilized, while additional hydrophobic residues induce aggregation. Binary patterning, in conjunction with a force field that models folded state energies, appears sufficient to satisfy two basic goals of protein design: stability and conformational specificity.
Collapse
Affiliation(s)
- S A Marshall
- Division of Chemistry and Chemical Engineering, California Institute of Technology, 1200 East California Blvd., Pasadena, CA 91125, USA
| | | |
Collapse
|
43
|
Rossi A, Micheletti C, Seno F, Maritan A. A self-consistent knowledge-based approach to protein design. Biophys J 2001; 80:480-90. [PMID: 11159418 PMCID: PMC1301249 DOI: 10.1016/s0006-3495(01)76030-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
A simple and very efficient protein design strategy is proposed by developing some recently introduced theoretical tools which have been successfully applied to exactly solvable protein models. The design approach is implemented by using three amino acid classes and it is based on the minimization of an appropriate energy function. For a given native state the results of the design procedure are compared, through a statistical analysis, with the properties of an ensemble of sequences folding in the same conformation. If the success rate is computed on those sites designed with high confidence, it can be as high as 80%. The method is also able to identify key sites for the folding process: results for 2ci2 and barnase are in very good agreement with experimental results.
Collapse
Affiliation(s)
- A Rossi
- International School for Advanced Studies and INFM, I-34014 Trieste, Italy.
| | | | | | | |
Collapse
|
44
|
Abstract
Numerous approaches have been described for creating relatively small folded biomolecular structures. "Peptide-amphiphiles," whereby monoalkyl or dialkyl hydrocarbon chains are covalently linked to peptide sequences, have been shown previously to form specific molecular architecture of enhanced stability. The present study has examined the use of monoalkyl hydrocarbon chains as a more general method for inducing protein-like structures. Peptide and peptide-amphiphiles have been characterized by CD and one- and two-dimensional nmr spectroscopic techniques. We have examined two structural elements: alpha-helices and collagen-like triple helices. The alpha-helical propensity of a 16-residue peptide either unmodified or acylated with a C(6) or C(16) monoalkyl hydrocarbon chain has been examined initially. The 16-residue peptide alone does not form a distinct structure in solution, whereas the 16-residue peptide adopts predominantly an alpha-helical structure in solution when a C(6) or C(16) monoalkyl hydrocarbon chain is N-terminally acylated. The thermal stability of the alpha-helix is greater upon addition of the C(16) compared with the C(6) chain, which correlates to the extent of aggregation induced by the respective hydrocarbon chains. Very similar results are seen using a 39-residue triple-helical model peptide, in that structural thermal stability (a) is increasingly enhanced as alkyl chain length is increased and (b) correlates to the extent of peptide-amphiphile aggregation. Overall, structures as diverse as alpha-helices, triple helices, and turns/loops have been shown to be induced and/or stabilized by alkyl chains. Increasing alkyl chain length enhances stability of the structural element and induces aggregates of defined sizes. Hydrocarbon chains may be useful as general tools for protein-like structure initiation and stabilization as well as biomaterial modification.
Collapse
Affiliation(s)
- P Forns
- Department of Chemistry and Biochemistry, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431-0991, USA
| | | | | | | |
Collapse
|
45
|
Street AG, Datta D, Gordon DB, Mayo SL. Designing protein beta-sheet surfaces by Z-score optimization. PHYSICAL REVIEW LETTERS 2000; 84:5010-5013. [PMID: 10990854 DOI: 10.1103/physrevlett.84.5010] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/1999] [Indexed: 05/23/2023]
Abstract
Studies of lattice models of proteins have suggested that the appropriate energy expression for protein design may include nonthermodynamic terms to accommodate negative design concerns. One method, developed in lattice model studies, maximizes a quantity known as the " Z-score," which compares the lowest energy sequence whose ground state structure is the target structure to an ensemble of random sequences. Here we show that, in certain circumstances, the technique can be applied to real proteins. The resulting energy expression is used to design the beta-sheet surfaces of two real proteins. We find experimentally that the designed proteins are stable and well folded, and in one case is even more thermostable than the wild type.
Collapse
Affiliation(s)
- A G Street
- Division of Physics, Mathematics and Astronomy, California Institute of Technology, MC 147-75, Pasadena, California 91125, USA
| | | | | | | |
Collapse
|
46
|
Dima RI, Settanni G, Micheletti C, Banavar JR, Maritan A. Extraction of interaction potentials between amino acids from native protein structures. J Chem Phys 2000. [DOI: 10.1063/1.481525] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
47
|
Zou J, Saven JG. Statistical theory of combinatorial libraries of folding proteins: energetic discrimination of a target structure. J Mol Biol 2000; 296:281-94. [PMID: 10656832 DOI: 10.1006/jmbi.1999.3426] [Citation(s) in RCA: 67] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
A self-consistent theory is presented that can be used to estimate the number and composition of sequences satisfying a predetermined set of constraints. The theory is formulated so as to examine the features of sequences having a particular value of Delta=E(f)-<E>(u), where E(f) is the energy of sequences when in a target structure and <E>(u) is an average energy of non-target structures. The theory yields the probabilities w(i)(alpha) that each position i in the sequence is occupied by a particular monomer type alpha. The theory is applied to a simple lattice model of proteins. Excellent agreement is observed between the theory and the results of exact enumerations. The theory provides a quantitative framework for the design and interpretation of combinatorial experiments involving proteins, where a library of amino acid sequences is searched for sequences that fold to a desired structure.
Collapse
Affiliation(s)
- J Zou
- Department of Chemistry, University of Pennsylvania, Philadelphia, PA 19104-6323, USA
| | | |
Collapse
|
48
|
Kleinberg JM. Efficient algorithms for protein sequence design and the analysis of certain evolutionary fitness landscapes. J Comput Biol 1999; 6:387-404. [PMID: 10582574 DOI: 10.1089/106652799318346] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Protein sequence design is a natural inverse problem to protein structure prediction: given a target structure in three dimensions, we wish to design an amino acid sequence that is likely fold to it. A model of Sun, Brem, Chan, and Dill casts this problem as an optimization on a space of sequences of hydrophobic (H) and polar (P) monomers; the goal is to find a sequence that achieves a dense hydrophobic core with few solvent-exposed hydrophobic residues. Sun et al. developed a heuristic method to search the space of sequences, without a guarantee of optimality or near-optimality; Hart subsequently raised the computational tractability of constructing an optimal sequence in this model as an open question. Here we resolve this question by providing an efficient algorithm to construct optimal sequences; our algorithm has a polynomial running time, and performs very efficiently in practice. We illustrate the implementation of our method on structures drawn from the Protein Data Bank. We also consider extensions of the model to larger amino acid alphabets, as a way to overcome the limitations of the binary H/P alphabet. We show that for a natural class of arbitrarily large alphabets, it remains possible to design optimal sequences efficiently. Finally, we analyze some of the consequences of this sequence design model for the study of evolutionary fitness landscapes. A given target structure may have many sequences that are optimal in the model of Sun et al.; following a notion raised by the work of J. Maynard Smith, we can ask whether these optimal sequences are "connected" by successive point mutations. We provide a polynomial-time algorithm to decide this connectedness property, relative to a given target structure. We develop the algorithm by first solving an analogous problem expressed in terms of submodular functions, a fundamental object of study in combinatorial optimization.
Collapse
Affiliation(s)
- J M Kleinberg
- Department of Computer Science, Cornell University, Ithaca, New York 14853, USA.
| |
Collapse
|
49
|
Lazar GA, Johnson EC, Desjarlais JR, Handel TM. Rotamer strain as a determinant of protein structural specificity. Protein Sci 1999; 8:2598-610. [PMID: 10631975 PMCID: PMC2144231 DOI: 10.1110/ps.8.12.2598] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
We present direct evidence for a change in protein structural specificity due to hydrophobic core packing. High resolution structural analysis of a designed core variant of ubiquitin reveals that the protein is in slow exchange between two conformations. Examination of side-chain rotamers indicates that this dynamic response and the lower stability of the protein are coupled to greater strain and mobility in the core. The results suggest that manipulating the level of side-chain strain may be one way of fine tuning the stability and specificity of proteins.
Collapse
Affiliation(s)
- G A Lazar
- Department of Molecular and Cell Biology, University of California, Berkeley 94720, USA
| | | | | | | |
Collapse
|
50
|
Abstract
We have developed a fully automated protein design strategy that works on the entire sequence of the protein and uses a full atom representation. At each step of the procedure, an all-atom model of the protein is built using the template protein structure and the current designed sequence. The energy of the model is used to drive a Monte Carlo optimization in sequence space: random moves are either accepted or rejected based on the Metropolis criterion. We rely on the physical forces that stabilize native protein structures to choose the optimum sequence. Our energy function includes van der Waals interactions, electrostatics and an environment free energy. Successful protein design should be specific and generate a sequence compatible with the template fold and incompatible with competing folds. We impose specificity by maintaining the amino acid composition constant, based on the random energy model. The specificity of the optimized sequence is tested by fold recognition techniques. Successful sequence designs for the B1 domain of protein G, for the lambda repressor and for sperm whale myoglobin are presented. We show that each additional term of the energy function improves the performance of our design procedure: the van der Waals term ensures correct packing, the electrostatics term increases the specificity for the correct native fold, and the environment solvation term ensures a correct pattern of buried hydrophobic and exposed hydrophilic residues. For the globin family, we show that we can design a protein sequence that is stable in the myoglobin fold, yet incompatible with the very similar hemoglobin fold.
Collapse
Affiliation(s)
- P Koehl
- Department of Structural Biology, Fairchild Building, Stanford University, Stanford, CA 94305, USA.
| | | |
Collapse
|