1
|
Iovino BG, Tang H, Ye Y. Protein domain embeddings for fast and accurate similarity search. Genome Res 2024; 34:1434-1444. [PMID: 39237301 PMCID: PMC11529836 DOI: 10.1101/gr.279127.124] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2024] [Accepted: 09/03/2024] [Indexed: 09/07/2024]
Abstract
Recently developed protein language models have enabled a variety of applications with the protein contextual embeddings they produce. Per-protein representations (each protein is represented as a vector of fixed dimension) can be derived via averaging the embeddings of individual residues, or applying matrix transformation techniques such as the discrete cosine transformation (DCT) to matrices of residue embeddings. Such protein-level embeddings have been applied to enable fast searches of similar proteins; however, limitations have been found; for example, PROST is good at detecting global homologs but not local homologs, and knnProtT5 excels for proteins with single domains but not multidomain proteins. Here, we propose a novel approach that first segments proteins into domains (or subdomains) and then applies the DCT to the vectorized embeddings of residues in each domain to infer domain-level contextual vectors. Our approach, called DCTdomain, uses predicted contact maps from ESM-2 for domain segmentation, which is formulated as a domain segmentation problem and can be solved using a recursive cut algorithm (RecCut in short) in quadratic time to the protein length; for comparison, an existing approach for domain segmentation uses a cubic-time algorithm. We show such domain-level contextual vectors (termed as DCT fingerprints) enable fast and accurate detection of similarity between proteins that share global similarities but with undefined extended regions between shared domains, and those that only share local similarities. In addition, tests on a database search benchmark show that the DCTdomain is able to detect distant homologs by leveraging the structural information in the contextual embeddings.
Collapse
Affiliation(s)
- Benjamin Giovanni Iovino
- Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana 47408, USA
| | - Haixu Tang
- Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana 47408, USA
| | - Yuzhen Ye
- Luddy School of Informatics, Computing and Engineering, Indiana University, Bloomington, Indiana 47408, USA
| |
Collapse
|
2
|
Kaur U, Kihn KC, Ke H, Kuo W, Gierasch LM, Hebert DN, Wintrode PL, Deredge D, Gershenson A. The conformational landscape of a serpin N-terminal subdomain facilitates folding and in-cell quality control. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.04.24.537978. [PMID: 37163105 PMCID: PMC10168285 DOI: 10.1101/2023.04.24.537978] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Many multi-domain proteins including the serpin family of serine protease inhibitors contain non-sequential domains composed of regions that are far apart in sequence. Because proteins are translated vectorially from N- to C-terminus, such domains pose a particular challenge: how to balance the conformational lability necessary to form productive interactions between early and late translated regions while avoiding aggregation. This balance is mediated by the protein sequence properties and the interactions of the folding protein with the cellular quality control machinery. For serpins, particularly α 1 -antitrypsin (AAT), mutations often lead to polymer accumulation in cells and consequent disease suggesting that the lability/aggregation balance is especially precarious. Therefore, we investigated the properties of progressively longer AAT N-terminal fragments in solution and in cells. The N-terminal subdomain, residues 1-190 (AAT190), is monomeric in solution and efficiently degraded in cells. More β -rich fragments, 1-290 and 1-323, form small oligomers in solution, but are still efficiently degraded, and even the polymerization promoting Siiyama (S53F) mutation did not significantly affect fragment degradation. In vitro, the AAT190 region is among the last regions incorporated into the final structure. Hydrogen-deuterium exchange mass spectrometry and enhanced sampling molecular dynamics simulations show that AAT190 has a broad, dynamic conformational ensemble that helps protect one particularly aggregation prone β -strand from solvent. These AAT190 dynamics result in transient exposure of sequences that are buried in folded, full-length AAT, which may provide important recognition sites for the cellular quality control machinery and facilitate degradation and, under favorable conditions, reduce the likelihood of polymerization.
Collapse
Affiliation(s)
- Upneet Kaur
- Department of Biochemistry & Molecular Biology, University of Massachusetts, Amherst, MA 01003
| | - Kyle C. Kihn
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201
| | - Haiping Ke
- Department of Biochemistry & Molecular Biology, University of Massachusetts, Amherst, MA 01003
| | - Weiwei Kuo
- Department of Biochemistry & Molecular Biology, University of Massachusetts, Amherst, MA 01003
| | - Lila M. Gierasch
- Department of Biochemistry & Molecular Biology, University of Massachusetts, Amherst, MA 01003
- Program in Molecular and Cellular Biology, University of Massachusetts, Amherst, MA 01003
- Department of Chemistry, University of Massachusetts, Amherst, MA 01003
| | - Daniel N. Hebert
- Department of Biochemistry & Molecular Biology, University of Massachusetts, Amherst, MA 01003
- Program in Molecular and Cellular Biology, University of Massachusetts, Amherst, MA 01003
| | - Patrick L. Wintrode
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201
| | - Daniel Deredge
- Department of Pharmaceutical Sciences, University of Maryland School of Pharmacy, Baltimore, MD 21201
| | - Anne Gershenson
- Department of Biochemistry & Molecular Biology, University of Massachusetts, Amherst, MA 01003
- Program in Molecular and Cellular Biology, University of Massachusetts, Amherst, MA 01003
| |
Collapse
|
3
|
The Last Secret of Protein Folding: The Real Relationship Between Long-Range Interactions and Local Structures. Protein J 2020; 39:422-433. [PMID: 33040262 DOI: 10.1007/s10930-020-09925-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/03/2020] [Indexed: 01/20/2023]
Abstract
The protein folding problem has been extensively studied for decades, and hundreds of thousands of protein structures have been solved. Yet, how proteins fold from a linear peptide chain to their unique 3D structures is not fully understood. With key clues having emerged unexpectedly from the field of nanoscience, a "Confined Lowest Energy Fragment" (CLEF) hypothesis was proposed. The CLEF hypothesis states that a protein chain can be divided into CLEFs, the semi-independent folding units, by a small number of key residues that form key long-range interactions. The native structure of a CLEF is the lowest energy state under the constraints of the key long-range interactions, but the native structure of the whole protein is not necessary the lowest energy state as Anfinsen's thermodynamic hypothesis suggested. The CLEF hypothesis proposes a unified CLEF mechanism for protein folding, basically a two-step process. In the first step, the favorable enthalpy of CLEFs for native structures quickly brings those residues for the key long-range interactions together, forming intermediates corresponding to the so-called hydrophobic collapse. In the second step, those collapsed key residues shuffle for the right combination to form the native key long-range interactions. The CLEF hypothesis provides a simple solution to all protein folding paradoxes, and proposes a "CLEF Age" or "Stone Age" for the prebiotic evolution of proteins.
Collapse
|
4
|
Abstract
How do proteins fold, and why do they fold in that way? This Perspective integrates earlier and more recent advances over the 50-y history of the protein folding problem, emphasizing unambiguously clear structural information. Experimental results show that, contrary to prior belief, proteins are multistate rather than two-state objects. They are composed of separately cooperative foldon building blocks that can be seen to repeatedly unfold and refold as units even under native conditions. Similarly, foldons are lost as units when proteins are destabilized to produce partially unfolded equilibrium molten globules. In kinetic folding, the inherently cooperative nature of foldons predisposes the thermally driven amino acid-level search to form an initial foldon and subsequent foldons in later assisted searches. The small size of foldon units, ∼ 20 residues, resolves the Levinthal time-scale search problem. These microscopic-level search processes can be identified with the disordered multitrack search envisioned in the "new view" model for protein folding. Emergent macroscopic foldon-foldon interactions then collectively provide the structural guidance and free energy bias for the ordered addition of foldons in a stepwise pathway that sequentially builds the native protein. These conclusions reconcile the seemingly opposed new view and defined pathway models; the two models account for different stages of the protein folding process. Additionally, these observations answer the "how" and the "why" questions. The protein folding pathway depends on the same foldon units and foldon-foldon interactions that construct the native structure.
Collapse
|
5
|
An alternative approach to protein folding. BIOMED RESEARCH INTERNATIONAL 2013; 2013:583045. [PMID: 24078920 PMCID: PMC3775432 DOI: 10.1155/2013/583045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/05/2013] [Revised: 06/20/2013] [Accepted: 07/31/2013] [Indexed: 11/26/2022]
Abstract
A diffusion theory-based, all-physical ab initio protein folding simulation is described and applied. The model is based upon the drift and diffusion of protein substructures relative to one another in the multiple energy fields present. Without templates or statistical inputs, the simulations were run at physiologic and ambient temperatures (including pH). Around 100 protein secondary structures were surveyed, and twenty tertiary structures were determined. Greater than 70% of the secondary core structures with over 80% alpha helices were correctly identified on protein ranging from 30 to 200 amino-acid sequence. The drift-diffusion model predicted tertiary structures with RMSD values in the 3–5 Angstroms range for proteins ranging 30 to 150 amino acids. These predictions are among the best for an all ab initio protein simulation. Simulations could be run entirely on a desktop computer in minutes; however, more accurate tertiary structures were obtained using molecular dynamic energy relaxation. The drift-diffusion model generated realistic energy versus time traces. Rapid secondary structures followed by a slow compacting towards lower energy tertiary structures occurred after an initial incubation period in agreement with observations.
Collapse
|
6
|
Rorick M. Quantifying protein modularity and evolvability: a comparison of different techniques. Biosystems 2012; 110:22-33. [PMID: 22796584 DOI: 10.1016/j.biosystems.2012.06.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2011] [Revised: 06/20/2012] [Accepted: 06/27/2012] [Indexed: 10/28/2022]
Abstract
Modularity increases evolvability by reducing constraints on adaptation and by allowing preexisting parts to function in new contexts for novel uses. Protein evolution provides an excellent context to study the causes and consequences of biological modularity. In order to address such questions, however, an index for protein modularity is necessary. This paper proposes a simple index for protein modularity-"module density"-which is the number of evolutionarily independent modules that compose a protein divided by the number of amino acids in the protein. The decomposition of proteins into constituent modules can be accomplished by either of two classes of methods. The first class of methods relies on "suppositional" criteria to assign amino acids to modules, whereas the second class of methods relies on "coevolutionary" criteria for this task. One simple and practical method from the first class consists of approximating the number of modules in a protein as the number of regular secondary structure elements (i.e., helices and sheets). Methods based on coevolutionary criteria require more elaborate data, but they have the advantage of being able to specify modules without prior assumptions about why they exist. Given the increasing availability of datasets sampling protein mutational spectra (e.g., from comparative genomics, experimental evolution, and computational prediction), methods based on coevolutionary criteria will likely become more promising in the near future. The ability to meaningfully quantify protein modularity via simple indices has the potential to aid future efforts to understand protein evolutionary rate determinants, improve molecular evolution models and engineer novel proteins.
Collapse
Affiliation(s)
- Mary Rorick
- University of Michigan, Department of Ecology and Evolutionary Biology, Ann Arbor, MI 48109-1048, United States.
| |
Collapse
|
7
|
Kathuria SV, Day IJ, Wallace LA, Matthews CR. Kinetic traps in the folding of beta alpha-repeat proteins: CheY initially misfolds before accessing the native conformation. J Mol Biol 2008; 382:467-84. [PMID: 18619461 DOI: 10.1016/j.jmb.2008.06.054] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2008] [Revised: 05/21/2008] [Accepted: 06/19/2008] [Indexed: 11/15/2022]
Abstract
The beta alpha-repeat class of proteins, represented by the (beta alpha)(8) barrel and the alpha/beta/alpha sandwich, are among the most common structural platforms in biology. Previous studies on the folding mechanisms of these motifs have revealed or suggested that the initial event involves the submillisecond formation of a kinetically trapped species that must at least partially unfold before productive folding to the respective native conformation can occur. To test the generality of these observations, CheY, a bacterial response regulator, was subjected to an extensive analysis of its folding reactions. Although earlier studies had proposed the formation of an off-pathway intermediate, the data available were not sufficient to rule out an alternative on-pathway mechanism. A global analysis of single- and double-jump kinetic data, combined with equilibrium unfolding data, was used to show that CheY folds and unfolds through two parallel channels defined by the state of isomerization of a prolyl peptide bond in the active site. Each channel involves a stable, highly structured folding intermediate whose kinetic properties are better described as the properties of an off-pathway species. Both intermediates subsequently flow through the unfolded state ensemble and adopt the native cis-prolyl isomer prior to forming the native state. Initial collapse to off-pathway folding intermediates is a common feature of the folding mechanisms of beta alpha-repeat proteins, perhaps reflecting the favored partitioning to locally determined substructures that cannot directly access the native conformation. Productive folding requires the dissipation of these prematurely folded substructures as a prelude to forming the larger-scale transition state that leads to the native conformation. Results from Gō-modeling studies in the accompanying paper elaborate on the topological frustration in the folding free-energy landscape of CheY.
Collapse
Affiliation(s)
- Sagar V Kathuria
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA
| | | | | | | |
Collapse
|
8
|
Li M, Song J. The N- and C-termini of the human Nogo molecules are intrinsically unstructured: bioinformatics, CD, NMR characterization, and functional implications. Proteins 2007; 68:100-8. [PMID: 17397058 DOI: 10.1002/prot.21385] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
RTN4 or Nogo proteins are composed of three alternative splice forms, namely 1192-residue Nogo-A, 373-residue Nogo-B, and 199-residue Nogo-C. Nogo proteins have received intense attentions because they have been implicated in a variety of critical cellular processes including CNS neuronal regeneration, vascular remodeling, apoptosis, interaction with beta-amyloid protein converting enzyme, and generation/maintenance of the tubular network of the endoplasmic reticulum (ER). Despite their significantly-different N-terminal lengths, they share a conserved C-terminal reticulon-homology domain consisting of two transmembrane fragments, a 66-residue extracellular loop Nogo-66 and a 38-residue C-tail carrying ER retention motif. Nogo-A owns the largest N-terminus with 1016 residues while the Nogo-B has an N-terminus almost identical to the first 200 residues of Nogo-A. So far, except for our previous determination of the Nogo-66 solution structure, no structural characterization of the other Nogo regions has been reported. In the present study, we initiated a systematically investigation of structural properties of Nogo molecules by a combined use of bioinformatics, CD, and NMR spectroscopy. The results led to two striking findings: (1) in agreement with bioinformatics prediction, the N- and C-termini of Nogo-B were experimentally demonstrated to be intrinsically unstructured by CD, two-dimensional 1H 15N NMR HSQC, hydrogen exchange, and 15N heteronuclear NOE characterization. (2) Further studies showed that the 1016-residue N-terminus of Nogo-A was again highly disordered. Therefore, it appears that being intrinsically-unstructured allows Nogo molecules to serve as double-faceted functional players, with one set of functions involved in cellular signaling processes essential for CNS neuronal regeneration, vascular remodeling, apoptosis and so forth and with another in generating/maintaining membrane-related structures. We propose that this mechanism may represent a general strategy to place the formation/maintenance of membrane-related structures under the direct regulation of the cellular signaling.
Collapse
Affiliation(s)
- Minfen Li
- Department of Biological Sciences, Faculty of Science, National University of Singapore, 10 Kent Ridge Crescent, Singapore 119260
| | | |
Collapse
|
9
|
Carey J, Lindman S, Bauer M, Linse S. Protein reconstitution and three-dimensional domain swapping: benefits and constraints of covalency. Protein Sci 2007; 16:2317-33. [PMID: 17962398 PMCID: PMC2211703 DOI: 10.1110/ps.072985007] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2007] [Revised: 07/30/2007] [Accepted: 08/01/2007] [Indexed: 10/22/2022]
Abstract
The phenomena of protein reconstitution and three-dimensional domain swapping reveal that highly similar structures can be obtained whether a protein is comprised of one or more polypeptide chains. In this review, we use protein reconstitution as a lens through which to examine the range of protein tolerance to chain interruptions and the roles of the primary structure in related features of protein structure and folding, including circular permutation, natively unfolded proteins, allostery, and amyloid fibril formation. The results imply that noncovalent interactions in a protein are sufficient to specify its structure under the constraints imposed by the covalent backbone.
Collapse
Affiliation(s)
- Jannette Carey
- Chemistry Department, Princeton University, NJ 08544-1009, USA.
| | | | | | | |
Collapse
|
10
|
Gebhard LG, Risso VA, Santos J, Ferreyra RG, Noguera ME, Ermácora MR. Mapping the Distribution of Conformational Information Throughout a Protein Sequence. J Mol Biol 2006; 358:280-8. [PMID: 16510154 DOI: 10.1016/j.jmb.2006.01.095] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2005] [Revised: 01/26/2006] [Accepted: 01/27/2006] [Indexed: 12/01/2022]
Abstract
The three-dimensional structure of protein is encoded in the sequence, but many amino acid residues carry no essential conformational information, and the identity of those that are structure-determining is elusive. By circular permutation and terminal deletion, we produced and purified 25 Bacillus licheniformis beta-lactamase (ESBL) variants that lack 5-21 contiguous residues each, and collectively have 82% of the sequence and 92% of the non-local atom-atom contacts eliminated. Circular dichroism and size-exclusion chromatography showed that most of the variants form conformationally heterogeneous mixtures, but by measuring catalytic constants, we found that all populate, to a greater or lesser extent, conformations with the essential features of the native fold. This suggests that no segment of the ESBL sequence is essential to the structure as a whole, which is congruent with the notion that local information and modular organization can impart most of the tertiary fold specificity and cooperativity.
Collapse
Affiliation(s)
- Leopoldo G Gebhard
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, Roque Sáenz Peña 180, (1876) Bernal, Buenos Aires, Argentina
| | | | | | | | | | | |
Collapse
|
11
|
Arai M, Iwakura M. Peptide fragment studies on the folding elements of dihydrofolate reductase from Escherichia coli. Proteins 2005; 62:399-410. [PMID: 16302220 DOI: 10.1002/prot.20675] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
One of the necessary conditions for a protein to be foldable is the presence of a complete set of "folding elements" (FEs) that are short, contiguous peptide segments distributed over an amino acid sequence. The FE-assembly model of protein folding has been proposed, in which the FEs play a role in guiding structure formation through FE-FE interactions early in folding. However, two major issues remain to be clarified regarding the roles of the FEs in determining protein foldability. Are the FEs AFUs that can form nativelike structures in isolation? Is the presence of only the FEs without mutual connections a sufficient condition for a protein to be foldable? Here, we address these questions using peptide fragments corresponding to the FEs of DHFR from Escherichia coli. We show by CD measurement that the FE peptides are unfolded under the native conditions, and some of them have the propensities toward non-native helices. MD simulations also show the non-native helical propensities of the peptides, and the helix contents estimated from the simulations are well correlated with those estimated from the CD in TFE. Thus, the FEs of DHFR are not AFUs, suggesting the importance of the FEs in nonlocal interactions. We also show that equimolar mixtures of the FE peptides do not induce any structural formation. Therefore, mutual connections between the FEs, which should strengthen the nonlocal FE-FE interactions, are also one of the necessary conditions for a protein to be foldable.
Collapse
Affiliation(s)
- Munehito Arai
- Protein Design Research Group, Institute for Biological Resources and Functions, National Institute of Advanced Industrial Science and Technology (AIST), Ibaraki, Japan
| | | |
Collapse
|
12
|
Wu Y, Vadrevu R, Yang X, Matthews CR. Specific structure appears at the N terminus in the sub-millisecond folding intermediate of the alpha subunit of tryptophan synthase, a TIM barrel protein. J Mol Biol 2005; 351:445-52. [PMID: 16023136 DOI: 10.1016/j.jmb.2005.06.006] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2005] [Revised: 06/02/2005] [Accepted: 06/03/2005] [Indexed: 11/20/2022]
Abstract
Competing views of the products of sub-millisecond folding reactions observed in many globular proteins have been ascribed either to the formation of discrete, partially folded states or to the random collapse of the unfolded chain under native-favoring conditions. To test the validity of these alternative interpretations for the stopped-flow burst-phase reaction in the (betaalpha)8, TIM barrel motif, a series of alanine replacements were made at five different leucine or isoleucine residues in the alpha subunit of tryptophan synthase (alphaTS) from Escherichia coli. This protein has been proposed to fold, in the sub-millisecond time range, to an off-pathway intermediate with significant stability and approximately 50% of the far-UV circular dichroism (CD) signal of the native conformation. Individual alanine replacements at any of three isoleucine or leucine residues in either alpha1, beta2 or beta3 completely eliminate the off-pathway species. These variants, within 5 ms, access an intermediate whose properties closely resemble those of an on-pathway equilibrium intermediate that is highly populated at moderate urea concentrations in wild-type alphaTS. By contrast, alanine replacements for leucine residues in either beta4 or beta6 destabilize but preserve the off-pathway, burst-phase species. When considered with complementary thermodynamic and kinetic data, this mutational analysis demonstrates that the sub-millisecond appearance of CD signal for alphaTS reflects the acquisition of secondary structure in a distinct thermodynamic state, not the random collapse of an unfolded chain. The contrasting results for replacements in the contiguous alpha1/beta2/beta3 domain and the C-terminal beta4 and beta6 strands imply a heterogeneous structure for the burst-phase species. The alpha1/beta2/beta3 domain appears to be tightly packed, and the C terminus appears to behave as a molten-globule-like structure whose folding is tightly coupled to that of the alpha1/beta2/beta3 domain.
Collapse
Affiliation(s)
- Ying Wu
- Department of Biochemistry and Molecular Pharmacology, University of Massachusetts Medical School, Worcester, MA 01605, USA.
| | | | | | | |
Collapse
|
13
|
He HW, Zhang J, Zhou HM, Yan YB. Conformational change in the C-terminal domain is responsible for the initiation of creatine kinase thermal aggregation. Biophys J 2005; 89:2650-8. [PMID: 16006628 PMCID: PMC1366765 DOI: 10.1529/biophysj.105.066142] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2005] [Accepted: 06/29/2005] [Indexed: 11/18/2022] Open
Abstract
Protein conformational changes may be associated with particular properties such as its function, transportation, assembly, tendency to aggregate, and potential cytotoxicity. In this research, the conformational change that is responsible for the fast destabilization and aggregation of rabbit muscle creatine kinase (EC 2.7.3.2) induced by heat was studied by intrinsic fluorescence and infrared spectroscopy. A pretransitional change of the tryptophan microenvironments was found from the intrinsic fluorescence spectra. A further analysis of the infrared spectra using quantitative second-derivative and two-dimensional correlation analysis indicated that the changes of the beta-sheet structures in the C-terminal domain and the loops occurred before the formation of intermolecular cross-beta-sheet structures and the unfolding of alpha-helices. These results suggested that the pretransitional conformational changes in the active site and the C-terminal domain might result in the modification of the domain-domain interactions and the formation of an inactive dimeric form that was prone to aggregate. Our results highlighted the fact that some minor conformational changes, which were usually negligible or undetectable by normal methods, might play a crucial role in protein stability and aggregation. Our results also suggested that the changes in domain-domain interactions, but not the dissociation of the dimer, might play a crucial role in the thermal denaturation and aggregation of this dimeric two-domain protein.
Collapse
Affiliation(s)
- Hua-Wei He
- Department of Biological Sciences and Biotechnology, and State Key Laboratory of Biomembrane and Membrane Biotechnology, Tsinghua University, Beijing, China
| | | | | | | |
Collapse
|
14
|
Liu HL, Hsu JP. Recent developments in structural proteomics for protein structure determination. Proteomics 2005; 5:2056-68. [PMID: 15846841 DOI: 10.1002/pmic.200401104] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The major challenges in structural proteomics include identifying all the proteins on the genome-wide scale, determining their structure-function relationships, and outlining the precise three-dimensional structures of the proteins. Protein structures are typically determined by experimental approaches such as X-ray crystallography or nuclear magnetic resonance (NMR) spectroscopy. However, the knowledge of three-dimensional space by these techniques is still limited. Thus, computational methods such as comparative and de novo approaches and molecular dynamic simulations are intensively used as alternative tools to predict the three-dimensional structures and dynamic behavior of proteins. This review summarizes recent developments in structural proteomics for protein structure determination; including instrumental methods such as X-ray crystallography and NMR spectroscopy, and computational methods such as comparative and de novo structure prediction and molecular dynamics simulations.
Collapse
Affiliation(s)
- Hsuan-Liang Liu
- Department of Chemical Engineering, National Taipei University of Technology, Taiwan.
| | | |
Collapse
|
15
|
Feng H, Zhou Z, Bai Y. A protein folding pathway with multiple folding intermediates at atomic resolution. Proc Natl Acad Sci U S A 2005; 102:5026-31. [PMID: 15793003 PMCID: PMC555603 DOI: 10.1073/pnas.0501372102] [Citation(s) in RCA: 98] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2004] [Indexed: 11/18/2022] Open
Abstract
Using native-state hydrogen-exchange-directed protein engineering and multidimensional NMR, we determined the high-resolution structure (rms deviation, 1.1 angstroms) for an intermediate of the four-helix bundle protein: Rd-apocytochrome b562. The intermediate has the N-terminal helix and a part of the C-terminal helix unfolded. In earlier studies, we also solved the structures of two other folding intermediates for the same protein: one with the N-terminal helix alone unfolded and the other with a reorganized hydrophobic core. Together, these structures provide a description of a protein folding pathway with multiple intermediates at atomic resolution. The two general features for the intermediates are (i) native-like backbone topology and (ii) nonnative side-chain interactions. These results have implications for important issues in protein folding studies, including large-scale conformation search, -value analysis, and computer simulations.
Collapse
Affiliation(s)
- Hanqiao Feng
- Laboratory of Biochemistry, National Cancer Institute, National Institutes of Health, Building 37, Room 6114E, Bethesda, MD 20892, USA
| | | | | |
Collapse
|
16
|
de Bono S, Riechmann L, Girard E, Williams RL, Winter G. A segment of cold shock protein directs the folding of a combinatorial protein. Proc Natl Acad Sci U S A 2005; 102:1396-401. [PMID: 15671167 PMCID: PMC547839 DOI: 10.1073/pnas.0407298102] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
It has been suggested that protein domains evolved by the non-homologous recombination of building blocks of subdomain size. In earlier work we attempted to recapitulate domain evolution in vitro. We took a polypeptide segment comprising three beta-strands in the monomeric, five-stranded beta-barrel cold shock protein (CspA) of Escherichia coli as a building block. This segment corresponds to a complete exon in homologous eukaryotic proteins and includes residues that nucleate folding in CspA. We recombined this segment at random with fragments of natural proteins and succeeded in generating a range of folded chimaeric proteins. We now present the crystal structure of one such combinatorial protein, 1b11, a 103-residue polypeptide that includes segments from CspA and the S1 domain of the 30S ribosomal subunit of E. coli. The structure reveals a segment-swapped, six-stranded beta-barrel of unique architecture that assembles to a tetramer. Surprisingly, the CspA segment retains its structural identity in 1b11, recapitulating its original fold and deforming the structure of the S1 segment as necessary to complete a barrel. Our work provides structural evidence that (i) random shuffling of nonhomologous polypeptide segments can lead to folded proteins and unique architectures, (ii) many structural features of the segments are retained, and (iii) some segments can act as templates around which the rest of the protein folds.
Collapse
Affiliation(s)
- Stephanie de Bono
- Centre for Protein Engineering and Laboratory of Molecular Biology, Medical Research Council Centre, Hills Road, Cambridge CB2 2QH, United Kingdom
| | | | | | | | | |
Collapse
|
17
|
Barzilai A, Kumar S, Wolfson H, Nussinov R. Potential folding-function interrelationship in proteins. Proteins 2004; 56:635-49. [PMID: 15281117 DOI: 10.1002/prot.20132] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The possibility is addressed that protein folding and function may be related via regions that are critical for both folding and function. This approach is based on the building blocks folding model that describes protein folding as binding events of conformationally fluctuating building blocks. Within these, we identify building block fragments that are critical for achieving the native fold. A library of such critical building blocks (CBBs) is constructed. Then, it is asked whether the functionally important residues fall in these CBB fragments. We find that for over two-thirds of the proteins in our library with available functional information, the catalytic or binding site residues lie within the CBB regions. From the evolutionary standpoint, a folding-function relationship is advantageous, since the need to guard against mutations is limited to one region. Furthermore, conformationally similar CBBs are found in globally unrelated proteins with different functions. Hence, substituting CBBs may lead to designed proteins with altered functions. We further find that the CBBs in our library are conformationally unstable.
Collapse
Affiliation(s)
- Adi Barzilai
- Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel
| | | | | | | |
Collapse
|
18
|
Haspel N, Tsai CJ, Wolfson H, Nussinov R. Reducing the computational complexity of protein folding via fragment folding and assembly. Protein Sci 2003; 12:1177-87. [PMID: 12761388 PMCID: PMC2323902 DOI: 10.1110/ps.0232903] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2002] [Revised: 12/23/2002] [Accepted: 02/23/2003] [Indexed: 10/27/2022]
Abstract
Understanding, and ultimately predicting, how a 1-D protein chain reaches its native 3-D fold has been one of the most challenging problems during the last few decades. Data increasingly indicate that protein folding is a hierarchical process. Hence, the question arises as to whether we can use the hierarchical concept to reduce the practically intractable computational times. For such a scheme to work, the first step is to cut the protein sequence into fragments that form local minima on the polypeptide chain. The conformations of such fragments in solution are likely to be similar to those when the fragments are embedded in the native fold, although alternate conformations may be favored during the mutual stabilization in the combinatorial assembly process. Two elements are needed for such cutting: (1) a library of (clustered) fragments derived from known protein structures and (2) an assignment algorithm that selects optimal combinations to "cover" the protein sequence. The next two steps in hierarchical folding schemes, not addressed here, are the combinatorial assembly of the fragments and finally, optimization of the obtained conformations. Here, we address the first step in a hierarchical protein-folding scheme. The input is a target protein sequence and a library of fragments created by clustering building blocks that were generated by cutting all protein structures. The output is a set of cutout fragments. We briefly outline a graph theoretic algorithm that automatically assigns building blocks to the target sequence, and we describe a sample of the results we have obtained.
Collapse
Affiliation(s)
- Nurit Haspel
- Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | | | | | | |
Collapse
|
19
|
Rabl CR, Martin SR, Neumann E, Bayley PM. Temperature jump kinetic study of the stability of apo-calmodulin. Biophys Chem 2002; 101-102:553-64. [PMID: 12488026 DOI: 10.1016/s0301-4622(02)00150-3] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Temperature-jump relaxation spectrometry has been used to study the unfolding properties of Ca(2+)-free Drosophila calmodulin from 278 to 336 K, monitored by absorption of Tyr-138. The T-jump amplitude data are well fitted throughout with a melting temperature T(m) = 315.7 K, deltaH(o)(m) = 140.5 kJ mol(-1) and deltaC(p)(o) = 3.28 kJ K(-1) mol(-1), giving deltaG(o)(293) = 7.36 kJ mol(-1) for the C-domain, in good agreement with other data. The relaxation rate observed (time range 1 micros-1 ms) obeys a simple two-state kinetic mechanism throughout. The activation energy for unfolding is nearly temperature-independent, in contrast to that for refolding, and hence the transition state is relatively compact, resembling the folded state, and the relaxation time, tau, shows complex temperature dependence. The domain unfolding is a two-state process occurring with tau of approximately 100 micros at the T(m). At 296 K, when the C-domain is approximately 6% unfolded, k(unfolding) approximately 305 s(-1), k(refolding) approximately 4660 s(-1) and tau approximately 200 micros. This closely resembles the rate and extent of a reported C-domain exchange process, inferred from NMR line-broadening at 296 K. The inherent instability of the apo-C-domain of calmodulin indicates that the unfolded form significantly contributes to the physical properties of apo-calmodulin at normal temperatures, and this instability is enhanced by low ionic strength conditions.
Collapse
Affiliation(s)
- Carl-Roland Rabl
- Faculty of Chemistry, University of Bielefeld, PO Box 100130, D-33501 Bielefeld, Germany
| | | | | | | |
Collapse
|
20
|
Luo JK, Hornby JAT, Wallace LA, Chen J, Armstrong RN, Dirr HW. Impact of domain interchange on conformational stability and equilibrium folding of chimeric class micro glutathione transferases. Protein Sci 2002; 11:2208-17. [PMID: 12192076 PMCID: PMC2373595 DOI: 10.1110/ps.0208002] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Rat micro class glutathione transferases M1-1 and M2-2 are homodimers that share a 78% sequence identity but display differences in stability. M1-1 is more stable at the secondary and tertiary structural levels, whereas its quaternary structure is less stable. Each subunit in these proteins consists of two structurally distinct domains with intersubunit contacts occurring between domain 1 of one subunit and domain 2 of the other subunit. The chimeric subunit variants M(12), which has domain 1 of M1 and domain 2 of M2, and its complement M(21), were used to investigate the conformational stability of the chimeric homodimers M(12)-(12) and M(21)-(21) to determine the contribution of each domain toward stability. Exchanging entire domains between class micro GSTs is accommodated by the GST fold. Urea-induced equilibrium unfolding data indicate that whereas the class micro equilibrium unfolding mechanism (i.e., N(2) <--> 2I <--> 2U) is not altered, domain exchanges impact significantly on the conformational stability of the native dimers and monomeric folding intermediates. Data for the wild-type and chimeric proteins indicate that the order of stability for the native dimer (N(2)) is M2-2 > M(12)-(12) M1-1 approximately M(21)-(21), and that the order of stability of the monomeric intermediate (I) is M1 > M2 approximately M(12) > M(21). Interactions involving Arg 77, which is topologically conserved in GSTs, appear to play an important role in the stability of both the native dimeric and folding monomeric structures.
Collapse
Affiliation(s)
- Jiann-Kae Luo
- University Research Council Protein Structure-Function Research Programme, School of Molecular and Cell Biology, University of the Witwatersrand, Johannesburg 2050, South Africa
| | | | | | | | | | | |
Collapse
|
21
|
Ellgaard L, Bettendorff P, Braun D, Herrmann T, Fiorito F, Jelesarov I, Güntert P, Helenius A, Wüthrich K. NMR Structures of 36 and 73-residue Fragments of the Calreticulin P-domain. J Mol Biol 2002; 322:773-84. [PMID: 12270713 DOI: 10.1016/s0022-2836(02)00812-4] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
Calreticulin (CRT) is an abundant, soluble molecular chaperone of the endoplasmic reticulum. Similar to its membrane-bound homolog calnexin (CNX), it is a lectin that promotes the folding of proteins carrying N-linked glycans. Both proteins cooperate with an associated co-chaperone, the thiol-disulfide oxidoreductase ERp57. This enzyme catalyzes the formation of disulfide bonds in CNX and CRT-bound glycoprotein substrates. Previously, we solved the NMR structure of the central proline-rich P-domain of CRT comprising residues 189-288. This structure shows an extended hairpin topology, with three short anti-parallel beta-sheets, three small hydrophobic clusters, and one helical turn at the tip of the hairpin. We further demonstrated that the residues 225-251 at the tip of the CRT P-domain are involved in direct contacts with ERp57. Here, we show that the CRT P-domain fragment CRT(221-256) constitutes an autonomous folding unit, and has a structure highly similar to that of the corresponding region in CRT(189-288). Of the 36 residues present in CRT(221-256), 32 form a well-structured core, making this fragment one of the smallest known natural sequences to form a stable non-helical fold in the absence of disulfide bonds or tightly bound metal ions. CRT(221-256) comprises all the residues of the intact P-domain that were shown to interact with ERp57. Isothermal titration microcalorimetry (ITC) now showed affinity of this fragment for ERp57 similar to that of the intact P-domain, demonstrating that CRT(221-256) may be used as a low molecular mass mimic of CRT for further investigations of the interaction with ERp57. We also solved the NMR structure of the 73-residue fragment CRT(189-261), in which the tip of the hairpin and the first beta-sheet are well structured, but the residues 189-213 are disordered, presumably due to lack of stabilizing interactions across the hairpin.
Collapse
Affiliation(s)
- Lars Ellgaard
- Institut für Biochemie, Eidgenössische Technische Hochschule Zürich, Switzerland
| | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Abstract
Protein folding is a topic of fundamental interest since it concerns the mechanisms by which the genetic message is translated into the three-dimensional and functional structure of proteins. In these post-genomic times, the knowledge of the fundamental principles are required in the exploitation of the information contained in the increasing number of sequenced genomes. Protein folding also has practical applications in the understanding of different pathologies and the development of novel therapeutics to prevent diseases associated with protein misfolding and aggregation. Significant advances have been made ranging from the Anfinsen postulate to the "new view" which describes the folding process in terms of an energy landscape. These new insights arise from both theoretical and experimental studies. The problem of folding in the cellular environment is briefly discussed. The modern view of misfolding and aggregation processes that are involved in several pathologies such as prion and Alzheimer diseases. Several approaches of structure prediction, which is a very active field of research, are described.
Collapse
Affiliation(s)
- Jeannine M Yon
- Institut de Biochimie Biophysique Moléculaire et Cellulaire, UMR Centre National de la Recherche Scientifique, Université de Paris-Sud, Orsay, France.
| |
Collapse
|
23
|
Tsai CJ, Polverino de Laureto P, Fontana A, Nussinov R. Comparison of protein fragments identified by limited proteolysis and by computational cutting of proteins. Protein Sci 2002; 11:1753-70. [PMID: 12070328 PMCID: PMC2373665 DOI: 10.1110/ps.4100102] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2001] [Revised: 04/17/2002] [Accepted: 04/17/2002] [Indexed: 10/14/2022]
Abstract
Here we present a comparison between protein fragments produced by limited proteolysis and those identified by computational cutting based on the building block folding model. The principles upon which the two methods are based are different. Limited proteolysis of natively folded proteins occurs at flexible sites and never at the level of chain segments of regular secondary structure such as alpha-helices. Therefore, the targets for limited proteolysis are locally unfolded regions. In contrast, the computational cutting algorithm considers the compactness of the fragments, their nonpolar buried surface area, and their isolatedness, that is, the surface area which was buried prior to the cutting and becomes exposed subsequently. Despite the different criteria, there is an overall correspondence between sites or regions of limited proteolysis with those identified by computational cutting. The computational cutting method has been applied to several model proteins for which detailed limited proteolysis data are available, namely apomyoglobin, cytochrome c, ribonuclease A, alpha-lactalbumin, and thermolysin. As expected, more cuts are obtained computationally than experimentally and the agreement is better when a number of proteolytic enzymes are used. For example, cytochrome c is cleaved by thermolysin at 56-57, 45-46, and at 80-81, and by proteinase K at 48-49 and 50-51. Incubation of the noncovalent and native-like complex of cytochrome c fragments 1-56 and 57-104 with proteinase K yielded the gapped protein species 1-48/57-104 and finally 1-40/57-104. Computational cutting of cytochrome c reproduced the major experimental observations, with cuts at 47, 64-65 or 65-66 and 80-81 and an unstable 32-47 region not assigned to any building block. The next step, not addressed in this work, is to probe the ability of the generated fragments to fold independently. Since both the computational algorithm and limited proteolysis attempt to dissect the protein folding problem, the general agreement between the two procedures is gratifying. This consistency allows us to propose the use of limited proteolysis to produce protein fragments that can adopt an independent folding and, therefore, to study folding intermediates. The results of the present study appear to validate the building block folding model and are in line with the proposal that protein folding is a hierarchical process, where parts constituting local minima of energy fold first, with their subsequent association and mutual stabilization to finally yield the global fold.
Collapse
Affiliation(s)
- Chung-Jung Tsai
- Laboratory of Experimental and Computational Biology, National Cancer Institute, Frederick, MD 21702, USA
| | | | | | | |
Collapse
|
24
|
Maeda M, Hamada D, Hoshino M, Onda Y, Hase T, Goto Y. Partially folded structure of flavin adenine dinucleotide-depleted ferredoxin-NADP+ reductase with residual NADP+ binding domain. J Biol Chem 2002; 277:17101-7. [PMID: 11872744 DOI: 10.1074/jbc.m112002200] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Maize ferredoxin-NADP(+) reductase (FNR) consists of flavin adenine dinucleotide (FAD) and NADP(+) binding domains with a FAD molecule bound noncovalently in the cleft between these domains. The structural changes of FNR induced by dissociation of FAD have been characterized by a combination of optical and biochemical methods. The CD spectrum of the FAD-depleted FNR (apo-FNR) suggested that removal of FAD from holo-FNR produced an intermediate conformational state with partially disrupted secondary and tertiary structures. Small angle x-ray scattering indicated that apo-FNR assumes a conformation that is less globular in comparison with holo-FNR but is not completely chain-like. Interestingly, the replacement of tyrosine 95 responsible for FAD binding with alanine resulted in a molecular form similar to apo-protein of the wild-type enzyme. Both apo- and Y95A-FNR species bound to Cibacron Blue affinity resin, indicating the presence of a native-like conformation for the NADP(+) binding domain. On the other hand, no evidence was found for the existence of folded conformations in the FAD binding domains of these proteins. These results suggested that FAD-depleted FNR assumes a partially folded structure with a residual NADP(+) binding domain but a disordered FAD binding domain.
Collapse
Affiliation(s)
- Masahiro Maeda
- Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka 565-0871, Japan
| | | | | | | | | | | |
Collapse
|
25
|
Farrell HM, Qi PX, Brown EM, Cooke PH, Tunick MH, Wickham ED, Unruh JJ. Molten globule structures in milk proteins: implications for potential new structure-function relationships. J Dairy Sci 2002; 85:459-71. [PMID: 11949847 DOI: 10.3168/jds.s0022-0302(02)74096-4] [Citation(s) in RCA: 51] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Recent advances in the field of protein chemistry have significantly enhanced our understanding of the possible intermediates that may occur during protein folding and unfolding. In particular, studies on alpha-lactalbumin have led to the theory that the molten globule state may be a possible intermediate in the folding of many proteins. The molten globule state is characterized by a somewhat compact structure, a higher degree of hydration and side chain flexibility, a significant amount of native secondary structure but little tertiary folds, and the ability to react with chaperones. Purified alpha(s1)- and kappa-caseins share many of these same properties; these caseins may thus occur naturally in a molten globule-like state with defined, persistent structures. The caseins appear to have defined secondary structures and to proceed to quaternary structures without tertiary folds. This process may be explained, in part, by comparison with the architectural concepts of tensegrity. By taking advantage of this "new view" of protein folding, and applying these concepts to dairy proteins, it may be possible to generate new and useful forms of proteins for the food ingredient market.
Collapse
Affiliation(s)
- H M Farrell
- U.S. Department of Agriculture, Agricultural Research Service, Eastern Regional Research Center, Wyndmoor, PA 19038, USA.
| | | | | | | | | | | | | |
Collapse
|
26
|
Cui Y, Wong WH, Bornberg-Bauer E, Chan HS. Recombinatoric exploration of novel folded structures: a heteropolymer-based model of protein evolutionary landscapes. Proc Natl Acad Sci U S A 2002; 99:809-14. [PMID: 11805332 PMCID: PMC117387 DOI: 10.1073/pnas.022240299] [Citation(s) in RCA: 73] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The role of recombination in evolution is compared with that of point mutations (substitutions) in the context of a simple, polymer physics-based model mapping between sequence (genotype) and conformational (phenotype) spaces. Crossovers and point mutations of lattice chains with a hydrophobic polar code are investigated. Sequences encoding for a single ground-state conformation are considered viable and used as model proteins. Point mutations lead to diffusive walks on the evolutionary landscape, whereas crossovers can "tunnel" through barriers of diminished fitness. The degree to which crossovers allow for more efficient sequence and structural exploration depends on the relative rates of point mutations versus that of crossovers and the dispersion in fitness that characterizes the ruggedness of the evolutionary landscape. The probability that a crossover between a pair of viable sequences results in viable sequences is an order of magnitude higher than random, implying that a sequence's overall propensity to encode uniquely is embodied partially in local signals. Consistent with this observation, certain hydrophobicity patterns are significantly more favored than others among fragments (i.e., subsequences) of sequences that encode uniquely, and examples reminiscent of autonomous folding units in real proteins are found. The number of structures explored by both crossovers and point mutations is always substantially larger than that via point mutations alone, but the corresponding numbers of sequences explored can be comparable when the evolutionary landscape is rugged. Efficient structural exploration requires intermediate nonextreme ratios between point-mutation and crossover rates.
Collapse
Affiliation(s)
- Yan Cui
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA
| | | | | | | |
Collapse
|
27
|
Chang JY. The folding pathway of alpha-lactalbumin elucidated by the technique of disulfide scrambling. Isolation of on-pathway and off-pathway intermediates. J Biol Chem 2002; 277:120-6. [PMID: 11560938 DOI: 10.1074/jbc.m108057200] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The technique of disulfide scrambling permits reversible conversion of the native and denatured (scrambled) proteins via shuffling and reshuffling of disulfide bonds. Under strong denaturing conditions (e.g. 6 m guanidinium chloride) and in the presence of a thiol initiator, alpha-lactalbumin (alphaLA) denatures by shuffling its four native disulfide bonds and converts to an assembly of 45 species of scrambled isomers. Among them, two predominant isomers, designated as X-alphaLA-a and X-alphaLA-d, account for about 50% of the total denatured structure of alphaLA. X-alphaLA-a and X-alphaLA-d, which adopt the disulfide patterns of (1-2,3-4,5-6,7-8) and (1-2,3-6,4-5,7-8), respectively, represent the most unfolded structures among the 104 possible scrambled isomers (Chang, J.-Y., and Li, L. (2001) J. Biol. Chem. 276, 9705-9712). In this study, X-alphaLA-a and X-alphaLA-d were purified and allowed to refold through disulfide scrambling to form the native alphaLA. Folding intermediates were trapped kinetically by acid quenching and analyzed quantitatively by reversed phase high pressure liquid chromatography. The results revealed two major on-pathway productive intermediates, two major off-pathway kinetic traps, and at least 30 additional minor transient intermediates. Of the two major on-pathway intermediates, one takes on a native-like alpha-helical domain, and the other comprises a structured beta-sheet, calcium binding domain. The two major kinetic traps are apparently stabilized by locally formed non-native-like structures. Overall, the folding mechanism of alphaLA is essentially congruent with the model of "folding funnel" furnished with a rather intricate energy landscape.
Collapse
Affiliation(s)
- Jui-Yoa Chang
- Research Center for Protein Chemistry, Institute of Molecular Medicine and the Department of Biochemistry and Molecular Biology, The University of Texas, Houston, Texas 77030, USA.
| |
Collapse
|
28
|
Abstract
Proteins in the alpha-lactalbumin and c-type lysozyme family have been studied extensively as model systems in protein folding. Early formation of the alpha-helical domain is observed in both alpha-lactalbumin and c-type lysozyme; however, the details of the kinetic folding pathways are significantly different. The major folding intermediate of hen egg-white lysozyme has a cooperatively formed tertiary structure, whereas the intermediate of alpha-lactalbumin exhibits the characteristics of a molten globule. In this study, we have designed and constructed an isolated alpha-helical domain of hen egg-white lysozyme, called Lyso-alpha, as a model of the lysozyme folding intermediate that is stable at equilibrium. Disulfide-exchange studies show that under native conditions, the cysteine residues in Lyso-alpha prefer to form the same set of disulfide bonds as in the alpha-helical domain of full-length lysozyme. Under denaturing conditions, formation of the nearest-neighbor disulfide bonds is strongly preferred. In contrast to the isolated alpha-helical domain of alpha-lactalbumin, Lyso-alpha with two native disulfide bonds exhibits a well-defined tertiary structure, as indicated by cooperative thermal unfolding and a well-dispersed NMR spectrum. Thus, the determinants for formation of the cooperative side-chain interactions are located mainly in the alpha-helical domain. Our studies suggest that the difference in kinetic folding pathways between alpha-lactalbumin and lysozyme can be explained by the difference in packing density between secondary structural elements and support the hypothesis that the structured regions in a protein folding intermediate may correspond to regions that can fold independently.
Collapse
Affiliation(s)
- P Bai
- Department of Biochemistry, University of Connecticut Health Center, 263 Farmington Avenue, Farmington, CT 06032, USA
| | | |
Collapse
|
29
|
Polverino de Laureto P, Vinante D, Scaramella E, Frare E, Fontana A. Stepwise proteolytic removal of the beta subdomain in alpha-lactalbumin. The protein remains folded and can form the molten globule in acid solution. EUROPEAN JOURNAL OF BIOCHEMISTRY 2001; 268:4324-33. [PMID: 11488928 DOI: 10.1046/j.1432-1327.2001.02352.x] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
Abstract
Bovine alpha-lactalbumin (alpha-LA) is an alpha/beta protein which adopts partly folded states when dissolved at low pH (A-state), by removal of the protein-bound calcium at neutral pH and low salt concentration (apo-state), as well as in aqueous trifluoroethanol. Previous spectroscopic studies have indicated that the A-state of alpha-LA at pH 2.0, considered a prototype molten globule, has a native-like fold in which the helical core is mostly retained, while the beta subdomain is less structured. Here, we investigate the conformational features of three derivatives of alpha-LA characterized by a single peptide bond fission or a deletion of 12 or 19/22 amino-acid residues of the beta subdomain of the native protein (approximately from residue 34 to 57). These alpha-LA derivatives were obtained by limited proteolysis of the protein in its partly folded state(s). A nicked alpha-LA species consisting of fragments 1-,3-40 and 41-123 (nicked-LA) was prepared by thermolytic digestion of the 123-residue chain of alpha-LA in 50% (v/v) aqueous trifluoroethanol. Two truncated or gapped protein species given by fragments 1-40 and 53-123 (desbeta1-LA) or fragments 1-34 and 54-,57-123 (desbeta2-LA) were obtained by digestion of alpha-LA with pepsin in acid or with proteinase K at neutral pH in its apo-state, respectively. The two protein fragments of nicked or gapped alpha-LA are covalently linked by the four disulfide bridges of the native protein. CD measurements revealed that, in aqueous solution at neutral pH and in the presence of calcium, the three protein species maintain the helical secondary structure of intact alpha-LA, while the tertiary structure is strongly affected by the proteolytic cleavages of the chain. Temperature effects of CD signals in the far- and near-UV region reveal a much more labile tertiary structure in the alpha-LA derivatives, while the secondary structure is mostly retained even upon heating. In acid solution at pH 2.0, the three alpha-LA variants adopt a conformational state essentially identical to the molten globule displayed by intact alpha-LA, as demonstrated by CD measurements. Moreover, they bind strongly the fluorescent dye 8-anilinonaphthalene-1-sulfonate, which is considered a diagnostic feature of the molten globule of proteins. Therefore, the beta subdomain can be removed from the alpha-LA molecule without impairing the capability of the rest of the chain to adopt a molten globule state. The results of this protein dissection study provide direct experimental evidence that in the alpha-LA molten globule only the alpha domain is structured.
Collapse
|
30
|
Carey J. A systematic and general proteolytic method for defining structural and functional domains of proteins. Methods Enzymol 2001; 328:499-514. [PMID: 11075363 DOI: 10.1016/s0076-6879(00)28415-2] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Affiliation(s)
- J Carey
- Chemistry Department, Princeton University, New Jersey 08544-1009, USA
| |
Collapse
|
31
|
Fischer KF, Marqusee S. A rapid test for identification of autonomous folding units in proteins. J Mol Biol 2000; 302:701-12. [PMID: 10986128 DOI: 10.1006/jmbi.2000.4049] [Citation(s) in RCA: 39] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The structure of a protein is dictated by a large number of weak interactions that cooperatively stabilize the native state. Usually, excised fragments smaller than a domain have little if any residual structure. When autonomous units of structure are found within domains, this challenges common assumptions about the cooperativity of protein structure. Such autonomous folding units (AFUs) are of wide interest and have applications in protein engineering and as simple model systems for studying the determinants of stability and specificity. A new method of identifying AFUs within proteins is presented here. The rapid autonomous fragment test (RAFT) identifies AFUs based on analysis of inter-residue contacts present in the three-dimensional structure of a protein. RAFT is fast enough to mine the entire PDB for AFUs and provide a library of potential small stable folds. We show that RAFT is able to predict whether a protein fragment will be structured if isolated from its parent domain.
Collapse
Affiliation(s)
- K F Fischer
- Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720-3206, USA
| | | |
Collapse
|
32
|
Abstract
The ankyrin repeat is an abundant, 33 residue sequence motif that forms a consecutive beta-hairpin-helix-loop-helix (beta(2)alpha(2)) fold. Most ankyrin repeat proteins consist of four or more complete repeats, which provide stabilizing interactions between adjacent modules. The cyclin-dependent kinase inhibitor and tumor suppressor p16(INK4) (p16) is one of the smallest ankyrin repeat proteins with a known structure. It consists of four complete repeats plus short N and C-terminal flanking regions that are unstructured in solution. On the basis of preliminary proteolysis studies and predictions using a computer algorithm for identifying autonomous folding units, we have identified a fragment consisting of the third and fourth ankyrin repeats of p16, called p16C, that can fold independently, without the rest of the protein. Far-UV circular dichroism studies showed that p16C has a significant level of alpha-helical secondary structure, and two proline substitutions that disrupt the alpha-helical secondary structure in wild-type p16 disrupt the secondary structure in p16C. The thermal denaturation of p16C is cooperative and reversible, with a midpoint of transition at 30. 5(+/-1) degrees C. From urea-induced denaturation studies, the free energy of unfolding for p16C was estimated to be 1.7(+/-0.3) kcal/mol at 20 degrees C. (1)H-(15)N 2D NMR studies suggest that the ankyrin repeats in p16C are likely to fold into a structure similar to that of full-length p16. In order to define the minimum autonomous folding unit in p16, we have further dissected p16C into two complementary peptides, each containing a single ankyrin repeat. These peptides are unstructured in solution. Thus, p16C is the smallest ankyrin repeat module that is known to fold independently and, in general, we believe that the two-ankyrin repeat fold could be the minimum structural unit for all ankyrin repeat proteins. We further discuss the significance of p16C in protein folding and engineering.
Collapse
Affiliation(s)
- B Zhang
- Department of Biochemistry, University of Connecticut Health Center, 263 Farmington Avenue, Farmington, CT, 06032, USA
| | | |
Collapse
|
33
|
Wallqvist A, Lavoie TA, Chanatry JA, Covell DG, Carey J. Cooperative folding units of escherichia coli tryptophan repressor. Biophys J 1999; 77:1619-26. [PMID: 10465773 PMCID: PMC1300450 DOI: 10.1016/s0006-3495(99)77010-4] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
A previously published computational procedure was used to identify cooperative folding units within tryptophan repressor. The theoretical results predict the existence of distinct stable substructures in the protein chain for the monomer and the dimer. The predictions were compared with experimental data on structure and folding of the repressor and its proteolytic fragments and show excellent agreement for the dimeric form of the protein. The results suggest that the monomer, the structure of which is currently unknown, is likely to have a structure different from the one it has within the context of the highly intertwined dimer. Application of this method to the repressor monomer represents an extension of the computations into the realm of evaluating hypothetical structures such as those produced by threading.
Collapse
Affiliation(s)
- A Wallqvist
- Frederick Cancer Research and Development Center, National Cancer Institute, Science Applications International Corporation, Frederick, Maryland 21702 USA
| | | | | | | | | |
Collapse
|