1
|
Selection among site-dependent structurally constrained substitution models of protein evolution by approximate Bayesian computation. Bioinformatics 2024; 40:btae096. [PMID: 38374231 PMCID: PMC10914458 DOI: 10.1093/bioinformatics/btae096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2023] [Revised: 01/15/2024] [Accepted: 02/16/2024] [Indexed: 02/21/2024] Open
Abstract
MOTIVATION The selection among substitution models of molecular evolution is fundamental for obtaining accurate phylogenetic inferences. At the protein level, evolutionary analyses are traditionally based on empirical substitution models but these models make unrealistic assumptions and are being surpassed by structurally constrained substitution (SCS) models. The SCS models often consider site-dependent evolution, a process that provides realism but complicates their implementation into likelihood functions that are commonly used for substitution model selection. RESULTS We present a method to perform selection among site-dependent SCS models, also among empirical and site-dependent SCS models, based on the approximate Bayesian computation (ABC) approach and its implementation into the computational framework ProteinModelerABC. The framework implements ABC with and without regression adjustments and includes diverse empirical and site-dependent SCS models of protein evolution. Using extensive simulated data, we found that it provides selection among SCS and empirical models with acceptable accuracy. As illustrative examples, we applied the framework to analyze a variety of protein families observing that SCS models fit them better than the corresponding best-fitting empirical substitution models. AVAILABILITY AND IMPLEMENTATION ProteinModelerABC is freely available from https://github.com/DavidFerreiro/ProteinModelerABC, can run in parallel and includes a graphical user interface. The framework is distributed with detailed documentation and ready-to-use examples.
Collapse
|
2
|
Substitution Models of Protein Evolution with Selection on Enzymatic Activity. Mol Biol Evol 2024; 41:msae026. [PMID: 38314876 PMCID: PMC10873502 DOI: 10.1093/molbev/msae026] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2023] [Revised: 01/25/2024] [Accepted: 01/31/2024] [Indexed: 02/07/2024] Open
Abstract
Substitution models of evolution are necessary for diverse evolutionary analyses including phylogenetic tree and ancestral sequence reconstructions. At the protein level, empirical substitution models are traditionally used due to their simplicity, but they ignore the variability of substitution patterns among protein sites. Next, in order to improve the realism of the modeling of protein evolution, a series of structurally constrained substitution models were presented, but still they usually ignore constraints on the protein activity. Here, we present a substitution model of protein evolution with selection on both protein structure and enzymatic activity, and that can be applied to phylogenetics. In particular, the model considers the binding affinity of the enzyme-substrate complex as well as structural constraints that include the flexibility of structural flaps, hydrogen bonds, amino acids backbone radius of gyration, and solvent-accessible surface area that are quantified through molecular dynamics simulations. We applied the model to the HIV-1 protease and evaluated it by phylogenetic likelihood in comparison with the best-fitting empirical substitution model and a structurally constrained substitution model that ignores the enzymatic activity. We found that accounting for selection on the protein activity improves the fitting of the modeled functional regions with the real observations, especially in data with high molecular identity, which recommends considering constraints on the protein activity in the development of substitution models of evolution.
Collapse
|
3
|
Continuous evolution of user-defined genes at 1-million-times the genomic mutation rate. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.13.566922. [PMID: 38014077 PMCID: PMC10680746 DOI: 10.1101/2023.11.13.566922] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
When nature maintains or evolves a gene's function over millions of years at scale, it produces a diversity of homologous sequences whose patterns of conservation and change contain rich structural, functional, and historical information about the gene. However, natural gene diversity likely excludes vast regions of functional sequence space and includes phylogenetic and evolutionary eccentricities, limiting what information we can extract. We introduce an accessible experimental approach for compressing long-term gene evolution to laboratory timescales, allowing for the direct observation of extensive adaptation and divergence followed by inference of structural, functional, and environmental constraints for any selectable gene. To enable this approach, we developed a new orthogonal DNA replication (OrthoRep) system that durably hypermutates chosen genes at a rate of >10 -4 substitutions per base in vivo . When OrthoRep was used to evolve a conditionally essential maladapted enzyme, we obtained thousands of unique multi-mutation sequences with many pairs >60 amino acids apart (>15% divergence), revealing known and new factors influencing enzyme adaptation. The fitness of evolved sequences was not predictable by advanced machine learning models trained on natural variation. We suggest that OrthoRep supports the prospective and systematic discovery of constraints shaping gene evolution, uncovering of new regions in fitness landscapes, and general applications in biomolecular engineering.
Collapse
|
4
|
Function of the Conserved Non-Functional Residues in Apomyoglobin - to Determine and to Preserve Correct Topology of the Protein. BIOCHEMISTRY. BIOKHIMIIA 2023; 88:1905-1909. [PMID: 38105207 DOI: 10.1134/s0006297923110184] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 07/18/2023] [Accepted: 08/31/2023] [Indexed: 12/19/2023]
Abstract
In this paper the answer to O. B. Ptitsyn's question "What is the role of conserved non-functional residues in apomyoglobin" is presented, which is based on the research results of three laboratories. The role of conserved non-functional apomyoglobin residues in formation of native topology in the molten globule state of this protein is revealed. This fact allows suggesting that the conserved non-functional residues in this protein are indispensable for fixation and maintaining main elements of the correct topology of its secondary structure in the intermediate state. The correct topology is a native element in the intermediate state of the protein.
Collapse
|
5
|
Exploring the aggregation of amyloid-β 42 through Monte Carlo simulations. Biophys Chem 2023; 297:107011. [PMID: 37037120 DOI: 10.1016/j.bpc.2023.107011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 03/25/2023] [Accepted: 03/26/2023] [Indexed: 04/09/2023]
Abstract
Coarse-grained Monte Carlo simulations are performed for a disordered protein, amyloid-β 42 to identify the interactions and understand the mechanism of its aggregation. A statistical potential is developed from a selected dataset of intrinsically disordered proteins, which accounts for the respective contributions of the bonded and non-bonded potentials. While, the bonded potential comprises the bond, bend, and dihedral constraints, the nonbonded interactions include van der Waals interactions, hydrogen bonds, and the two-body potential. The two-body potential captures the features of both hydrophobic and electrostatic interactions that brings the chains at a contact distance, while the repulsive van der Waals interactions prevent them from a collapse. Increased two-body hydrophobic interactions facilitate the formation of amorphous aggregates rather than the fibrillar ones. The formation of aggregates is validated from the interchain distances, and the total energy of the system. The aggregate is structurally characterized by the root-mean-square deviation, root-mean-square fluctuation and the radius of gyration. The aggregates are characterized by a decrease in SASA, an increase in the non-local interactions and a distinct free energy minimum relative to that of the monomeric state of amyloid-β 42. The hydrophobic residues help in nucleation, while the charged residues help in oligomerization and aggregation.
Collapse
|
6
|
Evolutionary conservation of amino acids contributing to the protein folding transition state. J Comput Chem 2023; 44:1002-1009. [PMID: 36571461 DOI: 10.1002/jcc.27060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 11/22/2022] [Accepted: 12/06/2022] [Indexed: 12/27/2022]
Abstract
The question of whether amino acids critical to protein folding kinetics are evolutionarily conserved has been investigated intensively in the past, but no consensus has yet been reached. Recently, we have demonstrated that the transition state, dictating folding kinetics, is characterized as the state of maximum dynamic cooperativity, i.e., the state of maximum correlations between amino acid contact formations. Here, we investigate the evolutionary conservation of those amino acids contributing significantly to the dynamic cooperativity. We find a strong indication of a new kind of relationship-necessary but not sufficient causality-between the evolutionary conservation and the dynamic cooperativity: larger contributions to the dynamic cooperativity arise from more conserved residues, but not vice versa. This holds for all the protein systems for which long folding simulation trajectories are available. To our knowledge, this is the first systematic demonstration of any kind of evolutionary conservation of amino acids relevant to folding kinetics.
Collapse
|
7
|
Folding and Evolution of a Repeat Protein on the Ribosome. Front Mol Biosci 2022; 9:851038. [PMID: 35707224 PMCID: PMC9189291 DOI: 10.3389/fmolb.2022.851038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Accepted: 04/27/2022] [Indexed: 12/04/2022] Open
Abstract
Life on earth is the result of the work of proteins, the cellular nanomachines that fold into elaborated 3D structures to perform their functions. The ribosome synthesizes all the proteins of the biosphere, and many of them begin to fold during translation in a process known as cotranslational folding. In this work we discuss current advances of this field and provide computational and experimental data that highlight the role of ribosome in the evolution of protein structures. First, we used the sequence of the Ankyrin domain from the Drosophila Notch receptor to launch a deep sequence-based search. With this strategy, we found a conserved 33-residue motif shared by different protein folds. Then, to see how the vectorial addition of the motif would generate a full structure we measured the folding on the ribosome of the Ankyrin repeat protein. Not only the on-ribosome folding data is in full agreement with classical in vitro biophysical measurements but also it provides experimental evidence on how folded proteins could have evolved by duplication and fusion of smaller fragments in the RNA world. Overall, we discuss how the ribosomal exit tunnel could be conceptualized as an active site that is under evolutionary pressure to influence protein folding.
Collapse
|
8
|
Double Mutant of Chymotrypsin Inhibitor 2 Stabilized through Increased Conformational Entropy. Biochemistry 2022; 61:160-170. [PMID: 35019273 DOI: 10.1021/acs.biochem.1c00749] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The conformational heterogeneity of a folded protein can affect not only its function but also stability and folding. We recently discovered and characterized a stabilized double mutant (L49I/I57V) of the protein CI2 and showed that state-of-the-art prediction methods could not predict the increased stability relative to the wild-type protein. Here, we have examined whether changed native-state dynamics, and resulting entropy changes, can explain the stability changes in the double mutant protein, as well as the two single mutant forms. We have combined NMR relaxation measurements of the ps-ns dynamics of amide groups in the backbone and the methyl groups in the side chains with molecular dynamics simulations to quantify the native-state dynamics. The NMR experiments reveal that the mutations have different effects on the conformational flexibility of CI2: a reduction in conformational dynamics (and entropy estimated from this) of the native state of the L49I variant correlates with its decreased stability, while increased dynamics of the I57V and L49I/I57V variants correlates with their increased stability. These findings suggest that explicitly accounting for changes in native-state entropy might be needed to improve the predictions of the effect of mutations on protein stability.
Collapse
|
9
|
Impact of Deleterious Mutations on Structure, Function and Stability of Serum/Glucocorticoid Regulated Kinase 1: A Gene to Diseases Correlation. Front Mol Biosci 2021; 8:780284. [PMID: 34805284 PMCID: PMC8597711 DOI: 10.3389/fmolb.2021.780284] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2021] [Accepted: 10/19/2021] [Indexed: 11/22/2022] Open
Abstract
Serum and glucocorticoid-regulated kinase 1 (SGK1) is a Ser/Thr protein kinase involved in regulating cell survival, growth, proliferation, and migration. Its elevated expression and dysfunction are reported in breast, prostate, hepatocellular, lung adenoma, and renal carcinomas. We have analyzed the SGK1 mutations to explore their impact at the sequence and structure level by utilizing state-of-the-art computational approaches. Several pathogenic and destabilizing mutations were identified based on their impact on SGK1 and analyzed in detail. Three amino acid substitutions, K127M, T256A, and Y298A, in the kinase domain of SGK1 were identified and incorporated structurally into original coordinates of SGK1 to explore their time evolution impact using all-atom molecular dynamic (MD) simulations for 200 ns. MD results indicate substantial conformational alterations in SGK1, thus its functional loss, particularly upon T256A mutation. This study provides meaningful insights into SGK1 dysfunction upon mutation, leading to disease progression, including cancer, and neurodegeneration.
Collapse
|
10
|
The local topological free energy of proteins. J Theor Biol 2021; 529:110854. [PMID: 34358536 DOI: 10.1016/j.jtbi.2021.110854] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 07/27/2021] [Accepted: 07/29/2021] [Indexed: 11/16/2022]
Abstract
Protein folding, the process by which proteins attain a 3-dimensional conformation necessary for their function, remains an important unsolved problem in biology. A major gap in our understanding is how local properties of proteins relate to their global properties. In this manuscript, we use the Writhe and Torsion to introduce a new local topological/geometrical free energy that can be associated to 4 consecutive amino acids along the protein backbone. By analyzing a culled protein dataset from the PDB, our results show that high local topological free energy conformations are independent of sequence and may be involved in the rate limiting step in protein folding. By analyzing a set of 2-state single domain proteins, we find that the total local topological free energy of these proteins correlates with the experimentally observed folding rates reported in Plaxco et al. (2000).
Collapse
|
11
|
Helix-Coil Transition at a Glycine Following a Nascent α-Helix: A Synergetic Guidance Mechanism for Helix Growth. J Phys Chem A 2020; 124:7478-7490. [PMID: 32877193 DOI: 10.1021/acs.jpca.0c05489] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
A detailed understanding of forces guiding the rapid folding of a polypeptide from an apparently random coil state to an ordered α-helical structure following the rate-limiting preorganization of the initial three residue backbones into helical conformation is imperative to comprehending and regulating protein folding and for the rational design of biological mimetics. However, several details of this process are still unknown. First, although the helix-coil transition was proposed to originate at the residue level (J. Chem. Phys. 1959, 31, 526-535; J. Chem. Phys. 1961, 34, 1963-1974), all helix-folding studies have only established it between time-averaged bulk states of a long-lived helix and several transiently populated random coils, along the whole helix model sequence. Second, the predominant thermodynamic forces driving either this two-state transition or the faster helix growth following helix nucleation are still unclear. Third, the conformational space of the random coil state is not well-defined unlike its corresponding α-helix. Here we investigate the restrictions placed on the conformational space of a Gly residue backbone, as a result of it immediately succeeding a nascent α-helical turn. Analyses of the temperature-dependent 1D-, 2D-NMR, FT-IR, and CD spectra and GROMACS MD simulation trajectory of a Gly residue backbone following a model α-helical turn, which is artificially rigidified by a covalent hydrogen bond surrogate, reveal that: (i) the α-helical turn guides the ϕ torsion of the Gly exclusively into either a predominantly populated entropically favored α-helical (α-ϕ) state or a scarcely populated random coil (RC-ϕ) state; (ii) the α-ϕ state of Gly in turn favors the stability of the preceding α-helical turn, while the RC-ϕ state disrupts it, revealing an entropy-driven synergetic guidance for helix growth in the residue following helix nucleation. The applicability of a current synergetic guidance mechanism to explain rapid helix growth in folded and unfolded states of proteins and helical peptides is discussed.
Collapse
|
12
|
α-Lactalbumin, Amazing Calcium-Binding Protein. Biomolecules 2020; 10:biom10091210. [PMID: 32825311 PMCID: PMC7565966 DOI: 10.3390/biom10091210] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2020] [Revised: 08/14/2020] [Accepted: 08/17/2020] [Indexed: 02/06/2023] Open
Abstract
α-Lactalbumin (α-LA) is a small (Mr 14,200), acidic (pI 4–5), Ca2+-binding protein. α-LA is a regulatory component of lactose synthase enzyme system functioning in the lactating mammary gland. The protein possesses a single strong Ca2+-binding site, which can also bind Mg2+, Mn2+, Na+, K+, and some other metal cations. It contains several distinct Zn2+-binding sites. Physical properties of α-LA strongly depend on the occupation of its metal binding sites by metal ions. In the absence of bound metal ions, α-LA is in the molten globule-like state. The binding of metal ions, and especially of Ca2+, increases stability of α-LA against the action of heat, various denaturing agents and proteases, while the binding of Zn2+ to the Ca2+-loaded protein decreases its stability and causes its aggregation. At pH 2, the protein is in the classical molten globule state. α-LA can associate with membranes at neutral or slightly acidic pH at physiological temperatures. Depending on external conditions, α-LA can form amyloid fibrils, amorphous aggregates, nanoparticles, and nanotubes. Some of these aggregated states of α-LA can be used in practical applications such as drug delivery to tissues and organs. α-LA and some of its fragments possess bactericidal and antiviral activities. Complexes of partially unfolded α-LA with oleic acid are cytotoxic to various tumor and bacterial cells. α-LA in the cytotoxic complexes plays a role of a delivery carrier of cytotoxic fatty acid molecules into tumor and bacterial cells across the cell membrane. Perhaps in the future the complexes of α-LA with oleic acid will be used for development of new anti-cancer drugs.
Collapse
|
13
|
Experimentally-driven protein structure modeling. J Proteomics 2020; 220:103777. [PMID: 32268219 PMCID: PMC7214187 DOI: 10.1016/j.jprot.2020.103777] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2019] [Revised: 03/17/2020] [Accepted: 04/02/2020] [Indexed: 11/25/2022]
Abstract
Revolutions in natural and exact sciences started at the dawn of last century have led to the explosion of theoretical, experimental, and computational approaches to determine structures of molecules, complexes, as well as their rich conformational dynamics. Since different experimental methods produce information that is attributed to specific time and length scales, corresponding computational methods have to be tailored to these scales and experiments. These methods can be then combined and integrated in scales, hence producing a fuller picture of molecular structure and motion from the "puzzle pieces" offered by various experiments. Here, we describe a number of computational approaches to utilize experimental data to glance into structure of proteins and understand their dynamics. We will also discuss the limitations and the resolution of the constraints-based modeling approaches. SIGNIFICANCE: Experimentally-driven computational structure modeling and determination is a rapidly evolving alternative to traditional approaches for molecular structure determination. These new hybrid experimental-computational approaches are proving to be a powerful microscope to glance into the structural features of intrinsically or partially disordered proteins, dynamics of molecules and complexes. In this review, we describe various approaches in the field of experimentally-driven computational structure modeling.
Collapse
|
14
|
funtrp: identifying protein positions for variation driven functional tuning. Nucleic Acids Res 2020; 47:e142. [PMID: 31584091 PMCID: PMC6868392 DOI: 10.1093/nar/gkz818] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 09/05/2019] [Accepted: 09/12/2019] [Indexed: 12/12/2022] Open
Abstract
Evaluating the impact of non-synonymous genetic variants is essential for uncovering disease associations and mechanisms of evolution. An in-depth understanding of sequence changes is also fundamental for synthetic protein design and stability assessments. However, the variant effect predictor performance gain observed in recent years has not kept up with the increased complexity of new methods. One likely reason for this might be that most approaches use similar sets of gene and protein features for modeling variant effects, often emphasizing sequence conservation. While high levels of conservation highlight residues essential for protein activity, much of the variation observable in vivo is arguably weaker in its impact, thus requiring evaluation at a higher level of resolution. Here, we describe functionNeutral/Toggle/Rheostatpredictor (funtrp), a novel computational method that categorizes protein positions based on the position-specific expected range of mutational impacts: Neutral (weak/no effects), Rheostat (function-tuning positions), or Toggle (on/off switches). We show that position types do not correlate strongly with familiar protein features such as conservation or protein disorder. We also find that position type distribution varies across different protein functions. Finally, we demonstrate that position types can improve performance of existing variant effect predictors and suggest a way forward for the development of new ones.
Collapse
|
15
|
Abstract
Few models of sequence evolution incorporate parameters describing protein structure, despite its high conservation, essential functional role and increasing availability. We present a structurally aware empirical substitution model for amino acid sequence evolution in which proteins are expressed using an expanded alphabet that relays both amino acid identity and structural information. Each character specifies an amino acid as well as information about the rotamer configuration of its side-chain: the discrete geometric pattern of permitted side-chain atomic positions, as defined by the dihedral angles between covalently linked atoms. By assigning rotamer states in 251,194 protein structures and identifying 4,508,390 substitutions between closely related sequences, we generate a 55-state “Dayhoff-like” model that shows that the evolutionary properties of amino acids depend strongly upon side-chain geometry. The model performs as well as or better than traditional 20-state models for divergence time estimation, tree inference, and ancestral state reconstruction. We conclude that not only is rotamer configuration a valuable source of information for phylogenetic studies, but that modeling the concomitant evolution of sequence and structure may have important implications for understanding protein folding and function.
Collapse
|
16
|
Folding of the Ig-Like Domain of the Dengue Virus Envelope Protein Analyzed by High-Hydrostatic-Pressure NMR at a Residue-Level Resolution. Biomolecules 2019; 9:biom9080309. [PMID: 31357538 PMCID: PMC6723665 DOI: 10.3390/biom9080309] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2019] [Revised: 07/17/2019] [Accepted: 07/24/2019] [Indexed: 12/15/2022] Open
Abstract
Dengue fever is a mosquito-borne endemic disease in tropical and subtropical regions, causing a significant public health problem in Southeast Asia. Domain III (ED3) of the viral envelope protein contains the two dominant putative epitopes and part of the heparin sulfate receptor binding region that drives the dengue virus (DENV)’s fusion with the host cell. Here, we used high-hydrostatic-pressure nuclear magnetic resonance (HHP-NMR) to obtain residue-specific information on the folding process of domain III from serotype 4 dengue virus (DEN4-ED3), which adopts the classical three-dimensional (3D) ß-sandwich structure known as the Ig-like fold. Interestingly, the folding pathway of DEN4-ED3 shares similarities with that of the Titin I27 module, which also adopts an Ig-like fold, but is functionally unrelated to ED3. For both proteins, the unfolding process starts by the disruption of the N- and C-terminal strands on one edge of the ß-sandwich, yielding a folding intermediate stable over a substantial pressure range (from 600 to 1000 bar). In contrast to this similarity, pressure-jump kinetics indicated that the folding transition state is considerably more hydrated in DEN4-ED3 than in Titin I27.
Collapse
|
17
|
Complex Folding Landscape of Apomyoglobin at Acidic pH Revealed by Ultrafast Kinetic Analysis of Core Mutants. J Phys Chem B 2018; 122:11228-11239. [PMID: 30133301 DOI: 10.1021/acs.jpcb.8b06895] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Under mildly acidic conditions (pH 4-4.5) apomyoglobin (apoMb) adopts a partially structured equilibrium state ( M-state) that structurally resembles a kinetic intermediate encountered at a late stage of folding to the native structure at neutral pH. We have previously reported that the M-state is formed rapidly (<1 ms) via a multistate process and thus offers a unique opportunity for exploring early stages of folding by both experimental and computational techniques. In order to gain structural insight into intermediates and barriers at the residue level, we studied the folding/unfolding kinetics of 12 apoMb mutants at pH 4.2 using fluorescence-detected ultrafast mixing techniques. Global analysis of the submillisecond folding/unfolding kinetics vs urea concentration for each variant, based on a sequential four-state mechanism ( U ⇔ I ⇔ L ⇔ M), allowed us to determine elementary rate constants and their dependence on urea concentration for most transitions. Comparison of the free energy diagrams constructed from the kinetic data of the mutants with that of wild-type apoMb yielded quantitative information on the effects of mutations on the free energy (ΔΔ G) of both intermediates and the first two kinetic barriers encountered during folding. Truncation of conserved aliphatic side chains on helices A, G, and H gives rise to a stepwise increase in ΔΔ G as the protein advances from U toward M, consistent with progressive stabilization of native-like contacts within the primary core of apoMb. Helix-helix contacts in the primary core contribute little to the first folding barrier ( U ⇔ I) and thus are not required for folding initiation but are critical for the stability of the late intermediate, L, and the M-state. Alanine substitution of hydrophobic residues at more peripheral helix-helix contact sites of the native structure, which are still absent or unstable in the M-state, shows both positive (destabilizing) and negative (stabilizing) ΔΔ G, indicating that non-native contacts are formed initially and weakened or lost as a result of subsequent structural rearrangement steps.
Collapse
|
18
|
A method for partitioning the information contained in a protein sequence between its structure and function. Proteins 2018; 86:956-964. [PMID: 29790601 DOI: 10.1002/prot.25527] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2018] [Revised: 04/27/2018] [Accepted: 05/14/2018] [Indexed: 11/11/2022]
Abstract
Proteins employ the information stored in the genetic code and translated into their sequences to carry out well-defined functions in the cellular environment. The possibility to encode for such functions is controlled by the balance between the amount of information supplied by the sequence and that left after that the protein has folded into its structure. We study the amount of information necessary to specify the protein structure, providing an estimate that keeps into account the thermodynamic properties of protein folding. We thus show that the information remaining in the protein sequence after encoding for its structure (the 'information gap') is very close to what needed to encode for its function and interactions. Then, by predicting the information gap directly from the protein sequence, we show that it may be possible to use these insights from information theory to discriminate between ordered and disordered proteins, to identify unknown functions, and to optimize artificially-designed protein sequences.
Collapse
|
19
|
Kinetic and thermodynamic studies reveal chemokine homologues CC11 and CC24 with an almost identical tertiary structure have different folding pathways. BMC BIOPHYSICS 2017; 10:7. [PMID: 28919974 PMCID: PMC5596964 DOI: 10.1186/s13628-017-0039-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/31/2017] [Accepted: 09/06/2017] [Indexed: 11/10/2022]
Abstract
BACKGROUND Proteins with low sequence identity but almost identical tertiary structure and function have been valuable to uncover the relationship between sequence, tertiary structure, folding mechanism and functions. Two homologous chemokines, CCL11 and CCL24, with low sequence identity but similar tertiary structure and function, provide an excellent model system for respective studies. RESULTS The kinetics and thermodynamics of the two homologous chemokines were systematically characterized. Despite their similar tertiary structures, CCL11 and CCL24 show different thermodynamic stability in guanidine hydrochloride titration, with D50% = 2.20 M and 4.96 M, respectively. The kinetics curves clearly show two phases in the folding/unfolding processes of both CCL11 and CCL24, which suggests the existence of an intermediate state in their folding/unfolding processes. The folding pathway of both CCL11 and CCL24 could be well described using a folding model with an on-pathway folding intermediate. However, the folding kinetics and stability of the intermediate state of CCL11 and CCL24 are obviously different. CONCLUSION Our results suggest homologous proteins with low sequence identity can display almost identical tertiary structure, but very different folding mechanisms, which applies to homologues in the chemokine protein family, extending the general applicability of the above observation.
Collapse
|
20
|
Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence. Annu Rev Biophys 2017; 46:85-103. [PMID: 28301766 DOI: 10.1146/annurev-biophys-070816-033819] [Citation(s) in RCA: 68] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
For decades, rates of protein evolution have been interpreted in terms of the vague concept of functional importance. Slowly evolving proteins or sites within proteins were assumed to be more functionally important and thus subject to stronger selection pressure. More recently, biophysical models of protein evolution, which combine evolutionary theory with protein biophysics, have completely revolutionized our view of the forces that shape sequence divergence. Slowly evolving proteins have been found to evolve slowly because of selection against toxic misfolding and misinteractions, linking their rate of evolution primarily to their abundance. Similarly, most slowly evolving sites in proteins are not directly involved in function, but mutating these sites has a large impact on protein structure and stability. In this article, we review the studies in the emerging field of biophysical protein evolution that have shaped our current understanding of sequence divergence patterns. We also propose future research directions to develop this nascent field.
Collapse
|
21
|
Sequence-, structure-, and dynamics-based comparisons of structurally homologous CheY-like proteins. Proc Natl Acad Sci U S A 2017; 114:1578-1583. [PMID: 28143938 DOI: 10.1073/pnas.1621344114] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We recently introduced a physically based approach to sequence comparison, the property factor method (PFM). In the present work, we apply the PFM approach to the study of a challenging set of sequences-the bacterial chemotaxis protein CheY, the N-terminal receiver domain of the nitrogen regulation protein NT-NtrC, and the sporulation response regulator Spo0F. These are all response regulators involved in signal transduction. Despite functional similarity and structural homology, they exhibit low sequence identity. PFM sequence comparison demonstrates a statistically significant qualitative difference between the sequence of CheY and those of the other two proteins that is not found using conventional alignment methods. This difference is shown to be consonant with structural characteristics, using distance matrix comparisons. We also demonstrate that residues participating strongly in native contacts during unfolding are distributed differently in CheY than in the other two proteins. The PFM result is also in accord with dynamic simulation results of several types. Molecular dynamics simulations of all three proteins were carried out at several temperatures, and it is shown that the dynamics of CheY are predicted to differ from those of NT-NtrC and Spo0F. The predicted dynamic properties of the three proteins are in good agreement with experimentally determined B factors and with fluctuations predicted by the Gaussian network model. We pinpoint the differences between the PFM and traditional sequence comparisons and discuss the informatic basis for the ability of the PFM approach to detect physical differences between these sequences that are not apparent from traditional alignment-based comparison.
Collapse
|
22
|
Early Folding Events, Local Interactions, and Conservation of Protein Backbone Rigidity. Biophys J 2017; 110:572-583. [PMID: 26840723 DOI: 10.1016/j.bpj.2015.12.028] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 12/21/2015] [Accepted: 12/29/2015] [Indexed: 01/20/2023] Open
Abstract
Protein folding is in its early stages largely determined by the protein sequence and complex local interactions between amino acids, resulting in lower energy conformations that provide the context for further folding into the native state. We compiled a comprehensive data set of early folding residues based on pulsed labeling hydrogen deuterium exchange experiments. These early folding residues have corresponding higher backbone rigidity as predicted by DynaMine from sequence, an effect also present when accounting for the secondary structures in the folded protein. We then show that the amino acids involved in early folding events are not more conserved than others, but rather, early folding fragments and the secondary structure elements they are part of show a clear trend toward conserving a rigid backbone. We therefore propose that backbone rigidity is a fundamental physical feature conserved by proteins that can provide important insights into their folding mechanisms and stability.
Collapse
|
23
|
Fold and flexibility: what can proteins' mechanical properties tell us about their folding nucleus? J R Soc Interface 2016; 12:rsif.2015.0876. [PMID: 26577596 DOI: 10.1098/rsif.2015.0876] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023] Open
Abstract
The determination of a protein's folding nucleus, i.e. a set of native contacts playing an important role during its folding process, remains an elusive yet essential problem in biochemistry. In this work, we investigate the mechanical properties of 70 protein structures belonging to 14 protein families presenting various folds using coarse-grain Brownian dynamics simulations. The resulting rigidity profiles combined with multiple sequence alignments show that a limited set of rigid residues, which we call the consensus nucleus, occupy conserved positions along the protein sequence. These residues' side chains form a tight interaction network within the protein's core, thus making our consensus nuclei potential folding nuclei. A review of experimental and theoretical literature shows that most (above 80%) of these residues were indeed identified as folding nucleus member in earlier studies.
Collapse
|
24
|
Benchmarking Inverse Statistical Approaches for Protein Structure and Design with Exactly Solvable Models. PLoS Comput Biol 2016; 12:e1004889. [PMID: 27177270 PMCID: PMC4866778 DOI: 10.1371/journal.pcbi.1004889] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2015] [Accepted: 03/30/2016] [Indexed: 12/05/2022] Open
Abstract
Inverse statistical approaches to determine protein structure and function from Multiple Sequence Alignments (MSA) are emerging as powerful tools in computational biology. However the underlying assumptions of the relationship between the inferred effective Potts Hamiltonian and real protein structure and energetics remain untested so far. Here we use lattice protein model (LP) to benchmark those inverse statistical approaches. We build MSA of highly stable sequences in target LP structures, and infer the effective pairwise Potts Hamiltonians from those MSA. We find that inferred Potts Hamiltonians reproduce many important aspects of ‘true’ LP structures and energetics. Careful analysis reveals that effective pairwise couplings in inferred Potts Hamiltonians depend not only on the energetics of the native structure but also on competing folds; in particular, the coupling values reflect both positive design (stabilization of native conformation) and negative design (destabilization of competing folds). In addition to providing detailed structural information, the inferred Potts models used as protein Hamiltonian for design of new sequences are able to generate with high probability completely new sequences with the desired folds, which is not possible using independent-site models. Those are remarkable results as the effective LP Hamiltonians used to generate MSA are not simple pairwise models due to the competition between the folds. Our findings elucidate the reasons for the success of inverse approaches to the modelling of proteins from sequence data, and their limitations. Inverse statistical approaches, modeling pairwise correlations between amino acids in the sequences of homologous proteins across many different organisms, can successfully extract protein structure (contact) information. Here, we benchmark those statistical approaches on exactly solvable models of proteins, folding on a 3D lattice, to assess the reasons underlying their success and their limitations. We show that the inferred parameters (effective pairwise interactions) of the statistical models have clear and quantitative interpretations in terms of positive (favoring the native fold) and negative (disfavoring competing folds) protein sequence design. New sequences randomly drawn from the statistical models are likely to fold into the native structures when effective pairwise interactions are accurately inferred, a performance which cannot be achieved with independent-site models.
Collapse
|
25
|
Functional Sites Induce Long-Range Evolutionary Constraints in Enzymes. PLoS Biol 2016; 14:e1002452. [PMID: 27138088 PMCID: PMC4854464 DOI: 10.1371/journal.pbio.1002452] [Citation(s) in RCA: 80] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2015] [Accepted: 04/04/2016] [Indexed: 12/26/2022] Open
Abstract
Functional residues in proteins tend to be highly conserved over evolutionary time. However, to what extent functional sites impose evolutionary constraints on nearby or even more distant residues is not known. Here, we report pervasive conservation gradients toward catalytic residues in a dataset of 524 distinct enzymes: evolutionary conservation decreases approximately linearly with increasing distance to the nearest catalytic residue in the protein structure. This trend encompasses, on average, 80% of the residues in any enzyme, and it is independent of known structural constraints on protein evolution such as residue packing or solvent accessibility. Further, the trend exists in both monomeric and multimeric enzymes and irrespective of enzyme size and/or location of the active site in the enzyme structure. By contrast, sites in protein–protein interfaces, unlike catalytic residues, are only weakly conserved and induce only minor rate gradients. In aggregate, these observations show that functional sites, and in particular catalytic residues, induce long-range evolutionary constraints in enzymes. Catalytic sites in enzymes are highly conserved, but do they affect the evolutionary conservation of neighboring sites? This study shows that not just nearby neighbors but also second, third, fourth, and even fifth neighbors of a catalytic residue experience evolutionary constraint compared to a random site. The basic biochemical functions of life are carried out by large molecules called enzymes. Enzymes consist of long chains of amino acids folded into a three-dimensional structure. Within that structure, a specific cluster of amino acids, known as the active site, performs the biochemical function. Substituting one amino acid for another in the active site typically results in a defective, non-functional enzyme, and therefore mutations at or near enzyme active sites are often lethal. Moreover, even mutations far from the active site have been found to disrupt function. Nonetheless, as organisms evolve, enzymes accumulate random mutations. Where in enzymes’ structures do these mutations accumulate without causing harm? Here, we observe evidence for extensive interactions between active sites and distant regions of the enzyme structure, in a comprehensive set of over 500 enzymes. We show that active sites tightly control the substitutions that an enzyme can tolerate. This control extends far beyond regions of the enzyme immediately adjacent to the active site, covering over 80% of a typical enzyme structure. Our findings have broad implications for molecular evolution, for enzyme engineering, and for the computational prediction of active-site locations in novel enzymes.
Collapse
|
26
|
Abstract
Allosteric transition, defined as conformational changes induced by ligand binding, is one of the fundamental properties of proteins. Allostery has been observed and characterized in many proteins, and has been recently utilized to control protein function via regulation of protein activity. Here, we review the physical and evolutionary origin of protein allostery, as well as its importance to protein regulation, drug discovery, and biological processes in living systems. We describe recently developed approaches to identify allosteric pathways, connected sets of pairwise interactions that are responsible for propagation of conformational change from the ligand-binding site to a distal functional site. We then present experimental and computational protein engineering approaches for control of protein function by modulation of allosteric sites. As an example of application of these approaches, we describe a synergistic computational and experimental approach to rescue the cystic-fibrosis-associated protein cystic fibrosis transmembrane conductance regulator, which upon deletion of a single residue misfolds and causes disease. This example demonstrates the power of allosteric manipulation in proteins to both elucidate mechanisms of molecular function and to develop therapeutic strategies that rescue those functions. Allosteric control of proteins provides a tool to shine a light on the complex cascades of cellular processes and facilitate unprecedented interrogation of biological systems.
Collapse
|
27
|
Understanding the Molecular Dynamics of Type-2 Diabetes Drug Target DPP-4 and its Interaction with Sitagliptin and Inhibitor Diprotin-A. Cell Biochem Biophys 2014; 70:907-22. [DOI: 10.1007/s12013-014-9998-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
28
|
Allosteric regulation of the Hsp90 dynamics and stability by client recruiter cochaperones: protein structure network modeling. PLoS One 2014; 9:e86547. [PMID: 24466147 PMCID: PMC3896489 DOI: 10.1371/journal.pone.0086547] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2013] [Accepted: 12/06/2013] [Indexed: 12/29/2022] Open
Abstract
The fundamental role of the Hsp90 chaperone in supporting functional activity of diverse protein clients is anchored by specific cochaperones. A family of immune sensing client proteins is delivered to the Hsp90 system with the aid of cochaperones Sgt1 and Rar1 that act cooperatively with Hsp90 to form allosterically regulated dynamic complexes. In this work, functional dynamics and protein structure network modeling are combined to dissect molecular mechanisms of Hsp90 regulation by the client recruiter cochaperones. Dynamic signatures of the Hsp90-cochaperone complexes are manifested in differential modulation of the conformational mobility in the Hsp90 lid motif. Consistent with the experiments, we have determined that targeted reorganization of the lid dynamics is a unifying characteristic of the client recruiter cochaperones. Protein network analysis of the essential conformational space of the Hsp90-cochaperone motions has identified structurally stable interaction communities, interfacial hubs and key mediating residues of allosteric communication pathways that act concertedly with the shifts in conformational equilibrium. The results have shown that client recruiter cochaperones can orchestrate global changes in the dynamics and stability of the interaction networks that could enhance the ATPase activity and assist in the client recruitment. The network analysis has recapitulated a broad range of structural and mutagenesis experiments, particularly clarifying the elusive role of Rar1 as a regulator of the Hsp90 interactions and a stability enhancer of the Hsp90-cochaperone complexes. Small-world organization of the interaction networks in the Hsp90 regulatory complexes gives rise to a strong correspondence between highly connected local interfacial hubs, global mediator residues of allosteric interactions and key functional hot spots of the Hsp90 activity. We have found that cochaperone-induced conformational changes in Hsp90 may be determined by specific interaction networks that can inhibit or promote progression of the ATPase cycle and thus control the recruitment of client proteins.
Collapse
|
29
|
An effective Coarse-grained model for biological simulations: Recent refinements and validations. Proteins 2013; 82:1168-85. [DOI: 10.1002/prot.24482] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
30
|
An evolution-based approach to De Novo protein design and case study on Mycobacterium tuberculosis. PLoS Comput Biol 2013; 9:e1003298. [PMID: 24204234 PMCID: PMC3812052 DOI: 10.1371/journal.pcbi.1003298] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 09/09/2013] [Indexed: 01/31/2023] Open
Abstract
Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality. The goal of computational protein design is to create new protein sequences of desirable structure and biological function. Most protein design methods are developed to search for sequences with the lowest free-energy based on physics-based force fields following Anfinsen's thermodynamic hypothesis. A major obstacle of such approaches is the inaccuracy of the force-field design, which cannot accurately describe atomic interactions or correctly recognize protein folds. We propose a novel method which uses evolutionary information, in the form of sequence profiles from structure families, to guide the sequence design. Since sequence profiles are generally more accurate than physics-based potentials in protein fold recognition, a unique advantage lies on that it targets the design procedure to a family of protein sequence profiles to enhance the robustness of designed sequences. The method was tested on 87 proteins and the designed sequences can be folded by I-TASSER to models with an average RMSD 2.1 Å. As a case study of large-scale application, the method is extended to redesign all structurally resolved proteins in the human pathogenic bacteria, Mycobacterium tuberculosis. Five sequences varying in fold and sizes were characterized by circular dichroism and NMR spectroscopy experiments and three were shown to have ordered tertiary structure.
Collapse
|
31
|
Abstract
Myoglobins are ubiquitous proteins that play a seminal role in oxygen storage, transport, and NO metabolism. The folding mechanism of apomyoglobins from different species has been studied to a fair extent over the last two decades. However, integrated investigations of the entire process, including both the early (sub-ms) and late (ms-s) folding stages, have been missing. Here, we study the folding kinetics of the single-Trp Escherichia coli globin apoHmpH via a combination of continuous-flow microfluidic and stopped-flow approaches. A rich series of molecular events emerges, spanning a very wide temporal range covering more than 7 orders of magnitude, from sub-microseconds to tens of seconds. Variations in fluorescence intensity and spectral shifts reveal that the protein region around Trp120 undergoes a fast collapse within the 8 μs mixing time and gradually reaches a native-like conformation with a half-life of 144 μs from refolding initiation. There are no further fluorescence changes beyond ca. 800 μs, and folding proceeds much more slowly, up to 20 s, with acquisition of the missing helicity (ca. 30%), long after consolidation of core compaction. The picture that emerges is a gradual acquisition of native structure on a free-energy landscape with few large barriers. Interestingly, the single tryptophan, which lies within the main folding core of globins, senses some local structural consolidation events after establishment of native-like core polarity (i.e., likely after core dedydration). In all, this work highlights how the main core of the globin fold is capable of becoming fully native efficiently, on the sub-millisecond time scale.
Collapse
|
32
|
The how’s and why’s of protein folding intermediates. Arch Biochem Biophys 2013; 531:14-23. [DOI: 10.1016/j.abb.2012.10.006] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2012] [Revised: 10/05/2012] [Accepted: 10/11/2012] [Indexed: 12/13/2022]
|
33
|
The interface of protein structure, protein biophysics, and molecular evolution. Protein Sci 2012; 21:769-85. [PMID: 22528593 PMCID: PMC3403413 DOI: 10.1002/pro.2071] [Citation(s) in RCA: 140] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2012] [Revised: 03/22/2012] [Accepted: 03/23/2012] [Indexed: 12/20/2022]
Abstract
Abstract The interface of protein structural biology, protein biophysics, molecular evolution, and molecular population genetics forms the foundations for a mechanistic understanding of many aspects of protein biochemistry. Current efforts in interdisciplinary protein modeling are in their infancy and the state-of-the art of such models is described. Beyond the relationship between amino acid substitution and static protein structure, protein function, and corresponding organismal fitness, other considerations are also discussed. More complex mutational processes such as insertion and deletion and domain rearrangements and even circular permutations should be evaluated. The role of intrinsically disordered proteins is still controversial, but may be increasingly important to consider. Protein geometry and protein dynamics as a deviation from static considerations of protein structure are also important. Protein expression level is known to be a major determinant of evolutionary rate and several considerations including selection at the mRNA level and the role of interaction specificity are discussed. Lastly, the relationship between modeling and needed high-throughput experimental data as well as experimental examination of protein evolution using ancestral sequence resurrection and in vitro biochemistry are presented, towards an aim of ultimately generating better models for biological inference and prediction.
Collapse
|
34
|
Abstract
The sequence and structure of a large body of proteins are becoming increasingly available. It is desirable to explore mathematical tools for efficient extraction of information from such sources. The principles of graph theory, which was earlier applied in fields such as electrical engineering and computer networks are now being adopted to investigate protein structure, folding, stability, function and dynamics. This review deals with a brief account of relevant graphs and graph theoretic concepts. The concepts of protein graph construction are discussed. The manner in which graphs are analyzed and parameters relevant to protein structure are extracted, are explained. The structural and biological information derived from protein structures using these methods is presented.
Collapse
|
35
|
SEARCH FOR FOLDING INITIATION SITES FROM AMINO ACID SEQUENCE. J Bioinform Comput Biol 2011; 6:681-91. [DOI: 10.1142/s021972000800362x] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2007] [Revised: 01/02/2008] [Accepted: 01/04/2008] [Indexed: 11/18/2022]
Abstract
A crucial event in protein folding is the formation of a folding nucleus, which is a structured part of the protein chain in the transition state. We demonstrate a correlation between locations of residues involved in the folding nuclei and locations of predicted amyloidogenic regions. The average Φ-values are significantly greater inside amyloidogenic regions than outside them. We have found that fibril formation and normal folding involve many of the same key residues, giving an opportunity to outline the folding initiation site in protein chains. The search for folding initiation sites for apomyoglobin and ribonuclease. A coincides with the predictions made by other approaches.
Collapse
|
36
|
A QUANTITATIVE ANALYSIS OF INTERFACIAL AMINO ACID CONSERVATION IN PROTEIN-PROTEIN HETERO COMPLEXES. J Bioinform Comput Biol 2011; 3:1137-50. [PMID: 16278951 DOI: 10.1142/s0219720005001429] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2004] [Revised: 03/18/2005] [Accepted: 03/28/2005] [Indexed: 11/18/2022]
Abstract
A long-standing question in molecular biology is whether interfaces of protein-protein complexes are more conserved than the rest of the protein surfaces. Although it has been reported that conservation can be used as an indicator for predicting interaction sites on proteins, there are recent reports stating that the interface regions are only slightly more conserved than the rest of the protein surfaces, with conservation signals not being statistically significant enough for predicting protein-protein binding sites. In order to properly address these controversial reports we have studied a set of 28 well resolved hetero complex structures of proteins that consists of transient and non-transient complexes. The surface positions were classified into four conservation classes and the conservation index of the surface positions was quantitatively analyzed. The results indicate that the surface density of highly conserved positions is significantly higher in the protein-protein interface regions compared with the other regions of the protein surface. However, the average conservation index of the patches in the interface region is not significantly higher compared with other surface regions of the protein structures. This finding demonstrates that the number of conserved residue positions is a more appropriate indicator for predicting protein-protein binding sites than the average conservation index in the interacting region. We have further validated our findings on a set of 59 benchmark complex structures. Furthermore, an analysis of 19 complexes of antigen-antibody interactions shows that there is no conservation of amino acid positions in the interacting regions of these complexes, as expected, with the variable region of the immunoglobulins interacting mostly with the antigens. Interestingly, antigen interacting regions also have a higher number of non-conserved residue positions in the interacting region than the rest of the protein surface.
Collapse
|
37
|
Fast Side Chain Replacement in Proteins Using a Coarse-Grained Approach for Evaluating the Effects of Mutation During Evolution. J Mol Evol 2011; 73:23-33. [DOI: 10.1007/s00239-011-9454-3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2010] [Accepted: 07/14/2011] [Indexed: 11/28/2022]
|
38
|
Abstract
Recent years have witnessed an explosion in computational power, leading to attempts to model ever more complex systems. Nevertheless, there remain cases for which the use of brute-force computer simulations is clearly not the solution. In such cases, great benefit can be obtained from the use of physically sound simplifications. The introduction of such coarse graining can be traced back to the early usage of a simplified model in studies of proteins. Since then, the field has progressed tremendously. In this review, we cover both key developments in the field and potential future directions. Additionally, particular emphasis is given to two general approaches, namely the renormalization and reference potential approaches, which allow one to move back and forth between the coarse-grained (CG) and full models, as these approaches provide the foundation for CG modeling of complex systems.
Collapse
|
39
|
Abstract
Recent years have witnessed a tremendous explosion in computational power, which in turn has resulted in great progress in the complexity of the biological and chemical problems that can be addressed by means of all-atom simulations. Despite this, however, our computational time is not infinite, and in fact many of the key problems of the field were resolved long before the existence of the current levels of computational power. This review will start by presenting a brief historical overview of the use of multiscale simulations in biology, and then present some key developments in the field, highlighting several cases where the use of a physically sound simplification is clearly superior to a brute-force approach. Finally, some potential future directions will be discussed.
Collapse
|
40
|
Exploring the role of structure and dynamics in the function of chymotrypsin inhibitor 2. Proteins 2010; 79:916-24. [PMID: 21287622 DOI: 10.1002/prot.22930] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2010] [Revised: 10/16/2010] [Accepted: 10/25/2010] [Indexed: 11/11/2022]
Abstract
Increasing awareness of the possible role of internal dynamics in protein function has led to the development of new methods for experimentally characterizing protein dynamics across multiple time scales, especially using NMR spectroscopy. A few analyses of the conformational dynamics of proteins ranging from nonallosteric single domains to multidomain allosteric enzymes are now available; however, demonstrating a connection between dynamics and function remains difficult on account of the comparative lack of studies examining both changes in dynamics and changes in function in response to the same perturbations. In previous work, we characterized changes in structure and dynamics on the ps–ns time scale resulting from hydrophobic core mutations in chymotrypsin inhibitor 2 and found that there are moderate, persistent global changes in dynamics in the absence of gross structural changes (Whitley et al., Biochemistry 2008;47:8566–8576). Here, we assay those and additional mutants for inhibitory ability toward the serine proteases elastase and chymotrypsin to determine the effects of mutation on function. Results indicate that core mutation has only a subtle effect on CI2 function. Using chemical shifts, we also studied the effect of complex formation on CI2 structure and found that perturbations are greatest at the complex interface but also propagate toward CI2's hydrophobic core. The structure–dynamics–function data set completed here suggests that dynamics plays a limited role in the function of this small model system, although we do observe a correlation between nanosecond-scale reactive loop motions and inhibitory ability for mutations at one key position in the hydrophobic core.
Collapse
|
41
|
Folding intermediate and folding nucleus for I-->N and U-->I-->N transitions in apomyoglobin: contributions by conserved and nonconserved residues. Biophys J 2010; 98:1694-702. [PMID: 20409491 DOI: 10.1016/j.bpj.2009.12.4326] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2009] [Revised: 12/22/2009] [Accepted: 12/30/2009] [Indexed: 11/19/2022] Open
Abstract
Kinetic investigation on the wild-type apomyoglobin and its 12 mutants with substitutions of hydrophobic residues by Ala was performed using stopped-flow fluorescence. Characteristics of the kinetic intermediate I and the folding nucleus were derived solely from kinetic data, namely, the slow-phase folding rate constants and the burst-phase amplitudes of Trp fluorescence intensity. This allowed us to pioneer the phi-analysis for apomyoglobin. As shown, these mutations drastically destabilized the native state N and produced minor (for conserved residues of G, H helices) or even negligible (for nonconserved residues of B, C, D, E helices) destabilizing effect on the state I. On the other hand, conserved residues of A, G, H helices made a smaller contribution to stability of the folding nucleus at the rate-limiting I-->N transition than nonconserved residues of B, D, E helices. Thus, conserved side chains of the A-, G-, H-residues become involved in the folding nucleus before crossing the main barrier, whereas nonconserved side chains of the B-, D-, E-residues join the nucleus in the course of the I-->N transition.
Collapse
|
42
|
Molecular simulations provide new insights into the role of the accessory immunoglobulin-like domain of Cel9A. FEBS Lett 2010; 584:3431-5. [DOI: 10.1016/j.febslet.2010.06.041] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2010] [Revised: 06/13/2010] [Accepted: 06/28/2010] [Indexed: 11/21/2022]
|
43
|
Multiscale simulations of protein landscapes: using coarse-grained models as reference potentials to full explicit models. Proteins 2010; 78:1212-27. [PMID: 20052756 DOI: 10.1002/prot.22640] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Evaluating the free-energy landscape of proteins and the corresponding functional aspects presents a major challenge for computer simulation approaches. This challenge is due to the complexity of the landscape and the enormous computer time needed for converging simulations. The use of simplified coarse-grained (CG) folding models offers an effective way of sampling the landscape but such a treatment, however, may not give the correct description of the effect of the actual protein residues. A general way around this problem that has been put forward in our early work (Fan et al., Theor Chem Acc 1999;103:77-80) uses the CG model as a reference potential for free-energy calculations of different properties of the explicit model. This method is refined and extended here, focusing on improving the electrostatic treatment and on demonstrating key applications. These applications include: evaluation of changes of folding energy upon mutations, calculations of transition-states binding free energies (which are crucial for rational enzyme design), evaluations of catalytic landscape, and evaluations of the time-dependent responses to pH changes. Furthermore, the general potential of our approach in overcoming major challenges in studies of structure function correlation in proteins is discussed.
Collapse
|
44
|
Structural and functional constraints in the evolution of protein families. Nat Rev Mol Cell Biol 2009; 10:709-20. [PMID: 19756040 DOI: 10.1038/nrm2762] [Citation(s) in RCA: 137] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
|
45
|
Multi-constraint computational design suggests that native sequences of germline antibody H3 loops are nearly optimal for conformational flexibility. Proteins 2009; 75:846-58. [PMID: 19194863 DOI: 10.1002/prot.22293] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The limited size of the germline antibody repertoire has to recognize a far larger number of potential antigens. The ability of a single antibody to bind multiple ligands due to conformational flexibility in the antigen-binding site can significantly enlarge the repertoire. Among the six complementarity determining regions (CDRs) that generally comprise the binding site, the CDR H3 loop is particularly variable. Computational protein design studies showed that predicted low energy sequences compatible with a given backbone structure often have considerable similarity to the corresponding native sequences of naturally occurring proteins, indicating that native protein sequences are close to optimal for their structures. Here, we take a step forward to determine whether conformational flexibility, believed to play a key functional role in germline antibodies, is also central in shaping their native sequence. In particular, we use a multi-constraint computational design strategy, along with the Rosetta scoring function, to propose that the native sequences of CDR H3 loops from germline antibodies are nearly optimal for conformational flexibility. Moreover, we find that antibody maturation may lead to sequences with a higher degree of optimization for a single conformation, while disfavoring sequences that are intrinsically flexible. In addition, this computational strategy allows us to predict mutations in the CDR H3 loop to stabilize the antigen-bound conformation, a computational mimic of affinity maturation, that may increase antigen binding affinity by preorganizing the antigen binding loop. In vivo affinity maturation data are consistent with our predictions. The method described here can be useful to design antibodies with higher selectivity and affinity by reducing conformational diversity.
Collapse
|
46
|
Local fluctuations vs. global unfolding of proteins investigated by limited proteolysis. BIOCATAL BIOTRANSFOR 2009. [DOI: 10.1080/10242420500183287] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
47
|
Abstract
The phenomenon of intra-protein communication is fundamental to such processes as allostery and signaling, yet comparatively little is understood about its physical origins despite notable progress in recent years. This review introduces contemporary but distinct frameworks for understanding intra-protein communication by presenting both the ideas behind them and a discussion of their successes and shortcomings. The first framework holds that intra-protein communication is accomplished by the sequential mechanical linkage of residues spanning a gap between distal sites. According to the second framework, proteins are best viewed as ensembles of distinct structural microstates, the dynamical and thermodynamic properties of which contribute to the experimentally observable macroscale properties. Nuclear magnetic resonance (NMR) spectroscopy is a powerful method for studying intra-protein communication, and the insights into both frameworks it provides are presented through a discussion of numerous examples from the literature. Distinct from mechanical and thermodynamic considerations of intra-protein communication are recently applied graph and network theoretic analyses. These computational methods reduce complex three dimensional protein architectures to simple maps comprised of nodes (residues) connected by edges (inter-residue "interactions"). Analysis of these graphs yields a characterization of the protein's topology and network characteristics. These methods have shown proteins to be "small world" networks with moderately high local residue connectivities existing concurrently with a small but significant number of long range connectivities. However, experimental studies of the tantalizing idea that these putative long range interaction pathways facilitate one or several macroscopic protein characteristics are unfortunately lacking at present. This review concludes by comparing and contrasting the presented frameworks and methodologies for studying intra-protein communication and suggests a manner in which they can be brought to bear simultaneously to further enhance our understanding of this important fundamental phenomenon.
Collapse
|
48
|
On the role of some conserved and nonconserved amino acid residues in the transitional state and intermediate of apomyoglobin folding. Mol Biol 2009. [DOI: 10.1134/s0026893309010178] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
49
|
pH-induced equilibrium unfolding of apomyoglobin: substitutions at conserved Trp14 and Met131 and non-conserved Val17 positions. BIOCHEMISTRY (MOSCOW) 2008; 73:693-701. [PMID: 18620536 DOI: 10.1134/s0006297908060102] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
A number of residues in globins family are well conserved but are not directly involved in the primary oxygen-carrying function of these proteins. A possible role for these conserved, non-functional residues has been suggested in promoting a rapid and correct folding process to the native tertiary structure. To test this hypothesis, we have studied pH-induced equilibrium unfolding of mutant apomyoglobins with substitutions of the conserved residues Trp14 and Met131, which are not involved in the function of myoglobin, by various amino acids. This allowed estimating their impact on the stability of various conformational states of the proteins and selecting conditions for a folding kinetics study. The results obtained from circular dichroism, tryptophan fluorescence, and differential scanning microcalorimetry for these mutant proteins were compared with those for the wild type protein and for a mutant with the non-conserved Val17 substituted by Ala. In the native folded state, all of the mutant apoproteins have a compact globular structure, but are destabilized in comparison to the wild type protein. The pH-induced denaturation of the mutant proteins occurs through the formation of a molten globule-like intermediate similar to that of the wild type protein. Thermodynamic parameters for all of the proteins were calculated using the three state model. Stability of equilibrium intermediates at pH ~4.0 was shown to be slightly affected by the mutations. Thus, all of the above substitutions influence the stability of the native state of these proteins. The cooperativity of conformational transitions and the exposed to solvent protein surface were also changed, but not for the substitution at Val17.
Collapse
|
50
|
On the relationship between folding and chemical landscapes in enzyme catalysis. Proc Natl Acad Sci U S A 2008; 105:13877-82. [PMID: 18779576 PMCID: PMC2544547 DOI: 10.1073/pnas.0803405105] [Citation(s) in RCA: 75] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2008] [Indexed: 11/18/2022] Open
Abstract
Elucidating the relationship between the folding landscape of enzymes and their catalytic power has been one of the challenges of modern enzymology. The present work explores this issue by using a simplified folding model to generate the free-energy landscape of an enzyme and then to evaluate the activation barriers for the chemical step in different regions of the landscape. This approach is used to investigate the recent finding that an engineered monomeric chorismate mutase exhibits catalytic efficiency similar to the naturally occurring dimer even though it exhibits the properties of an intrinsically disordered molten globule. It is found that the monomer becomes more confined than its native-like counterpart upon ligand binding but still retains a wider catalytic region. Although the overall rate acceleration is still determined by reduction of the reorganization energy, the detailed contribution of different barriers yields a more complex picture for the chemical process than that of a single path. This work provides insight into the relationship between folding landscapes and catalysis. The computational approach used here may also provide a powerful strategy for modeling single-molecule experiments and designing enzymes.
Collapse
|