1
|
Xiong P, Hu X, Huang B, Zhang J, Chen Q, Liu H. Increasing the efficiency and accuracy of the ABACUS protein sequence design method. Bioinformatics 2019; 36:136-144. [DOI: 10.1093/bioinformatics/btz515] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2018] [Revised: 05/29/2019] [Accepted: 06/21/2019] [Indexed: 11/13/2022] Open
Abstract
Abstract
Motivation
The ABACUS (a backbone-based amino acid usage survey) method uses unique statistical energy functions to carry out protein sequence design. Although some of its results have been experimentally verified, its accuracy remains improvable because several important components of the method have not been specifically optimized for sequence design or in contexts of other parts of the method. The computational efficiency also needs to be improved to support interactive online applications or the consideration of a large number of alternative backbone structures.
Results
We derived a model to measure solvent accessibility with larger mutual information with residue types than previous models, optimized a set of rotamers which can approximate the sidechain atomic positions more accurately, and devised an empirical function to treat inter-atomic packing with parameters fitted to native structures and optimized in consistence with the rotamer set. Energy calculations have been accelerated by interpolation between pre-determined representative points in high-dimensional structural feature spaces. Sidechain repacking tests showed that ABACUS2 can accurately reproduce the conformation of native sidechains. In sequence design tests, the native residue type recovery rate reached 37.7%, exceeding the value of 32.7% for ABACUS1. Applying ABACUS2 to designed sequences on three native backbones produced proteins shown to be well-folded by experiments.
Availability and implementation
The ABACUS2 sequence design server can be visited at http://biocomp.ustc.edu.cn/servers/abacus-design.php.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Peng Xiong
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Xiuhong Hu
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Bin Huang
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Jiahai Zhang
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Quan Chen
- School of Life Sciences, Hefei, Anhui 230026, China
| | - Haiyan Liu
- School of Life Sciences, Hefei, Anhui 230026, China
- Hefei National Laboratory for Physical Sciences at the Microscale, Hefei, Anhui 230026, China
- School of Data Science, University of Sciences and Technology of China, Hefei, Anhui 230026, China
| |
Collapse
|
2
|
Childers MC, Towse CL, Daggett V. Molecular dynamics-derived rotamer libraries for d-amino acids within homochiral and heterochiral polypeptides. Protein Eng Des Sel 2018; 31:191-204. [PMID: 29992252 PMCID: PMC6205366 DOI: 10.1093/protein/gzy016] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2018] [Accepted: 06/15/2018] [Indexed: 01/06/2023] Open
Abstract
Computational resources have contributed to the design and engineering of novel proteins by integrating genomic, structural and dynamic aspects of proteins. Non-canonical amino acids, such as d-amino acids, expand the available sequence space for designing and engineering proteins; however, the rotamer libraries for d-amino acids are usually constructed as the mirror images of l-amino acid rotamer libraries, an assumption that has not been tested. To this end, we have performed molecular dynamics (MD) simulations of model host-guest peptide systems containing d-amino acids. Our simulations systematically address the applicability of the mirror image convention as well as the effects of neighboring residue chirality. Rotamer libraries derived from these systems provide realistic rotamer distributions suitable for use in both rational and computational design workflows. Our simulations also address the impact of chirality on the intrinsic conformational preferences of amino acids, providing fundamental insights into the relationship between chirality and biomolecular dynamics. While d-amino acids are rare in naturally occurring proteins, they are used in designed proteins to stabilize a desired conformation, increase bioavailability or confer favorable biochemical and physical attributes. Here, we present d-amino acid rotamer libraries derived from MD simulations of alanine-based host-guest pentapeptides and show how certain residues can deviate from mirror image symmetry. Our simulations directly model d-amino acids as guest residues within the chiral l-Ala and d-Ala pentapeptide series to explicitly incorporate any contributions resulting from the chiralities of neighboring residues.
Collapse
Affiliation(s)
| | - Clare-Louise Towse
- Department of Bioengineering, University of Washington, Seattle, WA, USA
| | - Valerie Daggett
- Department of Bioengineering, University of Washington, Seattle, WA, USA
| |
Collapse
|
3
|
Matsuura Y, Takehira M, Makhatadze GI, Joti Y, Naitow H, Kunishima N, Yutani K. Strategy for Stabilization of CutA1 Proteins Due to Ion-Ion Interactions at Temperatures of over 100 °C. Biochemistry 2018; 57:2649-2656. [PMID: 29648806 DOI: 10.1021/acs.biochem.8b00103] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023]
Abstract
In order to elucidate the contribution of charged residues to protein stabilization at temperatures of over 100 °C, we constructed many mutants of the CutA1 protein ( EcCutA1) from Escherichia coli. The goal was to see if one can achieve the same stability as for a CutA1 from hyperthermophile Pyrococcus horikoshii that has the denaturation temperature near 150 °C. The hydrophobic mutant of EcCutA1 ( Ec0VV) with denaturation temperature ( Td) of 113.2 °C was used as a template for mutations. The highest Td of Ec0VV mutants substituted by a single charged residue was 118.4 °C. Multiple ion mutants were also constructed by combination of single mutants and found to have an increased thermostability. The highest stability of multiple mutants was a mutant substituted by nine charged residues that had a Td of 142.2 °C. To evaluate the energy of ion-ion interactions of mutant proteins, we used the structural ensemble obtained by a molecular dynamics simulation at 300 K. The Td of ionic mutants linearly increases with the increments of the computed energy of ion-ion interactions for ionic mutant proteins even up to the temperatures near 140 °C, suggesting that ion-ion interactions cumulatively contribute to the stabilization of a protein at high temperatures.
Collapse
Affiliation(s)
| | - Michiyo Takehira
- RIKEN SPring-8 Center , 1-1-1 Kouto , Sayo, Hyogo 679-5148 , Japan
| | - George I Makhatadze
- Department of Biology , Rensselaer Polytechnic Institute , 110 Eighth Street , Troy , New York 12180-3590 , United States
| | - Yasumasa Joti
- Japan Synchrotron Radiation Research Institute , 1-1-1, Kouto , Sayo, Hyogo 679-5198 Japan
| | - Hisashi Naitow
- RIKEN SPring-8 Center , 1-1-1 Kouto , Sayo, Hyogo 679-5148 , Japan
| | - Naoki Kunishima
- RIKEN SPring-8 Center , 1-1-1 Kouto , Sayo, Hyogo 679-5148 , Japan
| | - Katsuhide Yutani
- RIKEN SPring-8 Center , 1-1-1 Kouto , Sayo, Hyogo 679-5148 , Japan
| |
Collapse
|
4
|
Oda H, Ota M, Toh H. Profile comparison revealed deviation from structural constraint at the positively selected sites. Biosystems 2016; 147:67-77. [PMID: 27443483 DOI: 10.1016/j.biosystems.2016.07.007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2015] [Revised: 07/13/2016] [Accepted: 07/16/2016] [Indexed: 11/18/2022]
Abstract
The amino acid substitutions at a site are affected by mixture of various constraints. It is also known that the amino acid substitutions are accelerated at sites under positive selection. However, the relationship between the substitutions at positively selected sites and the constraints has not been thoroughly examined. The advances in computational biology have enabled us to divide the mixture of the constraints into the structural constraint and the remainings by using the amino acid sequences and the tertiary structures, which is expressed as the deviation of the mixture of constraints from the structural constraint. Here, two types of profiles, or matrices with the size of 20 x (site length), are compared. One of the profiles represents the mixture of constraints, and is generated from a multiple amino acid sequence alignment, whereas the other is designed to represent the structural constraints. We applied the profile comparison method to proteins under positive selection to examine the relationship between the positive selection and constraints. The results suggested that the constraint at a site under positive selection tends to be deviated from the structural constraint at the site.
Collapse
Affiliation(s)
- Hiroyuki Oda
- Graduate School of Systems Life Sciences, Kyushu University, 744 Motooka Nishi-ku, Fukuoka 819-0395, Japan.
| | - Motonori Ota
- Graduate School of Information Science, Nagoya University, Furo-cho, Chikusa-ku, Nagoya City, Aichi 464-8601, Japan
| | - Hiroyuki Toh
- Department of Biomedical Chemistry, School of Science and Technology, Kwansei Gakuin University, 2-1 Gakuen, Sanda, Hyogo 669-1337, Japan
| |
Collapse
|
5
|
Shirai T, Saito M, Kobayashi A, Asano M, Hizume M, Ikeda S, Teruya K, Morita M, Kitamoto T. Evaluating prion models based on comprehensive mutation data of mouse PrP. Structure 2014; 22:560-71. [PMID: 24560805 DOI: 10.1016/j.str.2013.12.019] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2013] [Revised: 12/21/2013] [Accepted: 12/28/2013] [Indexed: 10/25/2022]
Abstract
The structural details of the essential entity of prion disease, fibril prion protein (PrP(Sc)), are still elusive despite the large body of evidence supporting the prion hypothesis. Five major working models of PrP(Sc) structure, which are not compatible with each other, have been proposed. However, no systematic evaluation has been performed on those models. We devised a method that combined systematic point mutation with threading on knowledge-based amino acid potentials. A comprehensive mutation experiment was performed on mouse prion protein, and the PrP(Sc) conversion efficiency of each mutant was examined. The models were evaluated based on the mutation data by using the threading method. Although the data turned out to be rather more consistent with the models that assumed a conversion of the N-terminal region of core PrP into a β helix than with others, substantial modifications were also required to further improve the current model based on recent experimental results.
Collapse
Affiliation(s)
- Tsuyoshi Shirai
- Department of Computer Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama, Shiga 526-0829, Japan; Bioinformatics Research Division, Japan Science and Technology Agency, 5-3, Yonbancho, Chiyoda-ku, Tokyo 102-8666 Japan.
| | - Mihoko Saito
- Department of Computer Bioscience, Nagahama Institute of Bio-Science and Technology, Nagahama, Shiga 526-0829, Japan; Bioinformatics Research Division, Japan Science and Technology Agency, 5-3, Yonbancho, Chiyoda-ku, Tokyo 102-8666 Japan
| | - Atsushi Kobayashi
- Department of Neurological Science, Tohoku University Graduate School of Medicine Research, 2-1 Seiryo-machi, Aoba-ku, Sendai 980-8575, Japan
| | - Masahiro Asano
- Department of Neurological Science, Tohoku University Graduate School of Medicine Research, 2-1 Seiryo-machi, Aoba-ku, Sendai 980-8575, Japan
| | - Masaki Hizume
- Department of Neurological Science, Tohoku University Graduate School of Medicine Research, 2-1 Seiryo-machi, Aoba-ku, Sendai 980-8575, Japan
| | - Shino Ikeda
- Department of Neurological Science, Tohoku University Graduate School of Medicine Research, 2-1 Seiryo-machi, Aoba-ku, Sendai 980-8575, Japan
| | - Kenta Teruya
- Department of Neurological Science, Tohoku University Graduate School of Medicine Research, 2-1 Seiryo-machi, Aoba-ku, Sendai 980-8575, Japan
| | - Masanori Morita
- Department of Neurological Science, Tohoku University Graduate School of Medicine Research, 2-1 Seiryo-machi, Aoba-ku, Sendai 980-8575, Japan; Research and Development Division, Benesis Corporation, Kitahama, Chuo-Ku, Osaka 541-850, Japan
| | - Tetsuyuki Kitamoto
- Department of Neurological Science, Tohoku University Graduate School of Medicine Research, 2-1 Seiryo-machi, Aoba-ku, Sendai 980-8575, Japan.
| |
Collapse
|
6
|
Moal IH, Fernandez-Recio J. Intermolecular Contact Potentials for Protein-Protein Interactions Extracted from Binding Free Energy Changes upon Mutation. J Chem Theory Comput 2013; 9:3715-27. [PMID: 26584123 DOI: 10.1021/ct400295z] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/31/2023]
Abstract
Understanding and predicting the energetics of protein-protein interactions is fundamental to the structural modeling of protein complexes. Binding free energy can be approximated as a sum of pairwise atomic or residue contact energies, which are commonly inferred from contact frequencies observed in experimental protein structures. However, such statistically inferred potentials require certain assumptions and approximation. Here, we explore the possibility of deriving atomic and residue contact potentials directly from experimental binding free energy changes following mutation and present a number of such potentials. The first set of potentials is obtained by unweighted least-squares fitting and bootsrap aggregating. The second set is calculated using a weighting scheme optimized against absolute binding affinity data, so as to account for the over-representation of certain complexes, residues, and families of interactions. The congruence of the potentials with known physical chemistry is investigated. The potentials are further validated by ranking and clustering protein-protein docking poses.
Collapse
Affiliation(s)
- Iain H Moal
- Joint BSC-IRB Research Program in Computational Biology, Life Science Department, Barcelona Supercomputing Center , C/Jordi Girona 29, 08034 Barcelona, Spain
| | - Juan Fernandez-Recio
- Joint BSC-IRB Research Program in Computational Biology, Life Science Department, Barcelona Supercomputing Center , C/Jordi Girona 29, 08034 Barcelona, Spain
| |
Collapse
|
7
|
Kuroda D, Shirai H, Jacobson MP, Nakamura H. Computer-aided antibody design. Protein Eng Des Sel 2012; 25:507-21. [PMID: 22661385 PMCID: PMC3449398 DOI: 10.1093/protein/gzs024] [Citation(s) in RCA: 169] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2012] [Revised: 04/14/2012] [Accepted: 04/19/2012] [Indexed: 11/12/2022] Open
Abstract
Recent clinical trials using antibodies with low toxicity and high efficiency have raised expectations for the development of next-generation protein therapeutics. However, the process of obtaining therapeutic antibodies remains time consuming and empirical. This review summarizes recent progresses in the field of computer-aided antibody development mainly focusing on antibody modeling, which is divided essentially into two parts: (i) modeling the antigen-binding site, also called the complementarity determining regions (CDRs), and (ii) predicting the relative orientations of the variable heavy (V(H)) and light (V(L)) chains. Among the six CDR loops, the greatest challenge is predicting the conformation of CDR-H3, which is the most important in antigen recognition. Further computational methods could be used in drug development based on crystal structures or homology models, including antibody-antigen dockings and energy calculations with approximate potential functions. These methods should guide experimental studies to improve the affinities and physicochemical properties of antibodies. Finally, several successful examples of in silico structure-based antibody designs are reviewed. We also briefly review structure-based antigen or immunogen design, with application to rational vaccine development.
Collapse
Affiliation(s)
- Daisuke Kuroda
- Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka, Japan.
| | | | | | | |
Collapse
|
8
|
Analyzing effects of naturally occurring missense mutations. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2012; 2012:805827. [PMID: 22577471 PMCID: PMC3346971 DOI: 10.1155/2012/805827] [Citation(s) in RCA: 82] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Revised: 02/01/2012] [Accepted: 02/01/2012] [Indexed: 11/17/2022]
Abstract
Single-point mutation in genome, for example, single-nucleotide polymorphism (SNP) or rare genetic mutation, is the change of a single nucleotide for another in the genome sequence. Some of them will produce an amino acid substitution in the corresponding protein sequence (missense mutations); others will not. This paper focuses on genetic mutations resulting in a change in the amino acid sequence of the corresponding protein and how to assess their effects on protein wild-type characteristics. The existing methods and approaches for predicting the effects of mutation on protein stability, structure, and dynamics are outlined and discussed with respect to their underlying principles. Available resources, either as stand-alone applications or webservers, are pointed out as well. It is emphasized that understanding the molecular mechanisms behind these effects due to these missense mutations is of critical importance for detecting disease-causing mutations. The paper provides several examples of the application of 3D structure-based methods to model the effects of protein stability and protein-protein interactions caused by missense mutations as well.
Collapse
|
9
|
Zhang Z, Wang L, Gao Y, Zhang J, Zhenirovskyy M, Alexov E. Predicting folding free energy changes upon single point mutations. ACTA ACUST UNITED AC 2012; 28:664-71. [PMID: 22238268 DOI: 10.1093/bioinformatics/bts005] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
MOTIVATION The folding free energy is an important characteristic of proteins stability and is directly related to protein's wild-type function. The changes of protein's stability due to naturally occurring mutations, missense mutations, are typically causing diseases. Single point mutations made in vitro are frequently used to assess the contribution of given amino acid to the stability of the protein. In both cases, it is desirable to predict the change of the folding free energy upon single point mutations in order to either provide insights of the molecular mechanism of the change or to design new experimental studies. RESULTS We report an approach that predicts the free energy change upon single point mutation by utilizing the 3D structure of the wild-type protein. It is based on variation of the molecular mechanics Generalized Born (MMGB) method, scaled with optimized parameters (sMMGB) and utilizing specific model of unfolded state. The corresponding mutations are built in silico and the predictions are tested against large dataset of 1109 mutations with experimentally measured changes of the folding free energy. Benchmarking resulted in root mean square deviation = 1.78 kcal/mol and slope of the linear regression fit between the experimental data and the calculations was 1.04. The sMMGB is compared with other leading methods of predicting folding free energy changes upon single mutations and results discussed with respect to various parameters. AVAILABILITY All the pdb files we used in this article can be downloaded from http://compbio.clemson.edu/downloadDir/mentaldisorders/sMMGB_pdb.rar. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Zhe Zhang
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, SC 29634, USA
| | | | | | | | | | | |
Collapse
|
10
|
Nishi H, Koike R, Ota M. Cover and spacer insertions: Small nonhydrophobic accessories that assist protein oligomerization. Proteins 2011; 79:2372-9. [DOI: 10.1002/prot.23084] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2010] [Revised: 04/22/2011] [Accepted: 05/09/2011] [Indexed: 01/15/2023]
|
11
|
Motono C, Nakata J, Koike R, Shimizu K, Shirota M, Amemiya T, Tomii K, Nagano N, Sakaya N, Misoo K, Sato M, Kidera A, Hiroaki H, Shirai T, Kinoshita K, Noguchi T, Ota M. SAHG, a comprehensive database of predicted structures of all human proteins. Nucleic Acids Res 2010; 39:D487-93. [PMID: 21051360 PMCID: PMC3013665 DOI: 10.1093/nar/gkq1057] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Most proteins from higher organisms are known to be multi-domain proteins and contain substantial numbers of intrinsically disordered (ID) regions. To analyse such protein sequences, those from human for instance, we developed a special protein-structure-prediction pipeline and accumulated the products in the Structure Atlas of Human Genome (SAHG) database at http://bird.cbrc.jp/sahg. With the pipeline, human proteins were examined by local alignment methods (BLAST, PSI-BLAST and Smith–Waterman profile–profile alignment), global–local alignment methods (FORTE) and prediction tools for ID regions (POODLE-S) and homology modeling (MODELLER). Conformational changes of protein models upon ligand-binding were predicted by simultaneous modeling using templates of apo and holo forms. When there were no suitable templates for holo forms and the apo models were accurate, we prepared holo models using prediction methods for ligand-binding (eF-seek) and conformational change (the elastic network model and the linear response theory). Models are displayed as animated images. As of July 2010, SAHG contains 42 581 protein-domain models in approximately 24 900 unique human protein sequences from the RefSeq database. Annotation of models with functional information and links to other databases such as EzCatDB, InterPro or HPRD are also provided to facilitate understanding the protein structure-function relationships.
Collapse
Affiliation(s)
- Chie Motono
- Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology, Tokyo 135-0064, Japan.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
12
|
Abstract
When designing a mutagenesis experiment, it is often crucial to estimate the stability change of proteins induced by mutations (Delta DG). Despite the recent advances in computational methods, it is still challenging to estimate D DG quickly and accurately. We recently developed the Eris protocols for in silico evaluation of the Delta DG. Starting from the tertiary structure of the wide-type protein, the Eris protocols can model the structure of the mutant protein and estimate Delta DG using the structure models. The Eris protocols not only efficiently optimize the side chains conformations, taking advantage of a fast rotamer-based searching algorithm, but also allow protein backbone flexibility during the modeling. As a result, the Eris protocols effectively resolve steric clashes induced by certain mutations and have more accurate Delta DG predictions than a fixed-backbone approach. We discuss the general aspects of computational Delta DG estimations and discuss in detail the principles and methodologies of the Eris protocols.
Collapse
|
13
|
Betancourt MR. Another look at the conditions for the extraction of protein knowledge-based potentials. Proteins 2009; 76:72-85. [PMID: 19089977 DOI: 10.1002/prot.22320] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Abstract
Protein knowledge-based potentials are effective free energies obtained from databases of known protein structures. They are used to parameterize coarse-grained protein models in many folding simulation and structure prediction methods. Two common approaches are used in the derivation of knowledge-based potentials. One assumes that the energy parameters optimize the native structure stability. The other assumes that interaction events are related to their energies according to the Boltzmann distribution, and that they are distributed independently of other events, that is, the quasi-chemical approximation. Here, these assumptions are systematically tested by extracting contact energies from artificial databases of lattice proteins with predefined pairwise contact energies. Databases of protein sequences are designed to either satisfy the Boltzmann distribution at high or low temperatures, or to simultaneously optimize the native stability and folding kinetics. It is found that the quasi-chemical approximation, with the ideal reference state, accurately reproduce the true energies for high temperature Boltzmann distributed sequences (weakly interacting residues), but less accurately at low temperatures, where the sequences correspond to energy minima and the residues are strongly interacting. To overcome this problem, an iterative procedure for Boltzmann distributed sequences is introduced, which accounts for interacting residue correlations and eliminates the need for the quasi-chemical approximation. In this case, the energies are accurately reproduced at any ensemble temperature. However, when the database of sequences designed for optimal stability and kinetics is used, the energy correlation is less than optimal using either method, exhibiting random and systematic deviations from linearity. Therefore, the assumption that native structures are maximally stable or that sequences are determined according to the Boltzmann distribution seems to be inadequate for obtaining accurate energies. The limited number of sequences in the database and the inhomogeneous concentration of amino acids from one structure to another do not seem to be major obstacles for improving the quality of the extracted pairwise energies, with the exception of repulsive interactions.
Collapse
Affiliation(s)
- Marcos R Betancourt
- Department of Physics, Indiana University Purdue University Indianapolis, Indianapolis, Indiana 46202, USA.
| |
Collapse
|
14
|
Chen Y, Ding F, Nie H, Serohijos AW, Sharma S, Wilcox KC, Yin S, Dokholyan NV. Protein folding: then and now. Arch Biochem Biophys 2008; 469:4-19. [PMID: 17585870 PMCID: PMC2173875 DOI: 10.1016/j.abb.2007.05.014] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2007] [Revised: 05/11/2007] [Accepted: 05/21/2007] [Indexed: 01/19/2023]
Abstract
Over the past three decades the protein folding field has undergone monumental changes. Originally a purely academic question, how a protein folds has now become vital in understanding diseases and our abilities to rationally manipulate cellular life by engineering protein folding pathways. We review and contrast past and recent developments in the protein folding field. Specifically, we discuss the progress in our understanding of protein folding thermodynamics and kinetics, the properties of evasive intermediates, and unfolded states. We also discuss how some abnormalities in protein folding lead to protein aggregation and human diseases.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Nikolay V. Dokholyan
- † To whom correspondence should be addressed: Nikolay V. Dokholyan, Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, North Carolina 27599. Fax: 919-966-2852.
| |
Collapse
|
15
|
Yin S, Ding F, Dokholyan NV. Modeling Backbone Flexibility Improves Protein Stability Estimation. Structure 2007; 15:1567-76. [DOI: 10.1016/j.str.2007.09.024] [Citation(s) in RCA: 133] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2007] [Revised: 09/06/2007] [Accepted: 09/26/2007] [Indexed: 11/16/2022]
|
16
|
Isogai Y, Ito Y, Ikeya T, Shiro Y, Ota M. Design of λ Cro Fold: Solution Structure of a Monomeric Variant of the De Novo Protein. J Mol Biol 2005; 354:801-14. [PMID: 16289118 DOI: 10.1016/j.jmb.2005.10.005] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2005] [Revised: 10/02/2005] [Accepted: 10/04/2005] [Indexed: 10/25/2022]
Abstract
One of the classical DNA-binding proteins, bacteriophage lambda Cro, forms a homodimer with a unique fold of alpha-helices and beta-sheets. We have computationally designed an artificial sequence of 60 amino acid residues to stabilize the backbone tertiary structure of the lambda Cro dimer by simulated annealing using knowledge-based structure-sequence compatibility functions. The designed amino acid sequence has 25% identity with that of natural lambda Cro and preserves Phe58, which is important for formation of the stably folded structure of lambda Cro. The designed dimer protein and its monomeric variant, which was redesigned by the insertion of a beta-hairpin sequence at the C-terminal region to prevent dimerization, were synthesized and biochemically characterized to be well folded. The designed protein was monomeric under a wide range of protein concentrations and its solution structure was determined by NMR spectroscopy. The solved structure is similar to that of a monomeric variant of natural lambda Cro with a root-mean-square deviation of the polypeptide backbones at 2.1A and has a well-packed protein core. Thus, our knowledge-based functions provide approximate but essential relationships between amino acid sequences and protein structures, and are useful for finding novel sequences that are foldable into a given target structure.
Collapse
Affiliation(s)
- Yasuhiro Isogai
- Bio-metal Science Laboratory, RIKEN Harima Institute/SPring8, Mikazuki, Sayo, Hyogo 679-5148, Japan.
| | | | | | | | | |
Collapse
|
17
|
Kinoshita K, Ota M. P-cats: prediction of catalytic residues in proteins from their tertiary structures. Bioinformatics 2005; 21:3570-1. [PMID: 15994193 DOI: 10.1093/bioinformatics/bti561] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED P-cats is a web server that predicts the catalytic residues in proteins from the atomic coordinates. P-cats receives a coordinate file of the tertiary structure and sends out analytical results via e-mail. The reply contains a summary and two URLs to allow the user to examine the conserved residues: one for interactive images of the prediction results and the other for a graphical view of the multiple sequence alignment. AVAILABILITY P-cats is freely available at http://p-cats.hgc.jp/p-cats CONTACT kino@ims.u-tokyo.ac.jp
Collapse
Affiliation(s)
- Kengo Kinoshita
- Institute of Medical Science, University of Tokyo, Shirokanedai, Minato-ku, Tokyo 108-8639, Japan.
| | | |
Collapse
|
18
|
Hioki Y, Ogasahara K, Lee SJ, Ma J, Ishida M, Yamagata Y, Matsuura Y, Ota M, Ikeguchi M, Kuramitsu S, Yutani K. The crystal structure of the tryptophan synthase beta subunit from the hyperthermophile Pyrococcus furiosus. Investigation of stabilization factors. ACTA ACUST UNITED AC 2004; 271:2624-35. [PMID: 15206928 DOI: 10.1111/j.1432-1033.2004.04191.x] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The structure of the tryptophan synthase beta2 subunit (Pfbeta2) from the hyperthermophile, Pyrococcus furiosus, was determined by X-ray crystallographic analysis at 2.2 A resolution, and its stability was examined by DSC. This is the first report of the X-ray structure of the tryptophan synthase beta2 subunit alone, although the structure of the tryptophan synthase alpha2beta2 complex from Salmonella typhimurium has already been reported. The structure of Pfbeta2 was essentially similar to that of the beta2 subunit (Stbeta2) in the alpha2beta2 complex from S. typhimurium. The sequence alignment with secondary structures of Pfbeta and Stbeta in monomeric form showed that six residues in the N-terminal region and three residues in the C-terminal region were deleted in Pfbeta, and one residue at Pro366 of Stbeta and at Ile63 of Pfbeta was inserted. The denaturation temperature of Pfbeta2 was higher by 35 degrees C than the reported values from mesophiles at approximately pH 8. On the basis of structural information on both proteins, the analyses of the contributions of each stabilization factor indicate that: (a) the higher stability of Pfbeta2 is not caused by either a hydrophobic interaction or an increase in ion pairs; (b) the number of hydrogen bonds involved in the main chains of Pfbeta is greater by about 10% than that of Stbeta, indicating that the secondary structures of Pfbeta are more stabilized than those of Stbeta and (c) the sequence of Pfbeta seems to be better fitted to an ideally stable structure than that of Stbeta, as assessed from X-ray structure data.
Collapse
Affiliation(s)
- Yusaku Hioki
- Institute for Protein Research, Osaka University, Japan
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
19
|
Abstract
We have developed an effective scoring function for protein design. The atomic solvation parameters, together with the weights of energy terms, were optimized so that residues corresponding to the native sequence were predicted with low energy in the training set of 28 protein structures. The solvation energy of non-hydrogen-bonded hydrophilic atoms was considered separately and expressed in a nonlinear way. As a result, our scoring function predicted native residues as the most favorable in 59% of the total positions in 28 proteins. We then tested the scoring function by comparing the predicted stability changes for 103 T4 lysozyme mutants with the experimental values. The correlation coefficients were 0.77 for surface mutations and 0.71 for all mutations. Finally, the scoring function combined with Monte Carlo simulation was used to predict favorable sequences on a fixed backbone. The designed sequences were similar to the natural sequences of the family to which the template structure belonged. The profile of the designed sequences was helpful for identification of remote homologues of the native sequence.
Collapse
Affiliation(s)
- Shide Liang
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas 75390-9050, USA
| | | |
Collapse
|
20
|
Handa N, Terada T, Kamewari Y, Hamana H, Tame JRH, Park SY, Kinoshita K, Ota M, Nakamura H, Kuramitsu S, Shirouzu M, Yokoyama S. Crystal structure of the conserved protein TT1542 from Thermus thermophilus HB8. Protein Sci 2003; 12:1621-32. [PMID: 12876312 PMCID: PMC2323949 DOI: 10.1110/gad.03104003] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
The TT1542 protein from Thermus thermophilus HB8 is annotated as a conserved hypothetical protein, and belongs to the DUF158 family in the Pfam database. A BLAST search revealed that homologs of TT1542 are present in a wide range of organisms. The TT1542 homologs in eukaryotes, PIG-L in mammals, and GPI12 in yeast and protozoa, have N-acetylglucosaminylphosphatidylinositol (GlcNAc-PI) de-N-acetylase activity. Although most of the homologs in prokaryotes are hypothetical and have no known function, Rv1082 and Rv1170 from Mycobacterium tuberculosis are enzymes involved in the mycothiol detoxification pathway. Here we report the crystal structure of the TT1542 protein at 2.0 A resolution, which represents the first structure for this superfamily of proteins. The structure of the TT1542 monomer consists of a twisted beta-sheet composed of six parallel beta-strands and one antiparallel beta-strand (with the strand order 3-2-1-4-5-7-6) sandwiched between six alpha-helices. The N-terminal five beta-strands and four alpha-helices form an incomplete Rossmann fold-like structure. The structure shares some similarity to the sugar-processing enzymes with Rossmann fold-like domains, especially those of the GPGTF (glycogen phosphorylase/glycosyl transferase) superfamily, and also to the NAD(P)-binding Rossmann fold domains. TT1542 is a homohexamer in the crystal and in solution, the six monomers forming a cylindrical structure. Putative active sites are suggested by the structure and conserved amino acid residues.
Collapse
Affiliation(s)
- Noriko Handa
- RIKEN Genomic Sciences Center, 1-7-22 Suehiro-cho, Tsurumi, Yokohama 230-0045, Japan
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
21
|
Aita T, Ota M, Husimi Y. An in silico exploration of the neutral network in protein sequence space. J Theor Biol 2003; 221:599-613. [PMID: 12713943 DOI: 10.1006/jtbi.2003.3209] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Designating amino-acid sequences that fold into a common main-chain structure as "neutral sequences" for the structure, regardless of their function or stability, we investigated the distribution of neutral sequences in protein sequence space. For four distinct target structures (alpha, beta,alpha/beta and alpha+beta types) with the same chain length of 108, we generated the respective neutral sequences by using the inverse folding technique with a knowledge-based potential function. We assumed that neutral sequences for a protein structure have Z scores higher than or equal to fixed thresholds, where thresholds are defined as the Z score for the corresponding native sequence (case 1) or much greater Z score (case 2). An exploring walk simulation suggested that the neutral sequences mapped into the sequence space were connected with each other through straight neutral paths and formed an inherent neutral network over the sequence space. Through another exploring walk simulation, we investigated contiguous regions between or among the neutral networks for the distinct protein structures and obtained the following results. The closest approach distance between the two neutral networks ranged from 5 to 29 on the Hamming distance scale, showing a linear increase against the threshold values. The sequences located at the "interchange" regions between the two neutral networks have intermediate sequence-profile-scores for both corresponding structures. Introducing a "ball" in the sequence space that contains at least one neutral sequence for each of the four structures, we found that the minimal radius of the ball that is centered at an arbitrary position ranged from 35 to 50, while the minimal radius of the ball that is centered at a certain special position ranged from 20 to 30, in the Hamming distance scale. The relatively small Hamming distances (5-30) may support an evolution mechanism by transferring from a network for a structure to another network for a more beneficial structure via the interchange regions.
Collapse
Affiliation(s)
- Takuyo Aita
- Tsukuba Research Institute, Novartis Pharma K. K. Ohkubo 8, Tsukuba 300-2611, Japan
| | | | | |
Collapse
|
22
|
Ota M, Kinoshita K, Nishikawa K. Prediction of catalytic residues in enzymes based on known tertiary structure, stability profile, and sequence conservation. J Mol Biol 2003; 327:1053-64. [PMID: 12662930 DOI: 10.1016/s0022-2836(03)00207-9] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
The catalytic or functionally important residues of a protein are known to exist in evolutionarily constrained regions. However, the patterns of residue conservation alone are sometimes not very informative, depending on the homologous sequences available for a given query protein. Here, we present an integrated method to locate the catalytic residues in an enzyme from its sequence and structure. Mutations of functional residues usually decrease the activity, but concurrently often increase stability. Also, catalytic residues tend to occupy partially buried sites in holes or clefts on the molecular surface. After confirming these general tendencies by carrying out statistical analyses on 49 representative enzymes, these data together with amino acid conservation were evaluated. This novel method exhibited better sensitivity in the prediction accuracy than traditional methods that consider only the residue conservation. We applied it to some so-called "hypothetical" proteins, with known structures but undefined functions. The relationships among the catalytic, conserved, and destabilizing residues in enzymatic proteins are discussed.
Collapse
Affiliation(s)
- Motonori Ota
- National Institute of Genetics, Yata, Mishima, 411-8540, Shizuoka, Japan.
| | | | | |
Collapse
|
23
|
Aita T, Husimi Y. Statistical formulae of the energy distribution among a globular protein structure ensemble. J Theor Biol 2003; 220:107-21. [PMID: 12453454 DOI: 10.1006/jtbi.2003.3158] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In prediction of a protein main-chain structure into which a query sequence of amino acids folds, one evaluates the relative stability of a candidate structure against reference structures. We developed a statistical theory for calculating the energy distribution over a main-chain structure ensemble, only with an amino acid composition given as a single argument. Then, we obtained a statistical formulae of the ensemble mean <E> and ensemble variance V[E] of the reference structural energies, as explicit functions of the amino acid composition. The mean <E> and the variance V[E] calculated from the formulae were well or roughly consistent with those resulting from a gapless threading simulation. We can use the formulae not only to perform the high-through-put screening of sequences in the inverse folding problem, but also to handle the problem analytically.
Collapse
Affiliation(s)
- Takuyo Aita
- Tsukuba Research Institute, Novartis Pharma K K Ohkubo 8, Tsukuba, 300-2611, Japan
| | | |
Collapse
|
24
|
Abstract
Rotamer libraries are widely used in protein structure prediction, protein design, and structure refinement. As the size of the structure data base has increased rapidly in recent years, it has become possible to derive well-refined rotamer libraries using strict criteria for data inclusion and for studying dependence of rotamer populations and dihedral angles on local structural features.
Collapse
Affiliation(s)
- Roland L Dunbrack
- Institute for Cancer Research, Fox Chase Cancer Center, 7701 Burholme Avenue, Philadelphia PA 19111, USA.
| |
Collapse
|
25
|
Abstract
The progress achieved by several groups in the field of computational protein design shows that successful design methods include two major features: efficient algorithms to deal with the combinatorial exploration of sequence space and optimal energy functions to rank sequences according to their fitness for the given fold.
Collapse
Affiliation(s)
- Joaquim Mendes
- European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
| | | | | |
Collapse
|
26
|
Isogai Y, Ota M, Ishii A, Ishida M, Nishikawa K. Identification of amino acids involved in protein structural uniqueness: implication for de novo protein design. Protein Eng Des Sel 2002; 15:555-60. [PMID: 12200537 DOI: 10.1093/protein/15.7.555] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Structural uniqueness is characteristic of native proteins and is essential to express their biological functions. The major factors that bring about the uniqueness are specific interactions between hydrophobic residues and their unique packing in the protein core. To find the origin of the uniqueness in their amino acid sequences, we analyzed the distribution of the side chain rotational isomers (rotamers) of hydrophobic amino acids in protein tertiary structures and derived deltaS(contact), the conformational-entropy changes of side chains by residue-residue contacts in each secondary structure. The deltaS(contact) values indicate distinct tendencies of the residue pairs to restrict side chain conformation by inter-residue contacts. Of the hydrophobic residues in alpha-helices, aliphatic residues (Leu, Val, Ile) strongly restrict the side chain conformations of each other. In beta-sheets, Met is most strongly restricted by contact with Ile, whereas Leu, Val and Ile are less affected by other residues in contact than those in alpha-helices. In designed and native protein variants, deltaS(contact) was found to correlate with the folding-unfolding cooperativity. Thus, it can be used as a specificity parameter for designing artificial proteins with a unique structure.
Collapse
Affiliation(s)
- Yasuhiro Isogai
- The Institute of Physical and Chemical Research (RIKEN), 2-1 Hirosawa, Wako, Saitama 351-0198, Japan.
| | | | | | | | | |
Collapse
|