1
|
Harihar B, Saravanan KM, Gromiha MM, Selvaraj S. Importance of Inter-residue Contacts for Understanding Protein Folding and Unfolding Rates, Remote Homology, and Drug Design. Mol Biotechnol 2025; 67:862-884. [PMID: 38498284 DOI: 10.1007/s12033-024-01119-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2023] [Accepted: 02/10/2024] [Indexed: 03/20/2024]
Abstract
Inter-residue interactions in protein structures provide valuable insights into protein folding and stability. Understanding these interactions can be helpful in many crucial applications, including rational design of therapeutic small molecules and biologics, locating functional protein sites, and predicting protein-protein and protein-ligand interactions. The process of developing machine learning models incorporating inter-residue interactions has been improved recently. This review highlights the theoretical models incorporating inter-residue interactions in predicting folding and unfolding rates of proteins. Utilizing contact maps to depict inter-residue interactions aids researchers in developing computer models for detecting remote homologs and interface residues within protein-protein complexes which, in turn, enhances our knowledge of the relationship between sequence and structure of proteins. Further, the application of contact maps derived from inter-residue interactions is highlighted in the field of drug discovery. Overall, this review presents an extensive assessment of the significant models that use inter-residue interactions to investigate folding rates, unfolding rates, remote homology, and drug development, providing potential future advancements in constructing efficient computational models in structural biology.
Collapse
Affiliation(s)
- Balasubramanian Harihar
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Konda Mani Saravanan
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Chennai, Tamil Nadu, 600073, India
| | - Michael M Gromiha
- Department of Biotechnology, Bhupat and Jyoti Mehta School of Biosciences, Indian Institute of Technology Madras, Chennai, Tamil Nadu, 600036, India
| | - Samuel Selvaraj
- Department of Bioinformatics, School of Life Sciences, Bharathidasan University, Tiruchirappalli, Tamil Nadu, 620024, India.
| |
Collapse
|
2
|
Kuwajima K. The Molten Globule, and Two-State vs. Non-Two-State Folding of Globular Proteins. Biomolecules 2020; 10:biom10030407. [PMID: 32155758 PMCID: PMC7175247 DOI: 10.3390/biom10030407] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2020] [Revised: 03/03/2020] [Accepted: 03/06/2020] [Indexed: 11/16/2022] Open
Abstract
From experimental studies of protein folding, it is now clear that there are two types of folding behavior, i.e., two-state folding and non-two-state folding, and understanding the relationships between these apparently different folding behaviors is essential for fully elucidating the molecular mechanisms of protein folding. This article describes how the presence of the two types of folding behavior has been confirmed experimentally, and discusses the relationships between the two-state and the non-two-state folding reactions, on the basis of available data on the correlations of the folding rate constant with various structure-based properties, which are determined primarily by the backbone topology of proteins. Finally, a two-stage hierarchical model is proposed as a general mechanism of protein folding. In this model, protein folding occurs in a hierarchical manner, reflecting the hierarchy of the native three-dimensional structure, as embodied in the case of non-two-state folding with an accumulation of the molten globule state as a folding intermediate. The two-state folding is thus merely a simplified version of the hierarchical folding caused either by an alteration in the rate-limiting step of folding or by destabilization of the intermediate.
Collapse
Affiliation(s)
- Kunihiro Kuwajima
- Department of Physics, School of Science, the University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan; ; Tel.: +81-90-5435-6540
- School of Computational Sciences, Korea Institute for Advanced Study (KIAS), Seoul 02455, Korea
| |
Collapse
|
3
|
Corrales M, Cuscó P, Usmanova DR, Chen HC, Bogatyreva NS, Filion GJ, Ivankov DN. Machine Learning: How Much Does It Tell about Protein Folding Rates? PLoS One 2015; 10:e0143166. [PMID: 26606303 PMCID: PMC4659572 DOI: 10.1371/journal.pone.0143166] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2015] [Accepted: 11/02/2015] [Indexed: 11/18/2022] Open
Abstract
The prediction of protein folding rates is a necessary step towards understanding the principles of protein folding. Due to the increasing amount of experimental data, numerous protein folding models and predictors of protein folding rates have been developed in the last decade. The problem has also attracted the attention of scientists from computational fields, which led to the publication of several machine learning-based models to predict the rate of protein folding. Some of them claim to predict the logarithm of protein folding rate with an accuracy greater than 90%. However, there are reasons to believe that such claims are exaggerated due to large fluctuations and overfitting of the estimates. When we confronted three selected published models with new data, we found a much lower predictive power than reported in the original publications. Overly optimistic predictive powers appear from violations of the basic principles of machine-learning. We highlight common misconceptions in the studies claiming excessive predictive power and propose to use learning curves as a safeguard against those mistakes. As an example, we show that the current amount of experimental data is insufficient to build a linear predictor of logarithms of folding rates based on protein amino acid composition.
Collapse
Affiliation(s)
- Marc Corrales
- Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Spain Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
| | - Pol Cuscó
- Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Spain Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
| | - Dinara R. Usmanova
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia
| | - Heng-Chang Chen
- Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Spain Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
| | - Natalya S. Bogatyreva
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, Pushchino, Moscow Region, Russia
| | - Guillaume J. Filion
- Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Spain Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
| | - Dmitry N. Ivankov
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, Pushchino, Moscow Region, Russia
- * E-mail:
| |
Collapse
|
4
|
Chang CCH, Tey BT, Song J, Ramanan RN. Towards more accurate prediction of protein folding rates: a review of the existing web-based bioinformatics approaches. Brief Bioinform 2014; 16:314-24. [DOI: 10.1093/bib/bbu007] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
5
|
Wagaman AS, Jaswal SS. Capturing protein folding-relevant topology via absolute contact order variants. JOURNAL OF THEORETICAL & COMPUTATIONAL CHEMISTRY 2014. [DOI: 10.1142/s0219633614500059] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Absolute contact order is one of the simplest parameters used to predict protein folding rates. Many variants of contact order (CO) have been applied to highlight different aspects of contact neighborhoods and their relationship to folding. However, a systematic study of the influence of CO variants on correlation with folding rate has not been performed for a large combined set of multi- and two-state proteins. We explore different contact neighborhoods and resulting CO by varying the distance thresholds and weighting of sequence separation for heavy atom and residue-based counting methods for a set of 136 proteins diverse across folding and structural classes. We examine the changes in contact neighborhoods and compare correlations with our CO variants and the protein folding rates across our data set as well as by folding type and structural class. Different CO variants lead to the strongest correlations within each protein structural class. Our results demonstrate that backbone topology at a distance beyond where energetic interactions dominate is able to capture folding determinants, and suggest that more sensitive methods of characterizing contact relationships may improve ln kf prediction for diverse protein sets.
Collapse
Affiliation(s)
- Amy S. Wagaman
- Mathematics Department, Amherst College, P. O. Box 5000, Amherst, MA 01002, USA
| | - Sheila S. Jaswal
- Chemistry Department and Program in Biochemistry and Biophysics, Amherst College, P. O. Box 5000, Amherst, MA 01002, USA
| |
Collapse
|
6
|
Cheng X, Xiao X, Wu ZC, Wang P, Lin WZ. Swfoldrate: predicting protein folding rates from amino acid sequence with sliding window method. Proteins 2012; 81:140-8. [PMID: 22933332 DOI: 10.1002/prot.24171] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2012] [Revised: 07/20/2012] [Accepted: 08/25/2012] [Indexed: 01/18/2023]
Abstract
Protein folding is the process by which a protein processes from its denatured state to its specific biologically active conformation. Understanding the relationship between sequences and the folding rates of proteins remains an important challenge. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. In this study, the long-range and short-range contact in protein were used to derive extended version of the pseudo amino acid composition based on sliding window method. This method is capable of predicting the protein folding rates just from the amino acid sequence without the aid of any structural class information. We systematically studied the contributions of individual features to folding rate prediction. The optimal feature selection procedures are adopted by means of combining the forward feature selection and sequential backward selection method. Using the jackknife cross validation test, the method was demonstrated on the large dataset. The predictor was achieved on the basis of multitudinous physicochemical features and statistical features from protein using nonlinear support vector machine (SVM) regression model, the method obtained an excellent agreement between predicted and experimentally observed folding rates of proteins. The correlation coefficient is 0.9313 and the standard error is 2.2692. The prediction server is freely available at http://www.jci-bioinfo.cn/swfrate/input.jsp.
Collapse
Affiliation(s)
- Xiang Cheng
- Computer Department, Jing-De-Zhen Ceramic Institute, Jing-De-Zhen 333403, China
| | | | | | | | | |
Collapse
|
7
|
Guo J, Rao N. Predicting protein folding rate from amino acid sequence. J Bioinform Comput Biol 2011; 9:1-13. [PMID: 21328704 DOI: 10.1142/s0219720011005306] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2010] [Revised: 10/19/2010] [Accepted: 10/19/2010] [Indexed: 11/18/2022]
Abstract
Predicting protein folding rate from amino acid sequence is an important challenge in computational and molecular biology. Over the past few years, many methods have been developed to reflect the correlation between the folding rates and protein structures and sequences. In this paper, we present an effective method, a combined neural network--genetic algorithm approach, to predict protein folding rates only from amino acid sequences, without any explicit structural information. The originality of this paper is that, for the first time, it tackles the effect of sequence order. The proposed method provides a good correlation between the predicted and experimental folding rates. The correlation coefficient is 0.80 and the standard error is 2.65 for 93 proteins, the largest such databases of proteins yet studied, when evaluated with leave-one-out jackknife test. The comparative results demonstrate that this correlation is better than most of other methods, and suggest the important contribution of sequence order information to the determination of protein folding rates.
Collapse
Affiliation(s)
- Jianxiu Guo
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, P. R. China.
| | | |
Collapse
|
8
|
Zhang Y, Luo L. The dynamical contact order: protein folding rate parameters based on quantum conformational transitions. SCIENCE CHINA-LIFE SCIENCES 2011; 54:386-92. [PMID: 21509661 DOI: 10.1007/s11427-011-4158-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/17/2010] [Accepted: 08/09/2010] [Indexed: 11/25/2022]
Abstract
Protein folding is regarded as a quantum transition between the torsion states of a polypeptide chain. According to the quantum theory of conformational dynamics, we propose the dynamical contact order (DCO) defined as a characteristic of the contact described by the moment of inertia and the torsion potential energy of the polypeptide chain between contact residues. Consequently, the protein folding rate can be quantitatively studied from the point of view of dynamics. By comparing theoretical calculations and experimental data on the folding rate of 80 proteins, we successfully validate the view that protein folding is a quantum conformational transition. We conclude that (i) a correlation between the protein folding rate and the contact inertial moment exists; (ii) multi-state protein folding can be regarded as a quantum conformational transition similar to that of two-state proteins but with an intermediate delay. We have estimated the order of magnitude of the time delay; (iii) folding can be classified into two types, exergonic and endergonic. Most of the two-state proteins with higher folding rate are exergonic and most of the multi-state proteins with low folding rate are endergonic. The folding speed limit is determined by exergonic folding.
Collapse
Affiliation(s)
- Ying Zhang
- Laboratory of Theoretical Biophysics, Faculty of Physical Science and Technology, Inner Mongolia University, Hohhot 010021, China
| | | |
Collapse
|
9
|
Guo J, Rao N, Liu G, Yang Y, Wang G. Predicting protein folding rates using the concept of Chou's pseudo amino acid composition. J Comput Chem 2011; 32:1612-7. [PMID: 21328402 DOI: 10.1002/jcc.21740] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2010] [Revised: 11/04/2010] [Accepted: 12/02/2010] [Indexed: 12/12/2022]
Abstract
One of the most important challenges in computational and molecular biology is to understand the relationship between amino acid sequences and the folding rates of proteins. Recent works suggest that topological parameters, amino acid properties, chain length and the composition index relate well with protein folding rates, however, sequence order information has seldom been considered as a property for predicting protein folding rates. In this study, amino acid sequence order was used to derive an effective method, based on an extended version of the pseudo-amino acid composition, for predicting protein folding rates without any explicit structural information. Using the jackknife cross validation test, the method was demonstrated on the largest dataset (99 proteins) reported. The method was found to provide a good correlation between the predicted and experimental folding rates. The correlation coefficient is 0.81 (with a highly significant level) and the standard error is 2.46. The reported algorithm was found to perform better than several representative sequence-based approaches using the same dataset. The results indicate that sequence order information is an important determinant of protein folding rates.
Collapse
Affiliation(s)
- Jianxiu Guo
- School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu 610054, People's Republic of China
| | | | | | | | | |
Collapse
|
10
|
Harihar B, Selvaraj S. Refinement of the long-range order parameter in predicting folding rates of two-state proteins. Biopolymers 2009; 91:928-35. [DOI: 10.1002/bip.21281] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
11
|
Barrick D. What have we learned from the studies of two-state folders, and what are the unanswered questions about two-state protein folding? Phys Biol 2009; 6:015001. [PMID: 19208936 DOI: 10.1088/1478-3975/6/1/015001] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Small proteins with globular structures often fold by simple all-or-none mechanisms, both in an equilibrium and a kinetic sense, despite the very large number of partly folded conformations available. This type of 'two-state' folding will be discussed in terms of experimental tests, underlying molecular mechanisms, and limits to two-state behavior. Factors that appear to be important for two-state folding include topology (sequence distance of contacts in the native structure), molecular cooperativity and local energy distribution. Because their local stability distributions and cooperativities can be dissected and analyzed separately from topological features, recent studies of the folding of symmetric proteins will be discussed as a means to better understand the origins of two-state folding.
Collapse
Affiliation(s)
- Doug Barrick
- T C Department of Biophysics, The Johns Hopkins University, 3400 N Charles St, Baltimore, MD 21218, USA.
| |
Collapse
|
12
|
Huang LT, Gromiha MM. Analysis and prediction of protein folding rates using quadratic response surface models. J Comput Chem 2008; 29:1675-83. [PMID: 18351617 DOI: 10.1002/jcc.20925] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Understanding the relationship between amino acid sequences and folding rates of proteins is an important task in computational and molecular biology. In this work, we have systematically analyzed the composition of amino acid residues for proteins with different ranges of folding rates. We observed that the polar residues, Asn, Gln, Ser, and Lys, are dominant in fast folding proteins whereas the hydrophobic residues, Ala, Cys, Gly, and Leu, prefer to be in slow folding proteins. Further, we have developed a method based on quadratic response surface models for predicting the folding rates of 77 two- and three-state proteins. Our method showed a correlation of 0.90 between experimental and predicted protein folding rates using leave-one-out cross-validation method. The classification of proteins based on structural class improved the correlation to 0.98 and it is 0.99, 0.98, and 0.96, respectively, for all-alpha, all-beta, and mixed class proteins. In addition, we have utilized Baysean classification theory for discriminating two- and three-state proteins, which showed an accuracy of 90%. We have developed a web server for predicting protein folding rates and it is available at http://bioinformatics.myweb.hinet.net/foldrate.htm.
Collapse
Affiliation(s)
- Liang-Tsung Huang
- Department of Computer Science and Information Engineering, Ming-Dao University, Changhua 523, Taiwan
| | | |
Collapse
|
13
|
Istomin AY, Jacobs DJ, Livesay DR. On the role of structural class of a protein with two-state folding kinetics in determining correlations between its size, topology, and folding rate. Protein Sci 2008; 16:2564-9. [PMID: 17962408 DOI: 10.1110/ps.073124507] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
The time it takes for proteins to fold into their native states varies over several orders of magnitude depending on their native-state topology, size, and amino acid composition. In a number of previous studies, it was found that there is strong correlation between logarithmic folding rates and contact order for proteins that fold with two-state kinetics, while such correlation is absent for three-state proteins. Conversely, strong correlations between folding rates and chain length occur within three-state proteins, but not in two-state proteins. Here, we demonstrate that chain lengths and folding rates of two-state proteins are not correlated with each other only when all-alpha, all-beta, and mixed-class proteins are considered together, which is typically the case. However, when considering all-alpha and all-beta two-state proteins separately, there is significant linear correlation between folding rate and size. Moreover, the sets of data points for the all-alpha and all-beta classes define asymptotes of lower and upper limits on folding rates of mixed-class proteins. By analyzing correlation of other topological parameters with folding rates of two-state proteins, we find that only the long-range order exhibits correlation with folding rates that is uniform over all three classes. It is also the only descriptor to provide statistically significant correlations for each of the three structural classes. We give an interpretation of this observation in terms of Makarov and Plaxco's diffusion-based topomer-search model.
Collapse
Affiliation(s)
- Andrei Y Istomin
- Department of Physics and Optical Science, University of North Carolina at Charlotte 28223, USA.
| | | | | |
Collapse
|
14
|
Fulton KF, Bate MA, Faux NG, Mahmood K, Betts C, Buckle AM. Protein Folding Database (PFD 2.0): an online environment for the International Foldeomics Consortium. Nucleic Acids Res 2006; 35:D304-7. [PMID: 17170010 PMCID: PMC1781104 DOI: 10.1093/nar/gkl1007] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The Protein Folding Database (PFD) is a publicly accessible repository of thermodynamic and kinetic protein folding data. Here we describe the first major revision of this work, featuring extensive restructuring that conforms to standards set out by the recently formed International Foldeomics Consortium. The database now adopts standards for data acquisition, analysis and reporting proposed by the consortium, which will facilitate the comparison of folding rates, energies and structure across diverse sets of proteins. Data can now be easily deposited using a rich set of deposition tools. Enhanced search tools allow sophisticated searching and graphical data analysis affords simple data analysis online. PFD can be accessed freely at .
Collapse
Affiliation(s)
- Kate F. Fulton
- The Department of Biochemistry and Molecular Biology, School of Biomedical Sciences, Faculty of Medicine, Monash UniversityClayton, Victoria 3800, Australia
| | - Mark A. Bate
- The Department of Biochemistry and Molecular Biology, School of Biomedical Sciences, Faculty of Medicine, Monash UniversityClayton, Victoria 3800, Australia
| | - Noel G. Faux
- The Department of Biochemistry and Molecular Biology, School of Biomedical Sciences, Faculty of Medicine, Monash UniversityClayton, Victoria 3800, Australia
| | - Khalid Mahmood
- The Department of Biochemistry and Molecular Biology, School of Biomedical Sciences, Faculty of Medicine, Monash UniversityClayton, Victoria 3800, Australia
| | - Chris Betts
- The Department of Biochemistry and Molecular Biology, School of Biomedical Sciences, Faculty of Medicine, Monash UniversityClayton, Victoria 3800, Australia
| | - Ashley M. Buckle
- The Department of Biochemistry and Molecular Biology, School of Biomedical Sciences, Faculty of Medicine, Monash UniversityClayton, Victoria 3800, Australia
- To whom correspondence should be addressed. Tel: +61 9905 3781; Fax: +61 9905 9773;
| |
Collapse
|
15
|
Ma BG, Guo JX, Zhang HY. Direct correlation between proteins' folding rates and their amino acid compositions: An ab initio folding rate prediction. Proteins 2006; 65:362-72. [PMID: 16937389 DOI: 10.1002/prot.21140] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Discovering the mechanism of protein folding, in molecular biology, is a great challenge. A key step to this end is to find factors that correlate with protein folding rates. Over the past few years, many empirical parameters, such as contact order, long-range order, total contact distance, secondary structure contents, have been developed to reflect the correlation between folding rates and protein tertiary or secondary structures. However, the correlation between proteins' folding rates and their amino acid compositions has not been explored. In the present work, we examined systematically the correlation between proteins' folding rates and their amino acid compositions for two-state and multistate folders and found that different amino acids contributed differently to the folding progress. The relation between the amino acids' molecular weight and degeneracy and the folding rates was examined, and the role of hydrophobicity in the protein folding process was also inspected. As a consequence, a new indicator called composition index was derived, which takes no structure factors into account and is merely determined by the amino acid composition of a protein. Such an indicator is found to be highly correlated with the protein's folding rate (r > 0.7). From the results of this work, three points of concluding remarks are evident. (1) Two-state folders and multistate folders have different rate-determining amino acids. (2) The main determining information of a protein's folding rate is largely reflected in its amino acid composition. (3) Composition index may be the best predictor for an ab initio protein folding rate prediction directly from protein sequence from the standpoint of practical application.
Collapse
Affiliation(s)
- Bin-Guang Ma
- Shandong Provincial Research Center for Bioinformatic Engineering and Technique, Center for Advanced Study, Shandong University of Technology, Zibo 255049, People's Republic of China.
| | | | | |
Collapse
|