1
|
Yang J, Cheng WX, Zhang P, Wu G, Sheng ST, Yang J, Zhao S, Hu Q, Ji W, Shi Q. Conformational ensembles for protein structure prediction. Sci Rep 2025; 15:8513. [PMID: 40074747 PMCID: PMC11904239 DOI: 10.1038/s41598-024-84066-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2024] [Accepted: 12/19/2024] [Indexed: 03/14/2025] Open
Abstract
Acquisition of conformational ensembles for a protein is a challenging task, which is actually involving to the solution for protein folding problem and the study of intrinsically disordered protein. Despite AlphaFold with artificial intelligence acquired unprecedented accuracy to predict structures, its result is limited to a single state of conformation and it cannot provide multiple conformations to display protein intrinsic disorder. To overcome the barrier, a FiveFold approach was developed with a single sequence method. It applied the protein folding shape code (PFSC) uniformly to expose local folds of five amino acid residues, formed the protein folding variation matrix (PFVM) to reveal local folding variations along sequence, obtained a massive number of folding conformations in PFSC strings, and then an ensemble of multiple conformational protein structures is constructed. The P53_HUMAN as a well-known protein and LEF1_HUMAN and Q8GT36_SPIOL as typical disordered proteins are token as the benchmark to evaluate the predicted outcomes. The results demonstrated an effective algorithm and biological meaningful process well to predict protein multiple conformation structures.
Collapse
Affiliation(s)
- Jiaan Yang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China.
- Micro Biotech, Ltd., Shanghai, 200123, China.
| | - Wen Xiang Cheng
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
| | - Peng Zhang
- Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, Guangdong, China
- Biomedical Engineering, Shenzhen University of Advanced Technology, Shenzhen, 518060, China
| | - Gang Wu
- School of Basic Medicine, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei, China
| | - Si Tong Sheng
- HYK High-Throughput Biotechnology Institute, Shenzhen, 518057, Guangdong, China
| | - Junjie Yang
- Wuhan International Biohub Cooperation, Wuhan, 430075, Hubei, China
| | - Suwen Zhao
- iHuman Institute, ShanghaiTech University, Shanghai, 201210, China
| | - Qiyue Hu
- Beyang Therapeutics Co. Ltd, Shanghai, 201210, China
| | - Wenxin Ji
- National Facility for Protein Science in Shanghai, Shanghai Advanced Research Institute, Chinese Academy of Sciences, Shanghai, 201210, China
| | - Qiong Shi
- Laboratory of Aquatic Genomics, College of Life Sciences and Oceanography, Shenzhen University, Shenzhen, 518057, China
| |
Collapse
|
2
|
Kazmirchuk TDD, Bradbury-Jost C, Withey TA, Gessese T, Azad T, Samanfar B, Dehne F, Golshani A. Peptides of a Feather: How Computation Is Taking Peptide Therapeutics under Its Wing. Genes (Basel) 2023; 14:1194. [PMID: 37372372 PMCID: PMC10298604 DOI: 10.3390/genes14061194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 05/24/2023] [Accepted: 05/26/2023] [Indexed: 06/29/2023] Open
Abstract
Leveraging computation in the development of peptide therapeutics has garnered increasing recognition as a valuable tool to generate novel therapeutics for disease-related targets. To this end, computation has transformed the field of peptide design through identifying novel therapeutics that exhibit enhanced pharmacokinetic properties and reduced toxicity. The process of in-silico peptide design involves the application of molecular docking, molecular dynamics simulations, and machine learning algorithms. Three primary approaches for peptide therapeutic design including structural-based, protein mimicry, and short motif design have been predominantly adopted. Despite the ongoing progress made in this field, there are still significant challenges pertaining to peptide design including: enhancing the accuracy of computational methods; improving the success rate of preclinical and clinical trials; and developing better strategies to predict pharmacokinetics and toxicity. In this review, we discuss past and present research pertaining to the design and development of in-silico peptide therapeutics in addition to highlighting the potential of computation and artificial intelligence in the future of disease therapeutics.
Collapse
Affiliation(s)
- Thomas David Daniel Kazmirchuk
- Department of Biology, and the Ottawa Institute of Systems Biology (OISB), Carleton University, Ottawa, ON K1S 5B6, Canada
| | - Calvin Bradbury-Jost
- Department of Biology, and the Ottawa Institute of Systems Biology (OISB), Carleton University, Ottawa, ON K1S 5B6, Canada
| | - Taylor Ann Withey
- Department of Biology, and the Ottawa Institute of Systems Biology (OISB), Carleton University, Ottawa, ON K1S 5B6, Canada
| | - Tadesse Gessese
- Department of Biology, and the Ottawa Institute of Systems Biology (OISB), Carleton University, Ottawa, ON K1S 5B6, Canada
| | - Taha Azad
- Department of Microbiology and Infectious Diseases, Université de Sherbrooke, Sherbrooke, QC J1E 4K8, Canada
- Centre de Recherche du Centre Hospitalier Universitaire de Sherbrooke (CHUS), Sherbrooke, QC J1H 5N4, Canada
| | - Bahram Samanfar
- Department of Biology, and the Ottawa Institute of Systems Biology (OISB), Carleton University, Ottawa, ON K1S 5B6, Canada
- Agriculture and Agri-Food Canada, Ottawa Research and Development Centre (ORDC), Ottawa, ON K1A 0C6, Canada
| | - Frank Dehne
- School of Computer Science, Carleton University, Ottawa, ON K1S 5B6, Canada
| | - Ashkan Golshani
- Department of Biology, and the Ottawa Institute of Systems Biology (OISB), Carleton University, Ottawa, ON K1S 5B6, Canada
| |
Collapse
|
3
|
Machine learning/molecular dynamic protein structure prediction approach to investigate the protein conformational ensemble. Sci Rep 2022; 12:10018. [PMID: 35705565 PMCID: PMC9200820 DOI: 10.1038/s41598-022-13714-z] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 05/11/2022] [Indexed: 11/25/2022] Open
Abstract
Proteins exist in several different conformations. These structural changes are often associated with fluctuations at the residue level. Recent findings show that co-evolutionary analysis coupled with machine-learning techniques improves the precision by providing quantitative distance predictions between pairs of residues. The predicted statistical distance distribution from Multi Sequence Analysis reveals the presence of different local maxima suggesting the flexibility of key residue pairs. Here we investigate the ability of the residue-residue distance prediction to provide insights into the protein conformational ensemble. We combine deep learning approaches with mechanistic modeling to a set of proteins that experimentally showed conformational changes. The predicted protein models were filtered based on energy scores, RMSD clustering, and the centroids selected as the lowest energy structure per cluster. These models were compared to the experimental-Molecular Dynamics (MD) relaxed structure by analyzing the backbone residue torsional distribution and the sidechain orientations. Our pipeline allows to retrieve the experimental structural dynamics experimentally represented by different X-ray conformations for the same sequence as well the conformational space observed with the MD simulations. We show the potential correlation between the experimental structure dynamics and the predicted model ensemble demonstrating the susceptibility of the current state-of-the-art methods in protein folding and dynamics prediction and pointing out the areas of improvement.
Collapse
|
4
|
Zou T, Woodrum BW, Halloran N, Campitelli P, Bobkov AA, Ghirlanda G, Ozkan SB. Local Interactions That Contribute Minimal Frustration Determine Foldability. J Phys Chem B 2021; 125:2617-2626. [PMID: 33687216 DOI: 10.1021/acs.jpcb.1c00364] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Earlier experiments suggest that the evolutionary information (conservation and coevolution) encoded in protein sequences is necessary and sufficient to specify the fold of a protein family. However, there is no computational work to quantify the effect of such evolutionary information on the folding process. Here we explore the role of early folding steps for sequences designed using coevolution and conservation through a combination of computational and experimental methods. We simulated a repertoire of native and designed WW domain sequences to analyze early local contact formation and found that the N-terminal β-hairpin turn would not form correctly due to strong non-native local contacts in unfoldable sequences. Through a maximum likelihood approach, we identified five local contacts that play a critical role in folding, suggesting that a small subset of amino acid pairs can be used to solve the "needle in the haystack" problem to design foldable sequences. Thus, using the contact probability of those five local contacts that form during the early stage of folding, we built a classification model that predicts the foldability of a WW sequence with 81% accuracy. This classification model was used to redesign WW domain sequences that could not fold due to frustration and make them foldable by introducing a few mutations that led to the stabilization of these critical local contacts. The experimental analysis shows that a redesigned sequence folds and binds to polyproline peptides with a similar affinity as those observed for native WW domains. Overall, our analysis shows that evolutionary-designed sequences should not only satisfy the folding stability but also ensure a minimally frustrated folding landscape.
Collapse
Affiliation(s)
- Taisong Zou
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona 85287, United States
| | - Brian W Woodrum
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States
| | - Nicholas Halloran
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States
| | - Paul Campitelli
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona 85287, United States
| | - Andrey A Bobkov
- Conrad Prebys Center for Chemical Genomics, Sanford Burnham Prebys Medical Discovery Institute, La Jolla, California 92037, United States
| | - Giovanna Ghirlanda
- School of Molecular Sciences, Arizona State University, Tempe, Arizona 85287, United States
| | - Sefika Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona 85287, United States
| |
Collapse
|
5
|
rawMSA: End-to-end Deep Learning using raw Multiple Sequence Alignments. PLoS One 2019; 14:e0220182. [PMID: 31415569 PMCID: PMC6695225 DOI: 10.1371/journal.pone.0220182] [Citation(s) in RCA: 46] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2019] [Accepted: 07/10/2019] [Indexed: 12/01/2022] Open
Abstract
In the last decades, huge efforts have been made in the bioinformatics community to develop machine learning-based methods for the prediction of structural features of proteins in the hope of answering fundamental questions about the way proteins function and their involvement in several illnesses. The recent advent of Deep Learning has renewed the interest in neural networks, with dozens of methods being developed taking advantage of these new architectures. However, most methods are still heavily based pre-processing of the input data, as well as extraction and integration of multiple hand-picked, and manually designed features. Multiple Sequence Alignments (MSA) are the most common source of information in de novo prediction methods. Deep Networks that automatically refine the MSA and extract useful features from it would be immensely powerful. In this work, we propose a new paradigm for the prediction of protein structural features called rawMSA. The core idea behind rawMSA is borrowed from the field of natural language processing to map amino acid sequences into an adaptively learned continuous space. This allows the whole MSA to be input into a Deep Network, thus rendering pre-calculated features such as sequence profiles and other features calculated from MSA obsolete. We showcased the rawMSA methodology on three different prediction problems: secondary structure, relative solvent accessibility and inter-residue contact maps. We have rigorously trained and benchmarked rawMSA on a large set of proteins and have determined that it outperforms classical methods based on position-specific scoring matrices (PSSM) when predicting secondary structure and solvent accessibility, while performing on par with methods using more pre-calculated features in the inter-residue contact map prediction category in CASP12 and CASP13. Clearly demonstrating that rawMSA represents a promising development that can pave the way for improved methods using rawMSA instead of sequence profiles to represent evolutionary information in the coming years. Availability: datasets, dataset generation code, evaluation code and models are available at: https://bitbucket.org/clami66/rawmsa.
Collapse
|
6
|
Molecular simulation of peptides coming of age: Accurate prediction of folding, dynamics and structures. Arch Biochem Biophys 2019; 664:76-88. [DOI: 10.1016/j.abb.2019.01.033] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Revised: 01/23/2019] [Accepted: 01/28/2019] [Indexed: 12/24/2022]
|
7
|
Delarue M, Koehl P. Combined approaches from physics, statistics, and computer science for ab initio protein structure prediction: ex unitate vires (unity is strength)? F1000Res 2018; 7. [PMID: 30079234 PMCID: PMC6058471 DOI: 10.12688/f1000research.14870.1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/19/2018] [Indexed: 11/20/2022] Open
Abstract
Connecting the dots among the amino acid sequence of a protein, its structure, and its function remains a central theme in molecular biology, as it would have many applications in the treatment of illnesses related to misfolding or protein instability. As a result of high-throughput sequencing methods, biologists currently live in a protein sequence-rich world. However, our knowledge of protein structure based on experimental data remains comparatively limited. As a consequence, protein structure prediction has established itself as a very active field of research to fill in this gap. This field, once thought to be reserved for theoretical biophysicists, is constantly reinventing itself, borrowing ideas informed by an ever-increasing assembly of scientific domains, from biology, chemistry, (statistical) physics, mathematics, computer science, statistics, bioinformatics, and more recently data sciences. We review the recent progress arising from this integration of knowledge, from the development of specific computer architecture to allow for longer timescales in physics-based simulations of protein folding to the recent advances in predicting contacts in proteins based on detection of coevolution using very large data sets of aligned protein sequences.
Collapse
Affiliation(s)
- Marc Delarue
- Unité Dynamique Structurale des Macromolécules, Institut Pasteur, and UMR 3528 du CNRS, Paris, France
| | - Patrice Koehl
- Department of Computer Science, Genome Center, University of California, Davis, Davis, California, USA
| |
Collapse
|
8
|
Smith DJ, Shell MS. Can Simple Interaction Models Explain Sequence-Dependent Effects in Peptide Homodimerization? J Phys Chem B 2017; 121:5928-5943. [PMID: 28537734 DOI: 10.1021/acs.jpcb.7b03186] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
The development of rapid methods to explain and predict peptide interactions, aggregation, and self-assembly has become important to understanding amyloid disease pathology, the shelf stability of peptide therapeutics, and the design of novel peptide materials. Although experimental aggregation databases have been used to develop correlative and statistical models, molecular simulations offer atomic-level details that potentially provide greater physical insight and allow one to single out the most explanatory simple models. Here, we outline one such approach using a case study that develops homodimerization models for serine-glycine peptides with various hydrophobic leucine mutations. Using detailed all-atom simulations, we calculate reference dimerization free energy profiles and binding constants for a small peptide library. We then use statistical methods to systematically assess whether simple interaction models, which do not require expensive simulations and free energy calculation, can capture them. Surprisingly, some combinations of a few simple scaling laws well recapitulate the detailed, all-atom results with high accuracy. Specifically, we find that a recently proposed phenomenological hydrophobic force law and coarse measures of entropic effects in binding offer particularly high explanatory power, underscoring the physical relevance to association that these driving forces can play.
Collapse
Affiliation(s)
- David J Smith
- Department of Chemical Engineering, University of California, Santa Barbara , Santa Barbara, California 93106, United States
| | - M Scott Shell
- Department of Chemical Engineering, University of California, Santa Barbara , Santa Barbara, California 93106, United States
| |
Collapse
|
9
|
Li DW, Brüschweiler R. Protocol To Make Protein NMR Structures Amenable to Stable Long Time Scale Molecular Dynamics Simulations. J Chem Theory Comput 2015; 10:1781-7. [PMID: 26580385 DOI: 10.1021/ct4010646] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
A robust protocol for the treatment of NMR protein structures is presented that makes them amenable to long time scale molecular dynamics (MD) simulations that are stable. The protocol embeds an NMR structure in a native low energy region of the recently developed ff99SB_φψ(g24;CS) molecular mechanics force field. Extended MD trajectories that start from these structures show good consistency with proton-proton nuclear Overhauser effect data, and they reproduce NMR chemical shift data better than the original NMR structures as is demonstrated for four protein systems. Moreover, for all proteins studied here the simulations spontaneously approach the X-ray crystal structures, thereby improving the effective resolution of the initial structural models.
Collapse
Affiliation(s)
- Da-Wei Li
- Campus Chemical Instrument Center and Department of Chemistry and Biochemistry, The Ohio State University , Columbus, Ohio 43210, United States.,Department of Chemistry and Biochemistry and National High Magnetic Field Laboratory, Florida State University , Tallahassee, Florida 32306, United States
| | - Rafael Brüschweiler
- Campus Chemical Instrument Center and Department of Chemistry and Biochemistry, The Ohio State University , Columbus, Ohio 43210, United States.,Department of Chemistry and Biochemistry and National High Magnetic Field Laboratory, Florida State University , Tallahassee, Florida 32306, United States
| |
Collapse
|
10
|
Kumar A, Campitelli P, Thorpe MF, Ozkan SB. Partial unfolding and refolding for structure refinement: A unified approach of geometric simulations and molecular dynamics. Proteins 2015; 83:2279-92. [PMID: 26476100 DOI: 10.1002/prot.24947] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2015] [Revised: 09/11/2015] [Accepted: 09/29/2015] [Indexed: 12/26/2022]
Abstract
The most successful protein structure prediction methods to date have been template-based modeling (TBM) or homology modeling, which predicts protein structure based on experimental structures. These high accuracy predictions sometimes retain structural errors due to incorrect templates or a lack of accurate templates in the case of low sequence similarity, making these structures inadequate in drug-design studies or molecular dynamics simulations. We have developed a new physics based approach to the protein refinement problem by mimicking the mechanism of chaperons that rehabilitate misfolded proteins. The template structure is unfolded by selectively (targeted) pulling on different portions of the protein using the geometric based technique FRODA, and then refolded using hierarchically restrained replica exchange molecular dynamics simulations (hr-REMD). FRODA unfolding is used to create a diverse set of topologies for surveying near native-like structures from a template and to provide a set of persistent contacts to be employed during re-folding. We have tested our approach on 13 previous CASP targets and observed that this method of folding an ensemble of partially unfolded structures, through the hierarchical addition of contact restraints (that is, first local and then nonlocal interactions), leads to a refolding of the structure along with refinement in most cases (12/13). Although this approach yields refined models through advancement in sampling, the task of blind selection of the best refined models still needs to be solved. Overall, the method can be useful for improved sampling for low resolution models where certain of the portions of the structure are incorrectly modeled.
Collapse
Affiliation(s)
- Avishek Kumar
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona
| | - Paul Campitelli
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona
| | - M F Thorpe
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona.,Rudolf Peierls Center for Theoretical Physics, University of Oxford, Oxford, OX1 3NP, United Kingdom
| | - S Banu Ozkan
- Department of Physics and Center for Biological Physics, Arizona State University, Tempe, Arizona
| |
Collapse
|
11
|
Márquez-Chamorro AE, Asencio-Cortés G, Santiesteban-Toca CE, Aguilar-Ruiz JS. Soft computing methods for the prediction of protein tertiary structures: A survey. Appl Soft Comput 2015. [DOI: 10.1016/j.asoc.2015.06.024] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
12
|
Vallat B, Madrid-Aliste C, Fiser A. Modularity of Protein Folds as a Tool for Template-Free Modeling of Structures. PLoS Comput Biol 2015; 11:e1004419. [PMID: 26252221 PMCID: PMC4529212 DOI: 10.1371/journal.pcbi.1004419] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 06/30/2015] [Indexed: 12/25/2022] Open
Abstract
Predicting the three-dimensional structure of proteins from their amino acid sequences remains a challenging problem in molecular biology. While the current structural coverage of proteins is almost exclusively provided by template-based techniques, the modeling of the rest of the protein sequences increasingly require template-free methods. However, template-free modeling methods are much less reliable and are usually applicable for smaller proteins, leaving much space for improvement. We present here a novel computational method that uses a library of supersecondary structure fragments, known as Smotifs, to model protein structures. The library of Smotifs has saturated over time, providing a theoretical foundation for efficient modeling. The method relies on weak sequence signals from remotely related protein structures to create a library of Smotif fragments specific to the target protein sequence. This Smotif library is exploited in a fragment assembly protocol to sample decoys, which are assessed by a composite scoring function. Since the Smotif fragments are larger in size compared to the ones used in other fragment-based methods, the proposed modeling algorithm, SmotifTF, can employ an exhaustive sampling during decoy assembly. SmotifTF successfully predicts the overall fold of the target proteins in about 50% of the test cases and performs competitively when compared to other state of the art prediction methods, especially when sequence signal to remote homologs is diminishing. Smotif-based modeling is complementary to current prediction methods and provides a promising direction in addressing the structure prediction problem, especially when targeting larger proteins for modeling. Each protein folds into a unique three-dimensional structure that enables it to carry out its biological function. Knowledge of the atomic details of protein structures is therefore a key to understanding their function. Advances in high throughput experimental technologies have lead to an exponential increase in the availability of known protein sequences. Although strong progress has been made in experimental protein structure determination, it remains a fact that more than 99% of structural information is provided by computational modeling methods. We describe here a novel structure prediction method, SmotifTF, which uses a unique library of known protein fragments to assemble the three-dimensional structure of a sequence. The fragment library has saturated over time and therefore provides a complete set of building blocks required for model building. The method performs competitively compared to existing methods of structure prediction.
Collapse
Affiliation(s)
- Brinda Vallat
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, New York, United States of America
| | - Carlos Madrid-Aliste
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, New York, United States of America
| | - Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, New York, New York, United States of America
| |
Collapse
|
13
|
Ruiz-Blanco YB, Marrero-Ponce Y, García Y, Puris A, Bello R, Green J, Sotomayor-Torres CM. A physics-based scoring function for protein structural decoys: Dynamic testing on targets of CASP-ROLL. Chem Phys Lett 2014. [DOI: 10.1016/j.cplett.2014.07.014] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
14
|
Maurice KJ. SSThread: Template-free protein structure prediction by threading pairs of contacting secondary structures followed by assembly of overlapping pairs. J Comput Chem 2014; 35:644-56. [PMID: 24523210 DOI: 10.1002/jcc.23543] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2013] [Revised: 11/15/2013] [Accepted: 01/05/2014] [Indexed: 11/12/2022]
Abstract
Acquiring the three-dimensional structure of a protein from its amino acid sequence alone, despite a great deal of work and significant progress on the subject, is still an unsolved problem. SSThread, a new template-free algorithm is described here that consists of making several predictions of contacting pairs of α-helices and β-strands derived from a database of experimental structures using a knowledge-based potential, secondary structure prediction, and contact map prediction followed by assembly of overlapping pair predictions to create an ensemble of core structure predictions whose loops are then predicted. In a set of seven CASP10 targets SSThread outperformed the two leading methods for two targets each. The targets were all β-strand containing structures and most of them have a high relative contact order which demonstrates the advantages of SSThread. The primary bottlenecks based on sets of 74 and 21 test cases are the pair prediction and loop prediction stages.
Collapse
|
15
|
Orevi T, Rahamim G, Shemesh S, Ben Ishay E, Amir D, Haas E. Fast closure of long loops at the initiation of the folding transition of globular proteins studied by time-resolved FRET-based methods. BIO-ALGORITHMS AND MED-SYSTEMS 2014. [DOI: 10.1515/bams-2014-0018] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
AbstractThe protein folding problem would be considered “solved” when it will be possible to “read genes”, i.e., to predict the native fold of proteins, their dynamics, and the mechanism of fast folding based solely on sequence data. The long-term goal should be the creation of an algorithm that would simulate the stepwise mechanism of folding, which constrains the conformational space and in which random search for stable interactions is possible. Here, we focus attention on the initial phases of the folding transition starting with the compact disordered collapsed ensemble, in search of the initial sub-domain structural biases that direct the otherwise stochastic dynamics of the backbone. Our studies are designed to test the “loop hypothesis”, which suggests that fast closure of long loop structures by non-local interactions between clusters of mainly non-polar residues is an essential conformational step at the initiation of the folding transition of globular proteins. We developed and applied experimental methods based on time-resolved resonance excitation energy transfer (trFRET) measurements combined with fast mixing methods and studied the initial phases of the folding of
Collapse
|
16
|
Orevi T, Rahamim G, Hazan G, Amir D, Haas E. The loop hypothesis: contribution of early formed specific non-local interactions to the determination of protein folding pathways. Biophys Rev 2013; 5:85-98. [PMID: 28510159 DOI: 10.1007/s12551-013-0113-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2013] [Accepted: 03/01/2013] [Indexed: 12/12/2022] Open
Abstract
The extremely fast and efficient folding transition (in seconds) of globular proteins led to the search for some unifying principles embedded in the physics of the folding polypeptides. Most of the proposed mechanisms highlight the role of local interactions that stabilize secondary structure elements or a folding nucleus as the starting point of the folding pathways, i.e., a "bottom-up" mechanism. Non-local interactions were assumed either to stabilize the nucleus or lead to the later steps of coalescence of the secondary structure elements. An alternative mechanism was proposed, an "up-down" mechanism in which it was assumed that folding starts with the formation of very few non-local interactions which form closed long loops at the initiation of folding. The possible biological advantage of this mechanism, the "loop hypothesis", is that the hydrophobic collapse is associated with ordered compactization which reduces the chance for degradation and misfolding. In the present review the experiments, simulations and theoretical consideration that either directly or indirectly support this mechanism are summarized. It is argued that experiments monitoring the time-dependent development of the formation of specifically targeted early-formed sub-domain structural elements, either long loops or secondary structure elements, are necessary. This can be achieved by the time-resolved FRET-based "double kinetics" method in combination with mutational studies. Yet, attempts to improve the time resolution of the folding initiation should be extended down to the sub-microsecond time regime in order to design experiments that would resolve the classes of proteins which first fold by local or non-local interactions.
Collapse
Affiliation(s)
- Tomer Orevi
- The Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel, 52900
| | - Gil Rahamim
- The Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel, 52900
| | - Gershon Hazan
- The Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel, 52900
| | - Dan Amir
- The Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel, 52900
| | - Elisha Haas
- The Goodman Faculty of Life Sciences, Bar Ilan University, Ramat Gan, Israel, 52900.
| |
Collapse
|
17
|
Charge effects on the fibril-forming peptide KTVIIE: a two-dimensional replica exchange simulation study. Biophys J 2012; 102:1952-60. [PMID: 22768952 DOI: 10.1016/j.bpj.2012.03.019] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2011] [Revised: 02/28/2012] [Accepted: 03/02/2012] [Indexed: 12/12/2022] Open
Abstract
The assembly of peptides into ordered nanostructures is increasingly recognized as both a bioengineering tool for generating new materials and a critical aspect of aggregation processes that underlie neurological diseases such as Alzheimer's disease, Parkinson's disease, and Huntington's disease. There is a major problem in understanding how extremely subtle sequence changes can lead to profound and often unexpected differences in self-assembly behavior. To better delineate the complex interplay of different microscopic driving forces in such cases, we develop a methodology to quantify and compare the propensity of different peptide sequences to form small oligomers during early self-assembly stages. This umbrella-sampling replica exchange molecular dynamics method performs a replica exchange molecular dynamics simulation along peptide association reaction coordinates using umbrella restraints. With this method, we study a set of sequence-similar peptides that differ in net charge: K(+)TVIIE(-), K(+)TVIIE, and (+)K(+)TVIIE. Interestingly, experiments show that only the monovalent peptide, K(+)TVIIE, forms fibrils, whereas the others do not. We examine dimer, trimer, and tetramer formation processes of these peptides, and compute high-accuracy potential of mean force association curves. The potential of mean forces recapitulate a higher stability and equilibrium constant of the fibril-forming peptide, similar to experiment, but reveal that entropic contributions to association free energies can play a surprisingly significant role. The simulations also show behavior reminiscent of experimental aggregate polymorphism, revealed in multiple stable conformational states and association pathways. Our results suggest that sequence changes can have significant effects on self-assembly through not only direct peptide-peptide interactions but conformational entropies and degeneracies as well.
Collapse
|
18
|
Li DW, Brüschweiler R. Dynamic and Thermodynamic Signatures of Native and Non-Native Protein States with Application to the Improvement of Protein Structures. J Chem Theory Comput 2012; 8:2531-9. [PMID: 26588978 DOI: 10.1021/ct300358u] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Accurate knowledge of the 3D structural ensemble of proteins is important for understanding of their biological function. We report here the application of microsecond all-atom molecular dynamics (MD) simulations in explicit solvent for the improvement of the quality of low-resolution structures obtained by protein structure prediction (decoys). Seventy MD simulations of ∼1 μs average duration were performed on 13 different protein systems starting from X-ray crystal structures and decoys. Their behavior can be divided into three groups: 22 trajectories converged toward the native state, 27 trajectories displayed a quasi-equilibrium by populating mainly a single non-native free energy basin, and 21 trajectories drifted away from their initial decoy structure transiently visiting multiple free energy minima. To determine whether the native structure can be identified among non-native ensembles, the free energy was determined for each basin by the MM/GBSA method together with the von Mises entropy estimator in dihedral angle space. For the proteins studied here, it is found that the ensembles belonging to free energy basins with the lowest free energies and the longest residence times are most native-like. The results demonstrate that explicit solvent microsecond MD simulations using the latest generation of protein force fields and free energy metrics are sufficiently accurate to permit positive identification of native state ensembles against low-resolution structural models and decoys. The approach can be applied to the direct refinement of predicted or experimental low-resolution protein structures.
Collapse
Affiliation(s)
- Da-Wei Li
- Chemical Sciences Laboratory, Department of Chemistry and Biochemistry and National High Magnetic Field Laboratory, Florida State University, Tallahassee, Florida 32306, United States
| | - Rafael Brüschweiler
- Chemical Sciences Laboratory, Department of Chemistry and Biochemistry and National High Magnetic Field Laboratory, Florida State University, Tallahassee, Florida 32306, United States
| |
Collapse
|
19
|
Glembo TJ, Farrell DW, Gerek ZN, Thorpe MF, Ozkan SB. Collective dynamics differentiates functional divergence in protein evolution. PLoS Comput Biol 2012; 8:e1002428. [PMID: 22479170 PMCID: PMC3315450 DOI: 10.1371/journal.pcbi.1002428] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2011] [Accepted: 01/30/2012] [Indexed: 12/29/2022] Open
Abstract
Protein evolution is most commonly studied by analyzing related protein sequences and generating ancestral sequences through Bayesian and Maximum Likelihood methods, and/or by resurrecting ancestral proteins in the lab and performing ligand binding studies to determine function. Structural and dynamic evolution have largely been left out of molecular evolution studies. Here we incorporate both structure and dynamics to elucidate the molecular principles behind the divergence in the evolutionary path of the steroid receptor proteins. We determine the likely structure of three evolutionarily diverged ancestral steroid receptor proteins using the Zipping and Assembly Method with FRODA (ZAMF). Our predictions are within ∼2.7 Å all-atom RMSD of the respective crystal structures of the ancestral steroid receptors. Beyond static structure prediction, a particular feature of ZAMF is that it generates protein dynamics information. We investigate the differences in conformational dynamics of diverged proteins by obtaining the most collective motion through essential dynamics. Strikingly, our analysis shows that evolutionarily diverged proteins of the same family do not share the same dynamic subspace, while those sharing the same function are simultaneously clustered together and distant from those, that have functionally diverged. Dynamic analysis also enables those mutations that most affect dynamics to be identified. It correctly predicts all mutations (functional and permissive) necessary to evolve new function and ∼60% of permissive mutations necessary to recover ancestral function. Proteins are remarkable machines of the living systems that show diverse biochemical functions. Biochemical diversity has grown over time via molecular evolution. In order to understand how diversity arose, it is fundamental to understand how the earliest proteins evolved and served as templates for the present diverse proteome. The one sequence - one structure - one function paradigm is being extended to a new view: an ensemble of different conformations in equilibrium can evolve new function and the analysis of inherent structural dynamics is crucial to give a more complete understanding of protein evolution. Therefore, we aim to bring structural dynamics into protein evolution through our zipping and assembly method with FRODA. (ZAMF). We apply ZAMF to simultaneously obtain structures and structural dynamics of three ancestral sequences of steroid receptor proteins. By comparative dynamics analysis among the three ancestral steroid hormone receptors: (i) we show that changes in the structural dynamics indicates functional divergence and (ii) we identify all functionally critical and most of the permissive mutations necessary to evolve new function. Overall, all these findings suggest that conformational dynamics may play an important role where new functions evolve through novel molecular interactions.
Collapse
Affiliation(s)
- Tyler J. Glembo
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona, United States of America
| | - Daniel W. Farrell
- Laufer Center for Physical and Quantitative Biology, Stony Brook University, Stony Brook, New York, United States of America
| | - Z. Nevin Gerek
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona, United States of America
| | - M. F. Thorpe
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona, United States of America
| | - S. Banu Ozkan
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona, United States of America
- * E-mail:
| |
Collapse
|
20
|
Voelz VA, Dill KA, Chorny I. Peptoid conformational free energy landscapes from implicit-solvent molecular simulations in AMBER. Biopolymers 2012; 96:639-50. [PMID: 21184487 DOI: 10.1002/bip.21575] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
To test the accuracy of existing AMBER force field models in predicting peptoid conformation and dynamics, we simulated a set of model peptoid molecules recently examined by Butterfoss et al. (JACS 2009, 131, 16798-16807) using QM methods as well as three peptoid sequences with experimentally determined structures. We found that AMBER force fields, when used with a Generalized Born/Surface Area (GBSA) implicit solvation model, could accurately reproduce the peptoid torsional landscape as well as the major conformers of known peptoid structures. Enhanced sampling by replica exchange molecular dynamics (REMD) using temperatures from 300 to 800 K was used to sample over cis-trans isomerization barriers. Compared to (Nrch)5 and cyclo-octasarcosyl, the free energy of N-(2-nitro-3-hydroxyl phenyl)glycine-N-(phenyl)glycine has the most "foldable" free energy landscape, due to deep trans-amide minima dictated by N-aryl sidechains. For peptoids with (S)-N (1-phenylethyl) (Nspe) side chains, we observe a discrepancy in backbone dihedral propensities between molecular simulations and QM calculations, which may be due to force field effects or the inability to capture n --> n* interactions. For these residues, an empirical phi-angle biasing potential can "rescue" the backbone propensities seen in QM. This approach can serve as a general strategy for addressing force fields without resorting to a complete reparameterization. Overall, this study demonstrates the utility of implicit-solvent REMD simulations for efficient sampling to predict peptoid conformational landscapes, providing a potential tool for first-principles design of sequences with specific folding properties.
Collapse
|
21
|
Zou T, Ozkan SB. Local and non-local native topologies reveal the underlying folding landscape of proteins. Phys Biol 2011; 8:066011. [DOI: 10.1088/1478-3975/8/6/066011] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
22
|
An enumerative stepwise ansatz enables atomic-accuracy RNA loop modeling. Proc Natl Acad Sci U S A 2011; 108:20573-8. [PMID: 22143768 DOI: 10.1073/pnas.1106516108] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Atomic-accuracy structure prediction of macromolecules should be achievable by optimizing a physically realistic energy function but is presently precluded by incomplete sampling of a biopolymer's many degrees of freedom. We present herein a working hypothesis, called the "stepwise ansatz," for recursively constructing well-packed atomic-detail models in small steps, enumerating several million conformations for each monomer, and covering all build-up paths. By making use of high-performance computing and the Rosetta framework, we provide first tests of this hypothesis on a benchmark of 15 RNA loop-modeling problems drawn from riboswitches, ribozymes, and the ribosome, including 10 cases that are not solvable by current knowledge-based modeling approaches. For each loop problem, this deterministic stepwise assembly method either reaches atomic accuracy or exposes flaws in Rosetta's all-atom energy function, indicating the resolution of the conformational sampling bottleneck. As a further rigorous test, we have carried out a blind all-atom prediction for a noncanonical RNA motif, the C7.2 tetraloop/receptor, and validated this model through nucleotide-resolution chemical mapping experiments. Stepwise assembly is an enumerative, ab initio build-up method that systematically outperforms existing Monte Carlo and knowledge-based methods for 3D structure prediction.
Collapse
|
23
|
Zhao L, Liu Z, Cao Z, Liu H, Wang J. Determination of thermal intermediate state ensemble of box 5 with restrained molecular dynamics simulations. COMPUT THEOR CHEM 2011. [DOI: 10.1016/j.comptc.2011.10.004] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
24
|
Pritchard-Bell A, Shell MS. Smoothing protein energy landscapes by integrating folding models with structure prediction. Biophys J 2011; 101:2251-9. [PMID: 22067165 DOI: 10.1016/j.bpj.2011.09.036] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2011] [Revised: 09/13/2011] [Accepted: 09/19/2011] [Indexed: 10/15/2022] Open
Abstract
Decades of work has investigated the energy landscapes of simple protein models, but what do the landscapes of real, large, atomically detailed proteins look like? We explore an approach to this problem that systematically extracts simple funnel models of actual proteins using ensembles of structure predictions and physics-based atomic force fields and sampling. Central to our effort are calculations of a quantity called the relative entropy, which quantifies the extent to which a given set of structure decoys and a putative native structure can be projected onto a theoretical funnel description. We examine 86 structure prediction targets and one coupled folding-binding system, and find that in a majority of cases the relative entropy robustly signals which structures are nearest to native (i.e., which appear to lie closest to a funnel bottom). Importantly, the landscape model improves substantially upon purely energetic measures in scoring decoys. Our results suggest that physics-based models-including both folding theories and all-atom force fields-may be successfully integrated with structure prediction efforts. Conversely, detailed predictions of structures and the relative entropy approach enable one to extract coarse topographic features of protein landscapes that may enhance the development and application of simpler folding models.
Collapse
Affiliation(s)
- Ari Pritchard-Bell
- Department of Chemical Engineering, University of California Santa Barbara, Santa Barbara, California, USA
| | | |
Collapse
|
25
|
Petrella RJ. A versatile method for systematic conformational searches: application to CheY. J Comput Chem 2011; 32:2369-85. [PMID: 21557263 PMCID: PMC3298744 DOI: 10.1002/jcc.21817] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2010] [Revised: 03/01/2011] [Accepted: 03/20/2011] [Indexed: 12/27/2022]
Abstract
A novel molecular structure prediction method, the Z Method, is described. It provides a versatile platform for the development and use of systematic, grid-based conformational search protocols, in which statistical information (i.e., rotamers) can also be included. The Z Method generates trial structures by applying many changes of the same type to a single starting structure, thereby sampling the conformation space in an unbiased way. The method, implemented in the CHARMM program as the Z Module, is applied here to an illustrative model problem in which rigid, systematic searches are performed in a 36-dimensional conformational space that describes the relative positions of the 10 secondary structural elements of the protein CheY. A polar hydrogen representation with an implicit solvation term (EEF1) is used to evaluate successively larger fragments of the protein generated in a hierarchical build-up procedure. After a final refinement stage, and a total computational time of about two-and-a-half CPU days on AMD Opteron processors, the prediction is within 1.56 Å of the native structure. The errors in the predicted backbone dihedral angles are found to approximately cancel. Monte Carlo and simulated annealing trials on the same or smaller versions of the problem, using the same atomic model and energy terms, are shown to result in less accurate predictions. Although the problem solved here is a limited one, the findings illustrate the utility of systematic searches with atom-based models for macromolecular structure prediction and the importance of unbiased sampling in structure prediction methods.
Collapse
Affiliation(s)
- Robert J Petrella
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, USA.
| |
Collapse
|
26
|
Buck PM, Bystroff C. Constraining local structure can speed up folding by promoting structural polarization of the folding pathway. Protein Sci 2011; 20:959-69. [PMID: 21413096 DOI: 10.1002/pro.619] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2010] [Revised: 02/20/2011] [Accepted: 02/22/2011] [Indexed: 11/08/2022]
Abstract
The pathway which proteins take to fold can be influenced from the earliest events of structure formation. In this light, it was both predicted and confirmed that increasing the stiffness of a beta hairpin turn decreased the size of the transition state ensemble (TSE), while increasing the folding rate. Thus, there appears to be a relationship between conformationally restricting the TSE and increasing the folding rate, at least for beta hairpin turns. In this study, we hypothesize that the enormous sampling necessary to fold even two-state folding proteins in silico could be reduced if local structure constraints were used to restrict structural heterogeneity by polarizing folding pathways or forcing folding into preferred routes. Using a Gō model, we fold Chymotrypsin Inhibitor 2 (CI-2) and the src SH3 domain after constraining local sequence windows to their native structure by rigid body dynamics (RBD). Trajectories were monitored for any changes to the folding pathway and differences in the kinetics compared with unconstrained simulations. Constraining local structure decreases folding time two-fold for 41% of src SH3 windows and 45% of CI-2 windows. For both proteins, folding times are never significantly increased after constraining any window. Structural polarization of the folding pathway appears to explain these rate increases. Folding rate enhancements are consistent with the goal to reduce sampling time necessary to reach native structures during folding simulations. As anticipated, not all constrained windows showed an equal decrease in folding time. We conclude by analyzing these differences and explain why RBD may be the preferred way to constrain structure.
Collapse
Affiliation(s)
- Patrick M Buck
- Department of Biology, Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute, Troy, New York, USA
| | | |
Collapse
|
27
|
Gee J, Shell MS. Two-dimensional replica exchange approach for peptide–peptide interactions. J Chem Phys 2011; 134:064112. [DOI: 10.1063/1.3551576] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
28
|
Zhou Y, Duan Y, Yang Y, Faraggi E, Lei H. Trends in template/fragment-free protein structure prediction. Theor Chem Acc 2011; 128:3-16. [PMID: 21423322 PMCID: PMC3030773 DOI: 10.1007/s00214-010-0799-2] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2010] [Accepted: 08/15/2010] [Indexed: 12/13/2022]
Abstract
Predicting the structure of a protein from its amino acid sequence is a long-standing unsolved problem in computational biology. Its solution would be of both fundamental and practical importance as the gap between the number of known sequences and the number of experimentally solved structures widens rapidly. Currently, the most successful approaches are based on fragment/template reassembly. Lacking progress in template-free structure prediction calls for novel ideas and approaches. This article reviews trends in the development of physical and specific knowledge-based energy functions as well as sampling techniques for fragment-free structure prediction. Recent physical- and knowledge-based studies demonstrated that it is possible to sample and predict highly accurate protein structures without borrowing native fragments from known protein structures. These emerging approaches with fully flexible sampling have the potential to move the field forward.
Collapse
Affiliation(s)
- Yaoqi Zhou
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Yong Duan
- UC Davis Genome Center and Department of Applied Science, University of California, One Shields Avenue, Davis, CA USA
- College of Physics, Huazhong University of Science and Technology, 1037 Luoyu Road, 430074 Wuhan, China
| | - Yuedong Yang
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Eshel Faraggi
- School of Informatics, Indiana Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indiana University Purdue University, 719 Indiana Ave #319, Walker Plaza Building, Indianapolis, IN 46202 USA
| | - Hongxing Lei
- UC Davis Genome Center and Department of Applied Science, University of California, One Shields Avenue, Davis, CA USA
- Beijing Institute of Genomics, Chinese Academy of Sciences, 100029 Beijing, China
| |
Collapse
|
29
|
Mittal J, Best RB. Tackling force-field bias in protein folding simulations: folding of Villin HP35 and Pin WW domains in explicit water. Biophys J 2010; 99:L26-8. [PMID: 20682244 DOI: 10.1016/j.bpj.2010.05.005] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2010] [Revised: 05/01/2010] [Accepted: 05/04/2010] [Indexed: 11/28/2022] Open
Abstract
The ability to fold proteins on a computer has highlighted the fact that existing force fields tend to be biased toward a particular type of secondary structure. Consequently, force fields for folding simulations are often chosen according to the native structure, implying that they are not truly "transferable." Here we show that, while the AMBER ff03 potential is known to favor helical structures, a simple correction to the backbone potential (ff03( *)) results in an unbiased energy function. We take as examples the 35-residue alpha-helical Villin HP35 and 37 residue beta-sheet Pin WW domains, which had not previously been folded with the same force field. Starting from unfolded configurations, simulations of both proteins in Amber ff03( *) in explicit solvent fold to within 2.0 A RMSD of the experimental structures. This demonstrates that a simple backbone correction results in a more transferable force field, an important requirement if simulations are to be used to interpret folding mechanism.
Collapse
Affiliation(s)
- Jeetain Mittal
- Department of Chemical Engineering, Lehigh University, Bethlehem, Pennsylvania, USA.
| | | |
Collapse
|
30
|
Lin EI, Shell MS. Can Peptide Folding Simulations Provide Predictive Information for Aggregation Propensity? J Phys Chem B 2010; 114:11899-908. [DOI: 10.1021/jp104114n] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Edmund I. Lin
- Department of Chemical Engineering, University of California Santa Barbara, Santa Barbara, California 93106-5080
| | - M. Scott Shell
- Department of Chemical Engineering, University of California Santa Barbara, Santa Barbara, California 93106-5080
| |
Collapse
|
31
|
Shell MS. A replica-exchange approach to computing peptide conformational free energies. MOLECULAR SIMULATION 2010. [DOI: 10.1080/08927021003720546] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/19/2022]
|
32
|
Glembo TJ, Ozkan SB. Union of geometric constraint-based simulations with molecular dynamics for protein structure prediction. Biophys J 2010; 98:1046-54. [PMID: 20303862 PMCID: PMC2849074 DOI: 10.1016/j.bpj.2009.11.031] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2009] [Revised: 11/05/2009] [Accepted: 11/17/2009] [Indexed: 10/19/2022] Open
Abstract
Although proteins are a fundamental unit in biology, the mechanism by which proteins fold into their native state is not well understood. In this work, we explore the assembly of secondary structure units via geometric constraint-based simulations and the effect of refinement of assembled structures using reservoir replica exchange molecular dynamics. Our approach uses two crucial features of these methods: i), geometric simulations speed up the search for nativelike topologies as there are no energy barriers to overcome; and ii), molecular dynamics identifies the low free energy structures and further refines these structures toward the actual native conformation. We use eight alpha-, beta-, and alpha/beta-proteins to test our method. The geometric simulations of our test set result in an average RMSD from native of 3.7 A and this further reduces to 2.7 A after refinement. We also explore the question of robustness of assembly for inaccurate (shifted and shortened) secondary structure. We find that the RMSD from native is highly dependent on the accuracy of secondary structure input, and even slightly shifting the location of secondary structure along the amino acid sequence can lead to a rapid decrease in RMSD to native due to incorrect packing.
Collapse
Key Words
- casp, critical assessment of techniques for protein structure prediction
- froda, framework rigidity optimized dynamics algorithm
- md, molecular dynamic
- remd, replica exchange molecular dynamics
- rmsd, root mean-square deviation
- r-remd, reservoir replica exchange molecular dynamics
- zam, zipping and assembly method
- zamf, zam with froda
- 3-d, three-dimensional
- 1-d, one-dimensional
Collapse
Affiliation(s)
| | - S. Banu Ozkan
- Center for Biological Physics, Department of Physics, Arizona State University, Tempe, Arizona
| |
Collapse
|
33
|
Jefferys BR, Kelley LA, Sternberg MJE. Protein folding requires crowd control in a simulated cell. J Mol Biol 2010; 397:1329-38. [PMID: 20149797 PMCID: PMC2891488 DOI: 10.1016/j.jmb.2010.01.074] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2009] [Revised: 12/30/2009] [Accepted: 01/02/2010] [Indexed: 11/09/2022]
Abstract
Macromolecular crowding has a profound effect upon biochemical processes in the cell. We have computationally studied the effect of crowding upon protein folding for 12 small domains in a simulated cell using a coarse-grained protein model, which is based upon Langevin dynamics, designed to unify the often disjoint goals of protein folding simulation and structure prediction. The model can make predictions of native conformation with accuracy comparable with that of the best current template-free models. It is fast enough to enable a more extensive analysis of crowding than previously attempted, studying several proteins at many crowding levels and further random repetitions designed to more closely approximate the ensemble of conformations. We found that when crowding approaches 40% excluded volume, the maximum level found in the cell, proteins fold to fewer native-like states. Notably, when crowding is increased beyond this level, there is a sudden failure of protein folding: proteins fix upon a structure more quickly and become trapped in extended conformations. These results suggest that the ability of small protein domains to fold without the help of chaperones may be an important factor in limiting the degree of macromolecular crowding in the cell. Here, we discuss the possible implications regarding the relationship between protein expression level, protein size, chaperone activity and aggregation.
Collapse
Affiliation(s)
- Benjamin R Jefferys
- Division of Molecular Biosciences, Biochemistry Building, Imperial College London, South Kensington, London SW7 2AZ, UK.
| | | | | |
Collapse
|
34
|
Lin E, Shell MS. Convergence and Heterogeneity in Peptide Folding with Replica Exchange Molecular Dynamics. J Chem Theory Comput 2009; 5:2062-73. [DOI: 10.1021/ct900119n] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Edmund Lin
- Department of Chemical Engineering, University of California Santa Barbara, 552 University Road, Santa Barbara, California 93106-5080
| | - M. Scott Shell
- Department of Chemical Engineering, University of California Santa Barbara, 552 University Road, Santa Barbara, California 93106-5080
| |
Collapse
|
35
|
Abstract
New amino acid sequences of proteins are being learned at a rapid rate, thanks to modern genomics. The native structures and functions of those proteins can often be inferred using bioinformatics methods. We show here that it is also possible to infer the stabilities and thermal folding properties of proteins, given only simple genomics information: the chain length and the numbers of charged side chains. In particular, our model predicts DeltaH(T), DeltaS(T), DeltaC(p), and DeltaF(T)--the folding enthalpy, entropy, heat capacity, and free energy--as functions of temperature T; the denaturant m values in guanidine and urea; the pH-temperature-salt phase diagrams, and the energy of confinement F(s) of the protein inside a cavity of radius s. All combinations of these phase equilibria can also then be computed from that information. As one illustration, we compute the pH and salt conditions that would denature a protein inside a small confined cavity. Because the model is analytical, it is computationally efficient enough that it could be used to automatically annotate whole proteomes with protein stability information.
Collapse
|
36
|
Voelz VA, Shell MS, Dill KA. Predicting peptide structures in native proteins from physical simulations of fragments. PLoS Comput Biol 2009; 5:e1000281. [PMID: 19197352 PMCID: PMC2629132 DOI: 10.1371/journal.pcbi.1000281] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2008] [Accepted: 12/17/2008] [Indexed: 11/25/2022] Open
Abstract
It has long been proposed that much of the information encoding how a protein folds is contained locally in the peptide chain. Here we present a large-scale simulation study designed to examine the extent to which conformations of peptide fragments in water predict native conformations in proteins. We perform replica exchange molecular dynamics (REMD) simulations of 872 8-mer, 12-mer, and 16-mer peptide fragments from 13 proteins using the AMBER 96 force field and the OBC implicit solvent model. To analyze the simulations, we compute various contact-based metrics, such as contact probability, and then apply Bayesian classifier methods to infer which metastable contacts are likely to be native vs. non-native. We find that a simple measure, the observed contact probability, is largely more predictive of a peptide's native structure in the protein than combinations of metrics or multi-body components. Our best classification model is a logistic regression model that can achieve up to 63% correct classifications for 8-mers, 71% for 12-mers, and 76% for 16-mers. We validate these results on fragments of a protein outside our training set. We conclude that local structure provides information to solve some but not all of the conformational search problem. These results help improve our understanding of folding mechanisms, and have implications for improving physics-based conformational sampling and structure prediction using all-atom molecular simulations.
Collapse
Affiliation(s)
- Vincent A Voelz
- Department of Chemistry, Stanford University, Stanford, CA, USA.
| | | | | |
Collapse
|