1
|
Caetano-Anollés K, Aziz MF, Mughal F, Caetano-Anollés G. On Protein Loops, Prior Molecular States and Common Ancestors of Life. J Mol Evol 2024:10.1007/s00239-024-10167-y. [PMID: 38652291 DOI: 10.1007/s00239-024-10167-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Accepted: 03/22/2024] [Indexed: 04/25/2024]
Abstract
The principle of continuity demands the existence of prior molecular states and common ancestors responsible for extant macromolecular structure. Here, we focus on the emergence and evolution of loop prototypes - the elemental architects of protein domain structure. Phylogenomic reconstruction spanning superkingdoms and viruses generated an evolutionary chronology of prototypes with six distinct evolutionary phases defining a most parsimonious evolutionary progression of cellular life. Each phase was marked by strategic prototype accumulation shaping the structures and functions of common ancestors. The last universal common ancestor (LUCA) of cells and viruses and the last universal cellular ancestor (LUCellA) defined stem lines that were structurally and functionally complex. The evolutionary saga highlighted transformative forces. LUCA lacked biosynthetic ribosomal machinery, while the pivotal LUCellA lacked essential DNA biosynthesis and modern transcription. Early proteins therefore relied on RNA for genetic information storage but appeared initially decoupled from it, hinting at transformative shifts of genetic processing. Urancestral loop types suggest advanced folding designs were present at an early evolutionary stage. An exploration of loop geometric properties revealed gradual replacement of prototypes with α-helix and β-strand bracing structures over time, paving the way for the dominance of other loop types. AlphFold2-generated atomic models of prototype accretion described patterns of fold emergence. Our findings favor a ‛processual' model of evolving stem lines aligned with Woese's vision of a communal world. This model prompts discussing the 'problem of ancestors' and the challenges that lie ahead for research in taxonomy, evolution and complexity.
Collapse
Affiliation(s)
- Kelsey Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
- Callout Biotech, Albuquerque, NM, 87112, USA
| | - M Fayez Aziz
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Fizza Mughal
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences and Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
| |
Collapse
|
2
|
Stevens AO, He Y. Benchmarking the Accuracy of AlphaFold 2 in Loop Structure Prediction. Biomolecules 2022; 12:985. [PMID: 35883541 PMCID: PMC9312937 DOI: 10.3390/biom12070985] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2022] [Revised: 07/05/2022] [Accepted: 07/12/2022] [Indexed: 01/22/2023] Open
Abstract
The inhibition of protein-protein interactions is a growing strategy in drug development. In addition to structured regions, many protein loop regions are involved in protein-protein interactions and thus have been identified as potential drug targets. To effectively target such regions, protein structure is critical. Loop structure prediction is a challenging subgroup in the field of protein structure prediction because of the reduced level of conservation in protein sequences compared to the secondary structure elements. AlphaFold 2 has been suggested to be one of the greatest achievements in the field of protein structure prediction. The AlphaFold 2 predicted protein structures near the X-ray resolution in the Critical Assessment of protein Structure Prediction (CASP 14) competition in 2020. The purpose of this work is to survey the performance of AlphaFold 2 in specifically predicting protein loop regions. We have constructed an independent dataset of 31,650 loop regions from 2613 proteins (deposited after the AlphaFold 2 was trained) with both experimentally determined structures and AlphaFold 2 predicted structures. With extensive evaluation using our dataset, the results indicate that AlphaFold 2 is a good predictor of the structure of loop regions, especially for short loop regions. Loops less than 10 residues in length have an average Root Mean Square Deviation (RMSD) of 0.33 Å and an average the Template Modeling score (TM-score) of 0.82. However, we see that as the number of residues in a given loop increases, the accuracy of AlphaFold 2's prediction decreases. Loops more than 20 residues in length have an average RMSD of 2.04 Å and an average TM-score of 0.55. Such a correlation between accuracy and length of the loop is directly linked to the increase in flexibility. Moreover, AlphaFold 2 does slightly over-predict α-helices and β-strands in proteins.
Collapse
Affiliation(s)
- Amy O. Stevens
- Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, NM 87131, USA;
| | - Yi He
- Department of Chemistry and Chemical Biology, University of New Mexico, Albuquerque, NM 87131, USA;
- Translational Informatics Division, Department of Internal Medicine, University of New Mexico, Albuquerque, NM 87131, USA
| |
Collapse
|
3
|
Sakuma K, Minami S. Enumeration and comprehensive in-silico modeling of three-helix bundle structures composed of typical αα-hairpins. BMC Bioinformatics 2021; 22:465. [PMID: 34579643 PMCID: PMC8474748 DOI: 10.1186/s12859-021-04380-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2021] [Accepted: 09/14/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The design of protein structures from scratch requires special attention to the combination of the types and lengths of the secondary structures and the loops required to build highly designable backbone structure models. However, it is difficult to predict the combinations that result in globular and protein-like conformations without simulations. In this study, we used single-chain three-helix bundles as simple models of protein tertiary structures and sought to thoroughly investigate the conditions required to construct them, starting from the identification of the typical αα-hairpin motifs. RESULTS First, by statistical analysis of naturally occurring protein structures, we identified three αα-hairpins motifs that were specifically related to the left- and right-handedness of helix-helix packing. Second, specifying these αα-hairpins motifs as junctions, we performed sequence-independent backbone-building simulations to comparatively build single-chain three-helix bundle structures and identified the promising combinations of the length of the α-helix and αα-hairpins types that results in tight packing between the first and third α-helices. Third, using those single-chain three-helix bundle backbone structures as template structures, we designed amino acid sequences that were predicted to fold into the target topologies, which supports that the compact single-chain three-helix bundles structures that we sampled show sufficient quality to allow amino-acid sequence design. CONCLUSION The enumeration of the dominant subsets of possible backbone structures for small single-chain three-helical bundle topologies revealed that the compact foldable structures are discontinuously and sparsely distributed in the conformational space. Additionally, although the designs have not been experimentally validated in the present research, the comprehensive set of computational structural models generated also offers protein designers the opportunity to skip building similar structures by themselves and enables them to quickly focus on building specialized designs using the prebuilt structure models. The backbone and best design models in this study are publicly accessible from the following URL: https://doi.org/10.5281/zenodo.4321632 .
Collapse
Affiliation(s)
- Koya Sakuma
- SOKENDAI, The Graduate University for Advanced Studies, 38 Nishigonaka, Myodaiji, Okazaki, 444-8585, Japan.
- Institute for Molecular Science, 38 Nishigonaka, Myodaiji, Okazaki, 444-8585, Japan.
| | | |
Collapse
|
4
|
Rumfeldt J, Kurttila M, Takala H, Ihalainen JA. The hairpin extension controls solvent access to the chromophore binding pocket in a bacterial phytochrome: a UV-vis absorption spectroscopy study. Photochem Photobiol Sci 2021; 20:1173-1181. [PMID: 34460093 DOI: 10.1007/s43630-021-00090-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 08/09/2021] [Indexed: 10/20/2022]
Abstract
Solvent access to the protein interior plays an important role in the function of many proteins. Phytochromes contain a specific structural feature, a hairpin extension that appears to relay structural information from the chromophore to the rest of the protein. The extension interacts with amino acids near the chromophore, and hence shields the chromophore from the surrounding solvent. We envision that the detachment of the extension from the protein surface allows solvent exchange reactions in the vicinity of the chromophore. This can facilitate for example, proton transfer processes between solvent and the protein interior. To test this hypothesis, the kinetics of the protonation state of the biliverdin chromophore from Deinococcus radiodurans bacteriophytchrome, and thus, the pH of the surrounding solution, is determined. The observed absorbance changes are related to the solvent access of the chromophore binding pocket, gated by the hairpin extension. We therefore propose a model with an "open" (solvent-exposed, deprotonation-active on a (sub)second time-scale) state and a "closed" (solvent-gated, deprotonation inactive) state, where the hairpin fluctuates slowly between these conformations thereby controlling the deprotonation process of the chromophore on a minute time scale. When the connection between the hairpin and the biliverdin surroundings is destabilized by a point mutation, the amplitude of the deprotonation phase increases considerably. In the absence of the extension, the chromophore deprotonates essentially without any "gating". Hence, we introduce a straightforward method to study the stability and fluctuation of the phytochrome hairpin in its photostationary state. This approach can be extended to other chromophore-protein systems where absorption changes reflect dynamic processes of the protein.
Collapse
Affiliation(s)
- Jessica Rumfeldt
- Nanoscience Center, Department of Biological and Environmental Science, University of Jyväskylä, 40014, Jyväskylä, Finland
| | - Moona Kurttila
- Nanoscience Center, Department of Biological and Environmental Science, University of Jyväskylä, 40014, Jyväskylä, Finland
| | - Heikki Takala
- Nanoscience Center, Department of Biological and Environmental Science, University of Jyväskylä, 40014, Jyväskylä, Finland
| | - Janne A Ihalainen
- Nanoscience Center, Department of Biological and Environmental Science, University of Jyväskylä, 40014, Jyväskylä, Finland.
| |
Collapse
|
5
|
Barozet A, Chacón P, Cortés J. Current approaches to flexible loop modeling. Curr Res Struct Biol 2021; 3:187-191. [PMID: 34409304 PMCID: PMC8361254 DOI: 10.1016/j.crstbi.2021.07.002] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 06/30/2021] [Accepted: 07/25/2021] [Indexed: 01/14/2023] Open
Abstract
Loops are key components of protein structures, involved in many biological functions. Due to their conformational variability, the structural investigation of loops is a difficult topic, requiring a combination of experimental and computational methods. This paper provides a brief overview of current computational approaches to flexible loop modeling, and presents the main ingredients of the most standard protocols. Despite great progress in recent years, accurately modeling the conformational variability of long flexible loops remains a challenging problem. Future advances in this field will likely come from a tight coupling of experimental and computational techniques, which would enable a better understanding of the relationships between loop sequence, structural flexibility, and functional roles. In fine, accurate loop modeling will open the road to loop design problems of interest for applications in biomedicine and biotechnology.
Collapse
Affiliation(s)
- Amélie Barozet
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| | - Pablo Chacón
- Department of Biological Physical Chemistry, Rocasolano Physical Chemistry Institute C.S.I.C., Madrid, Spain
| | - Juan Cortés
- LAAS-CNRS, Université de Toulouse, CNRS, Toulouse, France
| |
Collapse
|
6
|
Investigation of machine learning techniques on proteomics: A comprehensive survey. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2019; 149:54-69. [PMID: 31568792 DOI: 10.1016/j.pbiomolbio.2019.09.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/05/2019] [Revised: 09/16/2019] [Accepted: 09/23/2019] [Indexed: 11/21/2022]
Abstract
Proteomics is the extensive investigation of proteins which has empowered the recognizable proof of consistently expanding quantities of protein. Proteins are necessary part of living life form, with numerous capacities. The proteome is the complete arrangement of proteins that are created or altered by a life form or framework of the organism. Proteome fluctuates with time and unambiguous prerequisites, or stresses, that a cell or organism experiences. Proteomics is an interdisciplinary area that has derived from the hereditary data of different genome ventures. Much proteomics information is gathered with the assistance of high throughput techniques, for example, mass spectrometry and microarray. It would regularly take weeks or months to analyze the information and perform examinations by hand. Therefore, scholars and scientific experts are teaming up with computer science researchers and mathematicians to make projects and pipeline to computationally examine the protein information. Utilizing bioinformatics procedures, scientists are prepared to do quicker investigation and protein information storing. The goal of this paper is to brief about the review of machine learning procedures and its application in the field of proteomics.
Collapse
|
7
|
Karami Y, Guyon F, De Vries S, Tufféry P. DaReUS-Loop: accurate loop modeling using fragments from remote or unrelated proteins. Sci Rep 2018; 8:13673. [PMID: 30209260 PMCID: PMC6135855 DOI: 10.1038/s41598-018-32079-w] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2018] [Accepted: 08/31/2018] [Indexed: 11/08/2022] Open
Abstract
Despite efforts during the past decades, loop modeling remains a difficult part of protein structure modeling. Several approaches have been developed in the framework of crystal structures. However, for homology models, the modeling of loops is still far from being solved. We propose DaReUS-Loop, a data-based approach that identifies loop candidates mining the complete set of experimental structures available in the Protein Data Bank. Candidate filtering relies on local conformation profile-profile comparison, together with physico-chemical scoring. Applied to three different template-based test sets, DaReUS-Loop shows significant increase in the number of high-accuracy loops, and significant enhancement for modeling long loops. A special advantage is that our method proposes a prediction confidence score that correlates well with the expected accuracy of the loops. Strikingly, over 50% of successful loop models are derived from unrelated proteins, indicating that fragments under similar constraints tend to adopt similar structure, beyond mere homology.
Collapse
Affiliation(s)
- Yasaman Karami
- Molécules Thérapeutiques in silico, UMR-S973, Institut National de la Santé et de la Recherche Médicale (INSERM), Université Paris Diderot, Sorbonne Paris Cité, RPBS, 75013, Paris, France
| | - Frédéric Guyon
- Molécules Thérapeutiques in silico, UMR-S973, Institut National de la Santé et de la Recherche Médicale (INSERM), Université Paris Diderot, Sorbonne Paris Cité, RPBS, 75013, Paris, France
| | - Sjoerd De Vries
- Molécules Thérapeutiques in silico, UMR-S973, Institut National de la Santé et de la Recherche Médicale (INSERM), Université Paris Diderot, Sorbonne Paris Cité, RPBS, 75013, Paris, France.
| | - Pierre Tufféry
- Molécules Thérapeutiques in silico, UMR-S973, Institut National de la Santé et de la Recherche Médicale (INSERM), Université Paris Diderot, Sorbonne Paris Cité, RPBS, 75013, Paris, France.
| |
Collapse
|
8
|
Li D, Hu X, Liu X, Feng Z, Ding C. Using feature optimization-based support vector machine method to recognize the β-hairpin motifs in enzymes. Saudi J Biol Sci 2016; 24:1361-1369. [PMID: 28855832 PMCID: PMC5562482 DOI: 10.1016/j.sjbs.2016.11.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2016] [Revised: 11/16/2016] [Accepted: 11/17/2016] [Indexed: 11/28/2022] Open
Abstract
β-Hairpins in enzyme, a kind of special protein with catalytic functions, contain many binding sites which are essential for the functions of enzyme. With the increasing number of observed enzyme protein sequences, it is of especial importance to use bioinformatics techniques to quickly and accurately identify the β-hairpin in enzyme protein for further advanced annotation of structure and function of enzyme. In this work, the proposed method was trained and tested on a non-redundant enzyme β-hairpin database containing 2818 β-hairpins and 1098 non-β-hairpins. With 5-fold cross-validation on the training dataset, the overall accuracy of 90.08% and Matthew’s correlation coefficient (Mcc) of 0.74 were obtained, while on the independent test dataset, the overall accuracy of 88.93% and Mcc of 0.76 were achieved. Furthermore, the method was validated on 845 β-hairpins with ligand binding sites. With 5-fold cross-validation on the training dataset and independent test on the test dataset, the overall accuracies were 85.82% (Mcc of 0.71) and 84.78% (Mcc of 0.70), respectively. With an integration of mRMR feature selection and SVM algorithm, a reasonable high accuracy was achieved, indicating the method to be an effective tool for the further studies of β-hairpins in enzymes structure. Additionally, as a novelty for function prediction of enzymes, β-hairpins with ligand binding sites were predicted. Based on this work, a web server was constructed to predict β-hairpin motifs in enzymes (http://202.207.29.251:8080/).
Collapse
Affiliation(s)
- Dongmei Li
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Xiuzhen Hu
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Xingxing Liu
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Zhenxing Feng
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Changjiang Ding
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| |
Collapse
|
9
|
Jin Y, Liu Z, Li Y, Liu W, Tao Y, Wang G. A structural and functional study on the 2-C-methyl-d-erythritol-4-phosphate cytidyltransferase (IspD) from Bacillus subtilis. Sci Rep 2016; 6:36379. [PMID: 27821871 PMCID: PMC5099578 DOI: 10.1038/srep36379] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2016] [Accepted: 10/13/2016] [Indexed: 12/25/2022] Open
Abstract
2-C-Methyl-D-erythritol-4-phosphate cytidyltransferase (IspD) is an essential enzyme in the mevalonate-independent pathway of isoprenoid biosynthesis. This enzyme catalyzes 2-C-Methyl-d-erythritol 4-phosphate (MEP) and cytosine triphosphate (CTP) to 4-diphosphocytidyl-2-C-methyl-d-erythritol (CDPME) and inorganic pyrophosphate (PPi). Bacillus subtilis was a kind of excellent isoprene producer. However, the studies on the key enzymes of MEP pathway in B. subtilis were still absent. In this work, the crystal structures of IspD and IspD complexed with CTP from B.subtilis were determined. For the first time, the intact P-loop was observed in the apo structure of IspD enzyme. Structural comparisons revealed that the concerted movements of the P-loop and loops close to the active site were essential in the reaction catalyzed by IspD. Meanwhile, kinetic analysis showed that the CTP hydrolytic activity of IspD from B.subtilis was over two times higher than that from Escherichia coli. These results will be useful for future target-based screening of potential inhibitors and the metabolic engineering for isoprenoid biosynthesis.
Collapse
Affiliation(s)
- Yun Jin
- Key Laboratory of Environmental and Applied Microbiology, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China.,Key Laboratory of Environmental Microbiology of Sichuan Province, Chengdu, 610041, China.,University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhongchuan Liu
- Key Laboratory of Environmental and Applied Microbiology, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China.,Key Laboratory of Environmental Microbiology of Sichuan Province, Chengdu, 610041, China
| | - Yanjie Li
- Key Laboratory of Environmental and Applied Microbiology, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China.,Key Laboratory of Environmental Microbiology of Sichuan Province, Chengdu, 610041, China
| | - Weifeng Liu
- Chinese Academy of Sciences Key Laboratory of Microbial Physiological and Metabolic Engineering, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, People's Republic of China
| | - Yong Tao
- Chinese Academy of Sciences Key Laboratory of Microbial Physiological and Metabolic Engineering, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, People's Republic of China
| | - Ganggang Wang
- Key Laboratory of Environmental and Applied Microbiology, Chengdu Institute of Biology, Chinese Academy of Sciences, Chengdu, 610041, China.,Key Laboratory of Environmental Microbiology of Sichuan Province, Chengdu, 610041, China
| |
Collapse
|
10
|
Abstract
Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | - Andrej Sali
- University of California at San Francisco, San Francisco, California
| |
Collapse
|
11
|
Berezovsky IN, Guarnera E, Zheng Z. Basic units of protein structure, folding, and function. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2016; 128:85-99. [PMID: 27697476 DOI: 10.1016/j.pbiomolbio.2016.09.009] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Revised: 09/05/2016] [Accepted: 09/26/2016] [Indexed: 10/20/2022]
Abstract
Study of the hierarchy of domain structure with alternative sets of domains and analysis of discontinuous domains, consisting of remote segments of the polypeptide chain, raised a question about the minimal structural unit of the protein domain. The hypothesis on the decisive role of the polypeptide backbone in determining the elementary units of globular proteins have led to the discovery of closed loops. It is reviewed here how closed loops form the loop-n-lock structure of proteins, providing the foundation for stability and designability of protein folds/domain and underlying their co-translational folding. Simplified protein sequences are considered here with the aim to explore the basic principles that presumably dominated the folding and stability of proteins in the early stages of structural evolution. Elementary functional loops (EFLs), closed loops with one or few catalytic residues, are, in turn, units of the protein function. They are apparent descendants of the prebiotic ring-like peptides, which gave rise to the first functional folds/domains being fused in the beginning of the evolution of protein structure. It is also shown how evolutionary relations between protein functional superfamilies and folds delineated with the help of EFLs can contribute to establishing the rules for design of desired enzymatic functions. Generalized descriptors of the elementary functions are proposed to be used as basic units in the future computational design.
Collapse
Affiliation(s)
- Igor N Berezovsky
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore; Department of Biological Sciences (DBS), National University of Singapore (NUS), 8 Medical Drive, 117579, Singapore.
| | - Enrico Guarnera
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| | - Zejun Zheng
- Bioinformatics Institute (BII), Agency for Science, Technology and Research (A*STAR), 30 Biopolis Street, #07-01, Matrix, 138671, Singapore
| |
Collapse
|
12
|
Webb B, Sali A. Comparative Protein Structure Modeling Using MODELLER. CURRENT PROTOCOLS IN BIOINFORMATICS 2016; 54:5.6.1-5.6.37. [PMID: 27322406 PMCID: PMC5031415 DOI: 10.1002/cpbi.3] [Citation(s) in RCA: 1820] [Impact Index Per Article: 227.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | - Andrej Sali
- University of California at San Francisco, San Francisco, California
| |
Collapse
|
13
|
Planas-Iglesias J, Dwarakanath H, Mohammadyani D, Yanamala N, Kagan VE, Klein-Seetharaman J. Cardiolipin Interactions with Proteins. Biophys J 2015; 109:1282-94. [PMID: 26300339 DOI: 10.1016/j.bpj.2015.07.034] [Citation(s) in RCA: 97] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Revised: 06/18/2015] [Accepted: 07/13/2015] [Indexed: 10/23/2022] Open
Abstract
Cardiolipins (CL) represent unique phospholipids of bacteria and eukaryotic mitochondria with four acyl chains and two phosphate groups that have been implicated in numerous functions from energy metabolism to apoptosis. Many proteins are known to interact with CL, and several cocrystal structures of protein-CL complexes exist. In this work, we describe the collection of the first systematic and, to the best of our knowledge, the comprehensive gold standard data set of all known CL-binding proteins. There are 62 proteins in this data set, 21 of which have nonredundant crystal structures with bound CL molecules available. Using binding patch analysis of amino acid frequencies, secondary structures and loop supersecondary structures considering phosphate and acyl chain binding regions together and separately, we gained a detailed understanding of the general structural and dynamic features involved in CL binding to proteins. Exhaustive docking of CL to all known structures of proteins experimentally shown to interact with CL demonstrated the validity of the docking approach, and provides a rich source of information for experimentalists who may wish to validate predictions.
Collapse
Affiliation(s)
- Joan Planas-Iglesias
- Division of Metabolic and Vascular Health, Medical School, University of Warwick, Coventry, United Kingdom
| | - Himal Dwarakanath
- Division of Metabolic and Vascular Health, Medical School, University of Warwick, Coventry, United Kingdom
| | - Dariush Mohammadyani
- Department of Bioengineering, University of Pittsburgh, Pittsburgh, Pennsylvania; Department of Environmental and Occupational Health, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Naveena Yanamala
- Department of Environmental and Occupational Health, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Valerian E Kagan
- Department of Environmental and Occupational Health, University of Pittsburgh, Pittsburgh, Pennsylvania
| | - Judith Klein-Seetharaman
- Division of Metabolic and Vascular Health, Medical School, University of Warwick, Coventry, United Kingdom; Department of Bioengineering, University of Pittsburgh, Pittsburgh, Pennsylvania.
| |
Collapse
|
14
|
Messih MA, Lepore R, Tramontano A. LoopIng: a template-based tool for predicting the structure of protein loops. Bioinformatics 2015; 31:3767-72. [PMID: 26249814 PMCID: PMC4653384 DOI: 10.1093/bioinformatics/btv438] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Accepted: 07/21/2015] [Indexed: 12/31/2022] Open
Abstract
Motivation: Predicting the structure of protein loops is very challenging, mainly because they are not necessarily subject to strong evolutionary pressure. This implies that, unlike the rest of the protein, standard homology modeling techniques are not very effective in modeling their structure. However, loops are often involved in protein function, hence inferring their structure is important for predicting protein structure as well as function. Results: We describe a method, LoopIng, based on the Random Forest automated learning technique, which, given a target loop, selects a structural template for it from a database of loop candidates. Compared to the most recently available methods, LoopIng is able to achieve similar accuracy for short loops (4–10 residues) and significant enhancements for long loops (11–20 residues). The quality of the predictions is robust to errors that unavoidably affect the stem regions when these are modeled. The method returns a confidence score for the predicted template loops and has the advantage of being very fast (on average: 1 min/loop). Availability and implementation:www.biocomputing.it/looping Contact:anna.tramontano@uniroma1.it Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Rosalba Lepore
- Department of Physics, Sapienza University, 00185 Rome, Italy and
| | - Anna Tramontano
- Department of Physics, Sapienza University, 00185 Rome, Italy and Istituto Pasteur-Fondazione Cenci Bolognetti, Viale Regina Elena 291, 00161 Rome, Italy
| |
Collapse
|
15
|
Pelay-Gimeno M, Glas A, Koch O, Grossmann TN. Structure-Based Design of Inhibitors of Protein-Protein Interactions: Mimicking Peptide Binding Epitopes. Angew Chem Int Ed Engl 2015; 54:8896-927. [PMID: 26119925 PMCID: PMC4557054 DOI: 10.1002/anie.201412070] [Citation(s) in RCA: 491] [Impact Index Per Article: 54.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Indexed: 12/15/2022]
Abstract
Protein-protein interactions (PPIs) are involved at all levels of cellular organization, thus making the development of PPI inhibitors extremely valuable. The identification of selective inhibitors is challenging because of the shallow and extended nature of PPI interfaces. Inhibitors can be obtained by mimicking peptide binding epitopes in their bioactive conformation. For this purpose, several strategies have been evolved to enable a projection of side chain functionalities in analogy to peptide secondary structures, thereby yielding molecules that are generally referred to as peptidomimetics. Herein, we introduce a new classification of peptidomimetics (classes A-D) that enables a clear assignment of available approaches. Based on this classification, the Review summarizes strategies that have been applied for the structure-based design of PPI inhibitors through stabilizing or mimicking turns, β-sheets, and helices.
Collapse
Affiliation(s)
- Marta Pelay-Gimeno
- Chemical Genomics Centre of the Max Planck SocietyOtto-Hahn-Strasse 15, 44227 Dortmund (Germany) E-mail:
| | - Adrian Glas
- Chemical Genomics Centre of the Max Planck SocietyOtto-Hahn-Strasse 15, 44227 Dortmund (Germany) E-mail:
| | - Oliver Koch
- TU Dortmund University, Department of Chemistry and Chemical BiologyOtto-Hahn-Strasse 6, 44227 Dortmund (Germany)
| | - Tom N Grossmann
- Chemical Genomics Centre of the Max Planck SocietyOtto-Hahn-Strasse 15, 44227 Dortmund (Germany) E-mail:
- TU Dortmund University, Department of Chemistry and Chemical BiologyOtto-Hahn-Strasse 6, 44227 Dortmund (Germany)
| |
Collapse
|
16
|
Pelay-Gimeno M, Glas A, Koch O, Grossmann TN. Strukturbasierte Entwicklung von Protein-Protein-Interaktionsinhibitoren: Stabilisierung und Nachahmung von Peptidliganden. Angew Chem Int Ed Engl 2015. [DOI: 10.1002/ange.201412070] [Citation(s) in RCA: 80] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
|
17
|
Abstract
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | | |
Collapse
|
18
|
Hasenhindl C, Lai B, Delgado J, Traxlmayr MW, Stadlmayr G, Rüker F, Serrano L, Oostenbrink C, Obinger C. Creating stable stem regions for loop elongation in Fcabs - insights from combining yeast surface display, in silico loop reconstruction and molecular dynamics simulations. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2014; 1844:1530-40. [PMID: 24792385 PMCID: PMC4118681 DOI: 10.1016/j.bbapap.2014.04.020] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/14/2014] [Revised: 04/01/2014] [Accepted: 04/24/2014] [Indexed: 11/05/2022]
Abstract
Fcabs (Fc antigen binding) are crystallizable fragments of IgG where the C-terminal structural loops of the CH3 domain are engineered for antigen binding. For the design of libraries it is beneficial to know positions that will permit loop elongation to increase the potential interaction surface with antigen. However, the insertion of additional loop residues might impair the immunoglobulin fold. In the present work we have probed whether stabilizing mutations flanking the randomized and elongated loop region improve the quality of Fcab libraries. In detail, 13 libraries were constructed having the C-terminal part of the EF loop randomized and carrying additional residues (1, 2, 3, 5 or 10, respectively) in the absence and presence of two flanking mutations. The latter have been demonstrated to increase the thermal stability of the CH3 domain of the respective solubly expressed proteins. Assessment of the stability of the libraries expressed on the surface of yeast cells by flow cytometry demonstrated that loop elongation was considerably better tolerated in the stabilized libraries. By using in silico loop reconstruction and mimicking randomization together with MD simulations the underlying molecular dynamics were investigated. In the presence of stabilizing stem residues the backbone flexibility of the engineered EF loop as well as the fluctuation between its accessible conformations were decreased. In addition the CD loop (but not the AB loop) and most of the framework regions were rigidified. The obtained data are discussed with respect to the design of Fcabs and available data on the relation between flexibility and affinity of CDR loops in Ig-like molecules. Characterization of EF loop libraries of IgG1-Fc displayed on yeast surface. Artificial stable stem regions increase tolerance to amino acid insertions. Combination of in silico loop elongation with MD simulations. Analysis of loop dynamics and conformational variability. Pronounced impact of loop stabilization on domain and loop dynamics.
Collapse
Affiliation(s)
- Christoph Hasenhindl
- Christian Doppler Laboratory for Antibody Engineering, Vienna Institute of BioTechnology, BOKU - University of Natural Resources and Life Sciences, Muthgasse 18, A-1190 Vienna, Austria; Department of Chemistry, Vienna Institute of BioTechnology, BOKU - University of Natural Resources and Life Sciences, Muthgasse 18, A-1190 Vienna, Austria
| | - Balder Lai
- Institute of Molecular Modeling and Simulation, Department of Material Sciences and Process Engineering, BOKU - University of Natural Resources and Life Sciences, A-1190 Vienna, Austria
| | - Javier Delgado
- Design of Biological Systems, Systems Biology Research Unit, Centre for Genomic Regulation-CRG, UPF, 08003 Barcelona, Spain
| | - Michael W Traxlmayr
- Christian Doppler Laboratory for Antibody Engineering, Vienna Institute of BioTechnology, BOKU - University of Natural Resources and Life Sciences, Muthgasse 18, A-1190 Vienna, Austria; Department of Chemistry, Vienna Institute of BioTechnology, BOKU - University of Natural Resources and Life Sciences, Muthgasse 18, A-1190 Vienna, Austria
| | - Gerhard Stadlmayr
- Christian Doppler Laboratory for Antibody Engineering, Vienna Institute of BioTechnology, BOKU - University of Natural Resources and Life Sciences, Muthgasse 18, A-1190 Vienna, Austria; Department of Chemistry, Vienna Institute of BioTechnology, BOKU - University of Natural Resources and Life Sciences, Muthgasse 18, A-1190 Vienna, Austria
| | - Florian Rüker
- Christian Doppler Laboratory for Antibody Engineering, Vienna Institute of BioTechnology, BOKU - University of Natural Resources and Life Sciences, Muthgasse 18, A-1190 Vienna, Austria; Department of Biotechnology, Vienna Institute of BioTechnology, BOKU - University of Natural Resources and Life Sciences, Muthgasse 18, A-1190 Vienna, Austria
| | - Luis Serrano
- Design of Biological Systems, Systems Biology Research Unit, Centre for Genomic Regulation-CRG, UPF, 08003 Barcelona, Spain
| | - Chris Oostenbrink
- Institute of Molecular Modeling and Simulation, Department of Material Sciences and Process Engineering, BOKU - University of Natural Resources and Life Sciences, A-1190 Vienna, Austria
| | - Christian Obinger
- Christian Doppler Laboratory for Antibody Engineering, Vienna Institute of BioTechnology, BOKU - University of Natural Resources and Life Sciences, Muthgasse 18, A-1190 Vienna, Austria; Department of Chemistry, Vienna Institute of BioTechnology, BOKU - University of Natural Resources and Life Sciences, Muthgasse 18, A-1190 Vienna, Austria.
| |
Collapse
|
19
|
Dasgupta B, Dey S, Chakrabarti P. Water and side-chain embedded π-turns. Biopolymers 2014; 101:441-53. [PMID: 23996674 DOI: 10.1002/bip.22401] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2013] [Revised: 08/24/2013] [Accepted: 08/26/2013] [Indexed: 11/08/2022]
Affiliation(s)
- Bhaskar Dasgupta
- Department of Biochemistry; Bose Institute; P-1/12 CIT Scheme VIIM Kolkata West Bengal 700 054 India
| | - Sucharita Dey
- Bioinformatics Centre; Bose Institute; P-1/12 CIT Scheme VIIM Kolkata West Bengal 700 054 India
| | - Pinak Chakrabarti
- Department of Biochemistry; Bose Institute; P-1/12 CIT Scheme VIIM Kolkata West Bengal 700 054 India
- Bioinformatics Centre; Bose Institute; P-1/12 CIT Scheme VIIM Kolkata West Bengal 700 054 India
| |
Collapse
|
20
|
Abstract
Structural proteomics aims to understand the structural basis of protein interactions and functions. A prerequisite for this is the availability of 3D protein structures that mediate the biochemical interactions. The explosion in the number of available gene sequences set the stage for the next step in genome-scale projects -- to obtain 3D structures for each protein. To achieve this ambitious goal, the slow and costly structure determination experiments are supplemented with theoretical approaches. The current state and recent advances in structure modeling approaches are reviewed here, with special emphasis on comparative protein structure modeling techniques.
Collapse
Affiliation(s)
- András Fiser
- Department of Biochemistry, Seaver Foundation Center for Bioinformatics, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA.
| |
Collapse
|
21
|
Webb B, Eswar N, Fan H, Khuri N, Pieper U, Dong G, Sali A. Comparative Modeling of Drug Target Proteins☆. REFERENCE MODULE IN CHEMISTRY, MOLECULAR SCIENCES AND CHEMICAL ENGINEERING 2014. [PMCID: PMC7157477 DOI: 10.1016/b978-0-12-409547-2.11133-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
In this perspective, we begin by describing the comparative protein structure modeling technique and the accuracy of the corresponding models. We then discuss the significant role that comparative prediction plays in drug discovery. We focus on virtual ligand screening against comparative models and illustrate the state-of-the-art by a number of specific examples.
Collapse
|
22
|
Tsuji M. Local motifs involved in the canonical structure of the ligand-binding domain in the nuclear receptor superfamily. J Struct Biol 2013; 185:355-65. [PMID: 24361687 DOI: 10.1016/j.jsb.2013.12.007] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2013] [Revised: 12/14/2013] [Accepted: 12/16/2013] [Indexed: 11/19/2022]
Abstract
Structural and sequence alignment analyses have revealed the existence of class-dependent and -independent local motifs involved in the overall fold of the ligand-binding domain (LBD) in the nuclear receptor (NR) superfamily. Of these local motifs, three local motifs, i.e., AF-2 fixed motifs, were involved in the agonist conformation of the activation function-2 (AF-2) region of the LBD. Receptor-agonist interactions increased the stability of these AF-2 fixed motifs in the agonist conformation. In contrast, perturbation of the AF-2 fixed motifs by a ligand or another protein molecule led the AF-2 architecture to adopt an antagonist conformation. Knowledge of this process should provide us with novel insights into the 'agonism' and 'antagonism' of NRs.
Collapse
Affiliation(s)
- Motonori Tsuji
- Institute of Molecular Function, 2-105-14 Takasu, Misato-shi, Saitama 341-0037, Japan.
| |
Collapse
|
23
|
Bonet J, Planas-Iglesias J, Garcia-Garcia J, Marín-López MA, Fernandez-Fuentes N, Oliva B. ArchDB 2014: structural classification of loops in proteins. Nucleic Acids Res 2013; 42:D315-9. [PMID: 24265221 PMCID: PMC3964960 DOI: 10.1093/nar/gkt1189] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
The function of a protein is determined by its three-dimensional structure, which is formed by regular (i.e. β-strands and α-helices) and non-periodic structural units such as loops. Compared to regular structural elements, non-periodic, non-repetitive conformational units enclose a much higher degree of variability—raising difficulties in the identification of regularities, and yet represent an important part of the structure of a protein. Indeed, loops often play a pivotal role in the function of a protein and different aspects of protein folding and dynamics. Therefore, the structural classification of protein loops is an important subject with clear applications in homology modelling, protein structure prediction, protein design (e.g. enzyme design and catalytic loops) and function prediction. ArchDB, the database presented here (freely available at http://sbi.imim.es/archdb), represents such a resource and has been an important asset for the scientific community throughout the years. In this article, we present a completely reworked and updated version of ArchDB. The new version of ArchDB features a novel, fast and user-friendly web-based interface, and a novel graph-based, computationally efficient, clustering algorithm. The current version of ArchDB classifies 149,134 loops in 5739 classes and 9608 subclasses.
Collapse
Affiliation(s)
- Jaume Bonet
- Structural Bioinformatics Lab (GRIB-IMIM), Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Barcelona, Catalonia, 08950, Spain and Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, SY23 3DA Aberystwyth, Ceredigion, UK
| | | | | | | | | | | |
Collapse
|
24
|
Soong TT, Hwang MJ, Chen CM. Discovery of Recurrent Structural Motifs for Approximating Three-Dimensional Protein Structures. J CHIN CHEM SOC-TAIP 2013. [DOI: 10.1002/jccs.200400164] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
25
|
Mishra S, Saxena A, Sangwan RS. Fundamentals of Homology Modeling Steps and Comparison among Important Bioinformatics Tools: An Overview. ACTA ACUST UNITED AC 2013. [DOI: 10.17311/sciintl.2013.237.252] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
26
|
Abstract
Loops are irregular structures which connect two secondary structure elements in proteins. They often play important roles in function, including enzyme reactions and ligand binding. Despite their importance, their structure remains difficult to predict. Most protein loop structure prediction methods sample local loop segments and score them. In particular protein loop classifications and database search methods depend heavily on local properties of loops. Here we examine the distance between a loop's end points (span). We find that the distribution of loop span appears to be independent of the number of residues in the loop, in other words the separation between the anchors of a loop does not increase with an increase in the number of loop residues. Loop span is also unaffected by the secondary structures at the end points, unless the two anchors are part of an anti-parallel beta sheet. As loop span appears to be independent of global properties of the protein we suggest that its distribution can be described by a random fluctuation model based on the Maxwell-Boltzmann distribution. It is believed that the primary difficulty in protein loop structure prediction comes from the number of residues in the loop. Following the idea that loop span is an independent local property, we investigate its effect on protein loop structure prediction and show how normalised span (loop stretch) is related to the structural complexity of loops. Highly contracted loops are more difficult to predict than stretched loops.
Collapse
Affiliation(s)
- Yoonjoo Choi
- Department of Computer Science , Dartmouth College , Hanover, NH , USA
| | | | | |
Collapse
|
27
|
Torshin IY, Esipova NG, Tumanyan VG. Alternatingly twisted β-hairpins and nonglycine residues in the disallowed II′ region of the Ramachandran plot. J Biomol Struct Dyn 2013; 32:198-208. [DOI: 10.1080/07391102.2012.759451] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
28
|
Fernandez-Fuentes N, Fiser A. A modular perspective of protein structures: application to fragment based loop modeling. Methods Mol Biol 2013; 932:141-58. [PMID: 22987351 PMCID: PMC3635063 DOI: 10.1007/978-1-62703-065-6_9] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Proteins can be decomposed into supersecondary structure modules. We used a generic definition of supersecondary structure elements, so-called Smotifs, which are composed of two flanking regular secondary structures connected by a loop, to explore the evolution and current variety of structure building blocks. Here, we discuss recent observations about the saturation of Smotif geometries in protein structures and how it opens new avenues in protein structure modeling and design. As a first application of these observations we describe our loop conformation modeling algorithm, ArchPred that takes advantage of Smotifs classification. In this application, instead of focusing on specific loop properties the method narrows down possible template conformations in other, often not homologous structures, by identifying the most likely supersecondary structure environment that cradles the loop. Beyond identifying the correct starting supersecondary structure geometry, it takes into account information of fit of anchor residues, sterical clashes, match of predicted and observed dihedral angle preferences, and local sequence signal.
Collapse
Affiliation(s)
- Narcis Fernandez-Fuentes
- Leeds Institute of Molecular Medicine, Section of Experimental Therapeutics, University of Leeds, St. James's University Hospital, Leeds LS9 7TF, UK
| | - Andras Fiser
- Department of Systems and Computational Biology, Department of Biochemistry Albert Einstein College of Medicine, 1301 Morris Park Ave, Bronx, NY 10461, USA
| |
Collapse
|
29
|
Skliros A, Zimmermann MT, Chakraborty D, Saraswathi S, Katebi AR, Leelananda SP, Kloczkowski A, Jernigan RL. The importance of slow motions for protein functional loops. Phys Biol 2012; 9:014001. [PMID: 22314977 DOI: 10.1088/1478-3975/9/1/014001] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Loops in proteins that connect secondary structures such as alpha-helix and beta-sheet, are often on the surface and may play a critical role in some functions of a protein. The mobility of loops is central for the motional freedom and flexibility requirements of active-site loops and may play a critical role for some functions. The structures and behaviors of loops have not been studied much in the context of the whole structure and its overall motions, especially how these might be coupled. Here we investigate loop motions by using coarse-grained structures (C(α) atoms only) to solve the motions of the system by applying Lagrange equations with elastic network models to learn about which loops move in an independent fashion and which move in coordination with domain motions, faster and slower, respectively. The normal modes of the system are calculated using eigen-decomposition of the stiffness matrix. The contribution of individual modes and groups of modes is investigated for their effects on all residues in each loop by using Fourier analyses. Our results indicate overall that the motions of functional sets of loops behave in similar ways as the whole structure. But overall only a relatively few loops move in coordination with the dominant slow modes of motion, and these are often closely related to function.
Collapse
Affiliation(s)
- Aris Skliros
- L. H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50011, USA. Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | | | | | | | | | | | | | | |
Collapse
|
30
|
Price JL, Culyba EK, Chen W, Murray AN, Hanson SR, Wong CH, Powers ET, Kelly JW. N-glycosylation of enhanced aromatic sequons to increase glycoprotein stability. Biopolymers 2012; 98:195-211. [PMID: 22782562 PMCID: PMC3539202 DOI: 10.1002/bip.22030] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2011] [Revised: 01/17/2012] [Accepted: 01/26/2012] [Indexed: 11/12/2022]
Abstract
N-glycosylation can increase the rate of protein folding, enhance thermodynamic stability, and slow protein unfolding; however, the molecular basis for these effects is incompletely understood. Without clear engineering guidelines, attempts to use N-glycosylation as an approach for stabilizing proteins have resulted in unpredictable energetic consequences. Here, we review the recent development of three "enhanced aromatic sequons," which appear to facilitate stabilizing native-state interactions between Phe, Asn-GlcNAc and Thr when placed in an appropriate reverse turn context. It has proven to be straightforward to engineer a stabilizing enhanced aromatic sequon into glycosylation-naïve proteins that have not evolved to optimize specific protein-carbohydrate interactions. Incorporating these enhanced aromatic sequons into appropriate reverse turn types within proteins should enhance the well-known pharmacokinetic benefits of N-glycosylation-based stabilization by lowering the population of protease-susceptible unfolded and aggregation-prone misfolded states, thereby making such proteins more useful in research and pharmaceutical applications.
Collapse
Affiliation(s)
- Joshua L. Price
- Department of Chemistry and Biochemistry, Brigham Young University, Provo, UT 84602
| | - Elizabeth K. Culyba
- Department of Chemistry, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037
- The Skaggs Institute for Chemical Biology, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037
| | - Wentao Chen
- Department of Chemistry, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037
- The Skaggs Institute for Chemical Biology, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037
| | - Amber N. Murray
- Department of Chemistry, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037
- The Skaggs Institute for Chemical Biology, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037
| | - Sarah R. Hanson
- Department of Chemistry, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037
- The Skaggs Institute for Chemical Biology, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037
| | - Chi-Huey Wong
- Department of Chemistry, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037
- The Skaggs Institute for Chemical Biology, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037
| | - Evan T. Powers
- Department of Chemistry, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037
- The Skaggs Institute for Chemical Biology, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037
| | - Jeffery W. Kelly
- Department of Chemistry, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037
- The Skaggs Institute for Chemical Biology, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 N. Torrey Pines Rd., La Jolla, CA 92037
| |
Collapse
|
31
|
Hollingsworth SA, Lewis MC, Berkholz DS, Wong WK, Karplus PA. (φ,ψ)₂ motifs: a purely conformation-based fine-grained enumeration of protein parts at the two-residue level. J Mol Biol 2011; 416:78-93. [PMID: 22198294 DOI: 10.1016/j.jmb.2011.12.022] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2011] [Revised: 12/05/2011] [Accepted: 12/09/2011] [Indexed: 10/14/2022]
Abstract
A deep understanding of protein structure benefits from the use of a variety of classification strategies that enhance our ability to effectively describe local patterns of conformation. Here, we use a clustering algorithm to analyze 76,533 all-trans segments from protein structures solved at 1.2 Å resolution or better to create a purely φ,ψ-based comprehensive empirical categorization of common conformations adopted by two adjacent φ,ψ pairs (i.e., (φ,ψ)(2) motifs). The clustering algorithm works in an origin-shifted four-dimensional space based on the two φ,ψ pairs to yield a parameter-dependent list of (φ,ψ)(2) motifs, in order of their prominence. The results are remarkably distinct from and complementary to the standard hydrogen-bond-centered view of secondary structure. New insights include an unprecedented level of precision in describing the φ,ψ angles of both previously known and novel motifs, ordering of these motifs by their population density, a data-driven recommendation that the standard C(α(i))…C(α(i+3))<7 Å criteria for defining turns be changed to 6.5 Å, identification of β-strand and turn capping motifs, and identification of conformational capping by residues in polypeptide II conformation. We further document that the conformational preferences of a residue are substantially influenced by the conformation of its neighbors, and we suggest that accounting for these dependencies will improve protein modeling accuracy. Although the CUEVAS-4D(r(10)є(14)) 'parts list' presented here is only an initial exploration of the complex (φ,ψ)(2) landscape of proteins, it shows that there is value to be had from this approach, and it opens the door to more in-depth characterizations at the (φ,ψ)(2) level and at higher dimensions.
Collapse
Affiliation(s)
- Scott A Hollingsworth
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, OR 97331, USA
| | | | | | | | | |
Collapse
|
32
|
Wang LY. COVARIATION ANALYSIS OF LOCAL AMINO ACID SEQUENCES IN RECURRENT PROTEIN LOCAL STRUCTURES. J Bioinform Comput Biol 2011; 3:1391-409. [PMID: 16374913 DOI: 10.1142/s0219720005001648] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2004] [Revised: 07/10/2005] [Accepted: 09/07/2005] [Indexed: 11/18/2022]
Abstract
Local structural information is supposed to be frequently encoded in local amino acid sequences. Previous research only indicated that some local structure positions have specific residue preferences in some particular local structures. However, correlated pairwise replacements for interacting residues in recurrent local structural motifs from unrelated proteins have not been studied systematically. We introduced a new method fusing statistical covariation analysis and local structure-based alignment. Systematic analysis of structure-based multiple alignments of recurrent local structures from unrelated proteins in representative subset of Protein Databank indicates that covarying residue pairs with statistical significance exist in local structural motifs, in particular β-turns and helix caps. These residue pairs are mostly linked through polar functional groups with direct or indirect hydrogen bonding. Hydrophobic interaction is also a major factor in constraining pairwise amino acid residue replacement in recurrent local structures. We also found correlated residue pairs that are not clearly linked with through-space interactions. The physical constrains underlying these covariations are less clear. Overall, covarying residue pairs with statistical significance exist in local structures from unrelated proteins. The existence of sequence covariations in local structural motifs from unrelated proteins indicates that many relics of local relations are still retained in the tertiary structures after protein folding. It supports the notion that some local structural information is encoded in local sequences and the local structural codes could play important roles in determining native state protein folding topology.
Collapse
Affiliation(s)
- Lu-Yong Wang
- Integrated Data Systems Department, Siemens Corporate Research and Center for Computational Biology & Bioingormatics, Columbia University, 755, College Road East, Princeton, New Jersey 08540, USA.
| |
Collapse
|
33
|
Segura J, Oliva B, Fernandez-Fuentes N. CAPS-DB: a structural classification of helix-capping motifs. Nucleic Acids Res 2011; 40:D479-85. [PMID: 22021380 PMCID: PMC3245141 DOI: 10.1093/nar/gkr879] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
The regions of the polypeptide chain immediately preceding or following an α-helix are known as Nt- and Ct cappings, respectively. Cappings play a central role stabilizing α-helices due to lack of intrahelical hydrogen bonds in the first and last turn. Sequence patterns of amino acid type preferences have been derived for cappings but the structural motifs associated to them are still unclassified. CAPS-DB is a database of clusters of structural patterns of different capping types. The clustering algorithm is based in the geometry and the (ϕ–ψ)-space conformation of these regions. CAPS-DB is a relational database that allows the user to search, browse, inspect and retrieve structural data associated to cappings. The contents of CAPS-DB might be of interest to a wide range of scientist covering different areas such as protein design and engineering, structural biology and bioinformatics. The database is accessible at: http://www.bioinsilico.org/CAPSDB.
Collapse
Affiliation(s)
- Joan Segura
- Leeds Institute of Molecular Medicine, Section of Experimental Therapeutics, University of Leeds, St James's University Hospital, Leeds LS9 7TF, UK
| | | | | |
Collapse
|
34
|
Joo H, Chavan AG, Day R, Lennox KP, Sukhanov P, Dahl DB, Vannucci M, Tsai J. Near-native protein loop sampling using nonparametric density estimation accommodating sparcity. PLoS Comput Biol 2011; 7:e1002234. [PMID: 22028638 PMCID: PMC3197639 DOI: 10.1371/journal.pcbi.1002234] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2011] [Accepted: 09/01/2011] [Indexed: 11/29/2022] Open
Abstract
Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM). Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD <2.0 Å), the DPM-HMM method performs as well or better than the best templates, demonstrating that our automated method recaptures these canonical loops without inclusion of any IgG specific terms or manual intervention. In cases with poor or few good templates (mean RMSD >7.0 Å), this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/. A protein's structure consists of elements of regular secondary structure connected by less regular stretches of loop segments. The irregularity of the loop structure makes loop modeling quite challenging. More accurate sampling of these loop conformations has a direct impact on protein modeling, design, function classification, as well as protein interactions. A method has been developed that extends a more comprehensive knowledge-based approach to producing models of the loop regions of protein structure. Most physical models cannot adequately sample the large conformational space, while the more discrete knowledge based libraries are conformationally limited. To address both of these problems, we introduce a novel statistical method that produces a continuous yet weighted estimation of loop conformational space from a discrete library of structures by using a Dirichlet process mixture of hidden Markov models (DPM-HMM). Applied to loop structure sampling, the results of a number of tests demonstrate that our approach quickly generates large numbers of candidates with near native loop conformations. Most significantly, in the cases where the template sampling is sparse and/or far from native conformations, the DPM-HMM method samples close to the native space and produces a population of accurate loop structures.
Collapse
Affiliation(s)
- Hyun Joo
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
| | - Archana G. Chavan
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
| | - Ryan Day
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
| | - Kristin P. Lennox
- Department of Statistics, Texas A&M University, College Station, Texas, United States of America
| | - Paul Sukhanov
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
| | - David B. Dahl
- Department of Statistics, Texas A&M University, College Station, Texas, United States of America
| | - Marina Vannucci
- Department of Statistics, Rice University, Houston, Texas, United States of America
| | - Jerry Tsai
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
- * E-mail:
| |
Collapse
|
35
|
Glycosylation of the enhanced aromatic sequon is similarly stabilizing in three distinct reverse turn contexts. Proc Natl Acad Sci U S A 2011; 108:14127-32. [PMID: 21825145 DOI: 10.1073/pnas.1105880108] [Citation(s) in RCA: 52] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Cotranslational N-glycosylation can accelerate protein folding, slow protein unfolding, and increase protein stability, but the molecular basis for these energetic effects is incompletely understood. N-glycosylation of proteins at naïve sites could be a useful strategy for stabilizing proteins in therapeutic and research applications, but without engineering guidelines, often results in unpredictable changes to protein energetics. We recently introduced the enhanced aromatic sequon as a family of portable structural motifs that are stabilized upon glycosylation in specific reverse turn contexts: a five-residue type I β-turn harboring a G1 β-bulge (using a Phe-Yyy-Asn-Xxx-Thr sequon) and a type II β-turn within a six-residue loop (using a Phe-Yyy-Zzz-Asn-Xxx-Thr sequon) [Culyba EK, et al. (2011) Science 331:571-575]. Here we show that glycosylating a new enhanced aromatic sequon, Phe-Asn-Xxx-Thr, in a type I' β-turn stabilizes the Pin 1 WW domain. Comparing the energetic effects of glycosylating these three enhanced aromatic sequons in the same host WW domain revealed that the glycosylation-mediated stabilization is greatest for the enhanced aromatic sequon complementary to the type I β-turn with a G1 β-bulge. However, the portion of the stabilization from the tripartite interaction between Phe, Asn(GlcNAc), and Thr is similar for each enhanced aromatic sequon in its respective reverse turn context. Adding the Phe-Asn-Xxx-Thr motif (in a type I' β-turn) to the enhanced aromatic sequon family doubles the number of proteins that can be stabilized by glycosylation without having to alter the native reverse turn type.
Collapse
|
36
|
Kitao A. Transform and relax sampling for highly anisotropic systems: Application to protein domain motion and folding. J Chem Phys 2011; 135:045101. [DOI: 10.1063/1.3613676] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
37
|
Regad L, Martin J, Camproux AC. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs. BMC Bioinformatics 2011; 12:247. [PMID: 21689388 PMCID: PMC3158783 DOI: 10.1186/1471-2105-12-247] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2010] [Accepted: 06/20/2011] [Indexed: 12/24/2022] Open
Abstract
Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.
Collapse
|
38
|
Abstract
Unlike proteins, the RNA backbone has numerous degrees of freedom (eight, if one counts the sugar pucker), making RNA modeling, structure building and prediction a multidimensional problem of exceptionally high complexity. And yet RNA tertiary structures are not infinite in their structural morphology; rather, they are built from a limited set of discrete units. In order to reduce the dimensionality of the RNA backbone in a physically reasonable way, a shorthand notation was created that reduced the RNA backbone torsion angles to two (η and θ, analogous to φ and ψ in proteins). When these torsion angles are calculated for nucleotides in a crystallographic database and plotted against one another, one obtains a plot analogous to a Ramachandran plot (the η/θ plot), with highly populated and unpopulated regions. Nucleotides that occupy proximal positions on the plot have identical structures and are found in the same units of tertiary structure. In this review, we describe the statistical validation of the η/θ formalism and the exploration of features within the η/θ plot. We also describe the application of the η/θ formalism in RNA motif discovery, structural comparison, RNA structure building and tertiary structure prediction. More than a tool, however, the η/θ formalism has provided new insights into RNA structure itself, revealing its fundamental components and the factors underlying RNA architectural form.
Collapse
|
39
|
Agarwal G, Mahajan S, Srinivasan N, de Brevern AG. Identification of local conformational similarity in structurally variable regions of homologous proteins using protein blocks. PLoS One 2011; 6:e17826. [PMID: 21445259 PMCID: PMC3060819 DOI: 10.1371/journal.pone.0017826] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2010] [Accepted: 02/15/2011] [Indexed: 11/18/2022] Open
Abstract
Structure comparison tools can be used to align related protein structures to identify structurally conserved and variable regions and to infer functional and evolutionary relationships. While the conserved regions often superimpose well, the variable regions appear non superimposable. Differences in homologous protein structures are thought to be due to evolutionary plasticity to accommodate diverged sequences during evolution. One of the kinds of differences between 3-D structures of homologous proteins is rigid body displacement. A glaring example is not well superimposed equivalent regions of homologous proteins corresponding to α-helical conformation with different spatial orientations. In a rigid body superimposition, these regions would appear variable although they may contain local similarity. Also, due to high spatial deviation in the variable region, one-to-one correspondence at the residue level cannot be determined accurately. Another kind of difference is conformational variability and the most common example is topologically equivalent loops of two homologues but with different conformations. In the current study, we present a refined view of the “structurally variable” regions which may contain local similarity obscured in global alignment of homologous protein structures. As structural alphabet is able to describe local structures of proteins precisely through Protein Blocks approach, conformational similarity has been identified in a substantial number of ‘variable’ regions in a large data set of protein structural alignments; optimal residue-residue equivalences could be achieved on the basis of Protein Blocks which led to improved local alignments. Also, through an example, we have demonstrated how the additional information on local backbone structures through protein blocks can aid in comparative modeling of a loop region. In addition, understanding on sequence-structure relationships can be enhanced through our approach. This has been illustrated through examples where the equivalent regions in homologous protein structures share sequence similarity to varied extent but do not preserve local structure.
Collapse
Affiliation(s)
- Garima Agarwal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Swapnil Mahajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, UAS-GKVK Campus, Bangalore, India
| | | | - Alexandre G. de Brevern
- Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), INSERM, U665, Paris, France
- Université Paris Diderot - Paris 7, UMR-S665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| |
Collapse
|
40
|
Zou D, He Z, He J, Xia Y. Supersecondary structure prediction using Chou's pseudo amino acid composition. J Comput Chem 2010; 32:271-8. [DOI: 10.1002/jcc.21616] [Citation(s) in RCA: 94] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
41
|
Hollingsworth SA, Karplus PA. A fresh look at the Ramachandran plot and the occurrence of standard structures in proteins. Biomol Concepts 2010; 1:271-283. [PMID: 21436958 DOI: 10.1515/bmc.2010.022] [Citation(s) in RCA: 197] [Impact Index Per Article: 14.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
The Ramachandran plot is among the most central concepts in structural biology, seen in publications and textbooks alike. However, with the increasing numbers of known protein-structures and greater accuracy of ultra-high resolution protein structures, we are still learning more about the basic principles of protein structure. Here we use high fidelity conformational information to explore novel ways, such a geo-style and wrapped Ramachandran plots, to convey some of the basic aspects of the Ramachandran plot and of protein conformation. We point out the pressing need for a standard nomenclature for peptide conformation and propose such a nomenclature. Finally, we summarize some recent conceptual advances related to the building blocks of protein structure. The results for linear groups imply the need for substantive revisions in how the basics of protein structure are handled.
Collapse
Affiliation(s)
- Scott A Hollingsworth
- Department of Biochemistry & Biophysics, Oregon State University, Corvallis, OR 97331
| | | |
Collapse
|
42
|
Skliros A, Jernigan RL, Kloczkowski A. Models to Approximate the Motions of Protein Loops. J Chem Theory Comput 2010; 6:3249-3258. [PMID: 21031141 DOI: 10.1021/ct1001413] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We approximate the loop motions of various proteins by using a coarse-grained model and the theory of rubberlike elasticity of polymer chains. The loops are considered as chains where only the first and the last residues thereof are tethered by their connections to the main structure; while within the loop, the loop residues are connected only to their sequence neighbors. We applied these approximate models to five proteins. Our approximation shows that the loop motions can usually be computed locally which shows these motions are robust and not random. But most interestingly, the new method presented here can be used to compute the likely motions of loops that are missing in the structures.
Collapse
Affiliation(s)
- Aris Skliros
- L. H. Baker Center for Bioinformatics and Biological Statistics, Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | | | | |
Collapse
|
43
|
Hayward S, Kitao A. The effect of end constraints on protein loop kinematics. Biophys J 2010; 98:1976-85. [PMID: 20441762 DOI: 10.1016/j.bpj.2010.01.017] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2009] [Revised: 01/08/2010] [Accepted: 01/11/2010] [Indexed: 11/17/2022] Open
Abstract
Despite the prevalent involvement of loops in function little is known about how the constraining of end groups influences their kinematics. Using a linear inverse-kinematics approach and assuming fixed bond lengths, bond angles, and peptide bond torsions, as well as ignoring molecular interactions to assess the effect of the end-constraint only, it is shown that the constraint creates a closed surface in torsion angle space. For pentapeptides, the constraint gives rise to inaccessible regions in a Ramachandran plot. This complex and tightly curved surface produces interesting effects that may play a functional role. For example, a small change in one torsion angle can radically change the behavior of the whole loop. The constraint also produces long-range correlations, and structures exist where the correlation coefficient is 1.0 or -1.0 between rotations about bonds separated by >30 A. Another application allows some torsion angles to be targeted to specified values while others are constrained. When this application was used on key torsions in lactate dehydrogenase, it was found that the functional loop first folds forward and then moves sideways. For horse liver alcohol dehydrogenase, it was confirmed that the functional loop's Pro-Pro motif creates a rigid arm in an NAD-activated switch for domain closure.
Collapse
Affiliation(s)
- Steven Hayward
- D'Arcy Thompson Centre for Computational Biology, School of Computing Sciences, University of East Anglia, Norwich, United Kingdom.
| | | |
Collapse
|
44
|
Application of biasing-potential replica-exchange simulations for loop modeling and refinement of proteins in explicit solvent. Proteins 2010; 78:2809-19. [DOI: 10.1002/prot.22796] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
45
|
Fernandez-Fuentes N, Dybas JM, Fiser A. Structural characteristics of novel protein folds. PLoS Comput Biol 2010; 6:e1000750. [PMID: 20421995 PMCID: PMC2858679 DOI: 10.1371/journal.pcbi.1000750] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2009] [Accepted: 03/19/2010] [Indexed: 11/29/2022] Open
Abstract
Folds are the basic building blocks of protein structures. Understanding the emergence of novel protein folds is an important step towards understanding the rules governing the evolution of protein structure and function and for developing tools for protein structure modeling and design. We explored the frequency of occurrences of an exhaustively classified library of supersecondary structural elements (Smotifs), in protein structures, in order to identify features that would define a fold as novel compared to previously known structures. We found that a surprisingly small set of Smotifs is sufficient to describe all known folds. Furthermore, novel folds do not require novel Smotifs, but rather are a new combination of existing ones. Novel folds can be typified by the inclusion of a relatively higher number of rarely occurring Smotifs in their structures and, to a lesser extent, by a novel topological combination of commonly occurring Smotifs. When investigating the structural features of Smotifs, we found that the top 10% of most frequent ones have a higher fraction of internal contacts, while some of the most rare motifs are larger, and contain a longer loop region. Structural genomics efforts aim at exploring the repertoire of three-dimensional structures of protein molecules. While genome scale sequencing projects have already provided us with all the genes of many organisms, it is the three dimensional shape of gene encoded proteins that defines all the interactions among these components. Understanding the versatility and, ultimately, the role of all possible molecular shapes in the cell is a necessary step toward understanding how organisms function. In this work we explored the rules that identify certain shapes as novel compared to all already known structures. The findings of this work provide possible insights into the rules that can be used in future works to identify or design new molecular shapes or to relate folds with each other in a quantitative manner.
Collapse
Affiliation(s)
- Narcis Fernandez-Fuentes
- University of Leeds, Leeds Institute of Molecular Medicine Section of Experimental Therapeutics, St. James's University Hospital, Leeds, United Kingdom
| | - Joseph M. Dybas
- Department of Systems and Computational Biology, Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York, United States of America
| | - Andras Fiser
- Department of Systems and Computational Biology, Department of Biochemistry, Albert Einstein College of Medicine, Bronx, New York, United States of America
- * E-mail:
| |
Collapse
|
46
|
Faraggi E, Yang Y, Zhang S, Zhou Y. Predicting continuous local structure and the effect of its substitution for secondary structure in fragment-free protein structure prediction. Structure 2010; 17:1515-27. [PMID: 19913486 DOI: 10.1016/j.str.2009.09.006] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2009] [Revised: 09/01/2009] [Accepted: 09/03/2009] [Indexed: 11/30/2022]
Abstract
Local structures predicted from protein sequences are used extensively in every aspect of modeling and prediction of protein structure and function. For more than 50 years, they have been predicted at a low-resolution coarse-grained level (e.g., three-state secondary structure). Here, we combine a two-state classifier with real-value predictor to predict local structure in continuous representation by backbone torsion angles. The accuracy of the angles predicted by this approach is close to that derived from NMR chemical shifts. Their substitution for predicted secondary structure as restraints for ab initio structure prediction doubles the success rate. This result demonstrates the potential of predicted local structure for fragment-free tertiary-structure prediction. It further implies potentially significant benefits from using predicted real-valued torsion angles as a replacement for or supplement to the secondary-structure prediction tools used almost exclusively in many computational methods ranging from sequence alignment to function prediction.
Collapse
Affiliation(s)
- Eshel Faraggi
- Indiana University School of Informatics, Indiana University-Purdue University and Center for Computational Biology and Bioinformatics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | | | | | | |
Collapse
|
47
|
Mining protein loops using a structural alphabet and statistical exceptionality. BMC Bioinformatics 2010; 11:75. [PMID: 20132552 PMCID: PMC2833150 DOI: 10.1186/1471-2105-11-75] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2009] [Accepted: 02/04/2010] [Indexed: 12/21/2022] Open
Abstract
Background Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. Results We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints. Conclusions We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/.
Collapse
|
48
|
Abstract
Functional characterization of a protein is often facilitated by its 3D structure. However, the fraction of experimentally known 3D models is currently less than 1% due to the inherently time-consuming and complicated nature of structure determination techniques. Computational approaches are employed to bridge the gap between the number of known sequences and that of 3D models. Template-based protein structure modeling techniques rely on the study of principles that dictate the 3D structure of natural proteins from the theory of evolution viewpoint. Strategies for template-based structure modeling will be discussed with a focus on comparative modeling, by reviewing techniques available for all the major steps involved in the comparative modeling pipeline.
Collapse
Affiliation(s)
- Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|
49
|
Kountouris P, Hirst JD. Prediction of backbone dihedral angles and protein secondary structure using support vector machines. BMC Bioinformatics 2009; 10:437. [PMID: 20025785 PMCID: PMC2811710 DOI: 10.1186/1471-2105-10-437] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2009] [Accepted: 12/22/2009] [Indexed: 11/26/2022] Open
Abstract
Background The prediction of the secondary structure of a protein is a critical step in the prediction of its tertiary structure and, potentially, its function. Moreover, the backbone dihedral angles, highly correlated with secondary structures, provide crucial information about the local three-dimensional structure. Results We predict independently both the secondary structure and the backbone dihedral angles and combine the results in a loop to enhance each prediction reciprocally. Support vector machines, a state-of-the-art supervised classification technique, achieve secondary structure predictive accuracy of 80% on a non-redundant set of 513 proteins, significantly higher than other methods on the same dataset. The dihedral angle space is divided into a number of regions using two unsupervised clustering techniques in order to predict the region in which a new residue belongs. The performance of our method is comparable to, and in some cases more accurate than, other multi-class dihedral prediction methods. Conclusions We have created an accurate predictor of backbone dihedral angles and secondary structure. Our method, called DISSPred, is available online at http://comp.chem.nottingham.ac.uk/disspred/.
Collapse
Affiliation(s)
- Petros Kountouris
- School of Chemistry, University of Nottingham, University Park, Nottingham NG7 2RD, UK.
| | | |
Collapse
|
50
|
Zou D, He Z, He J. Beta-hairpin prediction with quadratic discriminant analysis using diversity measure. J Comput Chem 2009; 30:2277-84. [PMID: 19263434 DOI: 10.1002/jcc.21229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
On the basis of the features of protein sequential pattern, we used the method of increment of diversity combined with quadratic discriminant analysis (IDQD) to predict beta-hairpins motifs in protein sequences. Three rules are used to extract the raw beta-beta motifs sequential patterns for fixed-length. Amino acid basic compositions, dipeptide components, and amino acid composition distribution are combined to represent the compositional features. Eighteen feature variables on a sequential pattern to be predicted are defined in terms of ID. They are integrated in a single formal framework given by IDQD. The method is trained and tested on ArchDB40 dataset containing 3088 proteins. The overall accuracy of prediction and Matthew's correlation coefficient for the independent testing dataset are 81.7% and 0.60, respectively. In addition, a higher accuracy of 84.5% and Matthew's correlation coefficient of 0.68 for the independent testing dataset are obtained on a dataset previously used by Kumar et al. (Nucleic Acids Res 2005, 33, 154), which contains 2088 proteins. For a fair assessment of our method, the performance is also evaluated on all 63 proteins used in CASP6. The overall accuracy of prediction is 74.2% for the independent testing dataset.
Collapse
Affiliation(s)
- Dongsheng Zou
- College of Computer Science, Chongqing University, Chongqing 400044, China.
| | | | | |
Collapse
|