1
|
Górna MW, Merski M. Discovery and Analysis of Repeat and Low-Complexity Architectures in Proteins and Their Conserved Evolutionary Relationships Using Self-Homology Dot Plots. Methods Mol Biol 2025; 2870:95-116. [PMID: 39543033 DOI: 10.1007/978-1-0716-4213-9_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
Proteins that contain sequence repetitions and low complexity regions can be analyzed using self-homology dot plot analysis. Dot plots can readily identify protein sequence repeats; the number of repeats and their length and location within the protein sequence are readily identifiable from the dot plots without the need to pre-define any of these attributes, making this method largely model-independent. We discuss the criteria for statistical identification of protein repeats and recommend simple ways of identifying protein repeats. While higher levels of sequence conservation within the repeats do make them easier to formally identify, this method can identify protein repeats with fairly low levels of conservation, as well as notably non-tandem repetitions with sizeable sections of complex, non-repeat sequence separating the individual repeat instances. Furthermore, even simple visual examination of these dot plots can discover conserved patterns within families of closely related proteins, and the level of this conservation can be readily quantified using a Jaccard index. Exhaustive pairwise comparisons can be assembled using hierarchical clustering methods to get a picture of the conserved repeat architectures within families of repeat proteins.
Collapse
Affiliation(s)
- Maria W Górna
- Structural Biology Group, Biological and Chemical Research Centre, Faculty of Chemistry, University of Warsaw, Warsaw, Poland
| | - Matthew Merski
- i3S - Instituto de Investigação e Inovação em Saúde, Universidade do Porto, Porto, Portugal
| |
Collapse
|
2
|
Arrías PN, Osmanli Z, Peralta E, Chinestrad PM, Monzon AM, Tosatto SCE. Diversity and structural-functional insights of alpha-solenoid proteins. Protein Sci 2024; 33:e5189. [PMID: 39465903 PMCID: PMC11514114 DOI: 10.1002/pro.5189] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Revised: 09/25/2024] [Accepted: 09/29/2024] [Indexed: 10/29/2024]
Abstract
Alpha-solenoids are a significant and diverse subset of structured tandem repeat proteins (STRPs) that are important in various domains of life. This review examines their structural and functional diversity and highlights their role in critical cellular processes such as signaling, apoptosis, and transcriptional regulation. Alpha-solenoids can be classified into three geometric folds: low curvature, high curvature, and corkscrew, as well as eight subfolds: ankyrin repeats; Huntingtin, elongation factor 3, protein phosphatase 2A, and target of rapamycin; armadillo repeats; tetratricopeptide repeats; pentatricopeptide repeats; Pumilio repeats; transcription activator-like; and Sel-1 and Sel-1-like repeats. These subfolds represent distinct protein families with unique structural properties and functions, highlighting the versatility of alpha-solenoids. The review also discusses their association with disease, highlighting their potential as therapeutic targets and their role in protein design. Advances in state-of-the-art structure prediction methods provide new opportunities and challenges in the functional characterization and classification of this kind of fold, emphasizing the need for continued development of methods for their identification and proper data curation and deposition in the main databases.
Collapse
Affiliation(s)
- Paula Nazarena Arrías
- Department of Biomedical SciencesUniversity of PadovaPadovaItaly
- Department of Protein ScienceKTH Royal Institute of TechnologyStockholmSweden
| | - Zarifa Osmanli
- Department of Biomedical SciencesUniversity of PadovaPadovaItaly
| | - Estefanía Peralta
- Laboratorio de Investigación y Desarrollo de Bioactivos (LIDeB), Departamento de Ciencias Biológicas, Facultad de Ciencias ExactasUniversidad Nacional de La PlataLa PlataBuenos AiresArgentina
| | | | | | - Silvio C. E. Tosatto
- Department of Biomedical SciencesUniversity of PadovaPadovaItaly
- Institute of Biomembranes, Bioenergetics and Molecular BiotechnologiesNational Research Council (CNR‐IBIOM)BariItaly
| |
Collapse
|
3
|
Mezgec K, Snoj J, Ulčakar L, Ljubetič A, Tušek Žnidarič M, Škarabot M, Jerala R. Coupling of Spectrin Repeat Modules for the Assembly of Nanorods and Presentation of Protein Domains. ACS NANO 2024; 18:28748-28763. [PMID: 39392430 PMCID: PMC11503911 DOI: 10.1021/acsnano.4c07701] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 09/25/2024] [Accepted: 10/01/2024] [Indexed: 10/12/2024]
Abstract
Modular protein engineering is a powerful approach for fabricating high-molecular-weight assemblies and biomaterials with nanoscale precision. Herein, we address the challenge of designing an extended nanoscale filamentous architecture inspired by the central rod domain of human dystrophin, which protects sarcolemma during muscle contraction and consists of spectrin repeats composed of three-helical bundles. A module of three tandem spectrin repeats was used as a rigid building block self-assembling via coiled-coil (CC) dimer-forming peptides. CC peptides were precisely integrated to maintain the spectrin α-helix continuity in an appropriate frame to form extended nanorods. An orthogonal set of customizable CC heterodimers was harnessed for modular rigid domain association, which could be additionally regulated by metal ions and chelators. We achieved a robust assembly of rigid rods several micrometers in length, determined by atomic force microscopy and negative stain transmission electron microscopy. Furthermore, these rigid rods can serve as a scaffold for the decoration of diverse proteins or biologically active peptides along their length with adjustable spacing up to tens of nanometers, as confirmed by the DNA-PAINT super-resolution microscopy. This demonstrates the potential of modular bottom-up protein engineering and tunable CCs for the fabrication of functionalized protein biomaterials.
Collapse
Affiliation(s)
- Klemen Mezgec
- Department
of Synthetic Biology and Immunology, National
Institute of Chemistry, SI-1000 Ljubljana, Slovenia
- Graduate
School of Biomedicine, University of Ljubljana, SI-1000 Ljubljana, Slovenia
| | - Jaka Snoj
- Department
of Synthetic Biology and Immunology, National
Institute of Chemistry, SI-1000 Ljubljana, Slovenia
- Graduate
School of Biomedicine, University of Ljubljana, SI-1000 Ljubljana, Slovenia
| | - Liza Ulčakar
- Department
of Synthetic Biology and Immunology, National
Institute of Chemistry, SI-1000 Ljubljana, Slovenia
- Graduate
School of Biomedicine, University of Ljubljana, SI-1000 Ljubljana, Slovenia
| | - Ajasja Ljubetič
- Department
of Synthetic Biology and Immunology, National
Institute of Chemistry, SI-1000 Ljubljana, Slovenia
- EN-FIST
Centre of Excellence, SI-1000 Ljubljana, Slovenia
| | - Magda Tušek Žnidarič
- Department
of Biotechnology and Systems Biology, National
Institute of Biology, SI-1000 Ljubljana, Slovenia
| | - Miha Škarabot
- Condensed
Matter Department, Jozef Stefan Institute, SI-1000 Ljubljana, Slovenia
| | - Roman Jerala
- Department
of Synthetic Biology and Immunology, National
Institute of Chemistry, SI-1000 Ljubljana, Slovenia
- CTGCT, Centre
of Technology of Gene and Cell Therapy, Hajdrihova 19, SI-1000 Ljubljana, Slovenia
| |
Collapse
|
4
|
Machulin AV, Deryusheva EI, Galzitskaya OV. Variation in base composition, structure-function relationships, and origins of structural repetition in bacterial rpsA gene. Biosystems 2024; 238:105196. [PMID: 38537772 DOI: 10.1016/j.biosystems.2024.105196] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2023] [Revised: 03/22/2024] [Accepted: 03/22/2024] [Indexed: 04/12/2024]
Abstract
Protein domain repeats are known to arise due to tandem duplications of internal genes. However, the understanding of the underlying mechanisms of this process is incomplete. The goal of this work was to investigate the mechanism of occurrence of repeat expansion based on studying the sequences of 1324 rpsA genes of bacterial S1 ribosomal proteins containing different numbers of S1 structural domains. The rpsA gene encodes ribosomal S1 protein, which is essential for cell viability as it interacts with both mRNA and proteins. Gene ontology (GO) analysis of S1 domains in ribosomal S1 proteins revealed that bacterial protein sequences in S1 mainly have 3 types of molecular functions: RNA binding activity, nucleic acid activity, and ribosome structural component. Our results show that the maximum value of rpsA gene identity for full-length proteins was found for S1 proteins containing six structural domains (58%). Analysis of consensus sequences showed that parts of the rpsA gene encoding separate S1 domains have no a strictly repetitive structure between groups containing different numbers of S1 domains. At the same time, gene regions encoding some conserved residues that form the RNA-binding site remain conserved. The detected phylogenetic similarity suggests that the proposed fold of the rpsA translation initiation region of Escherichia coli has functional value and is important for translational control of rpsA gene expression in other bacterial phyla, but not only in gamma Proteobacteria.
Collapse
Affiliation(s)
- Andrey V Machulin
- Skryabin Institute of Biochemistry and Physiology of Microorganisms, Russian Academy of Sciences, Federal Research Center "Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences", 142290, Pushchino, Moscow Region, Russia
| | - Evgeniya I Deryusheva
- Institute for Biological Instrumentation, Federal Research Center "Pushchino Scientific Center for Biological Research of the Russian Academy of Sciences", 142290, Pushchino, Moscow Region, Russia
| | - Oxana V Galzitskaya
- Institute of Protein Research, Russian Academy of Sciences, 142290, Pushchino, Moscow Region, Russia; Institute of Theoretical and Experimental Biophysics, Russian Academy of Sciences, 142290, Pushchino, Moscow Region, Russia.
| |
Collapse
|
5
|
Nowakowska AW, Wojciechowski JW, Szulc N, Kotulska M. The role of tandem repeats in bacterial functional amyloids. J Struct Biol 2023; 215:108002. [PMID: 37482232 DOI: 10.1016/j.jsb.2023.108002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Revised: 07/05/2023] [Accepted: 07/20/2023] [Indexed: 07/25/2023]
Abstract
Repetitivity and modularity of proteins are two related notions incorporated into multiple evolutionary concepts. We discuss whether they may also be essential for functional amyloids. Amyloids are proteins that create very regular and usually highly insoluble fibrils, which are often associated with neurodegeneration. However, recent discoveries showed that amyloid structure of a protein could also be beneficial and desired, e.g., to promote cell adhesion. Functional amyloids are proteins which differ in their characteristics from pathological amyloids, so that the fibril formation could be more under control of an organism. We propose that repeats in the sequence could regulate the aggregation propensity of these proteins. The inclusion of multiple symmetric interactions, due to the presence of the repeats, could be supporting and strengthening the desirable structural properties of functional amyloids. Our results show that tandem repeats in bacterial functional amyloids have a distinct characteristic. The pattern of repeats supports the appropriate level of fibril formation and better controllability of fibril stability. The repeats tend to be more imperfect, which attenuates excessive aggregation propensity. Their desired structure and function are also reinforced by their amino acid profile. Although in the study we focused on bacterial functional amyloids, due to their importance in biofilm formation, we propose that similar mechanisms could be employed in other functional amyloids which are designed by evolution to aggregate in a desirable manner, but not necessarily in pathological amyloids.
Collapse
Affiliation(s)
- Alicja W Nowakowska
- Wrocław University of Science and Technology, Department of Biomedical Engineering, Poland.
| | - Jakub W Wojciechowski
- Wrocław University of Science and Technology, Department of Biomedical Engineering, Poland
| | - Natalia Szulc
- Wrocław University of Science and Technology, Department of Biomedical Engineering, Poland; Wrocław University of Environmental and Life Sciences, Department of Physics and Biophysics, Poland; LPCT, CNRS, Universite de Lorraine, F-54000 Nancy, France
| | - Malgorzata Kotulska
- Wrocław University of Science and Technology, Department of Biomedical Engineering, Poland.
| |
Collapse
|
6
|
Mesdaghi S, Price RM, Madine J, Rigden DJ. Deep Learning-based structure modelling illuminates structure and function in uncharted regions of β-solenoid fold space. J Struct Biol 2023; 215:108010. [PMID: 37544372 DOI: 10.1016/j.jsb.2023.108010] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 07/19/2023] [Accepted: 08/03/2023] [Indexed: 08/08/2023]
Abstract
Repeat proteins are common in all domains of life and exhibit a wide range of functions. One class of repeat protein contains solenoid folds where the repeating unit consists of β-strands separated by tight turns. β-solenoids have distinguishing structural features such as handedness, twist, oligomerisation state, coil shape and size which give rise to their diversity. Characterised β-solenoid repeat proteins are known to form regions in bacterial and viral virulence factors, antifreeze proteins and functional amyloids. For many of these proteins, the experimental structure has not been solved, as they are difficult to crystallise or model. Here we use various deep learning-based structure-modelling methods to discover novel predicted β-solenoids, perform structural database searches to mine further structural neighbours and relate their predicted structure to possible functions. We find both eukaryotic and prokaryotic adhesins, confirming a known functional linkage between adhesin function and the β-solenoid fold. We further identify exceptionally long, flat β-solenoid folds as possible structures of mucin tandem repeat regions and unprecedentedly small β-solenoid structures. Additionally, we characterise a novel β-solenoid coil shape, the FapC Greek key β-solenoid as well as plausible complexes between it and other proteins involved in Pseudomonas functional amyloid fibres.
Collapse
Affiliation(s)
- Shahram Mesdaghi
- The University of Liverpool, Institute of Systems, Molecular & Integrative Biology, Biosciences Building, Crown Street, Liverpool L69 7ZB, United Kingdom; Computational Biology Facility, MerseyBio, University of Liverpool, Crown Street, Liverpool L69 7ZB, United Kingdom
| | - Rebecca M Price
- The University of Liverpool, Institute of Systems, Molecular & Integrative Biology, Biosciences Building, Crown Street, Liverpool L69 7ZB, United Kingdom
| | - Jillian Madine
- The University of Liverpool, Institute of Systems, Molecular & Integrative Biology, Biosciences Building, Crown Street, Liverpool L69 7ZB, United Kingdom.
| | - Daniel J Rigden
- The University of Liverpool, Institute of Systems, Molecular & Integrative Biology, Biosciences Building, Crown Street, Liverpool L69 7ZB, United Kingdom.
| |
Collapse
|
7
|
Manasra S, Kajava AV. Why does the first protein repeat often become the only one? J Struct Biol 2023; 215:108014. [PMID: 37567371 DOI: 10.1016/j.jsb.2023.108014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 08/06/2023] [Accepted: 08/09/2023] [Indexed: 08/13/2023]
Abstract
Proteins with two similar motifs in tandem are one of the most common cases of tandem repeat proteins. The question arises: why is the first emerged repeat frequently fixed in the process of evolution, despite the ample opportunities to continue its multiplication at the DNA level? To answer this question, we systematically analyzed the structure and function of these proteins. Our analysis showed that, in the vast majority of cases, the structural repetitive units have a two-fold (C2) internal symmetry. These closed structures provide an internal structural limitation for the subsequent growth of the repeat number. Frequently, the units "swap" their secondary structure elements with each other. Moreover, the duplicated domains, in contrast to other tandem repeat proteins, form binding sites for small molecules around the axis of C2 symmetry. Thus, the closure of the C2 structures and the emergence of new functional sites around the axis of C2 symmetry provide plausible explanations for why a repeat, once appeared, becomes fixed in the evolutionary process. We have placed these structures within the general structural classification of tandem repeat proteins, classifying them as either Class IV or V depending on the size of the repetitive unit.
Collapse
Affiliation(s)
- Simona Manasra
- Institute of Bioengineering, ITMO University, Kronverksky Pr. 49, 197101 Saint Petersburg, Russia
| | - Andrey V Kajava
- Centre de Recherche en Biologie cellulaire de Montpellier (CRBM), UMR 5237 CNRS, Université Montpellier, 1919 Route de Mende, Cedex 5, 34293 Montpellier, France.
| |
Collapse
|
8
|
Mitochondrial COA7 is a heme-binding protein with disulfide reductase activity, which acts in the early stages of complex IV assembly. Proc Natl Acad Sci U S A 2022; 119:2110357119. [PMID: 35210360 PMCID: PMC8892353 DOI: 10.1073/pnas.2110357119] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/11/2022] [Indexed: 12/12/2022] Open
Abstract
Assembly factors play key roles in the biogenesis of mitochondrial protein complexes, regulating their stabilities, activities, and incorporation of essential cofactors. Cytochrome c oxidase assembly factor 7 (COA7) is a metazoan-specific assembly factor, the absence or mutation of which in humans accompanies complex IV assembly defects and neurological conditions. Here, we report the crystal structure of COA7 to 2.4 Å resolution, revealing a banana-shaped molecule composed of five helix-turn-helix (α/α) repeats. COA7 binds heme with micromolar affinity, even though the protein structure does not resemble previously characterized heme-binding proteins. The heme-bound COA7 can redox cycle between oxidation states Fe(II) and Fe(III) and shows disulfide reductase activity toward copper binding assembly factors. We propose that COA7 functions to facilitate the biogenesis of the binuclear copper site (CuA) of complex IV. Cytochrome c oxidase (COX) assembly factor 7 (COA7) is a metazoan-specific assembly factor, critical for the biogenesis of mitochondrial complex IV (cytochrome c oxidase). Although mutations in COA7 have been linked to complex IV assembly defects and neurological conditions such as peripheral neuropathy, ataxia, and leukoencephalopathy, the precise role COA7 plays in the biogenesis of complex IV is not known. Here, we show that loss of COA7 blocks complex IV assembly after the initial step where the COX1 module is built, progression from which requires the incorporation of copper and addition of the COX2 and COX3 modules. The crystal structure of COA7, determined to 2.4 Å resolution, reveals a banana-shaped molecule composed of five helix-turn-helix (α/α) repeats, tethered by disulfide bonds. COA7 interacts transiently with the copper metallochaperones SCO1 and SCO2 and catalyzes the reduction of disulfide bonds within these proteins, which are crucial for copper relay to COX2. COA7 binds heme with micromolar affinity, through axial ligation to the central iron atom by histidine and methionine residues. We therefore propose that COA7 is a heme-binding disulfide reductase for regenerating the copper relay system that underpins complex IV assembly.
Collapse
|
9
|
Chakrabarty B, Parekh N. DbStRiPs: Database of structural repeats in proteins. Protein Sci 2022; 31:23-36. [PMID: 33641184 PMCID: PMC8740836 DOI: 10.1002/pro.4052] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2020] [Revised: 02/11/2021] [Accepted: 02/15/2021] [Indexed: 01/03/2023]
Abstract
Recent interest in repeat proteins has arisen due to stable structural folds, high evolutionary conservation and repertoire of functions provided by these proteins. However, repeat proteins are poorly characterized because of high sequence variation between repeating units and structure-based identification and classification of repeats is desirable. Using a robust network-based pipeline, manual curation and Kajava's structure-based classification schema, we have developed a database of tandem structural repeats, Database of Structural Repeats in Proteins (DbStRiPs). A unique feature of this database is that available knowledge on sequence repeat families is incorporated by mapping Pfam classification scheme onto structural classification. Integration of sequence and structure-based classifications help in identifying different functional groups within the same structural subclass, leading to refinement in the annotation of repeat proteins. Analysis of complete Protein Data Bank revealed 16,472 repeat annotations in 15,141 protein chains, one previously uncharacterized novel protein repeat family (PRF), named left-handed beta helix, and 33 protein repeat clusters (PRCs). Based on their unique structural motif, ~79% of these repeat proteins are classified in one of the 14 PRFs or 33 PRCs, and the remaining are grouped as unclassified repeat proteins. Each repeat protein is provided with a detailed annotation in DbStRiPs that includes start and end boundaries of repeating units, copy number, secondary and tertiary structure view, repeat class/subclass, disease association, MSA of repeating units and cross-references to various protein pattern databases, human protein atlas and interaction resources. DbStRiPs provides easy search and download options to high-quality annotations of structural repeat proteins (URL: http://bioinf.iiit.ac.in/dbstrips/).
Collapse
Affiliation(s)
- Broto Chakrabarty
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information TechnologyHyderabadIndia
| | - Nita Parekh
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information TechnologyHyderabadIndia
| |
Collapse
|
10
|
Deryusheva EI, Machulin AV, Galzitskaya OV. Structural, Functional, and Evolutionary Characteristics of Proteins with Repeats. Mol Biol 2021. [DOI: 10.1134/s0026893321040038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
11
|
Parisi G, Palopoli N, Tosatto SC, Fornasari MS, Tompa P. "Protein" no longer means what it used to. Curr Res Struct Biol 2021; 3:146-152. [PMID: 34308370 PMCID: PMC8283027 DOI: 10.1016/j.crstbi.2021.06.002] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2021] [Revised: 06/18/2021] [Accepted: 06/22/2021] [Indexed: 01/02/2023] Open
Abstract
Every biologist knows that the word protein describes a group of macromolecules essential to sustain life on Earth. As biologists, we are invariably trained under a protein paradigm established since the early twentieth century. However, in recent years, the term protein unveiled itself as an euphemism to describe the overwhelming heterogeneity of these compounds. Most of our current studies are targeted on carefully selected subsets of proteins, but we tend to think and write about these as representative of the whole population. Here we discuss how seeking for universal definitions and general rules in any arbitrarily segmented study would be misleading about the conclusions. Of course, it is not our purpose to discourage the use of the word protein. Instead, we suggest to embrace the extended universe of proteins to reach a deeper understanding of their full potential, realizing that the term encompasses a group of molecules very heterogeneous in terms of size, shape, chemistry and functions, i.e. the term protein no longer means what it used to.
Collapse
Affiliation(s)
- Gustavo Parisi
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Nicolas Palopoli
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | | | - María Silvina Fornasari
- Departamento de Ciencia y Tecnología, Universidad Nacional de Quilmes, CONICET, Bernal, Buenos Aires, Argentina
| | - Peter Tompa
- VIB-VUB Center for Structural Biology (CSB), Brussels, Belgium
- Structural Biology Brussels (SBB), Vrije Universiteit Brussel (VUB), Brussels, Belgium
- Institute of Enzymology, Research Centre for Natural Sciences, Budapest, Hungary
| |
Collapse
|
12
|
Izert MA, Szybowska PE, Górna MW, Merski M. The Effect of Mutations in the TPR and Ankyrin Families of Alpha Solenoid Repeat Proteins. FRONTIERS IN BIOINFORMATICS 2021; 1:696368. [PMID: 36303725 PMCID: PMC9581033 DOI: 10.3389/fbinf.2021.696368] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Accepted: 06/22/2021] [Indexed: 11/20/2022] Open
Abstract
Protein repeats are short, highly similar peptide motifs that occur several times within a single protein, for example the TPR and Ankyrin repeats. Understanding the role of mutation in these proteins is complicated by the competing facts that 1) the repeats are much more restricted to a set sequence than non-repeat proteins, so mutations should be harmful much more often because there are more residues that are heavily restricted due to the need of the sequence to repeat and 2) the symmetry of the repeats in allows the distribution of functional contributions over a number of residues so that sometimes no specific site is singularly responsible for function (unlike enzymatic active site catalytic residues). To address this issue, we review the effects of mutations in a number of natural repeat proteins from the tetratricopeptide and Ankyrin repeat families. We find that mutations are context dependent. Some mutations are indeed highly disruptive to the function of the protein repeats while mutations in identical positions in other repeats in the same protein have little to no effect on structure or function.
Collapse
Affiliation(s)
| | | | | | - Matthew Merski
- *Correspondence: Maria Wiktoria Górna, ; Matthew Merski,
| |
Collapse
|
13
|
Rudenko V, Korotkov E. Search for Highly Divergent Tandem Repeats in Amino Acid Sequences. Int J Mol Sci 2021; 22:ijms22137096. [PMID: 34281150 PMCID: PMC8269118 DOI: 10.3390/ijms22137096] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 06/25/2021] [Accepted: 06/28/2021] [Indexed: 11/29/2022] Open
Abstract
We report a Method to Search for Highly Divergent Tandem Repeats (MSHDTR) in protein sequences which considers pairwise correlations between adjacent residues. MSHDTR was compared with some previously developed methods for searching for tandem repeats (TRs) in amino acid sequences, such as T-REKS and XSTREAM, which focus on the identification of TRs with significant sequence similarity, whereas MSHDTR detects repeats that significantly diverged during evolution, accumulating deletions, insertions, and substitutions. The application of MSHDTR to a search of the Swiss-Prot databank revealed over 15 thousand TR-containing amino acid sequences that were difficult to find using the other methods. Among the detected TRs, the most representative were those with consensus lengths of two and seven residues; these TRs were subjected to cluster analysis and the classes of patterns were identified. All TRs detected in this study have been combined into a databank accessible over the WWW.
Collapse
Affiliation(s)
- Valentina Rudenko
- Center of Bioengineering Research Center of Biotechnology RAS, 119071 Moscow, Russia;
- Correspondence: ; Tel.: +7-926-7248271
| | - Eugene Korotkov
- Center of Bioengineering Research Center of Biotechnology RAS, 119071 Moscow, Russia;
- Moscow Engineering Physics Institute, National Research Nuclear University MEPhI, 115409 Moscow, Russia
| |
Collapse
|
14
|
Accurate contact-based modelling of repeat proteins predicts the structure of new repeats protein families. PLoS Comput Biol 2021; 17:e1008798. [PMID: 33857128 PMCID: PMC8078820 DOI: 10.1371/journal.pcbi.1008798] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2020] [Revised: 04/27/2021] [Accepted: 02/15/2021] [Indexed: 12/18/2022] Open
Abstract
Repeat proteins are abundant in eukaryotic proteomes. They are involved in many eukaryotic specific functions, including signalling. For many of these proteins, the structure is not known, as they are difficult to crystallise. Today, using direct coupling analysis and deep learning it is often possible to predict a protein’s structure. However, the unique sequence features present in repeat proteins have been a challenge to use direct coupling analysis for predicting contacts. Here, we show that deep learning-based methods (trRosetta, DeepMetaPsicov (DMP) and PconsC4) overcomes this problem and can predict intra- and inter-unit contacts in repeat proteins. In a benchmark dataset of 815 repeat proteins, about 90% can be correctly modelled. Further, among 48 PFAM families lacking a protein structure, we produce models of forty-one families with estimated high accuracy. Repeat proteins are widespread among organisms and particularly abundant in eukaryotic proteomes. Their primary sequence presents repetition in the amino acid sequences that origin structures with repeated folds/domains. Although the repeated units often can be recognised from the sequence alone, often structural information is missing. Here, we used contact prediction for predicting the structure of repeats protein directly from their primary sequences. We benchmark the methods on a dataset comprehensive of all the known repeated structures. We evaluate the contact predictions and the obtained models for different classes of repeat proteins. Further, we develop and benchmark a quality assessment (QA) method specific for repeat proteins. Finally, we used the prediction pipeline for all PFAM repeat families without resolved structures and found that forty-one of them could be modelled with high accuracy.
Collapse
|
15
|
Abstract
Cooperativity is a hallmark of protein folding, but the thermodynamic origins of cooperativity are difficult to quantify. Tandem repeat proteins provide a unique experimental system to quantify cooperativity due to their internal symmetry and their tolerance of deletion, extension, and in some cases fragmentation into single repeats. Analysis of repeat proteins of different lengths with nearest-neighbor Ising models provides values for repeat folding ([Formula: see text]) and inter-repeat coupling (ΔGi-1,i). In this article, we review the architecture of repeat proteins and classify them in terms of ΔGi and ΔGi-1,i; this classification scheme groups repeat proteins according to their degree of cooperativity. We then present various statistical thermodynamic models, based on the 1D-Ising model, for analysis of different classes of repeat proteins. We use these models to analyze data for highly and moderately cooperative and noncooperative repeat proteins and relate their fitted parameters to overall structural features.
Collapse
Affiliation(s)
- Mark Petersen
- Program in Molecular Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, USA.,T.C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, USA;
| | - Doug Barrick
- T.C. Jenkins Department of Biophysics, Johns Hopkins University, Baltimore, Maryland 21218, USA;
| |
Collapse
|
16
|
|
17
|
Merski M, Młynarczyk K, Ludwiczak J, Skrzeczkowski J, Dunin-Horkawicz S, Górna MW. Self-analysis of repeat proteins reveals evolutionarily conserved patterns. BMC Bioinformatics 2020; 21:179. [PMID: 32381046 PMCID: PMC7204011 DOI: 10.1186/s12859-020-3493-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 04/15/2020] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Protein repeats can confound sequence analyses because the repetitiveness of their amino acid sequences lead to difficulties in identifying whether similar repeats are due to convergent or divergent evolution. We noted that the patterns derived from traditional "dot plot" protein sequence self-similarity analysis tended to be conserved in sets of related repeat proteins and this conservation could be quantitated using a Jaccard metric. RESULTS Comparison of these dot plots obviated the issues due to sequence similarity for analysis of repeat proteins. A high Jaccard similarity score was suggestive of a conserved relationship between closely related repeat proteins. The dot plot patterns decayed quickly in the absence of selective pressure with an expected loss of 50% of Jaccard similarity due to a loss of 8.2% sequence identity. To perform method testing, we assembled a standard set of 79 repeat proteins representing all the subgroups in RepeatsDB. Comparison of known repeat and non-repeat proteins from the PDB suggested that the information content in dot plots could be used to identify repeat proteins from pure sequence with no requirement for structural information. Analysis of the UniRef90 database suggested that 16.9% of all known proteins could be classified as repeat proteins. These 13.3 million putative repeat protein chains were clustered and a significant amount (82.9%) of clusters containing between 5 and 200 members were of a single functional type. CONCLUSIONS Dot plot analysis of repeat proteins attempts to obviate issues that arise due to the sequence degeneracy of repeat proteins. These results show that this kind of analysis can efficiently be applied to analyze repeat proteins on a large scale.
Collapse
Affiliation(s)
- Matthew Merski
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Krzysztof Młynarczyk
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Jan Ludwiczak
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
- Laboratory of Bioinformatics, Nencki Institute of Experimental Biology, Warsaw, Poland
| | - Jakub Skrzeczkowski
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| | - Stanisław Dunin-Horkawicz
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Maria W. Górna
- Structural Biology Group, Biological and Chemical Research Centre, Department of Chemistry, University of Warsaw, Warsaw, Poland
| |
Collapse
|
18
|
Perez-Riba A, Lowe AR, Main ERG, Itzhaki LS. Context-Dependent Energetics of Loop Extensions in a Family of Tandem-Repeat Proteins. Biophys J 2019; 114:2552-2562. [PMID: 29874606 PMCID: PMC6129472 DOI: 10.1016/j.bpj.2018.03.038] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 02/28/2018] [Accepted: 03/29/2018] [Indexed: 11/16/2022] Open
Abstract
Consensus-designed tetratricopeptide repeat proteins are highly stable, modular proteins that are strikingly amenable to rational engineering. They therefore have tremendous potential as building blocks for biomaterials and biomedicine. Here, we explore the possibility of extending the loops between repeats to enable further diversification, and we investigate how this modification affects stability and folding cooperativity. We find that extending a single loop by up to 25 residues does not disrupt the overall protein structure, but, strikingly, the effect on stability is highly context-dependent: in a two-repeat array, destabilization is relatively small and can be accounted for purely in entropic terms, whereas extending a loop in the middle of a large array is much more costly because of weakening of the interaction between the repeats. Our findings provide important and, to our knowledge, new insights that increase our understanding of the structure, folding, and function of natural repeat proteins and the design of artificial repeat proteins in biotechnology.
Collapse
Affiliation(s)
- Albert Perez-Riba
- Department of Pharmacology, University of Cambridge, Cambridge, United Kingdom
| | - Alan R Lowe
- London Centre for Nanotechnology, London, United Kingdom; Structural & Molecular Biology, University College London, London, United Kingdom; Department of Biological Sciences, Birkbeck College, University of London, London, United Kingdom
| | - Ewan R G Main
- School of Biological and Chemical Sciences, Queen Mary University of London, London, United Kingdom.
| | - Laura S Itzhaki
- Department of Pharmacology, University of Cambridge, Cambridge, United Kingdom.
| |
Collapse
|
19
|
A Graph-Based Approach for Detecting Sequence Homology in Highly Diverged Repeat Protein Families. Methods Mol Biol 2019. [PMID: 30298401 DOI: 10.1007/978-1-4939-8736-8_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Reconstructing evolutionary relationships in repeat proteins is notoriously difficult due to the high degree of sequence divergence that typically occurs between duplicated repeats. This is complicated further by the fact that proteins with a large number of similar repeats are more likely to produce significant local sequence alignments than proteins with fewer copies of the repeat motif. Furthermore, biologically correct sequence alignments are sometimes impossible to achieve in cases where insertion or translocation events disrupt the order of repeats in one of the sequences being aligned. Combined, these attributes make traditional phylogenetic methods for studying protein families unreliable for repeat proteins, due to the dependence of such methods on accurate sequence alignment.We present here a practical solution to this problem, making use of graph clustering combined with the open-source software package HH-suite, which enables highly sensitive detection of sequence relationships. Carrying out multiple rounds of homology searches via alignment of profile hidden Markov models, large sets of related proteins are generated. By representing the relationships between proteins in these sets as graphs, subsequent clustering with the Markov cluster algorithm enables robust detection of repeat protein subfamilies.
Collapse
|
20
|
Aires A, Llarena I, Moller M, Castro‐Smirnov J, Cabanillas‐Gonzalez J, Cortajarena AL. A Simple Approach to Design Proteins for the Sustainable Synthesis of Metal Nanoclusters. Angew Chem Int Ed Engl 2019. [DOI: 10.1002/ange.201813576] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
- Antonio Aires
- CIC biomaGUNE, Parque Tecnológico de San Sebastián Paseo Miramón 182 20014 Donostia-San Sebastián Spain
| | - Irantzu Llarena
- CIC biomaGUNE, Parque Tecnológico de San Sebastián Paseo Miramón 182 20014 Donostia-San Sebastián Spain
| | - Marco Moller
- CIC biomaGUNE, Parque Tecnológico de San Sebastián Paseo Miramón 182 20014 Donostia-San Sebastián Spain
| | | | | | - Aitziber L. Cortajarena
- CIC biomaGUNE, Parque Tecnológico de San Sebastián Paseo Miramón 182 20014 Donostia-San Sebastián Spain
- Ikerbasque, Basque Foundation for Science Ma Díaz de Haro 3 48013 Bilbao Spain
| |
Collapse
|
21
|
Aires A, Llarena I, Moller M, Castro‐Smirnov J, Cabanillas‐Gonzalez J, Cortajarena AL. A Simple Approach to Design Proteins for the Sustainable Synthesis of Metal Nanoclusters. Angew Chem Int Ed Engl 2019; 58:6214-6219. [PMID: 30875448 PMCID: PMC6617723 DOI: 10.1002/anie.201813576] [Citation(s) in RCA: 54] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Revised: 02/19/2019] [Indexed: 12/24/2022]
Abstract
Metal nanoclusters (NCs) are considered ideal nanomaterials for biological applications owing to their strong photoluminescence (PL), excellent photostability, and good biocompatibility. This study presents a simple and versatile strategy to design proteins, via incorporation of a di-histidine cluster coordination site, for the sustainable synthesis and stabilization of metal NCs with different metal composition. The resulting protein-stabilized metal NCs (Prot-NCs) of gold, silver, and copper are highly photoluminescent and photostable, have a long shelf life, and are stable under physiological conditions. The biocompatibility of the clusters was demonstrated in cell cultures in which Prot-NCs showed efficient cell internalization without affecting cell viability or losing luminescence. Moreover, the approach is translatable to other proteins to obtain Prot-NCs for various biomedical applications such as cell imaging or labeling.
Collapse
Affiliation(s)
- Antonio Aires
- CIC biomaGUNE, Parque Tecnológico de San SebastiánPaseo Miramón 18220014Donostia-San SebastiánSpain
| | - Irantzu Llarena
- CIC biomaGUNE, Parque Tecnológico de San SebastiánPaseo Miramón 18220014Donostia-San SebastiánSpain
| | - Marco Moller
- CIC biomaGUNE, Parque Tecnológico de San SebastiánPaseo Miramón 18220014Donostia-San SebastiánSpain
| | | | | | - Aitziber L. Cortajarena
- CIC biomaGUNE, Parque Tecnológico de San SebastiánPaseo Miramón 18220014Donostia-San SebastiánSpain
- Ikerbasque, Basque Foundation for ScienceM Díaz de Haro 348013BilbaoSpain
| |
Collapse
|
22
|
Perez-Riba A, Synakewicz M, Itzhaki LS. Folding cooperativity and allosteric function in the tandem-repeat protein class. Philos Trans R Soc Lond B Biol Sci 2019; 373:rstb.2017.0188. [PMID: 29735741 DOI: 10.1098/rstb.2017.0188] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 01/17/2018] [Indexed: 01/08/2023] Open
Abstract
The term allostery was originally developed to describe structural changes in one binding site induced by the interaction of a partner molecule with a distant binding site, and it has been studied in depth in the field of enzymology. Here, we discuss the concept of action at a distance in relation to the folding and function of the solenoid class of tandem-repeat proteins such as tetratricopeptide repeats (TPRs) and ankyrin repeats. Distantly located repeats fold cooperatively, even though only nearest-neighbour interactions exist in these proteins. A number of repeat-protein scaffolds have been reported to display allosteric effects, transferred through the repeat array, that enable them to direct the activity of the multi-subunit enzymes within which they reside. We also highlight a recently identified group of tandem-repeat proteins, the RRPNN subclass of TPRs, recent crystal structures of which indicate that they function as allosteric switches to modulate multiple bacterial quorum-sensing mechanisms. We believe that the folding cooperativity of tandem-repeat proteins and the biophysical mechanisms that transform them into allosteric switches are intimately intertwined. This opinion piece aims to combine our understanding of the two areas and develop ideas on their common underlying principles.This article is part of a discussion meeting issue 'Allostery and molecular machines'.
Collapse
Affiliation(s)
- Albert Perez-Riba
- Department of Pharmacology, University of Cambridge, Tennis Court Road, Cambridge CB2 1PD, UK
| | - Marie Synakewicz
- Department of Pharmacology, University of Cambridge, Tennis Court Road, Cambridge CB2 1PD, UK
| | - Laura S Itzhaki
- Department of Pharmacology, University of Cambridge, Tennis Court Road, Cambridge CB2 1PD, UK
| |
Collapse
|
23
|
Perez-Riba A, Itzhaki LS. The tetratricopeptide-repeat motif is a versatile platform that enables diverse modes of molecular recognition. Curr Opin Struct Biol 2019; 54:43-49. [PMID: 30708253 DOI: 10.1016/j.sbi.2018.12.004] [Citation(s) in RCA: 99] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2018] [Revised: 12/09/2018] [Accepted: 12/12/2018] [Indexed: 01/05/2023]
Abstract
Tetratricopeptide repeat (TPR) domains and TPR-like domains are widespread across nature. They are involved in varied cellular processes and have been traditionally associated with binding to short linear peptide motifs. However, examples of a much more diverse range of molecular recognition modes are increasing year by year. The Protein Data Bank has an ever-expanding collection of TPR proteins in complex with a myriad of different partners, ranging from short linear peptide motifs to large globular protein domains. In this review, we explore these varied binding modes. Additionally, we hope to highlight an emerging property of this simple, malleable fold-the potential for programmable complexity that can be achieved by acting as a scaffold for multiple binding partners.
Collapse
Affiliation(s)
- Albert Perez-Riba
- Donnelly Centre for Cellular & Biomolecular Research, University of Toronto, Toronto, Canada.
| | - Laura S Itzhaki
- Department of Pharmacology, University of Cambridge, Tennis Court Road, Cambridge, CB2 1PD, UK.
| |
Collapse
|
24
|
Sanchez-deAlcazar D, Mejias SH, Erazo K, Sot B, Cortajarena AL. Self-assembly of repeat proteins: Concepts and design of new interfaces. J Struct Biol 2018; 201:118-129. [DOI: 10.1016/j.jsb.2017.09.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2017] [Revised: 08/09/2017] [Accepted: 09/02/2017] [Indexed: 11/25/2022]
|
25
|
Rämisch S, Pramhed A, Tillgren V, Aspberg A, Logan DT. Crystal structure of human chondroadherin: solving a difficult molecular-replacement problem usingde novomodels. ACTA CRYSTALLOGRAPHICA SECTION D-STRUCTURAL BIOLOGY 2017; 73:53-63. [DOI: 10.1107/s205979831601980x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2016] [Accepted: 12/12/2016] [Indexed: 02/08/2023]
Abstract
Chondroadherin (CHAD) is a cartilage matrix protein that mediates the adhesion of isolated chondrocytes. Its protein core is composed of 11 leucine-rich repeats (LRR) flanked by cysteine-rich domains. CHAD makes important interactions with collagen as well as with cell-surface heparin sulfate proteoglycans and α2β1integrins. The integrin-binding site is located in a region of hitherto unknown structure at the C-terminal end of CHAD. Peptides based on the C-terminal human CHAD (hCHAD) sequence have shown therapeutic potential for treating osteoporosis. This article describes a still-unconventional structure solution by phasing withde novomodels, the first of a β-rich protein. Structure determination of hCHAD using traditional, though nonsystematic, molecular replacement was unsuccessful in the hands of the authors, possibly owing to a combination of low sequence identity to other LRR proteins, four copies in the asymmetric unit and weak translational pseudosymmetry. However, it was possible to solve the structure by generating a large number ofde novomodels for the central LRR domain usingRosettaand multiple parallel molecular-replacement attempts usingAMPLE. The hCHAD structure reveals an ordered C-terminal domain belonging to the LRRCT fold, with the integrin-binding motif (WLEAK) being part of a regular α-helix, and suggests ways in which experimental therapeutic peptides can be improved. The crystal structure itself and docking simulations further support that hCHAD dimers form in a similar manner to other matrix LRR proteins.
Collapse
|
26
|
Abstract
Repeats are ubiquitous elements of proteins and they play important roles for cellular function and during evolution. Repeats are, however, also notoriously difficult to capture computationally and large scale studies so far had difficulties in linking genetic causes, structural properties and evolutionary trajectories of protein repeats. Here we apply recently developed methods for repeat detection and analysis to a large dataset comprising over hundred metazoan genomes. We find that repeats in larger protein families experience generally very few insertions or deletions (indels) of repeat units but there is also a significant fraction of noteworthy volatile outliers with very high indel rates. Analysis of structural data indicates that repeats with an open structure and independently folding units are more volatile and more likely to be intrinsically disordered. Such disordered repeats are also significantly enriched in sites with a high functional potential such as linear motifs. Furthermore, the most volatile repeats have a high sequence similarity between their units. Since many volatile repeats also show signs of recombination, we conclude they are often shaped by concerted evolution. Intriguingly, many of these conserved yet volatile repeats are involved in host-pathogen interactions where they might foster fast but subtle adaptation in biological arms races. KEY WORDS: protein evolution, domain rearrangements, protein repeats, concerted evolution.
Collapse
Affiliation(s)
- Andreas Schüler
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University, Huefferstrasse 1, Muenster, Germany
| | - Erich Bornberg-Bauer
- Institute for Evolution and Biodiversity, Westfalian Wilhelms University, Huefferstrasse 1, Muenster, Germany
| |
Collapse
|
27
|
Biomolecular templating of functional hybrid nanostructures using repeat protein scaffolds. Biochem Soc Trans 2016; 43:825-31. [PMID: 26517889 DOI: 10.1042/bst20150077] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
The precise synthesis of materials and devices with tailored complex structures and properties is a requisite for the development of the next generation of products based on nanotechnology. Nowadays, the technology for the generation of this type of devices lacks the precision to determine their properties and is accomplished mostly by 'trial and error' experimental approaches. The use of bottom-up approaches that rely on highly specific biomolecular interactions of small and simple components is an attractive approach for the templating of nanoscale elements. In nature, protein assemblies define complex structures and functions. Engineering novel bio-inspired assemblies by exploiting the same rules and interactions that encode the natural diversity is an emerging field that opens the door to create nanostructures with numerous potential applications in synthetic biology and nanotechnology. Self-assembly of biological molecules into defined functional structures has a tremendous potential in nano-patterning and the design of novel materials and functional devices. Molecular self-assembly is a process by which complex 3D structures with specified functions are constructed from simple molecular building blocks. Here we discuss the basis of biomolecular templating, the great potential of repeat proteins as building blocks for biomolecular templating and nano-patterning. In particular, we focus on the designed consensus tetratricopeptide repeats (CTPRs), the control on the assembly of these proteins into higher order structures and their potential as building blocks in order to generate functional nanostructures and materials.
Collapse
|
28
|
Bending-Twisting Motions and Main Interactions in Nucleoplasmin Nuclear Import. PLoS One 2016; 11:e0157162. [PMID: 27258022 PMCID: PMC4892583 DOI: 10.1371/journal.pone.0157162] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 05/25/2016] [Indexed: 01/11/2023] Open
Abstract
Alpha solenoid proteins play a key role in regulating the classical nuclear import pathway, recognizing a target protein and transporting it into the nucleus. Importin-α (Impα) is the solenoid responsible for cargo protein recognition, and it has been extensively studied by X-ray crystallography to understand the binding specificity. To comprehend the main motions of Impα and to extend the information about the critical interactions during carrier-cargo recognition, we surveyed different conformational states based on molecular dynamics (MD) and normal mode (NM) analyses. Our model of study was a crystallographic structure of Impα complexed with the classical nuclear localization sequence (cNLS) from nucleoplasmin (Npl), which was submitted to multiple 100 ns of MD simulations. Representative conformations were selected for calculating the 87 lowest frequencies NMs of vibration, and a displacement approach was applied along each NM. Based on geometric criteria, using the radius of curvature and inter-repeat angles as the reference metrics, the main motions of Impα were described. Moreover, we determined the salt bridges, hydrogen bonds and hydrophobic interactions in the Impα-NplNLS interface. Our results show the bending and twisting motions participating in the recognition of nuclear proteins, allowing the accommodation and adjustment of a classical bipartite NLS sequence. The essential contacts for the nuclear import were also described and were mostly in agreement with previous studies, suggesting that the residues in the cNLS linker region establish important contacts with Impα adjusting the cNLS backbone. The MD simulations combined with NM analysis can be applied to the Impα-NLS system to help understand interactions between Impα and cNLSs and the analysis of non-classic NLSs.
Collapse
|
29
|
Identification of repetitive units in protein structures with ReUPred. Amino Acids 2016; 48:1391-400. [PMID: 26898549 DOI: 10.1007/s00726-016-2187-2] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2016] [Accepted: 01/23/2016] [Indexed: 01/02/2023]
Abstract
Over the last decade, numerous studies have demonstrated the fundamental importance of tandem repeat (TR) proteins in many biological processes. A plethora of new repeat structures have also been solved. The recently published RepeatsDB provides information on TR proteins. However, a detailed structural characterization of repetitive elements is largely missing, as repeat unit annotation is manually curated and currently covers only 3 % of the bona fide TR proteins. Repeat Protein Unit Predictor (ReUPred) is a novel method for the fast automatic prediction of repeat units and repeat classification using an extensive Structure Repeat Unit Library (SRUL) derived from RepeatsDB. ReUPred uses an iterative structural search against the SRUL to find repetitive units. On a test set of solenoid proteins, ReUPred is able to correctly detect 92 % of the proteins. Unlike previous methods, it is also able to correctly classify solenoid repeats in 89 % of cases. It also outperforms two recent state-of-the-art methods for the repeat unit identification problem. The accurate prediction of repeat units increases the number of annotated repeat units by an order of magnitude compared to the sequence-based Pfam classification. ReUPred is implemented in Python for Linux and freely available from the URL: http://protein.bio.unipd.it/reupred/ .
Collapse
|
30
|
Kinch LN, Li W, Schaeffer RD, Dunbrack RL, Monastyrskyy B, Kryshtafovych A, Grishin NV. CASP 11 target classification. Proteins 2016; 84 Suppl 1:20-33. [PMID: 26756794 DOI: 10.1002/prot.24982] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2015] [Revised: 12/22/2015] [Accepted: 01/05/2016] [Indexed: 11/09/2022]
Abstract
Protein target structures for the Critical Assessment of Structure Prediction round 11 (CASP11) and CASP ROLL were split into domains and classified into categories suitable for assessment of template-based modeling (TBM) and free modeling (FM) based on their evolutionary relatedness to existing structures classified by the Evolutionary Classification of Protein Domains (ECOD) database. First, target structures were divided into domain-based evaluation units. Target splits were based on the domain organization of available templates as well as the performance of servers on whole targets compared to split target domains. Second, evaluation units were classified into TBM and FM categories using a combination of measures that evaluate prediction quality and template detectability. Generally, target domains with sequence-related templates and good server prediction performance were classified as TBM, whereas targets without sequence-identifiable templates and low server performance were classified as FM. As in previous CASP experiments, the boundaries for classification were blurred due to the presence of significant insertions and deteriorations in the targets with respect to homologous templates, as well as the presence of templates with partial coverage of new folds. The FM category included 45 target domains, which represents an unprecedented number of difficult CASP targets provided for modeling. Proteins 2016; 84(Suppl 1):20-33. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Lisa N Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390-9050.
| | - Wenlin Li
- Department of Biophysics, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390-9050.,Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390-9050
| | - R Dustin Schaeffer
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390-9050
| | - Roland L Dunbrack
- Institute for Cancer Research, 333 Cottman Avenue, Philadelphia, 19111, Pennsylvania Fox Chase Cancer Center
| | - Bohdan Monastyrskyy
- Genome Center, University of California, 451 Health Sciences Drive, Davis, 95616, California
| | - Andriy Kryshtafovych
- Genome Center, University of California, 451 Health Sciences Drive, Davis, 95616, California
| | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390-9050.,Department of Biophysics, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390-9050.,Department of Biochemistry, University of Texas Southwestern Medical Center at Dallas, Dallas, Texas 75390-9050
| |
Collapse
|
31
|
Louros NN, Baltoumas FA, Hamodrakas SJ, Iconomidou VA. A β-solenoid model of the Pmel17 repeat domain: insights to the formation of functional amyloid fibrils. J Comput Aided Mol Des 2016; 30:153-64. [PMID: 26754844 DOI: 10.1007/s10822-015-9892-x] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2015] [Accepted: 12/21/2015] [Indexed: 10/22/2022]
Abstract
Pmel17 is a multidomain protein involved in biosynthesis of melanin. This process is facilitated by the formation of Pmel17 amyloid fibrils that serve as a scaffold, important for pigment deposition in melanosomes. A specific luminal domain of human Pmel17, containing 10 tandem imperfect repeats, designated as repeat domain (RPT), forms amyloid fibrils in a pH-controlled mechanism in vitro and has been proposed to be essential for the formation of the fibrillar matrix. Currently, no three-dimensional structure has been resolved for the RPT domain of Pmel17. Here, we examine the structure of the RPT domain by performing sequence threading. The resulting model was subjected to energy minimization and validated through extensive molecular dynamics simulations. Structural analysis indicated that the RPT model exhibits several distinct properties of β-solenoid structures, which have been proposed to be polymerizing components of amyloid fibrils. The derived model is stabilized by an extensive network of hydrogen bonds generated by stacking of highly conserved polar residues of the RPT domain. Furthermore, the key role of invariant glutamate residues is proposed, supporting a pH-dependent mechanism for RPT domain assembly. Conclusively, our work attempts to provide structural insights into the RPT domain structure and to elucidate its contribution to Pmel17 amyloid fibril formation.
Collapse
Affiliation(s)
- Nikolaos N Louros
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, 157 01, Athens, Greece
| | - Fotis A Baltoumas
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, 157 01, Athens, Greece
| | - Stavros J Hamodrakas
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, 157 01, Athens, Greece
| | - Vassiliki A Iconomidou
- Department of Cell Biology and Biophysics, Faculty of Biology, University of Athens, Panepistimiopolis, 157 01, Athens, Greece.
| |
Collapse
|
32
|
Designed Repeat Proteins as Building Blocks for Nanofabrication. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 940:61-81. [DOI: 10.1007/978-3-319-39196-0_4] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
33
|
Pellegrini M. Tandem Repeats in Proteins: Prediction Algorithms and Biological Role. Front Bioeng Biotechnol 2015; 3:143. [PMID: 26442257 PMCID: PMC4585158 DOI: 10.3389/fbioe.2015.00143] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 09/07/2015] [Indexed: 12/30/2022] Open
Abstract
Tandem repetitions in protein sequence and structure is a fascinating subject of research which has been a focus of study since the late 1990s. In this survey, we give an overview on the multi-faceted aspects of research on protein tandem repeats (PTR for short), including prediction algorithms, databases, early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design. We also touch on the rather open issue of the relationship between PTR and flexibility (or disorder) in proteins. Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological) signal-to-noise ratio that is a key feature of this problem. As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs will have a high impact on the investigations of the biological role of PTR.
Collapse
Affiliation(s)
- Marco Pellegrini
- Laboratory for Integrative Systems Medicine (LISM), Istituto di Informatica e Telematica, and Istituto di Fisiologia Clinica, Consiglio Nazionale delle Ricerche , Pisa , Italy
| |
Collapse
|
34
|
Chakrabarty B, Parekh N. PRIGSA: protein repeat identification by graph spectral analysis. J Bioinform Comput Biol 2015; 12:1442009. [PMID: 25385083 DOI: 10.1142/s0219720014420098] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Repetition of a structural motif within protein is associated with a wide range of structural and functional roles. In most cases the repeating units are well conserved at the structural level while at the sequence level, they are mostly undetectable suggesting the need for structure-based methods. Since most known methods require a training dataset, de novo approach is desirable. Here, we propose an efficient graph-based approach for detecting structural repeats in proteins. In a protein structure represented as a graph, interactions between inter- and intra-repeat units are well captured by the eigen spectra of adjacency matrix of the graph. These conserved interactions give rise to similar connections and a unique profile of the principal eigen spectra for each repeating unit. The efficacy of the approach is shown on eight repeat families annotated in UniProt, comprising of both solenoid and nonsolenoid repeats with varied secondary structure architecture and repeat lengths. The performance of the approach is also tested on other known benchmark datasets and the performance compared with two repeat identification methods. For a known repeat type, the algorithm also identifies the type of repeat present in the protein. A web tool implementing the algorithm is available at the URL http://bioinf.iiit.ac.in/PRIGSA/.
Collapse
Affiliation(s)
- Broto Chakrabarty
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | | |
Collapse
|
35
|
Do Viet P, Roche DB, Kajava AV. TAPO: A combined method for the identification of tandem repeats in protein structures. FEBS Lett 2015; 589:2611-9. [PMID: 26320412 DOI: 10.1016/j.febslet.2015.08.025] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2015] [Revised: 08/10/2015] [Accepted: 08/13/2015] [Indexed: 10/23/2022]
Abstract
In recent years, there has been an emergence of new 3D structures of proteins containing tandem repeats (TRs), as a result of improved expression and crystallization strategies. Databases focused on structure classifications (PDB, SCOP, CATH) do not provide an easy solution for selection of these structures from PDB. Several approaches have been developed, but no best approach exists to identify the whole range of 3D TRs. Here we describe the TAndem PrOtein detector (TAPO) that uses periodicities of atomic coordinates and other types of structural representation, including strings generated by conformational alphabets, residue contact maps, and arrangements of vectors of secondary structure elements. The benchmarking shows the superior performance of TAPO over the existing programs. In accordance with our analysis of PDB using TAPO, 19% of proteins contain 3D TRs. This analysis allowed us to identify new families of 3D TRs, suggesting that TAPO can be used to regularly update the collection and classification of existing repetitive structures.
Collapse
Affiliation(s)
- Phuong Do Viet
- Centre de Recherche de Biochimie Macromoléculaire, UMR 5237 CNRS, Université Montpellier, 1919, Route de Mende, 34293 Montpellier Cedex 5, France; Institut de Biologie Computationnelle, Université Montpellier, Bat. 5, 860, rue St Priest, 34095 Montpellier Cedex 5, France
| | - Daniel B Roche
- Centre de Recherche de Biochimie Macromoléculaire, UMR 5237 CNRS, Université Montpellier, 1919, Route de Mende, 34293 Montpellier Cedex 5, France; Institut de Biologie Computationnelle, Université Montpellier, Bat. 5, 860, rue St Priest, 34095 Montpellier Cedex 5, France
| | - Andrey V Kajava
- Centre de Recherche de Biochimie Macromoléculaire, UMR 5237 CNRS, Université Montpellier, 1919, Route de Mende, 34293 Montpellier Cedex 5, France; Institut de Biologie Computationnelle, Université Montpellier, Bat. 5, 860, rue St Priest, 34095 Montpellier Cedex 5, France.
| |
Collapse
|
36
|
Chakrabarty B, Parekh N. Identifying tandem Ankyrin repeats in protein structures. BMC Bioinformatics 2014; 15:6599. [PMID: 25547411 PMCID: PMC4307672 DOI: 10.1186/s12859-014-0440-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 12/18/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Tandem repetition of structural motifs in proteins is frequently observed across all forms of life. Topology of repeating unit and its frequency of occurrence are associated to a wide range of structural and functional roles in diverse proteins, and defects in repeat proteins have been associated with a number of diseases. It is thus desirable to accurately identify specific repeat type and its copy number. Weak evolutionary constraints on repeat units and insertions/deletions between them make their identification difficult at the sequence level and structure based approaches are desired. The proposed graph spectral approach is based on protein structure represented as a graph for detecting one of the most frequently observed structural repeats, Ankyrin repeat. RESULTS It has been shown in a large number of studies that 3-dimensional topology of a protein structure is well captured by a graph, making it possible to analyze a complex protein structure as a mathematical entity. In this study we show that eigen spectra profile of a protein structure graph exhibits a unique repetitive profile for contiguous repeating units enabling the detection of the repeat region and the repeat type. The proposed approach uses a non-redundant set of 58 Ankyrin proteins to define rules for the detection of Ankyrin repeat motifs. It is evaluated on a set of 370 proteins comprising 125 known Ankyrin proteins and remaining non-solenoid proteins and the prediction compared with UniProt annotation, sequence-based approach, RADAR, and structure-based approach, ConSole. To show the efficacy of the approach, we analyzed the complete PDB structural database and identified 641 previously unrecognized Ankyrin repeat proteins. We observe a unique eigen spectra profile for different repeat types and show that the method can be easily extended to detect other repeat types. It is implemented as a web server, AnkPred. It is freely available at 'bioinf.iiit.ac.in/AnkPred'. CONCLUSIONS AnkPred provides an elegant and computationally efficient graph-based approach for detecting Ankyrin structural repeats in proteins. By analyzing the eigen spectra of the protein structure graph and secondary structure information, characteristic features of a known repeat family are identified. This method is especially useful in correctly identifying new members of a repeat family.
Collapse
Affiliation(s)
- Broto Chakrabarty
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India.
| | - Nita Parekh
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India.
| |
Collapse
|
37
|
Kaushik S, Sowdhamini R. Distribution, classification, domain architectures and evolution of prolyl oligopeptidases in prokaryotic lineages. BMC Genomics 2014; 15:985. [PMID: 25407321 PMCID: PMC4522959 DOI: 10.1186/1471-2164-15-985] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Accepted: 10/09/2014] [Indexed: 11/30/2022] Open
Abstract
Background Prolyl oligopeptidases (POPs) are proteolytic enzymes, widely distributed in all the kingdoms of life. Bacterial POPs are pharmaceutically important enzymes, yet their functional and evolutionary details are not fully explored. Therefore, current analysis is aimed at understanding the distribution, domain architecture, probable biological functions and gene family expansion of POPs in bacterial and archaeal lineages. Results Exhaustive sequence analysis of 1,202 bacterial and 91 archaeal genomes revealed ~3,000 POP homologs, with only 638 annotated POPs. We observed wide distribution of POPs in all the analysed bacterial lineages. Phylogenetic analysis and co-clustering of POPs of different phyla suggested their common functions in all the prokaryotic species. Further, on the basis of unique sequence motifs we could classify bacterial POPs into eight subtypes. Analysis of coexisting domains in POPs highlighted their involvement in protein-protein interactions and cellular signaling. We proposed significant extension of this gene family by characterizing 39 new POPs and 158 new α/β hydrolase members. Conclusions Our study reflects diversity and functional importance of POPs in bacterial species. Many genomes with multiple POPs were identified with high sequence variations and different cellular localizations. Such anomalous distribution of POP genes in different bacterial genomes shows differential expansion of POP gene family primarily by multiple horizontal gene transfer events. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-15-985) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Swati Kaushik
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK Campus, Bellary Road, Bangalore, 560065, India. .,Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, CA, 94158, USA.
| | - Ramanathan Sowdhamini
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, GKVK Campus, Bellary Road, Bangalore, 560065, India.
| |
Collapse
|
38
|
Jost C, Plückthun A. Engineered proteins with desired specificity: DARPins, other alternative scaffolds and bispecific IgGs. Curr Opin Struct Biol 2014; 27:102-12. [PMID: 25033247 DOI: 10.1016/j.sbi.2014.05.011] [Citation(s) in RCA: 85] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2014] [Revised: 05/13/2014] [Accepted: 05/23/2014] [Indexed: 12/22/2022]
Abstract
Specific binding proteins have become essential for diagnostic and therapeutic applications, and traditionally these have been antibodies. Nowadays an increasing number of alternative scaffolds have joined these ranks. These additional folds have raised a lot of interest and expectations within the last decade. It appears that they have come of age and caught up with antibodies in many fields of applications. The last years have seen an exploration of possibilities in research, diagnostics and therapy. Some scaffolds have received further improvements broadening their fields of application, while others have started to occupy their respective niche. Protein engineering, the prerequisite for the advent of all alternative scaffolds, remains the driving force in this process, for both non-immunoglobulins and immunoglobulins alike.
Collapse
Affiliation(s)
- Christian Jost
- Department of Biochemistry, University of Zürich, Winterthurerstr. 190, CH-8057 Zürich, Switzerland
| | - Andreas Plückthun
- Department of Biochemistry, University of Zürich, Winterthurerstr. 190, CH-8057 Zürich, Switzerland.
| |
Collapse
|
39
|
Hrabe T, Godzik A. ConSole: using modularity of contact maps to locate solenoid domains in protein structures. BMC Bioinformatics 2014; 15:119. [PMID: 24766872 PMCID: PMC4021314 DOI: 10.1186/1471-2105-15-119] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2014] [Accepted: 04/17/2014] [Indexed: 11/10/2022] Open
Abstract
Background Periodic proteins, characterized by the presence of multiple repeats of short motifs, form an interesting and seldom-studied group. Due to often extreme divergence in sequence, detection and analysis of such motifs is performed more reliably on the structural level. Yet, few algorithms have been developed for the detection and analysis of structures of periodic proteins. Results ConSole recognizes modularity in protein contact maps, allowing for precise identification of repeats in solenoid protein structures, an important subgroup of periodic proteins. Tests on benchmarks show that ConSole has higher recognition accuracy as compared to Raphael, the only other publicly available solenoid structure detection tool. As a next step of ConSole analysis, we show how detection of solenoid repeats in structures can be used to improve sequence recognition of these motifs and to detect subtle irregularities of repeat lengths in three solenoid protein families. Conclusions The ConSole algorithm provides a fast and accurate tool to recognize solenoid protein structures as a whole and to identify individual solenoid repeat units from a structure. ConSole is available as a web-based, interactive server and is available for download at http://console.sanfordburnham.org.
Collapse
Affiliation(s)
| | - Adam Godzik
- Program in Bioinformatics and Systems Biology, Sanford-Burnham Medical Research Institute, 92037 La Jolla, CA, USA.
| |
Collapse
|
40
|
Jeoung M, Abdelmoti L, Jang ER, Vander Kooi CW, Galperin E. Functional Integration of the Conserved Domains of Shoc2 Scaffold. PLoS One 2013; 8:e66067. [PMID: 23805200 PMCID: PMC3689688 DOI: 10.1371/journal.pone.0066067] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2013] [Accepted: 05/05/2013] [Indexed: 01/25/2023] Open
Abstract
Shoc2 is a positive regulator of signaling to extracellular signal-regulated protein kinases 1 and 2 (ERK1/2). Shoc2 is also proposed to interact with RAS and Raf-1 in order to accelerate ERK1/2 activity. To understand the mechanisms by which Shoc2 regulates ERK1/2 activation by the epidermal growth factor receptor (EGFR), we dissected the role of Shoc2 structural domains in binding to its signaling partners and its role in regulating ERK1/2 activity. Shoc2 is comprised of two main domains: the 21 leucine rich repeats (LRRs) core and the N-terminal non-LRR domain. We demonstrated that the N-terminal domain mediates Shoc2 binding to both M-Ras and Raf-1, while the C-terminal part of Shoc2 contains a late endosomal targeting motif. We found that M-Ras binding to Shoc2 is independent of its GTPase activity. While overexpression of Shoc2 did not change kinetics of ERK1/2 activity, both the N-terminal and the LRR-core domain were able to rescue ERK1/2 activity in cells depleted of Shoc2, suggesting that these Shoc2 domains are involved in modulating ERK1/2 activity.
Collapse
Affiliation(s)
- Myoungkun Jeoung
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, United States of America
| | - Lina Abdelmoti
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, United States of America
| | - Eun Ryoung Jang
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, United States of America
| | - Craig W. Vander Kooi
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, United States of America
- Department of Molecular and Cellular Biochemistry and Center for Structural Biology, University of Kentucky, Lexington, Kentucky, United States of America
| | - Emilia Galperin
- Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, Kentucky, United States of America
- * E-mail:
| |
Collapse
|
41
|
Hoang TX, Trovato A, Seno F, Banavar JR, Maritan A. Sequence repeats and protein structure. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2012; 86:050901. [PMID: 23214731 DOI: 10.1103/physreve.86.050901] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/29/2012] [Indexed: 06/01/2023]
Abstract
Repeats are frequently found in known protein sequences. The level of sequence conservation in tandem repeats correlates with their propensities to be intrinsically disordered. We employ a coarse-grained model of a protein with a two-letter amino acid alphabet, hydrophobic (H) and polar (P), to examine the sequence-structure relationship in the realm of repeated sequences. A fraction of repeated sequences comprises a distinct class of bad folders, whose folding temperatures are much lower than those of random sequences. Imperfection in sequence repetition improves the folding properties of the bad folders while deteriorating those of the good folders. Our results may explain why nature has utilized repeated sequences for their versatility and especially to design functional proteins that are intrinsically unstructured at physiological temperatures.
Collapse
Affiliation(s)
- Trinh X Hoang
- Institute of Physics, Vietnam Academy of Science and Technology, 10 Dao Tan, Hanoi 10000, Vietnam
| | | | | | | | | |
Collapse
|
42
|
Varadamsetty G, Tremmel D, Hansen S, Parmeggiani F, Plückthun A. Designed Armadillo Repeat Proteins: Library Generation, Characterization and Selection of Peptide Binders with High Specificity. J Mol Biol 2012; 424:68-87. [DOI: 10.1016/j.jmb.2012.08.029] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2012] [Revised: 08/06/2012] [Accepted: 08/23/2012] [Indexed: 11/16/2022]
|
43
|
Artificial proteins from combinatorial approaches. Trends Biotechnol 2012; 30:512-20. [DOI: 10.1016/j.tibtech.2012.06.001] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2012] [Revised: 06/01/2012] [Accepted: 06/06/2012] [Indexed: 11/21/2022]
|
44
|
Walsh I, Sirocco FG, Minervini G, Di Domenico T, Ferrari C, Tosatto SCE. RAPHAEL: recognition, periodicity and insertion assignment of solenoid protein structures. ACTA ACUST UNITED AC 2012; 28:3257-64. [PMID: 22962341 DOI: 10.1093/bioinformatics/bts550] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Repeat proteins form a distinct class of structures where folding is greatly simplified. Several classes have been defined, with solenoid repeats of periodicity between ca. 5 and 40 being the most challenging to detect. Such proteins evolve quickly and their periodicity may be rapidly hidden at sequence level. From a structural point of view, finding solenoids may be complicated by the presence of insertions or multiple domains. To the best of our knowledge, no automated methods are available to characterize solenoid repeats from structure. RESULTS Here we introduce RAPHAEL, a novel method for the detection of solenoids in protein structures. It reliably solves three problems of increasing difficulty: (1) recognition of solenoid domains, (2) determination of their periodicity and (3) assignment of insertions. RAPHAEL uses a geometric approach mimicking manual classification, producing several numeric parameters that are optimized for maximum performance. The resulting method is very accurate, with 89.5% of solenoid proteins and 97.2% of non-solenoid proteins correctly classified. RAPHAEL periodicities have a Spearman correlation coefficient of 0.877 against the manually established ones. A baseline algorithm for insertion detection in identified solenoids has a Q(2) value of 79.8%, suggesting room for further improvement. RAPHAEL finds 1931 highly confident repeat structures not previously annotated as solenoids in the Protein Data Bank records.
Collapse
Affiliation(s)
- Ian Walsh
- Department of Biology, University of Padua, Viale G. Colombo 3, 35131 Padova, Italy
| | | | | | | | | | | |
Collapse
|
45
|
Corradin G, Céspedes N, Verdini A, Kajava AV, Arévalo-Herrera M, Herrera S. Malaria vaccine development using synthetic peptides as a technical platform. Adv Immunol 2012; 114:107-49. [PMID: 22449780 DOI: 10.1016/b978-0-12-396548-6.00005-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The review covers the development of synthetic peptides as vaccine candidates for Plasmodium falciparum- and Plasmodium vivax-induced malaria from its beginning up to date and the concomitant progress of solid phase peptide synthesis (SPPS) that enables the production of long peptides in a routine fashion. The review also stresses the development of other complementary tools and actions in order to achieve the long sought goal of an efficacious malaria vaccine.
Collapse
|
46
|
Kajava AV. Tandem repeats in proteins: from sequence to structure. J Struct Biol 2011; 179:279-88. [PMID: 21884799 DOI: 10.1016/j.jsb.2011.08.009] [Citation(s) in RCA: 169] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Revised: 08/15/2011] [Accepted: 08/17/2011] [Indexed: 10/17/2022]
Abstract
The bioinformatics analysis of proteins containing tandem repeats requires special computer programs and databases, since the conventional approaches predominantly developed for globular domains have limited success. Here, I survey bioinformatics tools which have been developed recently for identification and proteome-wide analysis of protein repeats. The last few years have also been marked by an emergence of new 3D structures of these proteins. Appraisal of the known structures and their classification uncovers a straightforward relationship between their architecture and the length of the repetitive units. This relationship and the repetitive character of structural folds suggest rules for better prediction of the 3D structures of such proteins. Furthermore, bioinformatics approaches combined with low resolution structural data, from biophysical techniques, especially, the recently emerged cryo-electron microscopy, lead to reliable prediction of the protein repeat structures and their mode of binding with partners within molecular complexes. This hybrid approach can actively be used for structural and functional annotations of proteomes.
Collapse
Affiliation(s)
- Andrey V Kajava
- Centre de Recherches de Biochimie Macromoléculaire, CNRS, Université Montpellier 1 et 2, 1919 Route de Mende, 34293 Montpellier, Cedex 5, France.
| |
Collapse
|
47
|
Structures and functions of autotransporter proteins in microbial pathogens. Int J Med Microbiol 2011; 301:461-8. [DOI: 10.1016/j.ijmm.2011.03.003] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2011] [Revised: 03/22/2011] [Accepted: 03/27/2011] [Indexed: 12/23/2022] Open
|
48
|
Jorda J, Kajava AV. Protein homorepeats sequences, structures, evolution, and functions. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2011; 79:59-88. [PMID: 20621281 DOI: 10.1016/s1876-1623(10)79002-7] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
The vast majority of protein sequences are aperiodic; they do not have any strong bias in the amino acid composition, and they use a subtle mixture of all or most of the 20 amino acid residues to code a great number of various structures and functions. In this context, homorepeats, runs of a single amino acid residue, represent unusual, eye-catching motifs in proteins. Despite the sequence simplicity and relatively small size, the homorepeat runs have a strong potential for molecular interactions due to the excessively high local concentration of a certain physico-chemical property. Appearance of such runs within proteins may give them new structural and functional features. An increasing number of studies demonstrate the abundance of these motifs in proteins, their important roles in biological processes, and their link to a number of hereditary and age-related diseases. In this chapter, we summarize data on the distribution of homorepeats in proteomes and on their structural properties, evolution, and functions.
Collapse
Affiliation(s)
- Julien Jorda
- Centre de Recherches de Biochimie Macromoléculaire UMR 5237, CNRS, University of Montpellier 1 and 2, Montpellier, France
| | | |
Collapse
|
49
|
Kleinau G, Mueller S, Jaeschke H, Grzesik P, Neumann S, Diehl A, Paschke R, Krause G. Defining structural and functional dimensions of the extracellular thyrotropin receptor region. J Biol Chem 2011; 286:22622-31. [PMID: 21525003 DOI: 10.1074/jbc.m110.211193] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The extracellular region of the thyrotropin receptor (TSHR) can be subdivided into the leucine-rich repeat domain (LRRD) and the hinge region. Both the LRRD and the hinge region interact with thyrotropin (TSH) or autoantibodies. Structural data for the TSHR LRRD were previously determined by crystallization (amino acids Glu(30)-Thr(257), 10 repeats), but the structure of the hinge region is still undefined. Of note, the amino acid sequence (Trp(258)-Tyr(279)) following the crystallized LRRD comprises a pattern typical for leucine-rich repeats with conserved hydrophobic side chains stabilizing the repeat fold. Moreover, functional data for amino acids between the LRRD and the transmembrane domain were fragmentary. We therefore investigated systematically these TSHR regions by mutagenesis to reveal insights into their functional contribution and potential structural features. We found that mutations of conserved hydrophobic residues between Thr(257) and Tyr(279) cause TSHR misfold, which supports a structural fold of this peptide, probably as an additional leucine-rich repeat. Furthermore, we identified several new mutations of hydrophilic amino acids in the entire hinge region leading to partial TSHR inactivation, indicating that these positions are important for intramolecular signal transduction. In summary, we provide new information regarding the structural features and functionalities of extracellular TSHR regions. Based on these insights and in context with previous results, we suggest an extracellular activation mechanism that supports an intramolecular agonistic unit as a central switch for activating effects at the extracellular region toward the serpentine domain.
Collapse
Affiliation(s)
- Gunnar Kleinau
- Department for Structural Biology, Leibniz-Institut für Molekulare Pharmakologie, D-13125 Berlin, Germany
| | | | | | | | | | | | | | | |
Collapse
|
50
|
Urvoas A, Guellouz A, Valerio-Lepiniec M, Graille M, Durand D, Desravines DC, van Tilbeurgh H, Desmadril M, Minard P. Design, Production and Molecular Structure of a New Family of Artificial Alpha-helicoidal Repeat Proteins (αRep) Based on Thermostable HEAT-like Repeats. J Mol Biol 2010; 404:307-27. [DOI: 10.1016/j.jmb.2010.09.048] [Citation(s) in RCA: 66] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2010] [Revised: 09/15/2010] [Accepted: 09/21/2010] [Indexed: 01/07/2023]
|