1
|
Milchevskiy YV, Milchevskaya VY, Nikitin AM, Kravatsky YV. Effective Local and Secondary Protein Structure Prediction by Combining a Neural Network-Based Approach with Extensive Feature Design and Selection without Reliance on Evolutionary Information. Int J Mol Sci 2023; 24:15656. [PMID: 37958639 PMCID: PMC10648199 DOI: 10.3390/ijms242115656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2023] [Revised: 10/24/2023] [Accepted: 10/25/2023] [Indexed: 11/15/2023] Open
Abstract
Protein structure prediction continues to pose multiple challenges despite outstanding progress that is largely attributable to the use of novel machine learning techniques. One of the widely used representations of local 3D structure-protein blocks (PBs)-can be treated in a similar way to secondary structure classes. Here, we present a new approach for predicting local conformation in terms of PB classes solely from amino acid sequences. We apply the RMSD metric to ensure unambiguous future 3D protein structure recovery. The selection of statistically assessed features is a key component of the proposed method. We suggest that ML input features should be created from the statistically significant predictors that are derived from the amino acids' physicochemical properties and the resolved structures' statistics. The statistical significance of the suggested features was assessed using a stepwise regression analysis that permitted the evaluation of the contribution and statistical significance of each predictor. We used the set of 380 statistically significant predictors as a learning model for the regression neural network that was trained using the PISCES30 dataset. When using the same dataset and metrics for benchmarking, our method outperformed all other methods reported in the literature for the CB513 nonredundant dataset (for the PBs, Q16 = 81.01%, and for the DSSP, Q3 = 85.99% and Q8 = 79.35%).
Collapse
Affiliation(s)
- Yury V. Milchevskiy
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia (Y.V.K.)
| | - Vladislava Y. Milchevskaya
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia (Y.V.K.)
- Institute of Medical Statistics and Bioinformatics, University of Cologne, 50931 Cologne, Germany
| | - Alexei M. Nikitin
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia (Y.V.K.)
| | - Yury V. Kravatsky
- Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, Vavilov Str., 32, 119991 Moscow, Russia (Y.V.K.)
- Center for Precision Genome Editing and Genetic Technologies for Biomedicine, Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 119991 Moscow, Russia
| |
Collapse
|
2
|
Cretin G, Galochkina T, de Brevern AG, Gelly JC. PYTHIA: Deep Learning Approach for Local Protein Conformation Prediction. Int J Mol Sci 2021; 22:ijms22168831. [PMID: 34445537 PMCID: PMC8396346 DOI: 10.3390/ijms22168831] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 08/09/2021] [Accepted: 08/10/2021] [Indexed: 02/07/2023] Open
Abstract
Protein Blocks (PBs) are a widely used structural alphabet describing local protein backbone conformation in terms of 16 possible conformational states, adopted by five consecutive amino acids. The representation of complex protein 3D structures as 1D PB sequences was previously successfully applied to protein structure alignment and protein structure prediction. In the current study, we present a new model, PYTHIA (predicting any conformation at high accuracy), for the prediction of the protein local conformations in terms of PBs directly from the amino acid sequence. PYTHIA is based on a deep residual inception-inside-inception neural network with convolutional block attention modules, predicting 1 of 16 PB classes from evolutionary information combined to physicochemical properties of individual amino acids. PYTHIA clearly outperforms the LOCUSTRA reference method for all PB classes and demonstrates great performance for PB prediction on particularly challenging proteins from the CASP14 free modelling category.
Collapse
Affiliation(s)
- Gabriel Cretin
- Biologie Intégrée du Globule Rouge, Université de Paris, UMR_S1134, BIGR, INSERM, 75015 Paris, France; (G.C.); (T.G.); (A.G.d.B.)
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Tatiana Galochkina
- Biologie Intégrée du Globule Rouge, Université de Paris, UMR_S1134, BIGR, INSERM, 75015 Paris, France; (G.C.); (T.G.); (A.G.d.B.)
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Alexandre G. de Brevern
- Biologie Intégrée du Globule Rouge, Université de Paris, UMR_S1134, BIGR, INSERM, 75015 Paris, France; (G.C.); (T.G.); (A.G.d.B.)
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
| | - Jean-Christophe Gelly
- Biologie Intégrée du Globule Rouge, Université de Paris, UMR_S1134, BIGR, INSERM, 75015 Paris, France; (G.C.); (T.G.); (A.G.d.B.)
- Laboratoire d’Excellence GR-Ex, 75015 Paris, France
- Correspondence:
| |
Collapse
|
3
|
Narwani TJ, Craveur P, Shinada NK, Floch A, Santuz H, Vattekatte AM, Srinivasan N, Rebehmed J, Gelly JC, Etchebest C, de Brevern AG. Discrete analyses of protein dynamics. J Biomol Struct Dyn 2019; 38:2988-3002. [PMID: 31361191 DOI: 10.1080/07391102.2019.1650112] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Protein structures are highly dynamic macromolecules. This dynamics is often analysed through experimental and/or computational methods only for an isolated or a limited number of proteins. Here, we explore large-scale protein dynamics simulation to observe dynamics of local protein conformations using different perspectives. We analysed molecular dynamics to investigate protein flexibility locally, using classical approaches such as RMSf, solvent accessibility, but also innovative approaches such as local entropy. First, we focussed on classical secondary structures and analysed specifically how β-strand, β-turns, and bends evolve during molecular simulations. We underlined interesting specific bias between β-turns and bends, which are considered as the same category, while their dynamics show differences. Second, we used a structural alphabet that is able to approximate every part of the protein structures conformations, namely protein blocks (PBs) to analyse (i) how each initial local protein conformations evolve during dynamics and (ii) if some exchange can exist among these PBs. Interestingly, the results are largely complex than simple regular/rigid and coil/flexible exchange. AbbreviationsNeqnumber of equivalentPBProtein BlocksPDBProtein DataBankRMSfroot mean square fluctuationsCommunicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Tarun Jairaj Narwani
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Pierrick Craveur
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Department of Integrative Structural and Computational Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Nicolas K Shinada
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Discngine, SAS, Paris, France
| | - Aline Floch
- Laboratoire D'Excellence GR-Ex, Paris, France.,Etablissement Français du Sang Ile de France, Créteil, France.,IMRB - INSERM U955 Team 2 « Transfusion et Maladies du Globule Rouge », Paris Est- Créteil Univ, Créteil, France.,UPEC, Université Paris Est-Créteil, Créteil, France
| | - Hubert Santuz
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Akhila Melarkode Vattekatte
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France
| | | | - Joseph Rebehmed
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Department of Computer Science and Mathematics, Lebanese American University, Byblos, Lebanon
| | - Jean-Christophe Gelly
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France.,IBL, Paris, France
| | - Catherine Etchebest
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France
| | - Alexandre G de Brevern
- Biologie Intégrée du Globule Rouge UMR_S1134, Inserm, Univ. Paris, Univ. de la Réunion, Univ. des Antilles, Paris, France.,Laboratoire D'Excellence GR-Ex, Paris, France.,Institut National de la Transfusion Sanguine (INTS), Paris, France.,Faculté Des Sciences et Technologies, Saint Denis Messag, La Réunion, France.,IBL, Paris, France
| |
Collapse
|
4
|
Mattei E, Ausiello G, Ferrè F, Helmer-Citterich M. A novel approach to represent and compare RNA secondary structures. Nucleic Acids Res 2014; 42:6146-57. [PMID: 24753415 PMCID: PMC4041456 DOI: 10.1093/nar/gku283] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2013] [Revised: 03/25/2014] [Accepted: 03/26/2014] [Indexed: 12/18/2022] Open
Abstract
Structural information is crucial in ribonucleic acid (RNA) analysis and functional annotation; nevertheless, how to include such structural data is still a debated problem. Dot-bracket notation is the most common and simple representation for RNA secondary structures but its simplicity leads also to ambiguity requiring further processing steps to dissolve. Here we present BEAR (Brand nEw Alphabet for RNA), a new context-aware structural encoding represented by a string of characters. Each character in BEAR encodes for a specific secondary structure element (loop, stem, bulge and internal loop) with specific length. Furthermore, exploiting this informative and yet simple encoding in multiple alignments of related RNAs, we captured how much structural variation is tolerated in RNA families and convert it into transition rates among secondary structure elements. This allowed us to compute a substitution matrix for secondary structure elements called MBR (Matrix of BEAR-encoded RNA secondary structures), of which we tested the ability in aligning RNA secondary structures. We propose BEAR and the MBR as powerful resources for the RNA secondary structure analysis, comparison and classification, motif finding and phylogeny.
Collapse
Affiliation(s)
- Eugenio Mattei
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome 'Tor Vergata', Via della Ricerca Scientifica snc, 00133 Rome, Italy
| | - Gabriele Ausiello
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome 'Tor Vergata', Via della Ricerca Scientifica snc, 00133 Rome, Italy
| | - Fabrizio Ferrè
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome 'Tor Vergata', Via della Ricerca Scientifica snc, 00133 Rome, Italy
| | - Manuela Helmer-Citterich
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome 'Tor Vergata', Via della Ricerca Scientifica snc, 00133 Rome, Italy
| |
Collapse
|
5
|
Joseph AP, de Brevern AG. From local structure to a global framework: recognition of protein folds. J R Soc Interface 2014; 11:20131147. [PMID: 24740960 DOI: 10.1098/rsif.2013.1147] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Protein folding has been a major area of research for many years. Nonetheless, the mechanisms leading to the formation of an active biological fold are still not fully apprehended. The huge amount of available sequence and structural information provides hints to identify the putative fold for a given sequence. Indeed, protein structures prefer a limited number of local backbone conformations, some being characterized by preferences for certain amino acids. These preferences largely depend on the local structural environment. The prediction of local backbone conformations has become an important factor to correctly identifying the global protein fold. Here, we review the developments in the field of local structure prediction and especially their implication in protein fold recognition.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Oxford, , Didcot OX11 0QX, UK
| | | |
Collapse
|
6
|
Joseph AP, Valadié H, Srinivasan N, de Brevern AG. Local structural differences in homologous proteins: specificities in different SCOP classes. PLoS One 2012; 7:e38805. [PMID: 22745680 PMCID: PMC3382195 DOI: 10.1371/journal.pone.0038805] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 05/10/2012] [Indexed: 11/19/2022] Open
Abstract
The constant increase in the number of solved protein structures is of great help in understanding the basic principles behind protein folding and evolution. 3-D structural knowledge is valuable in designing and developing methods for comparison, modelling and prediction of protein structures. These approaches for structure analysis can be directly implicated in studying protein function and for drug design. The backbone of a protein structure favours certain local conformations which include α-helices, β-strands and turns. Libraries of limited number of local conformations (Structural Alphabets) were developed in the past to obtain a useful categorization of backbone conformation. Protein Block (PB) is one such Structural Alphabet that gave a reasonable structure approximation of 0.42 Å. In this study, we use PB description of local structures to analyse conformations that are preferred sites for structural variations and insertions, among group of related folds. This knowledge can be utilized in improving tools for structure comparison that work by analysing local structure similarities. Conformational differences between homologous proteins are known to occur often in the regions comprising turns and loops. Interestingly, these differences are found to have specific preferences depending upon the structural classes of proteins. Such class-specific preferences are mainly seen in the all-β class with changes involving short helical conformations and hairpin turns. A test carried out on a benchmark dataset also indicates that the use of knowledge on the class specific variations can improve the performance of a PB based structure comparison approach. The preference for the indel sites also seem to be confined to a few backbone conformations involving β-turns and helix C-caps. These are mainly associated with short loops joining the regular secondary structures that mediate a reversal in the chain direction. Rare β-turns of type I’ and II’ are also identified as preferred sites for insertions.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMR 665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| | - Hélène Valadié
- INSERM UMR-S 726, DSIMB, Université Paris Diderot - Paris 7, Paris, France
| | | | - Alexandre G. de Brevern
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Paris, France
- Univ Paris Diderot, Sorbonne Paris Cité, UMR 665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
- * E-mail:
| |
Collapse
|
7
|
Joseph AP, Srinivasan N, de Brevern AG. Improvement of protein structure comparison using a structural alphabet. Biochimie 2011; 93:1434-45. [PMID: 21569819 DOI: 10.1016/j.biochi.2011.04.010] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2010] [Accepted: 04/12/2011] [Indexed: 12/29/2022]
Abstract
The three dimensional structure of a protein provides major insights into its function. Protein structure comparison has implications in functional and evolutionary studies. A structural alphabet (SA) is a library of local protein structure prototypes that can abstract every part of protein main chain conformation. Protein Blocks (PBs) is a widely used SA, composed of 16 prototypes, each representing a pentapeptide backbone conformation defined in terms of dihedral angles. Through this description, the 3D structural information can be translated into a 1D sequence of PBs. In a previous study, we have used this approach to compare protein structures encoded in terms of PBs. A classical sequence alignment procedure based on dynamic programming was used, with a dedicated PB Substitution Matrix (SM). PB-based pairwise structural alignment method gave an excellent performance, when compared to other established methods for mining. In this study, we have (i) refined the SMs and (ii) improved the Protein Block Alignment methodology (named as iPBA). The SM was normalized in regards to sequence and structural similarity. Alignment of protein structures often involves similar structural regions separated by dissimilar stretches. A dynamic programming algorithm that weighs these local similar stretches has been designed. Amino acid substitutions scores were also coupled linearly with the PB substitutions. iPBA improves (i) the mining efficiency rate by 6.8% and (ii) more than 82% of the alignments have a better quality. A higher efficiency in aligning multi-domain proteins could be also demonstrated. The quality of alignment is better than DALI and MUSTANG in 81.3% of the cases. Thus our study has resulted in an impressive improvement in the quality of protein structural alignment.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- INSERM UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques, 6, rue Alexandre Cabanel, 75739 Paris Cedex 15, France.
| | | | | |
Collapse
|
8
|
Agarwal G, Mahajan S, Srinivasan N, de Brevern AG. Identification of local conformational similarity in structurally variable regions of homologous proteins using protein blocks. PLoS One 2011; 6:e17826. [PMID: 21445259 PMCID: PMC3060819 DOI: 10.1371/journal.pone.0017826] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2010] [Accepted: 02/15/2011] [Indexed: 11/18/2022] Open
Abstract
Structure comparison tools can be used to align related protein structures to identify structurally conserved and variable regions and to infer functional and evolutionary relationships. While the conserved regions often superimpose well, the variable regions appear non superimposable. Differences in homologous protein structures are thought to be due to evolutionary plasticity to accommodate diverged sequences during evolution. One of the kinds of differences between 3-D structures of homologous proteins is rigid body displacement. A glaring example is not well superimposed equivalent regions of homologous proteins corresponding to α-helical conformation with different spatial orientations. In a rigid body superimposition, these regions would appear variable although they may contain local similarity. Also, due to high spatial deviation in the variable region, one-to-one correspondence at the residue level cannot be determined accurately. Another kind of difference is conformational variability and the most common example is topologically equivalent loops of two homologues but with different conformations. In the current study, we present a refined view of the “structurally variable” regions which may contain local similarity obscured in global alignment of homologous protein structures. As structural alphabet is able to describe local structures of proteins precisely through Protein Blocks approach, conformational similarity has been identified in a substantial number of ‘variable’ regions in a large data set of protein structural alignments; optimal residue-residue equivalences could be achieved on the basis of Protein Blocks which led to improved local alignments. Also, through an example, we have demonstrated how the additional information on local backbone structures through protein blocks can aid in comparative modeling of a loop region. In addition, understanding on sequence-structure relationships can be enhanced through our approach. This has been illustrated through examples where the equivalent regions in homologous protein structures share sequence similarity to varied extent but do not preserve local structure.
Collapse
Affiliation(s)
- Garima Agarwal
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Swapnil Mahajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- National Centre for Biological Sciences, Tata Institute of Fundamental Research, UAS-GKVK Campus, Bangalore, India
| | | | - Alexandre G. de Brevern
- Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), INSERM, U665, Paris, France
- Université Paris Diderot - Paris 7, UMR-S665, Paris, France
- Institut National de la Transfusion Sanguine (INTS), Paris, France
| |
Collapse
|
9
|
Joseph AP, Agarwal G, Mahajan S, Gelly JC, Swapna LS, Offmann B, Cadet F, Bornot A, Tyagi M, Valadié H, Schneider B, Etchebest C, Srinivasan N, De Brevern AG. A short survey on protein blocks. Biophys Rev 2010; 2:137-147. [PMID: 21731588 DOI: 10.1007/s12551-010-0036-1] [Citation(s) in RCA: 97] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Protein structures are classically described in terms of secondary structures. Even if the regular secondary structures have relevant physical meaning, their recognition from atomic coordinates has some important limitations such as uncertainties in the assignment of boundaries of helical and β-strand regions. Further, on an average about 50% of all residues are assigned to an irregular state, i.e., the coil. Thus different research teams have focused on abstracting conformation of protein backbone in the localized short stretches. Using different geometric measures, local stretches in protein structures are clustered in a chosen number of states. A prototype representative of the local structures in each cluster is generally defined. These libraries of local structures prototypes are named as "structural alphabets". We have developed a structural alphabet, named Protein Blocks, not only to approximate the protein structure, but also to predict them from sequence. Since its development, we and other teams have explored numerous new research fields using this structural alphabet. We review here some of the most interesting applications.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- DSIMB, Dynamique des Structures et Interactions des Macromolécules Biologiques Université Paris-Diderot - Paris VII INTS INSERM : U665 INTS, 6 rue Alexandre Cabanel, 75739 Paris Cedex 15 FRANCE,FR
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Smolarek D, Bertrand O, Czerwinski M, Colin Y, Etchebest C, de Brevern AG. Multiple interests in structural models of DARC transmembrane protein. Transfus Clin Biol 2010; 17:184-96. [PMID: 20655787 DOI: 10.1016/j.tracli.2010.05.003] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2010] [Accepted: 05/21/2010] [Indexed: 12/23/2022]
Abstract
Duffy Antigen Receptor for Chemokines (DARC) is an unusual transmembrane chemokine receptor which (i) binds the two main chemokine families and (ii) does not transduct any signal as it lacks the DRY consensus sequence. It is considered as silent chemokine receptor, a tank useful for chemiotactism. DARC had been particularly studied as a major actor of malaria infection by Plasmodium vivax. It is also implicated in multiple chemokine inflammation, inflammatory diseases, in cancer and might play a role in HIV infection and AIDS. In this review, we focus on the interest to build structural model of DARC to understand more precisely its abilities to bind its physiological ligand CXCL8 and its malaria ligand. We also present innovative development on VHHs able to bind DARC protein. We underline difficulties and limitations of such bioinformatics approaches and highlight the crucial importance of biological data to conduct these kinds of researches.
Collapse
Affiliation(s)
- D Smolarek
- Inserm UMR-S 665, dynamique des structures et interactions des macromolecules biologiques (DSIMB), 6, rue Alexandre-Cabanel, 75739 Paris cedex 15, France
| | | | | | | | | | | |
Collapse
|
11
|
Influence of assignment on the prediction of transmembrane helices in protein structures. Amino Acids 2010; 39:1241-54. [DOI: 10.1007/s00726-010-0559-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2009] [Accepted: 03/08/2010] [Indexed: 02/01/2023]
|
12
|
Tyagi M, Bornot A, Offmann B, de Brevern AG. Analysis of loop boundaries using different local structure assignment methods. Protein Sci 2009; 18:1869-81. [PMID: 19606500 DOI: 10.1002/pro.198] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Loops connect regular secondary structures. In many instances, they are known to play important biological roles. Analysis and prediction of loop conformations depend directly on the definition of repetitive structures. Nonetheless, the secondary structure assignment methods (SSAMs) often lead to divergent assignments. In this study, we analyzed, both structure and sequence point of views, how the divergence between different SSAMs affect boundary definitions of loops connecting regular secondary structures. The analysis of SSAMs underlines that no clear consensus between the different SSAMs can be easily found. Because these latter greatly influence the loop boundary definitions, important variations are indeed observed, that is, capping positions are shifted between different SSAMs. On the other hand, our results show that the sequence information in these capping regions are more stable than expected, and, classical and equivalent sequence patterns were found for most of the SSAMs. This is, to our knowledge, the most exhaustive survey in this field as (i) various databank have been used leading to similar results without implication of protein redundancy and (ii) the first time various SSAMs have been used. This work hence gives new insights into the difficult question of assignment of repetitive structures and addresses the issue of loop boundaries definition. Although SSAMs give very different local structure assignments capping sequence patterns remain efficiently stable.
Collapse
Affiliation(s)
- Manoj Tyagi
- Laboratoire de Biochimie et Génétique Moléculaire, Université de La Réunion, BP 7151, 15 avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France
| | | | | | | |
Collapse
|
13
|
Bornot A, Etchebest C, de Brevern AG. A new prediction strategy for long local protein structures using an original description. Proteins 2009; 76:570-87. [PMID: 19241475 DOI: 10.1002/prot.22370] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A relevant and accurate description of three-dimensional (3D) protein structures can be achieved by characterizing recurrent local structures. In a previous study, we developed a library of 120 3D structural prototypes encompassing all known 11-residues long local protein structures and ensuring a good quality of structural approximation. A local structure prediction method was also proposed. Here, overlapping properties of local protein structures in global ones are taken into account to characterize frequent local networks. At the same time, we propose a new long local structure prediction strategy which involves the use of evolutionary information coupled with Support Vector Machines (SVMs). Our prediction is evaluated by a stringent geometrical assessment. Every local structure prediction with a Calpha RMSD less than 2.5 A from the true local structure is considered as correct. A global prediction rate of 63.1% is then reached, corresponding to an improvement of 7.7 points compared with the previous strategy. In the same way, the prediction of 88.33% of the 120 structural classes is improved with 8.65% mean gain. 85.33% of proteins have better prediction results with a 9.43% average gain. An analysis of prediction rate per local network also supports the global improvement and gives insights into the potential of our method for predicting super local structures. Moreover, a confidence index for the direct estimation of prediction quality is proposed. Finally, our method is proved to be very competitive with cutting-edge strategies encompassing three categories of local structure predictions.
Collapse
Affiliation(s)
- Aurélie Bornot
- INSERM UMR-S, Université Paris Diderot, Institut National de la Transfusion Sanguine, France.
| | | | | |
Collapse
|
14
|
Deschavanne P, Tufféry P. Enhanced protein fold recognition using a structural alphabet. Proteins 2009; 76:129-37. [DOI: 10.1002/prot.22324] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
15
|
Tyagi M, Bornot A, Offmann B, de Brevern AG. Protein short loop prediction in terms of a structural alphabet. Comput Biol Chem 2009; 33:329-33. [PMID: 19625218 DOI: 10.1016/j.compbiolchem.2009.06.002] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2008] [Revised: 06/17/2009] [Accepted: 06/17/2009] [Indexed: 11/20/2022]
Abstract
Loops connect regular secondary structures. In many instances, they are known to play crucial biological roles. To bypass the limitation of secondary structure description, we previously defined a structural alphabet composed of 16 structural prototypes, called Protein Blocks (PBs). It leads to an accurate description of every region of 3D protein backbones and has been used in local structure prediction. In the present study, we used our structural alphabet to predict the loops connecting two repetitive structures. Thus, we showed interest to take into account the flanking regions, leading to prediction rate improvement up to 19.8%, but we also underline the sensitivity of such an approach. This research can be used to propose different structures for the loops and to probe and sample their flexibility. It is a useful tool for ab initio loop prediction and leads to insights into flexible docking approach.
Collapse
Affiliation(s)
- Manoj Tyagi
- Laboratoire de Biochimie et Génétique Moléculaire, Université de La Réunion, BP 7151, 15 avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France
| | | | | | | |
Collapse
|
16
|
Faure G, Bornot A, de Brevern AG. Analysis of protein contacts into Protein Units. Biochimie 2009; 91:876-87. [PMID: 19383526 DOI: 10.1016/j.biochi.2009.04.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2008] [Accepted: 04/13/2009] [Indexed: 11/18/2022]
Abstract
Three-dimensional structures of proteins are the support of their biological functions. Their folds are maintained by inter-residue interactions which are one of the main focuses to understand the mechanisms of protein folding and stability. Furthermore, protein structures can be composed of single or multiple functional domains that can fold and function independently. Hence, dividing a protein into domains is useful for obtaining an accurate structure and function determination. In previous studies, we enlightened protein contact properties according to different definitions and developed a novel methodology named Protein Peeling. Within protein structures, Protein Peeling characterizes small successive compact units along the sequence called protein units (PUs). The cutting done by Protein Peeling maximizes the number of contacts within the PUs and minimizes the number of contacts between them. This method is so a relevant tool in the context of the protein folding research and particularly regarding the hierarchical model proposed by George Rose. Here, we accurately analyze the PUs at different levels of cutting, using a non-redundant protein databank. Distribution of PU sizes, number of PUs or their accessibility are screened to determine their common and different features. Moreover, we highlight the preferential amino acid interactions inside and between PUs. Our results show that PUs are clearly an intermediate level between secondary structures and protein structural domains.
Collapse
Affiliation(s)
- Guilhem Faure
- INSERM UMR-S 726, Equipe de Bioinformatique Génomique et Moléculaire (EBGM), DSIMB, Université Paris Diderot - Paris 7, case 7113, 2 place Jussieu, 75251 Paris, France
| | | | | |
Collapse
|
17
|
Benros C, de Brevern AG, Hazout S. Analyzing the sequence–structure relationship of a library of local structural prototypes. J Theor Biol 2009; 256:215-26. [DOI: 10.1016/j.jtbi.2008.08.032] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2008] [Revised: 08/23/2008] [Accepted: 08/31/2008] [Indexed: 10/21/2022]
|
18
|
Faure G, Bornot A, de Brevern AG. Protein contacts, inter-residue interactions and side-chain modelling. Biochimie 2008; 90:626-39. [DOI: 10.1016/j.biochi.2007.11.007] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2007] [Accepted: 11/22/2007] [Indexed: 10/22/2022]
|
19
|
Lo WC, Huang PJ, Chang CH, Lyu PC. Protein structural similarity search by Ramachandran codes. BMC Bioinformatics 2007; 8:307. [PMID: 17716377 PMCID: PMC2194796 DOI: 10.1186/1471-2105-8-307] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2007] [Accepted: 08/23/2007] [Indexed: 11/13/2022] Open
Abstract
Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.
Collapse
Affiliation(s)
- Wei-Cheng Lo
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, 101, Section 2 Kuang Fu Road, Hsinchu 30013, Taiwan
| | - Po-Jung Huang
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, 101, Section 2 Kuang Fu Road, Hsinchu 30013, Taiwan
| | - Chih-Hung Chang
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, 101, Section 2 Kuang Fu Road, Hsinchu 30013, Taiwan
| | - Ping-Chiang Lyu
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, 101, Section 2 Kuang Fu Road, Hsinchu 30013, Taiwan
| |
Collapse
|
20
|
Etchebest C, Benros C, Bornot A, Camproux AC, de Brevern AG. A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. EUROPEAN BIOPHYSICS JOURNAL: EBJ 2007; 36:1059-69. [PMID: 17565494 DOI: 10.1007/s00249-007-0188-5] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2007] [Revised: 05/05/2007] [Accepted: 05/07/2007] [Indexed: 10/23/2022]
Abstract
Protein sequence world is considerably larger than structure world. In consequence, numerous non-related sequences may adopt similar 3D folds and different kinds of amino acids may thus be found in similar 3D structures. By grouping together the 20 amino acids into a smaller number of representative residues with similar features, sequence world simplification may be achieved. This clustering hence defines a reduced amino acid alphabet (reduced AAA). Numerous works have shown that protein 3D structures are composed of a limited number of building blocks, defining a structural alphabet. We previously identified such an alphabet composed of 16 representative structural motifs (5-residues length) called Protein Blocks (PBs). This alphabet permits to translate the structure (3D) in sequence of PBs (1D). Based on these two concepts, reduced AAA and PBs, we analyzed the distributions of the different kinds of amino acids and their equivalences in the structural context. Different reduced sets were considered. Recurrent amino acid associations were found in all the local structures while other were specific of some local structures (PBs) (e.g Cysteine, Histidine, Threonine and Serine for the alpha-helix Ncap). Some similar associations are found in other reduced AAAs, e.g Ile with Val, or hydrophobic aromatic residues Trp with Phe and Tyr. We put into evidence interesting alternative associations. This highlights the dependence on the information considered (sequence or structure). This approach, equivalent to a substitution matrix, could be useful for designing protein sequence with different features (for instance adaptation to environment) while preserving mainly the 3D fold.
Collapse
Affiliation(s)
- C Etchebest
- Equipe de Bioinformatique Génomique et Moléculaire (EBGM), INSERM UMR-S 726, Université Denis DIDEROT, Paris 7, case 7113, 2, place Jussieu, 75251, Paris, France
| | | | | | | | | |
Collapse
|