1
|
Abbass J, Parisi C. Machine learning-based prediction of proteins' architecture using sequences of amino acids and structural alphabets. J Biomol Struct Dyn 2024:1-16. [PMID: 38505995 DOI: 10.1080/07391102.2024.2328736] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2023] [Accepted: 03/05/2024] [Indexed: 03/21/2024]
Abstract
In addition to the growth of protein structures generated through wet laboratory experiments and deposited in the PDB repository, AlphaFold predictions have significantly contributed to the creation of a much larger database of protein structures. Annotating such a vast number of structures has become an increasingly challenging task. CATH is widely recognized as one the most common platforms for addressing this challenge, as it classifies proteins based on their structural and evolutionary relationships, offering the scientific community an invaluable resource for uncovering various properties, including functional annotations. While CATH annotation involves - to some extent - human intervention, keeping up with the classification of the rapidly expanding repositories of protein structures has become exceedingly difficult. Therefore, there is a pressing need for a fully automated approach. On the other hand, the abundance of protein sequences stemming from next generation sequencing technologies, lacking structural annotations, presents an additional challenge to the scientific community. Consequently, 'pre-annotating' protein sequences with structural features, ensuring a high level of precision, could prove highly advantageous. In this paper, after a thorough investigation, we introduce a novel machine-learning model capable of classifying any protein domain, whether it has a known structure or not, into one of the 40 main CATH Architectures. We achieve an F1 Score of 0.92 using only the amino acid sequence and a score of 0.94 using both the sequence of amino acids and the sequence of structural alphabets.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Jad Abbass
- School of Computer Science and Mathematics, Kingston University, London, UK
| | - Charles Parisi
- School of Computer Science and Mathematics, Kingston University, London, UK
- Telecom Physique Strasbourg, Strasbourg University, Strasbourg, France
| |
Collapse
|
2
|
Biological Characterization of Natural Peptide BcI-1003 from Boana cordobae (anura): Role in Alzheimer’s Disease and Microbial Infections. Int J Pept Res Ther 2022. [DOI: 10.1007/s10989-022-10472-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
3
|
Konagurthu AS, Subramanian R, Allison L, Abramson D, Stuckey PJ, Garcia de la Banda M, Lesk AM. Universal Architectural Concepts Underlying Protein Folding Patterns. Front Mol Biosci 2021; 7:612920. [PMID: 33996891 PMCID: PMC8120156 DOI: 10.3389/fmolb.2020.612920] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 12/16/2020] [Indexed: 11/17/2022] Open
Abstract
What is the architectural “basis set” of the observed universe of protein structures? Using information-theoretic inference, we answer this question with a dictionary of 1,493 substructures—called concepts—typically at a subdomain level, based on an unbiased subset of known protein structures. Each concept represents a topologically conserved assembly of helices and strands that make contact. Any protein structure can be dissected into instances of concepts from this dictionary. We dissected the Protein Data Bank and completely inventoried all the concept instances. This yields many insights, including correlations between concepts and catalytic activities or binding sites, useful for rational drug design; local amino-acid sequence–structure correlations, useful for ab initio structure prediction methods; and information supporting the recognition and exploration of evolutionary relationships, useful for structural studies. An interactive site, Proçodic, at http://lcb.infotech.monash.edu.au/prosodic (click), provides access to and navigation of the entire dictionary of concepts and their usages, and all associated information. This report is part of a continuing programme with the goal of elucidating fundamental principles of protein architecture, in the spirit of the work of Cyrus Chothia.
Collapse
Affiliation(s)
- Arun S Konagurthu
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Ramanan Subramanian
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Lloyd Allison
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - David Abramson
- Research Computing Center, University of Queensland, Brisbane, QLD, Australia
| | - Peter J Stuckey
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC, Australia.,School of Computing and Information Systems, University of Melbourne, Melbourne, VIC, Australia
| | - Maria Garcia de la Banda
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC, Australia
| | - Arthur M Lesk
- Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA, United States.,MRC Laboratory of Molecular Biology, Cambridge, United Kingdom
| |
Collapse
|
4
|
Siano A, Humpola MV, de Oliveira E, Albericio F, Simonetta AC, Lajmanovich R, Tonarelli GG. Leptodactylus latrans Amphibian Skin Secretions as a Novel Source for the Isolation of Antibacterial Peptides. Molecules 2018; 23:molecules23112943. [PMID: 30423858 PMCID: PMC6278411 DOI: 10.3390/molecules23112943] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2018] [Revised: 11/06/2018] [Accepted: 11/09/2018] [Indexed: 12/19/2022] Open
Abstract
Amphibians´ skin produces a diverse array of antimicrobial peptides that play a crucial role as the first line of defense against microbial invasion. Despite the immense richness of wild amphibians in Argentina, current knowledge about the presence of peptides with antimicrobial properties is limited to a only few species. Here we used LC-MS-MS to identify antimicrobial peptides with masses ranging from 1000 to 4000 Da from samples of skin secretions of Leptodactylus latrans (Anura: Leptodactylidae). Three novel amino acid sequences were selected for chemical synthesis and further studies. The three synthetic peptides, named P1-Ll-1577, P2-Ll-1298, and P3-Ll-2085, inhibited the growth of two ATCC strains, namely Escherichia coli and Staphylococcus aureus. P3-Ll-2085 was the most active peptide. In the presence of trifluoroethanol (TFE) and anionic liposomes, it adopted an amphipathic α-helical structure. P2-Ll-1298 showed slightly lower activity than P3-Ll-2085. Comparison of the MIC values of these two peptides revealed that the addition of seven amino acid residues (GLLDFLK) on the N-terminal of P2-Ll-1298 significantly improved activity against both strains. P1-Ll-1577, which remarkably is an anionic peptide, showed interesting antimicrobial activity against E. coli and S. aureus strain, showing marked membrane selectivity and non-hemolysis. Due to this, P1-L1-1577 emerges as a potential candidate for the development of new antibacterial drugs.
Collapse
Affiliation(s)
- Alvaro Siano
- Departamento de Química Orgánica, Facultad de Bioquímica y Cs. Biológicas (FBCB), Universidad Nacional del Litoral (UNL), Ciudad Universitaria, 3000 Santa Fe, Argentina.
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), 1825 Buenos Aires, Argentina.
| | - Maria Veronica Humpola
- Departamento de Química Orgánica, Facultad de Bioquímica y Cs. Biológicas (FBCB), Universidad Nacional del Litoral (UNL), Ciudad Universitaria, 3000 Santa Fe, Argentina.
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), 1825 Buenos Aires, Argentina.
| | - Eliandre de Oliveira
- Proteomics Platform, Barcelona Science Park, Baldiri Reixac 10, 08028 Barcelona, Spain.
| | - Fernando Albericio
- CIBER-BBN, Networking Centre on Bioengineering, Biomaterials and Nanomedicine, Barcelona Science Park, Baldiri Reixac 10, 08028 Barcelona, Spain;.
- Department of Organic Chemistry, University of Barcelona, 08028 Barcelona, Spain.
- School of Chemistry and Physics, University of KwaZulu-Natal, 4000 Durban, South Africa.
| | - Arturo C Simonetta
- Cátedras de Microbiología y Biotecnología, Departamento de Ingeniería en Alimentos, Facultad de Ingeniería Química, U.N.L. Santiago del Estero 2829, 3000 Santa Fe, Argentina.
| | - Rafael Lajmanovich
- Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), 1825 Buenos Aires, Argentina.
- Cátedra de Ecotoxicología, Escuela Superior de Sanidad. FBCB, U.N.L. Ciudad Universitaria, 3000 Santa Fe, Argentina.
| | - Georgina G Tonarelli
- Departamento de Química Orgánica, Facultad de Bioquímica y Cs. Biológicas (FBCB), Universidad Nacional del Litoral (UNL), Ciudad Universitaria, 3000 Santa Fe, Argentina.
| |
Collapse
|
5
|
SAFlex: A structural alphabet extension to integrate protein structural flexibility and missing data information. PLoS One 2018; 13:e0198854. [PMID: 29975698 PMCID: PMC6033379 DOI: 10.1371/journal.pone.0198854] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2017] [Accepted: 05/25/2018] [Indexed: 11/19/2022] Open
Abstract
In this paper, we describe SAFlex (Structural Alphabet Flexibility), an extension of an existing structural alphabet (HMM-SA), to better explore increasing protein three dimensional structure information by encoding conformations of proteins in case of missing residues or uncertainties. An SA aims to reduce three dimensional conformations of proteins as well as their analysis and comparison complexity by simplifying any conformation in a series of structural letters. Our methodology presents several novelties. Firstly, it can account for the encoding uncertainty by providing a wide range of encoding options: the maximum a posteriori, the marginal posterior distribution, and the effective number of letters at each given position. Secondly, our new algorithm deals with the missing data in the protein structure files (concerning more than 75% of the proteins from the Protein Data Bank) in a rigorous probabilistic framework. Thirdly, SAFlex is able to encode and to build a consensus encoding from different replicates of a single protein such as several homomer chains. This allows localizing structural differences between different chains and detecting structural variability, which is essential for protein flexibility identification. These improvements are illustrated on different proteins, such as the crystal structure of an eukaryotic small heat shock protein. They are promising to explore increasing protein redundancy data and obtain useful quantification of their flexibility.
Collapse
|
6
|
Regad L, Chéron JB, Triki D, Senac C, Flatters D, Camproux AC. Exploring the potential of a structural alphabet-based tool for mining multiple target conformations and target flexibility insight. PLoS One 2017; 12:e0182972. [PMID: 28817602 PMCID: PMC5560695 DOI: 10.1371/journal.pone.0182972] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2017] [Accepted: 07/27/2017] [Indexed: 11/18/2022] Open
Abstract
Protein flexibility is often implied in binding with different partners and is essential for protein function. The growing number of macromolecular structures in the Protein Data Bank entries and their redundancy has become a major source of structural knowledge of the protein universe. The analysis of structural variability through available redundant structures of a target, called multiple target conformations (MTC), obtained using experimental or modeling methods and under different biological conditions or different sources is one way to explore protein flexibility. This analysis is essential to improve the understanding of various mechanisms associated with protein target function and flexibility. In this study, we explored structural variability of three biological targets by analyzing different MTC sets associated with these targets. To facilitate the study of these MTC sets, we have developed an efficient tool, SA-conf, dedicated to capturing and linking the amino acid and local structure variability and analyzing the target structural variability space. The advantage of SA-conf is that it could be applied to divers sets composed of MTCs available in the PDB obtained using NMR and crystallography or homology models. This tool could also be applied to analyze MTC sets obtained by dynamics approaches. Our results showed that SA-conf tool is effective to quantify the structural variability of a MTC set and to localize the structural variable positions and regions of the target. By selecting adapted MTC subsets and comparing their variability detected by SA-conf, we highlighted different sources of target flexibility such as induced by binding partner, by mutation and intrinsic flexibility. Our results support the interest to mine available structures associated with a target using to offer valuable insight into target flexibility and interaction mechanisms. The SA-conf executable script, with a set of pre-compiled binaries are available at http://www.mti.univ-paris-diderot.fr/recherche/plateformes/logiciels.
Collapse
Affiliation(s)
- Leslie Regad
- Molécules thérapeutiques in silico (MTi), INSERM UMR-S973, Paris, France
- Université Paris Diderot, Sorbonne Paris Cité, Paris, France
- * E-mail: anne-claude.camproux@univ-paris-diderot (ACC); (LR)
| | - Jean-Baptiste Chéron
- Molécules thérapeutiques in silico (MTi), INSERM UMR-S973, Paris, France
- Université Paris Diderot, Sorbonne Paris Cité, Paris, France
- Institut de Chimie de Nice, UMR-CNRS 7272, Faculté des Sciences, Université de Nice-Sophia Antipolis, Nice, France
| | - Dhoha Triki
- Molécules thérapeutiques in silico (MTi), INSERM UMR-S973, Paris, France
- Université Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Caroline Senac
- Molécules thérapeutiques in silico (MTi), INSERM UMR-S973, Paris, France
- Université Paris Diderot, Sorbonne Paris Cité, Paris, France
- Sorbonne Universités, UPMC Univ Paris 06, CNRS, INSERM, Laboratoire d'Imagerie Biomédicale (LIB), Paris, France
| | - Delphine Flatters
- Molécules thérapeutiques in silico (MTi), INSERM UMR-S973, Paris, France
- Université Paris Diderot, Sorbonne Paris Cité, Paris, France
| | - Anne-Claude Camproux
- Molécules thérapeutiques in silico (MTi), INSERM UMR-S973, Paris, France
- Université Paris Diderot, Sorbonne Paris Cité, Paris, France
- * E-mail: anne-claude.camproux@univ-paris-diderot (ACC); (LR)
| |
Collapse
|
7
|
Characterization and Prediction of Protein Flexibility Based on Structural Alphabets. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4628025. [PMID: 27660756 PMCID: PMC5021887 DOI: 10.1155/2016/4628025] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/13/2016] [Accepted: 08/02/2016] [Indexed: 11/25/2022]
Abstract
Motivation. To assist efforts in determining and exploring the functional properties of proteins, it is desirable to characterize and predict protein flexibilities. Results. In this study, the conformational entropy is used as an indicator of the protein flexibility. We first explore whether the conformational change can capture the protein flexibility. The well-defined decoy structures are converted into one-dimensional series of letters from a structural alphabet. Four different structure alphabets, including the secondary structure in 3-class and 8-class, the PB structure alphabet (16-letter), and the DW structure alphabet (28-letter), are investigated. The conformational entropy is then calculated from the structure alphabet letters. Some of the proteins show high correlation between the conformation entropy and the protein flexibility. We then predict the protein flexibility from basic amino acid sequence. The local structures are predicted by the dual-layer model and the conformational entropy of the predicted class distribution is then calculated. The results show that the conformational entropy is a good indicator of the protein flexibility, but false positives remain a problem. The DW structure alphabet performs the best, which means that more subtle local structures can be captured by large number of structure alphabet letters. Overall this study provides a simple and efficient method for the characterization and prediction of the protein flexibility.
Collapse
|
8
|
Lamiable A, Thevenet P, Tufféry P. A critical assessment of hidden markov model sub-optimal sampling strategies applied to the generation of peptide 3D models. J Comput Chem 2016; 37:2006-16. [PMID: 27317417 DOI: 10.1002/jcc.24422] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2016] [Revised: 05/03/2016] [Accepted: 05/17/2016] [Indexed: 12/23/2022]
Abstract
Hidden Markov Model derived structural alphabets are a probabilistic framework in which the complete conformational space of a peptidic chain is described in terms of probability distributions that can be sampled to identify conformations of largest probabilities. Here, we assess how three strategies to sample sub-optimal conformations-Viterbi k-best, forward backtrack and a taboo sampling approach-can lead to the efficient generation of peptide conformations. We show that the diversity of sampling is essential to compensate biases introduced in the estimates of the probabilities, and we find that only the forward backtrack and a taboo sampling strategies can efficiently generate native or near-native models. Finally, we also find such approaches are as efficient as former protocols, while being one order of magnitude faster, opening the door to the large scale de novo modeling of peptides and mini-proteins. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- A Lamiable
- INSERM UMR-S 973, Université Paris Diderot, Sorbonne Paris Cité
| | - P Thevenet
- INSERM UMR-S 973, Université Paris Diderot, Sorbonne Paris Cité
| | - P Tufféry
- INSERM UMR-S 973, Université Paris Diderot, Sorbonne Paris Cité
| |
Collapse
|
9
|
Craveur P, Joseph AP, Esque J, Narwani TJ, Noël F, Shinada N, Goguet M, Leonard S, Poulain P, Bertrand O, Faure G, Rebehmed J, Ghozlane A, Swapna LS, Bhaskara RM, Barnoud J, Téletchéa S, Jallu V, Cerny J, Schneider B, Etchebest C, Srinivasan N, Gelly JC, de Brevern AG. Protein flexibility in the light of structural alphabets. Front Mol Biosci 2015; 2:20. [PMID: 26075209 PMCID: PMC4445325 DOI: 10.3389/fmolb.2015.00020] [Citation(s) in RCA: 59] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2015] [Accepted: 04/30/2015] [Indexed: 01/01/2023] Open
Abstract
Protein structures are valuable tools to understand protein function. Nonetheless, proteins are often considered as rigid macromolecules while their structures exhibit specific flexibility, which is essential to complete their functions. Analyses of protein structures and dynamics are often performed with a simplified three-state description, i.e., the classical secondary structures. More precise and complete description of protein backbone conformation can be obtained using libraries of small protein fragments that are able to approximate every part of protein structures. These libraries, called structural alphabets (SAs), have been widely used in structure analysis field, from definition of ligand binding sites to superimposition of protein structures. SAs are also well suited to analyze the dynamics of protein structures. Here, we review innovative approaches that investigate protein flexibility based on SAs description. Coupled to various sources of experimental data (e.g., B-factor) and computational methodology (e.g., Molecular Dynamic simulation), SAs turn out to be powerful tools to analyze protein dynamics, e.g., to examine allosteric mechanisms in large set of structures in complexes, to identify order/disorder transition. SAs were also shown to be quite efficient to predict protein flexibility from amino-acid sequence. Finally, in this review, we exemplify the interest of SAs for studying flexibility with different cases of proteins implicated in pathologies and diseases.
Collapse
Affiliation(s)
- Pierrick Craveur
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | - Agnel P Joseph
- Rutherford Appleton Laboratory, Science and Technology Facilities Council Didcot, UK
| | - Jeremy Esque
- Institut National de la Santé et de la Recherche Médicale U964,7 UMR Centre National de la Recherche Scientifique 7104, IGBMC, Université de Strasbourg Illkirch, France
| | - Tarun J Narwani
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | - Floriane Noël
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | - Nicolas Shinada
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | - Matthieu Goguet
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | - Sylvain Leonard
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | - Pierre Poulain
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France ; Ets Poulain Pointe-Noire, Congo
| | - Olivier Bertrand
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | - Guilhem Faure
- National Library of Medicine, National Center for Biotechnology Information, National Institutes of Health Bethesda, MD, USA
| | - Joseph Rebehmed
- Centre National de la Recherche Scientifique UMR7590, Sorbonne Universités, Université Pierre et Marie Curie - MNHN - IRD - IUC Paris, France
| | | | - Lakshmipuram S Swapna
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore Bangalore, India ; Hospital for Sick Children, and Departments of Biochemistry and Molecular Genetics, University of Toronto Toronto, ON, Canada
| | - Ramachandra M Bhaskara
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore Bangalore, India ; Department of Theoretical Biophysics, Max Planck Institute of Biophysics Frankfurt, Germany
| | - Jonathan Barnoud
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France ; Laboratoire de Physique, École Normale Supérieure de Lyon, Université de Lyon, Centre National de la Recherche Scientifique UMR 5672 Lyon, France
| | - Stéphane Téletchéa
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France ; Faculté des Sciences et Techniques, Université de Nantes, Unité Fonctionnalité et Ingénierie des Protéines, Centre National de la Recherche Scientifique UMR 6286, Université Nantes Nantes, France
| | - Vincent Jallu
- Platelet Unit, Institut National de la Transfusion Sanguine Paris, France
| | - Jiri Cerny
- Institute of Biotechnology, The Czech Academy of Sciences Prague, Czech Republic
| | - Bohdan Schneider
- Institute of Biotechnology, The Czech Academy of Sciences Prague, Czech Republic
| | - Catherine Etchebest
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | | | - Jean-Christophe Gelly
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| | - Alexandre G de Brevern
- Institut National de la Santé et de la Recherche Médicale U 1134 Paris, France ; UMR_S 1134, DSIMB, Université Paris Diderot, Sorbonne Paris Cite Paris, France ; Institut National de la Transfusion Sanguine, DSIMB Paris, France ; UMR_S 1134, DSIMB, Laboratory of Excellence GR-Ex Paris, France
| |
Collapse
|
10
|
Siano A, Húmpola MV, de Oliveira E, Albericio F, Simonetta AC, Lajmanovich R, Tonarelli GG. Antimicrobial peptides from skin secretions of Hypsiboas pulchellus (Anura: Hylidae). JOURNAL OF NATURAL PRODUCTS 2014; 77:831-841. [PMID: 24717080 DOI: 10.1021/np4009317] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The skin of many amphibians produces a large repertoire of antimicrobial peptides that are crucial in the first line of defense against microbial invasion. Despite the immense richness of wild amphibians in Argentina, knowledge about peptides with antimicrobial properties is limited to a few species. Here we used LC-MS-MS to analyze samples of Hypsiboas pulchellus skin with the aim to identify antimicrobial peptides in the mass range of 1000 to 2000 Da. Twenty-three novel sequences were identified by MS, three of which were selected for chemical synthesis and further studies. The three synthetic peptides, named P1-Hp-1971, P2-Hp-1935, and P3-Hp-1891, inhibited the growth of two ATCC strains: Escherichia coli (MIC: 16, 33, and 17 μM, respectively) and Staphylococcus aureus (MIC: 8, 66, and 17 μM, respectively). P1-Hp-1971 and P3-Hp-1891 were the most active peptides. P1-Hp-1971, which showed the highest therapeutic indices (40 for E. coli and 80 for S. aureus), is a proline-glycine-rich peptide with a highly unordered structure, while P3-Hp-1891 adopts an amphipathic α-helical structure in the presence of 2,2,2-trifluoroethanol and anionic liposomes. This is the first peptidomic study of Hypsiboas pulchellus skin secretions to allow the identification of antimicrobial peptides.
Collapse
Affiliation(s)
- Alvaro Siano
- Departamento de Química Orgánica, Facultad de Bioquímica y Cs. Biológicas (FBCB), Universidad Nacional del Litoral (UNL) , Ciudad Universitaria, 3000, Santa Fe, Argentina
| | | | | | | | | | | | | |
Collapse
|
11
|
Ahmed MH, Kellogg GE, Selley DE, Safo MK, Zhang Y. Predicting the molecular interactions of CRIP1a-cannabinoid 1 receptor with integrated molecular modeling approaches. Bioorg Med Chem Lett 2014; 24:1158-65. [PMID: 24461351 PMCID: PMC4353595 DOI: 10.1016/j.bmcl.2013.12.119] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2013] [Revised: 12/26/2013] [Accepted: 12/29/2013] [Indexed: 12/14/2022]
Abstract
Cannabinoid receptors are a family of G-protein coupled receptors that are involved in a wide variety of physiological processes and diseases. One of the key regulators that are unique to cannabinoid receptors is the cannabinoid receptor interacting proteins (CRIPs). Among them CRIP1a was found to decrease the constitutive activity of the cannabinoid type-1 receptor (CB1R). The aim of this study is to gain an understanding of the interaction between CRIP1a and CB1R through using different computational techniques. The generated model demonstrated several key putative interactions between CRIP1a and CB1R, including the critical involvement of Lys130 in CRIP1a.
Collapse
Affiliation(s)
- Mostafa H Ahmed
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA 23298, USA; Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Glen E Kellogg
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA 23298, USA; Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University, Richmond, VA 23298, USA; Center for the Study of Biological Complexity, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Dana E Selley
- Department of Pharmacology and Toxicology, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Martin K Safo
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA 23298, USA; Institute for Structural Biology and Drug Discovery, Virginia Commonwealth University, Richmond, VA 23298, USA
| | - Yan Zhang
- Department of Medicinal Chemistry, Virginia Commonwealth University, Richmond, VA 23298, USA.
| |
Collapse
|
12
|
Ma J, Wang S. Algorithms, Applications, and Challenges of Protein Structure Alignment. ADVANCES IN PROTEIN CHEMISTRY AND STRUCTURAL BIOLOGY 2014; 94:121-75. [DOI: 10.1016/b978-0-12-800168-4.00005-6] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
13
|
Maadooliat M, Gao X, Huang JZ. Assessing protein conformational sampling methods based on bivariate lag-distributions of backbone angles. Brief Bioinform 2012; 14:724-36. [PMID: 22926831 DOI: 10.1093/bib/bbs052] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
Despite considerable progress in the past decades, protein structure prediction remains one of the major unsolved problems in computational biology. Angular-sampling-based methods have been extensively studied recently due to their ability to capture the continuous conformational space of protein structures. The literature has focused on using a variety of parametric models of the sequential dependencies between angle pairs along the protein chains. In this article, we present a thorough review of angular-sampling-based methods by assessing three main questions: What is the best distribution type to model the protein angles? What is a reasonable number of components in a mixture model that should be considered to accurately parameterize the joint distribution of the angles? and What is the order of the local sequence-structure dependency that should be considered by a prediction method? We assess the model fits for different methods using bivariate lag-distributions of the dihedral/planar angles. Moreover, the main information across the lags can be extracted using a technique called Lag singular value decomposition (LagSVD), which considers the joint distribution of the dihedral/planar angles over different lags using a nonparametric approach and monitors the behavior of the lag-distribution of the angles using singular value decomposition. As a result, we developed graphical tools and numerical measurements to compare and evaluate the performance of different model fits. Furthermore, we developed a web-tool (http://www.stat.tamu.edu/∼madoliat/LagSVD) that can be used to produce informative animations.
Collapse
Affiliation(s)
- Mehdi Maadooliat
- Mathematical and Computer Sciences and Engineering Division, 4700 King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia, . Jianhua Z. Huang, Department of Statistics, 447 Blocker Building, Texas A&M University, 3143 TAMU, College Station, TX 77843-3143 (USA), E-mail:
| | | | | |
Collapse
|
14
|
Belhouchet M, Mohd Jaafar F, Firth AE, Grimes JM, Mertens PPC, Attoui H. Detection of a fourth orbivirus non-structural protein. PLoS One 2011; 6:e25697. [PMID: 22022432 PMCID: PMC3192121 DOI: 10.1371/journal.pone.0025697] [Citation(s) in RCA: 158] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2011] [Accepted: 09/08/2011] [Indexed: 12/22/2022] Open
Abstract
The genus Orbivirus includes both insect and tick-borne viruses. The orbivirus genome, composed of 10 segments of dsRNA, encodes 7 structural proteins (VP1–VP7) and 3 non-structural proteins (NS1–NS3). An open reading frame (ORF) that spans almost the entire length of genome segment-9 (Seg-9) encodes VP6 (the viral helicase). However, bioinformatic analysis recently identified an overlapping ORF (ORFX) in Seg-9. We show that ORFX encodes a new non-structural protein, identified here as NS4. Western blotting and confocal fluorescence microscopy, using antibodies raised against recombinant NS4 from Bluetongue virus (BTV, which is insect-borne), or Great Island virus (GIV, which is tick-borne), demonstrate that these proteins are synthesised in BTV or GIV infected mammalian cells, respectively. BTV NS4 is also expressed in Culicoides insect cells. NS4 forms aggregates throughout the cytoplasm as well as in the nucleus, consistent with identification of nuclear localisation signals within the NS4 sequence. Bioinformatic analyses indicate that NS4 contains coiled-coils, is related to proteins that bind nucleic acids, or are associated with membranes and shows similarities to nucleolar protein UTP20 (a processome subunit). Recombinant NS4 of GIV protects dsRNA from degradation by endoribonucleases of the RNAse III family, indicating that it interacts with dsRNA. However, BTV NS4, which is only half the putative size of the GIV NS4, did not protect dsRNA from RNAse III cleavage. NS4 of both GIV and BTV protect DNA from degradation by DNAse. NS4 was found to associate with lipid droplets in cells infected with BTV or GIV or transfected with a plasmid expressing NS4.
Collapse
Affiliation(s)
- Mourad Belhouchet
- Vector-Borne Viral Diseases Programme, Institute for Animal Health, Pirbright, United Kingdom
- Division of Structural Biology, Henry Wellcome Building for Genomic Medicine, Oxford, United Kingdom
| | - Fauziah Mohd Jaafar
- Vector-Borne Viral Diseases Programme, Institute for Animal Health, Pirbright, United Kingdom
| | - Andrew E. Firth
- Division of Virology, Department of Pathology, University of Cambridge, Cambridge, United Kingdom
| | - Jonathan M. Grimes
- Division of Structural Biology, Henry Wellcome Building for Genomic Medicine, Oxford, United Kingdom
| | - Peter P. C. Mertens
- Vector-Borne Viral Diseases Programme, Institute for Animal Health, Pirbright, United Kingdom
| | - Houssam Attoui
- Vector-Borne Viral Diseases Programme, Institute for Animal Health, Pirbright, United Kingdom
- * E-mail:
| |
Collapse
|
15
|
Regad L, Martin J, Camproux AC. Dissecting protein loops with a statistical scalpel suggests a functional implication of some structural motifs. BMC Bioinformatics 2011; 12:247. [PMID: 21689388 PMCID: PMC3158783 DOI: 10.1186/1471-2105-12-247] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2010] [Accepted: 06/20/2011] [Indexed: 12/24/2022] Open
Abstract
Background One of the strategies for protein function annotation is to search particular structural motifs that are known to be shared by proteins with a given function. Results Here, we present a systematic extraction of structural motifs of seven residues from protein loops and we explore their correspondence with functional sites. Our approach is based on the structural alphabet HMM-SA (Hidden Markov Model - Structural Alphabet), which allows simplification of protein structures into uni-dimensional sequences, and advanced pattern statistics adapted to short sequences. Structural motifs of interest are selected by looking for structural motifs significantly over-represented in SCOP superfamilies in protein loops. We discovered two types of structural motifs significantly over-represented in SCOP superfamilies: (i) ubiquitous motifs, shared by several superfamilies and (ii) superfamily-specific motifs, over-represented in few superfamilies. A comparison of ubiquitous words with known small structural motifs shows that they contain well-described motifs as turn, niche or nest motifs. A comparison between superfamily-specific motifs and biological annotations of Swiss-Prot reveals that some of them actually correspond to functional sites involved in the binding sites of small ligands, such as ATP/GTP, NAD(P) and SAH/SAM. Conclusions Our findings show that statistical over-representation in SCOP superfamilies is linked to functional features. The detection of over-represented motifs within structures simplified by HMM-SA is therefore a promising approach for prediction of functional sites and annotation of uncharacterized proteins.
Collapse
|
16
|
Hu Y, Dong X, Wu A, Cao Y, Tian L, Jiang T. Incorporation of local structural preference potential improves fold recognition. PLoS One 2011; 6:e17215. [PMID: 21365008 PMCID: PMC3041821 DOI: 10.1371/journal.pone.0017215] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2010] [Accepted: 01/25/2011] [Indexed: 11/19/2022] Open
Abstract
Fold recognition, or threading, is a popular protein structure modeling approach that uses known structure templates to build structures for those of unknown. The key to the success of fold recognition methods lies in the proper integration of sequence, physiochemical and structural information. Here we introduce another type of information, local structural preference potentials of 3-residue and 9-residue fragments, for fold recognition. By combining the two local structural preference potentials with the widely used sequence profile, secondary structure information and hydrophobic score, we have developed a new threading method called FR-t5 (fold recognition by use of 5 terms). In benchmark testings, we have found the consideration of local structural preference potentials in FR-t5 not only greatly enhances the alignment accuracy and recognition sensitivity, but also significantly improves the quality of prediction models.
Collapse
Affiliation(s)
- Yun Hu
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- Graduate University of Chinese Academy of Sciences, Beijing, China
| | - Xiaoxi Dong
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- Graduate University of Chinese Academy of Sciences, Beijing, China
| | - Aiping Wu
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
| | - Yang Cao
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- Graduate University of Chinese Academy of Sciences, Beijing, China
| | - Liqing Tian
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- Graduate University of Chinese Academy of Sciences, Beijing, China
| | - Taijiao Jiang
- National Laboratory of Biomacromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, China
- * E-mail:
| |
Collapse
|
17
|
Development of resistance against blackleg disease in Brassica oleracea var. botrytis through in silico methods. Fungal Genet Biol 2010; 47:800-8. [DOI: 10.1016/j.fgb.2010.06.014] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2010] [Revised: 06/15/2010] [Accepted: 06/28/2010] [Indexed: 01/03/2023]
|
18
|
Zhai Y, Attoui H, Mohd Jaafar F, Wang HQ, Cao YX, Fan SP, Sun YX, Liu LD, Mertens PPC, Meng WS, Wang D, Liang G. Isolation and full-length sequence analysis of Armigeres subalbatus totivirus, the first totivirus isolate from mosquitoes representing a proposed novel genus (Artivirus) of the family Totiviridae. J Gen Virol 2010; 91:2836-45. [DOI: 10.1099/vir.0.024794-0] [Citation(s) in RCA: 77] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
|
19
|
Maupetit J, Derreumaux P, Tufféry P. A fast method for large-scale de novo peptide and miniprotein structure prediction. J Comput Chem 2010; 31:726-38. [PMID: 19569182 DOI: 10.1002/jcc.21365] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Although peptides have many biological and biomedical implications, an accurate method predicting their equilibrium structural ensembles from amino acid sequences and suitable for large-scale experiments is still missing. We introduce a new approach-PEP-FOLD-to the de novo prediction of peptides and miniproteins. It first predicts, in the terms of a Hidden Markov Model-derived structural alphabet, a limited number of local conformations at each position of the structure. It then performs their assembly using a greedy procedure driven by a coarse-grained energy score. On a benchmark of 52 peptides with 9-23 amino acids, PEP-FOLD generates lowest-energy conformations within 2.8 and 2.3 A Calpha root-mean-square deviation from the full nuclear magnetic resonance structures (NMR) and the NMR rigid cores, respectively, outperforming previous approaches. For 13 miniproteins with 27-49 amino acids, PEP-FOLD reaches an accuracy of 3.6 and 4.6 A Calpha root-mean-square deviation for the most-native and lowest-energy conformations, using the nonflexible regions identified by NMR. PEP-FOLD simulations are fast-a few minutes only-opening therefore, the door to in silico large-scale rational design of new bioactive peptides and miniproteins.
Collapse
Affiliation(s)
- Julien Maupetit
- MTi, INSERM UMR-S973 and RPBS, Université Paris Diderot - Paris 7, 5 rue Marie-Andrée Lagroua Weill-Halle, 75205 Paris, Cedex 13, France
| | | | | |
Collapse
|
20
|
Pandini A, Fornili A, Kleinjung J. Structural alphabets derived from attractors in conformational space. BMC Bioinformatics 2010; 11:97. [PMID: 20170534 PMCID: PMC2838871 DOI: 10.1186/1471-2105-11-97] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2009] [Accepted: 02/20/2010] [Indexed: 11/20/2022] Open
Abstract
Background The hierarchical and partially redundant nature of protein structures justifies the definition of frequently occurring conformations of short fragments as 'states'. Collections of selected representatives for these states define Structural Alphabets, describing the most typical local conformations within protein structures. These alphabets form a bridge between the string-oriented methods of sequence analysis and the coordinate-oriented methods of protein structure analysis. Results A Structural Alphabet has been derived by clustering all four-residue fragments of a high-resolution subset of the protein data bank and extracting the high-density states as representative conformational states. Each fragment is uniquely defined by a set of three independent angles corresponding to its degrees of freedom, capturing in simple and intuitive terms the properties of the conformational space. The fragments of the Structural Alphabet are equivalent to the conformational attractors and therefore yield a most informative encoding of proteins. Proteins can be reconstructed within the experimental uncertainty in structure determination and ensembles of structures can be encoded with accuracy and robustness. Conclusions The density-based Structural Alphabet provides a novel tool to describe local conformations and it is specifically suitable for application in studies of protein dynamics.
Collapse
Affiliation(s)
- Alessandro Pandini
- Division of Mathematical Biology, MRC National Institute for Medical Research, London, UK
| | | | | |
Collapse
|
21
|
Mining protein loops using a structural alphabet and statistical exceptionality. BMC Bioinformatics 2010; 11:75. [PMID: 20132552 PMCID: PMC2833150 DOI: 10.1186/1471-2105-11-75] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2009] [Accepted: 02/04/2010] [Indexed: 12/21/2022] Open
Abstract
Background Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. Results We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints. Conclusions We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/.
Collapse
|
22
|
Akhoon BA, Gupta SK, Verma V, Dhaliwal G, Srivastava M, Gupta SK, Ahmad RF. In silico designing and optimization of anti-breast cancer antibody mimetic oligopeptide targeting HER-2 in women. J Mol Graph Model 2010; 28:664-9. [PMID: 20149699 DOI: 10.1016/j.jmgm.2010.01.002] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2009] [Accepted: 01/09/2010] [Indexed: 01/06/2023]
Abstract
Overexpression of HER-2 is of frequent (20-30%) occurrence in breast cancer. Therapeutic targeting of HER-2 with humanized antibody derived oligopeptide may be a promising approach to the treatment of breast cancer. HER-2 gene is part of a family of genes that play critical roles in regulating transmembrane growth of breast cancer cells. Pertuzumab, a recombinant humanized monoclonal antibody (2C4), binds to extracellular domain II of the HER-2 receptor and inhibits its ability to dimerize with other HER receptors blocking the cell growth, signaling and apoptosis induction. The unique binding pocket on HER-2 for pertuzumab provides an important target domain for creation of new anticancer drugs. In the present work an efficient oligopeptide was designed by our computational method that interacts with pertuzumab binding sites of HER-2. In silico docking study demonstrated the best specific interaction of RASPADREV oligopeptide with the dimerization domain in the HER-2 molecule among various screened oligopeptides. ADMET and SAR properties prove the drug likeness of designed oligopeptide as having value 0.98.
Collapse
Affiliation(s)
- Bashir A Akhoon
- Centre of Bioinformatics, Department of Biotechnology, SMVD University, Jammu, India.
| | | | | | | | | | | | | |
Collapse
|
23
|
In silico DNA vaccine designing against human papillomavirus (HPV) causing cervical cancer. Vaccine 2009; 28:120-31. [DOI: 10.1016/j.vaccine.2009.09.095] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2009] [Revised: 09/17/2009] [Accepted: 09/22/2009] [Indexed: 12/15/2022]
|
24
|
Schön JC, Jansen M. Determination, prediction, and understanding of structures, using the energy landscapes of chemical systems – Part II. ACTA ACUST UNITED AC 2009. [DOI: 10.1524/zkri.216.7.361.20362] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Abstract
In the past decade, new theoretical approaches have been developed to determine, predict and understand the struc-ture of chemical compounds. The central element of these methods has been the investigation of the energy landscape of chemical systems. Applications range from extended crystalline and amorphous compounds over clusters and molecular crystals to proteins. In this review, we are going to give an introduction to energy landscapes and methods for their investigation, together with a number of examples. These include structure prediction of extended and mo-lecular crystals, structure prediction and folding of proteins, structure analysis of zeolites, and structure determination of crystals from powder diffraction data.
Collapse
|
25
|
Deschavanne P, Tufféry P. Enhanced protein fold recognition using a structural alphabet. Proteins 2009; 76:129-37. [DOI: 10.1002/prot.22324] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
26
|
Wang S, Zheng WM. CLePAPS: fast pair alignment of protein structures based on conformational letters. J Bioinform Comput Biol 2008; 6:347-66. [PMID: 18464327 DOI: 10.1142/s0219720008003461] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2007] [Revised: 11/22/2007] [Accepted: 12/05/2007] [Indexed: 11/18/2022]
Abstract
Fast, efficient, and reliable algorithms for pairwise alignment of protein structures are in ever-increasing demand for analyzing the rapidly growing data on protein structures. CLePAPS is a tool developed for this purpose. It distinguishes itself from other existing algorithms by the use of conformational letters, which are discretized states of 3D segmental structural states. A letter corresponds to a cluster of combinations of the three angles formed by Calpha pseudobonds of four contiguous residues. A substitution matrix called CLESUM is available to measure the similarity between any two such letters. CLePAPS regards an aligned fragment pair (AFP) as an ungapped string pair with a high sum of pairwise CLESUM scores. Using CLESUM scores as the similarity measure, CLePAPS searches for AFPs by simple string comparison. The transformation which best superimposes a highly similar AFP can be used to superimpose the structure pairs under comparison. A highly scored AFP which is consistent with several other AFPs determines an initial alignment. CLePAPS then joins consistent AFPs guided by their similarity scores to extend the alignment by several "zoom-in" iteration steps. A follow-up refinement produces the final alignment. CLePAPS does not implement dynamic programming. The utility of CLePAPS is tested on various protein structure pairs.
Collapse
Affiliation(s)
- Sheng Wang
- Institute of Theoretical Physics, Academia Sinica, Beijing 100080, China
| | | |
Collapse
|
27
|
Ku SY, Hu YJ. Protein structure search and local structure characterization. BMC Bioinformatics 2008; 9:349. [PMID: 18721472 PMCID: PMC2529324 DOI: 10.1186/1471-2105-9-349] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2008] [Accepted: 08/22/2008] [Indexed: 11/10/2022] Open
Abstract
Background Structural similarities among proteins can provide valuable insight into their functional mechanisms and relationships. As the number of available three-dimensional (3D) protein structures increases, a greater variety of studies can be conducted with increasing efficiency, among which is the design of protein structural alphabets. Structural alphabets allow us to characterize local structures of proteins and describe the global folding structure of a protein using a one-dimensional (1D) sequence. Thus, 1D sequences can be used to identify structural similarities among proteins using standard sequence alignment tools such as BLAST or FASTA. Results We used self-organizing maps in combination with a minimum spanning tree algorithm to determine the optimum size of a structural alphabet and applied the k-means algorithm to group protein fragnts into clusters. The centroids of these clusters defined the structural alphabet. We also developed a flexible matrix training system to build a substitution matrix (TRISUM-169) for our alphabet. Based on FASTA and using TRISUM-169 as the substitution matrix, we developed the SA-FAST alignment tool. We compared the performance of SA-FAST with that of various search tools in database-scale search tasks and found that SA-FAST was highly competitive in all tests conducted. Further, we evaluated the performance of our structural alphabet in recognizing specific structural domains of EGF and EGF-like proteins. Our method successfully recovered more EGF sub-domains using our structural alphabet than when using other structural alphabets. SA-FAST can be found at . Conclusion The goal of this project was two-fold. First, we wanted to introduce a modular design pipeline to those who have been working with structural alphabets. Secondly, we wanted to open the door to researchers who have done substantial work in biological sequences but have yet to enter the field of protein structure research. Our experiments showed that by transforming the structural representations from 3D to 1D, several 1D-based tools can be applied to structural analysis, including similarity searches and structural motif finding.
Collapse
Affiliation(s)
- Shih-Yen Ku
- Department of Computer Science, National Chiao Tung University, 1001 University Rd. Hsinchu, Taiwan.
| | | |
Collapse
|
28
|
Li SC, Bu D, Xu J, Li M. Fragment-HMM: a new approach to protein structure prediction. Protein Sci 2008; 17:1925-34. [PMID: 18723665 DOI: 10.1110/ps.036442.108] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
We designed a simple position-specific hidden Markov model to predict protein structure. Our new framework naturally repeats itself to converge to a final target, conglomerating fragment assembly, clustering, target selection, refinement, and consensus, all in one process. Our initial implementation of this theory converges to within 6 A of the native structures for 100% of decoys on all six standard benchmark proteins used in ROSETTA (discussed by Simons and colleagues in a recent paper), which achieved only 14%-94% for the same data. The qualities of the best decoys and the final decoys our theory converges to are also notably better.
Collapse
Affiliation(s)
- Shuai Cheng Li
- David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, Ontario N2L3G1, Canada
| | | | | | | |
Collapse
|
29
|
Dong Q, Wang X, Lin L. Prediction of protein local structures and folding fragments based on building-block library. Proteins 2008; 72:353-66. [PMID: 18214964 DOI: 10.1002/prot.21931] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
In recent years, protein structure prediction using local structure information has made great progress. In this study, a novel and effective method is developed to predict the local structure and the folding fragments of proteins. First, the proteins with known structures are split into fragments. Second, these fragments, represented by dihedrals, are clustered to produce the building blocks (BBs). Third, an efficient machine learning method is used to predict the local structures of proteins from sequence profiles. Finally, a bi-gram model, trained by an iterated algorithm, is introduced to simulate the interactions of these BBs. For test proteins, the building-block lattice is constructed, which contains all the folding fragments of the proteins. The local structures and the optimal fragments are then obtained by the dynamic programming algorithm. The experiment is performed on a subset of the PDB database with sequence identity less than 25%. The results show that the performance of the method is better than the method that uses only sequence information. When multiple paths are returned, the average classification accuracy of local structures is 72.27% and the average prediction accuracy of local structures is 67.72%, which is a significant improvement in comparison with previous studies. The method can predict not only the local structures but also the folding fragments of proteins. This work is helpful for the ab initio protein structure prediction and especially, the understanding of the folding process of proteins.
Collapse
Affiliation(s)
- Qiwen Dong
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| | | | | |
Collapse
|
30
|
Abstract
Despite significant progress in recent years, protein structure prediction maintains its status as one of the prime unsolved problems in computational biology. One of the key remaining challenges is an efficient probabilistic exploration of the structural space that correctly reflects the relative conformational stabilities. Here, we present a fully probabilistic, continuous model of local protein structure in atomic detail. The generative model makes efficient conformational sampling possible and provides a framework for the rigorous analysis of local sequence-structure correlations in the native state. Our method represents a significant theoretical and practical improvement over the widely used fragment assembly technique by avoiding the drawbacks associated with a discrete and nonprobabilistic approach.
Collapse
|
31
|
Martin J, Regad L, Etchebest C, Camproux AC. Taking advantage of local structure descriptors to analyze interresidue contacts in protein structures and protein complexes. Proteins 2008; 73:672-89. [DOI: 10.1002/prot.22091] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
32
|
Liu X, Zhao YP, Zheng WM. CLEMAPS: Multiple alignment of protein structures based on conformational letters. Proteins 2008; 71:728-36. [DOI: 10.1002/prot.21739] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
33
|
Martin J, de Brevern AG, Camproux AC. In silico local structure approach: a case study on outer membrane proteins. Proteins 2008; 71:92-109. [PMID: 17932925 DOI: 10.1002/prot.21659] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The detection of Outer Membrane Proteins (OMP) in whole genomes is an actual question, their sequence characteristics have thus been intensively studied. This class of protein displays a common beta-barrel architecture, formed by adjacent antiparallel strands. However, due to the lack of available structures, few structural studies have been made on this class of proteins. Here we propose a novel OMP local structure investigation, based on a structural alphabet approach, i.e., the decomposition of 3D structures using a library of four-residue protein fragments. The optimal decomposition of structures using hidden Markov model results in a specific structural alphabet of 20 fragments, six of them dedicated to the decomposition of beta-strands. This optimal alphabet, called SA20-OMP, is analyzed in details, in terms of local structures and transitions between fragments. It highlights a particular and strong organization of beta-strands as series of regular canonical structural fragments. The comparison with alphabets learned on globular structures indicates that the internal organization of OMP structures is more constrained than in globular structures. The analysis of OMP structures using SA20-OMP reveals some recurrent structural patterns. The preferred location of fragments in the distinct regions of the membrane is investigated. The study of pairwise specificity of fragments reveals that some contacts between structural fragments in beta-sheets are clearly favored whereas others are avoided. This contact specificity is stronger in OMP than in globular structures. Moreover, SA20-OMP also captured sequential information. This can be integrated in a scoring function for structural model ranking with very promising results.
Collapse
Affiliation(s)
- Juliette Martin
- INSERM UMR-S 726/Université Denis Diderot Paris 7, Equipe de Bioinformatique Génomique et Moléculaire, F-75005 Paris
| | | | | |
Collapse
|
34
|
Schenk G, Margraf T, Torda AE. Protein sequence and structure alignments within one framework. Algorithms Mol Biol 2008; 3:4. [PMID: 18380904 PMCID: PMC2390564 DOI: 10.1186/1748-7188-3-4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2008] [Accepted: 04/01/2008] [Indexed: 11/19/2022] Open
Abstract
Background Protein structure alignments are usually based on very different techniques to sequence alignments. We propose a method which treats sequence, structure and even combined sequence + structure in a single framework. Using a probabilistic approach, we calculate a similarity measure which can be applied to fragments containing only protein sequence, structure or both simultaneously. Results Proof-of-concept results are given for the different problems. For sequence alignments, the methodology is no better than conventional methods. For structure alignments, the techniques are very fast, reliable and tolerant of a range of alignment parameters. Combined sequence and structure alignments may provide a more reliable alignment for pairs of proteins where pure structural alignments can be misled by repetitive elements or apparent symmetries. Conclusion The probabilistic framework has an elegance in principle, merging sequence and structure descriptors into a single framework. It has a practical use in fast structural alignments and a potential use in finding those examples where sequence and structural similarities apparently disagree.
Collapse
|
35
|
Martin J, Regad L, Lecornet H, Camproux AC. Structural deformation upon protein-protein interaction: a structural alphabet approach. BMC STRUCTURAL BIOLOGY 2008; 8:12. [PMID: 18307769 PMCID: PMC2315654 DOI: 10.1186/1472-6807-8-12] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/15/2007] [Accepted: 02/28/2008] [Indexed: 11/26/2022]
Abstract
Background In a number of protein-protein complexes, the 3D structures of bound and unbound partners significantly differ, supporting the induced fit hypothesis for protein-protein binding. Results In this study, we explore the induced fit modifications on a set of 124 proteins available in both bound and unbound forms, in terms of local structure. The local structure is described thanks to a structural alphabet of 27 structural letters that allows a detailed description of the backbone. Using a control set to distinguish induced fit from experimental error and natural protein flexibility, we show that the fraction of structural letters modified upon binding is significantly greater than in the control set (36% versus 28%). This proportion is even greater in the interface regions (41%). Interface regions preferentially involve coils. Our analysis further reveals that some structural letters in coil are not favored in the interface. We show that certain structural letters in coil are particularly subject to modifications at the interface, and that the severity of structural change also varies. These information are used to derive a structural letter substitution matrix that summarizes the local structural changes observed in our data set. We also illustrate the usefulness of our approach to identify common binding motifs in unrelated proteins. Conclusion Our study provides qualitative information about induced fit. These results could be of help for flexible docking.
Collapse
Affiliation(s)
- Juliette Martin
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM UMRS726/Université Denis Diderot Paris 7, F-75005 Paris, France.
| | | | | | | |
Collapse
|
36
|
Regad L, Guyon F, Maupetit J, Tufféry P, Camproux A. A Hidden Markov Model applied to the protein 3D structure analysis. Comput Stat Data Anal 2008. [DOI: 10.1016/j.csda.2007.09.010] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
37
|
Dong Q, Wang X, Lin L, Wang Y. Analysis and prediction of protein local structure based on structure alphabets. Proteins 2008; 72:163-72. [DOI: 10.1002/prot.21904] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
38
|
Hidden Markov Models for prediction of protein features. METHODS IN MOLECULAR BIOLOGY (CLIFTON, N.J.) 2008; 413:173-98. [PMID: 18075166 DOI: 10.1007/978-1-59745-574-9_7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
Hidden Markov Models (HMMs) are an extremely versatile statistical representation that can be used to model any set of one-dimensional discrete symbol data. HMMs can model protein sequences in many ways, depending on what features of the protein are represented by the Markov states. For protein structure prediction, states have been chosen to represent either homologous sequence positions, local or secondary structure types, or transmembrane locality. The resulting models can be used to predict common ancestry, secondary or local structure, or membrane topology by applying one of the two standard algorithms for comparing a sequence to a model. In this chapter, we review those algorithms and discuss how HMMs have been constructed and refined for the purpose of protein structure prediction.
Collapse
|
39
|
De Brevern AG, Etchebest C, Benros C, Hazout S. "Pinning strategy": a novel approach for predicting the backbone structure in terms of protein blocks from sequence. J Biosci 2007; 32:51-70. [PMID: 17426380 DOI: 10.1007/s12038-007-0006-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The description of protein 3D structures can be performed through a library of 3D fragments, named a structural alphabet. Our structural alphabet is composed of 16 small protein fragments of 5 C alpha in length, called protein blocks (PBs). It allows an efficient approximation of the 3D protein structures and a correct prediction of the local structure. The 72 most frequent series of 5 consecutive PBs, called structural words (SWs)are able to cover more than 90% of the 3D structures. PBs are highly conditioned by the presence of a limited number of transitions between them. In this study, we propose a new method called "pinning strategy" that used this specific feature to predict long protein fragments. Its goal is to define highly probable successions of PBs. It starts from the most probable SW and is then extended with overlapping SWs. Starting from an initial prediction rate of 34.4%, the use of the SWs instead of the PBs allows a gain of 4.5%. The pinning strategy simply applied to the SWs increases the prediction accuracy to 39.9%. In a second step, the sequence-structure relationship is optimized, the prediction accuracy reaches 43.6%.
Collapse
Affiliation(s)
- A G De Brevern
- 1 INSERM, U726, Equipe de Bioinformatique Genomique et Moleculaire (EBGM), Universite Paris 7,case 7113, 2, place Jussieu, 75251 Paris Cedex 05, France.
| | | | | | | |
Collapse
|
40
|
Lo WC, Huang PJ, Chang CH, Lyu PC. Protein structural similarity search by Ramachandran codes. BMC Bioinformatics 2007; 8:307. [PMID: 17716377 PMCID: PMC2194796 DOI: 10.1186/1471-2105-8-307] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2007] [Accepted: 08/23/2007] [Indexed: 11/13/2022] Open
Abstract
Background Protein structural data has increased exponentially, such that fast and accurate tools are necessary to access structure similarity search. To improve the search speed, several methods have been designed to reduce three-dimensional protein structures to one-dimensional text strings that are then analyzed by traditional sequence alignment methods; however, the accuracy is usually sacrificed and the speed is still unable to match sequence similarity search tools. Here, we aimed to improve the linear encoding methodology and develop efficient search tools that can rapidly retrieve structural homologs from large protein databases. Results We propose a new linear encoding method, SARST (Structural similarity search Aided by Ramachandran Sequential Transformation). SARST transforms protein structures into text strings through a Ramachandran map organized by nearest-neighbor clustering and uses a regenerative approach to produce substitution matrices. Then, classical sequence similarity search methods can be applied to the structural similarity search. Its accuracy is similar to Combinatorial Extension (CE) and works over 243,000 times faster, searching 34,000 proteins in 0.34 sec with a 3.2-GHz CPU. SARST provides statistically meaningful expectation values to assess the retrieved information. It has been implemented into a web service and a stand-alone Java program that is able to run on many different platforms. Conclusion As a database search method, SARST can rapidly distinguish high from low similarities and efficiently retrieve homologous structures. It demonstrates that the easily accessible linear encoding methodology has the potential to serve as a foundation for efficient protein structural similarity search tools. These search tools are supposed applicable to automated and high-throughput functional annotations or predictions for the ever increasing number of published protein structures in this post-genomic era.
Collapse
Affiliation(s)
- Wei-Cheng Lo
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, 101, Section 2 Kuang Fu Road, Hsinchu 30013, Taiwan
| | - Po-Jung Huang
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, 101, Section 2 Kuang Fu Road, Hsinchu 30013, Taiwan
| | - Chih-Hung Chang
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, 101, Section 2 Kuang Fu Road, Hsinchu 30013, Taiwan
| | - Ping-Chiang Lyu
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, 101, Section 2 Kuang Fu Road, Hsinchu 30013, Taiwan
| |
Collapse
|
41
|
Etchebest C, Benros C, Bornot A, Camproux AC, de Brevern AG. A reduced amino acid alphabet for understanding and designing protein adaptation to mutation. EUROPEAN BIOPHYSICS JOURNAL: EBJ 2007; 36:1059-69. [PMID: 17565494 DOI: 10.1007/s00249-007-0188-5] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/13/2007] [Revised: 05/05/2007] [Accepted: 05/07/2007] [Indexed: 10/23/2022]
Abstract
Protein sequence world is considerably larger than structure world. In consequence, numerous non-related sequences may adopt similar 3D folds and different kinds of amino acids may thus be found in similar 3D structures. By grouping together the 20 amino acids into a smaller number of representative residues with similar features, sequence world simplification may be achieved. This clustering hence defines a reduced amino acid alphabet (reduced AAA). Numerous works have shown that protein 3D structures are composed of a limited number of building blocks, defining a structural alphabet. We previously identified such an alphabet composed of 16 representative structural motifs (5-residues length) called Protein Blocks (PBs). This alphabet permits to translate the structure (3D) in sequence of PBs (1D). Based on these two concepts, reduced AAA and PBs, we analyzed the distributions of the different kinds of amino acids and their equivalences in the structural context. Different reduced sets were considered. Recurrent amino acid associations were found in all the local structures while other were specific of some local structures (PBs) (e.g Cysteine, Histidine, Threonine and Serine for the alpha-helix Ncap). Some similar associations are found in other reduced AAAs, e.g Ile with Val, or hydrophobic aromatic residues Trp with Phe and Tyr. We put into evidence interesting alternative associations. This highlights the dependence on the information considered (sequence or structure). This approach, equivalent to a substitution matrix, could be useful for designing protein sequence with different features (for instance adaptation to environment) while preserving mainly the 3D fold.
Collapse
Affiliation(s)
- C Etchebest
- Equipe de Bioinformatique Génomique et Moléculaire (EBGM), INSERM UMR-S 726, Université Denis DIDEROT, Paris 7, case 7113, 2, place Jussieu, 75251, Paris, France
| | | | | | | | | |
Collapse
|
42
|
Dong QW, Wang XL, Lin L. Methods for optimizing the structure alphabet sequences of proteins. Comput Biol Med 2007; 37:1610-6. [PMID: 17493604 DOI: 10.1016/j.compbiomed.2007.03.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2006] [Accepted: 03/16/2007] [Indexed: 11/24/2022]
Abstract
Protein structure prediction based on fragment assemble has made great progress in recent years. Local protein structure prediction is receiving increased attention. One essential step of local protein structure prediction method is that the three-dimensional conformations must be compressed into one-dimensional series of letters of a structural alphabet. The traditional method assigns each structure fragment the structure alphabet that has the best local structure similarity. However, such locally optimal structure alphabet sequence does not guarantee to produce the globally optimal structure. This study presents two efficient methods trying to find the optimal structure alphabet sequence, which can model the native structures as accuracy as possible. First, a 28-letter structure alphabet is derived by clustering fragment in Cartesian space with fragment length of seven residues. The average quantization error of the 28 letters is 0.82 A in term of root mean square deviation. Then, two efficient methods are presented to encode the protein structures into series of structure alphabet letters, that is, the greedy and dynamic programming algorithm. They are tested on PDB database using the structure alphabet developed in Cartesian coordinates space (our structure alphabet) and in torsion angles space (the PB structure alphabet), respectively. The experimental results show that these two methods can find the approximately optimal structure alphabet sequences by searching a small fraction of the modeling space. The traditional local-optimization method achieves 26.27 A root mean square deviations between the reconstructed structures and the native one, while the modeling accuracy is improved to 3.28 A by the greedy algorithm. The results are helpful for local protein structure prediction.
Collapse
Affiliation(s)
- Qi-wen Dong
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| | | | | |
Collapse
|
43
|
Hamelryck T, Kent JT, Krogh A. Sampling realistic protein conformations using local structural bias. PLoS Comput Biol 2006; 2:e131. [PMID: 17002495 PMCID: PMC1570370 DOI: 10.1371/journal.pcbi.0020131] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/20/2006] [Accepted: 08/21/2006] [Indexed: 11/19/2022] Open
Abstract
The prediction of protein structure from sequence remains a major unsolved problem in biology. The most successful protein structure prediction methods make use of a divide-and-conquer strategy to attack the problem: a conformational sampling method generates plausible candidate structures, which are subsequently accepted or rejected using an energy function. Conceptually, this often corresponds to separating local structural bias from the long-range interactions that stabilize the compact, native state. However, sampling protein conformations that are compatible with the local structural bias encoded in a given protein sequence is a long-standing open problem, especially in continuous space. We describe an elegant and mathematically rigorous method to do this, and show that it readily generates native-like protein conformations simply by enforcing compactness. Our results have far-reaching implications for protein structure prediction, determination, simulation, and design. Protein structure prediction is one of the main unsolved problems in computational biology today. A common way to tackle the problem is to generate plausible protein conformations using a fairly inaccurate but fast method, and to evaluate the conformations using an accurate but slow method. The main bottleneck lies in the first step, that is, efficiently exploring protein conformational space. Currently, the best way to do this is to construct plausible structures by stringing together fragments from experimentally determined protein structures, a method called fragment assembly. Hamelryck, Kent, and Krogh present a new method that can efficiently generate protein conformations that are compatible with a given protein sequence. Unlike for existing methods, the generated conformations cover a continuous range and come with an associated probability. The method shows great promise for use in protein structure prediction, determination, simulation, and design.
Collapse
Affiliation(s)
- Thomas Hamelryck
- Bioinformatics Center, Institute of Molecular Biology and Physiology, University of Copenhagen, Copenhagen, Denmark.
| | | | | |
Collapse
|
44
|
Benros C, de Brevern AG, Etchebest C, Hazout S. Assessing a novel approach for predicting local 3D protein structures from sequence. Proteins 2006; 62:865-80. [PMID: 16385557 DOI: 10.1002/prot.20815] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We developed a novel approach for predicting local protein structure from sequence. It relies on the Hybrid Protein Model (HPM), an unsupervised clustering method we previously developed. This model learns three-dimensional protein fragments encoded into a structural alphabet of 16 protein blocks (PBs). Here, we focused on 11-residue fragments encoded as a series of seven PBs and used HPM to cluster them according to their local similarities. We thus built a library of 120 overlapping prototypes (mean fragments from each cluster), with good three-dimensional local approximation, i.e., a mean accuracy of 1.61 A Calpha root-mean-square distance. Our prediction method is intended to optimize the exploitation of the sequence-structure relations deduced from this library of long protein fragments. This was achieved by setting up a system of 120 experts, each defined by logistic regression to optimize the discrimination from sequence of a given prototype relative to the others. For a target sequence window, the experts computed probabilities of sequence-structure compatibility for the prototypes and ranked them, proposing the top scorers as structural candidates. Predictions were defined as successful when a prototype <2.5 A from the true local structure was found among those proposed. Our strategy yielded a prediction rate of 51.2% for an average of 4.2 candidates per sequence window. We also proposed a confidence index to estimate prediction quality. Our approach predicts from sequence alone and will thus provide valuable information for proteins without structural homologs. Candidates will also contribute to global structure prediction by fragment assembly.
Collapse
Affiliation(s)
- Cristina Benros
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U726, Université Denis DIDEROT-Paris 7, Paris, France.
| | | | | | | |
Collapse
|
45
|
Etchebest C, Benros C, Hazout S, de Brevern AG. A structural alphabet for local protein structures: improved prediction methods. Proteins 2006; 59:810-27. [PMID: 15822101 DOI: 10.1002/prot.20458] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Three-dimensional protein structures can be described with a library of 3D fragments that define a structural alphabet. We have previously proposed such an alphabet, composed of 16 patterns of five consecutive amino acids, called Protein Blocks (PBs). These PBs have been used to describe protein backbones and to predict local structures from protein sequences. The Q16 prediction rate reaches 40.7% with an optimization procedure. This article examines two aspects of PBs. First, we determine the effect of the enlargement of databanks on their definition. The results show that the geometrical features of the different PBs are preserved (local RMSD value equal to 0.41 A on average) and sequence-structure specificities reinforced when databanks are enlarged. Second, we improve the methods for optimizing PB predictions from sequences, revisiting the optimization procedure and exploring different local prediction strategies. Use of a statistical optimization procedure for the sequence-local structure relation improves prediction accuracy by 8% (Q16 = 48.7%). Better recognition of repetitive structures occurs without losing the prediction efficiency of the other local folds. Adding secondary structure prediction improved the accuracy of Q16 by only 1%. An entropy index (Neq), strongly related to the RMSD value of the difference between predicted PBs and true local structures, is proposed to estimate prediction quality. The Neq is linearly correlated with the Q16 prediction rate distributions, computed for a large set of proteins. An "expected" prediction rate QE16 is deduced with a mean error of 5%.
Collapse
Affiliation(s)
- Catherine Etchebest
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U726, Université Denis DIDEROT-Paris, France
| | | | | | | |
Collapse
|
46
|
Mayewski S. A multibody, whole-residue potential for protein structures, with testing by Monte Carlo simulated annealing. Proteins 2006; 59:152-69. [PMID: 15723360 DOI: 10.1002/prot.20397] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
A new multibody, whole-residue potential for protein tertiary structure is described. The potential is based on the local environment surrounding each main-chain alpha carbon (CA), defined as the set of all residues whose CA coordinates lie within a spherical volume of set radius in 3-dimensional (3D) space surrounding that position. It is shown that the relative positions of the CAs in these local environments belong to a set of preferred templates. The templates are derived by cluster analysis of the presently available database of over 3000 protein chains (750,000 residues) having not more than 30% sequence similarity. For each template is derived also a set of residue propensities for each topological position in the template. Using lookup tables of these derived templates, it is then possible to calculate an energy for any conformation of a given protein sequence. The application of the potential to ab initio protein tertiary structure prediction is evaluated by performing Monte Carlo simulated annealing on test protein sequences.
Collapse
Affiliation(s)
- Stefan Mayewski
- Max-Planck-Institut für Biochemie, 82152 Martinsried, Germany.
| |
Collapse
|
47
|
Sander O, Sommer I, Lengauer T. Local protein structure prediction using discriminative models. BMC Bioinformatics 2006; 7:14. [PMID: 16405736 PMCID: PMC1368994 DOI: 10.1186/1471-2105-7-14] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2005] [Accepted: 01/11/2006] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In recent years protein structure prediction methods using local structure information have shown promising improvements. The quality of new fold predictions has risen significantly and in fold recognition incorporation of local structure predictions led to improvements in the accuracy of results. We developed a local structure prediction method to be integrated into either fold recognition or new fold prediction methods. For each local sequence window of a protein sequence the method predicts probability estimates for the sequence to attain particular local structures from a set of predefined local structure candidates. The first step is to define a set of local structure representatives based on clustering recurrent local structures. In the second step a discriminative model is trained to predict the local structure representative given local sequence information. RESULTS The step of clustering local structures yields an average RMSD quantization error of 1.19 A for 27 structural representatives (for a fragment length of 7 residues). In the prediction step the area under the ROC curve for detection of the 27 classes ranges from 0.68 to 0.88. CONCLUSION The described method yields probability estimates for local protein structure candidates, giving signals for all kinds of local structure. These local structure predictions can be incorporated either into fold recognition algorithms to improve alignment quality and the overall prediction accuracy or into new fold prediction methods.
Collapse
Affiliation(s)
- Oliver Sander
- Max-Planck-Institute for Informatics, Department of Computational Biology and Applied Algorithmics, Stuhlsatzenhausweg 85, D-66123 Saarbrücken, Germany
| | - Ingolf Sommer
- Max-Planck-Institute for Informatics, Department of Computational Biology and Applied Algorithmics, Stuhlsatzenhausweg 85, D-66123 Saarbrücken, Germany
| | - Thomas Lengauer
- Max-Planck-Institute for Informatics, Department of Computational Biology and Applied Algorithmics, Stuhlsatzenhausweg 85, D-66123 Saarbrücken, Germany
| |
Collapse
|
48
|
Abstract
The field of protein-structure prediction has been revolutionized by the application of "mix-and-match" methods both in template-based homology modeling and in template-free de novo folding. Consensus analysis and recombination of fragments copied from known protein structures is currently the only approach that allows the building of models that are closer to the native structure of the target protein than the structure of its closest homologue. It is also the most successful approach in cases in which the target protein exhibits a novel three-dimensional fold. This review summarizes the recent developments in both template-based and template-free protein structure modeling and compares the available methods for protein-structure prediction by recombination of fragments. A convergence between the "protein folding" and "protein evolution" schools of thought is postulated.
Collapse
Affiliation(s)
- Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland.
| |
Collapse
|
49
|
Camproux AC, Tufféry P. Hidden Markov model-derived structural alphabet for proteins: the learning of protein local shapes captures sequence specificity. Biochim Biophys Acta Gen Subj 2005; 1724:394-403. [PMID: 16040198 DOI: 10.1016/j.bbagen.2005.05.019] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2005] [Revised: 05/10/2005] [Accepted: 05/11/2005] [Indexed: 11/19/2022]
Abstract
Understanding and predicting protein structures depend on the complexity and the accuracy of the models used to represent them. We have recently set up a Hidden Markov Model to optimally compress protein three-dimensional conformations into a one-dimensional series of letters of a structural alphabet. Such a model learns simultaneously the shape of representative structural letters describing the local conformation and the logic of their connections, i.e. the transition matrix between the letters. Here, we move one step further and report some evidence that such a model of protein local architecture also captures some accurate amino acid features. All the letters have specific and distinct amino acid distributions. Moreover, we show that words of amino acids can have significant propensities for some letters. Perspectives point towards the prediction of the series of letters describing the structure of a protein from its amino acid sequence.
Collapse
Affiliation(s)
- A C Camproux
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U726, Université Paris 7, case 7113, 2 place Jussieu, 75251 Paris, France.
| | | |
Collapse
|
50
|
Zheng WM, Liu X. A Protein Structural Alphabet and Its Substitution Matrix CLESUM. TRANSACTIONS ON COMPUTATIONAL SYSTEMS BIOLOGY II 2005. [DOI: 10.1007/11567752_4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
|