1
|
Stank A, Richter S, Wade RC. ProSAT+: visualizing sequence annotations on 3D structure. Protein Eng Des Sel 2016; 29:281-4. [PMID: 27284084 DOI: 10.1093/protein/gzw021] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 05/09/2016] [Indexed: 11/15/2022] Open
Abstract
PRO: tein S: tructure A: nnotation T: ool-plus (ProSAT(+)) is a new web server for mapping protein sequence annotations onto a protein structure and visualizing them simultaneously with the structure. ProSAT(+) incorporates many of the features of the preceding ProSAT and ProSAT2 tools but also provides new options for the visualization and sharing of protein annotations. Data are extracted from the UniProt KnowledgeBase, the RCSB PDB and the PDBe SIFTS resource, and visualization is performed using JSmol. User-defined sequence annotations can be added directly to the URL, thus enabling visualization and easy data sharing. ProSAT(+) is available at http://prosat.h-its.org.
Collapse
Affiliation(s)
- Antonia Stank
- Heidelberg Institute for Theoretical Studies (HITS), Schloss-Wolfsbrunnenweg 35, Heidelberg 69118, Germany Heidelberg Graduate School of Mathematical and Computational Methods for the Sciences, Im Neuenheimer Feld 205, Heidelberg 69120, Germany
| | - Stefan Richter
- Heidelberg Institute for Theoretical Studies (HITS), Schloss-Wolfsbrunnenweg 35, Heidelberg 69118, Germany
| | - Rebecca C Wade
- Heidelberg Institute for Theoretical Studies (HITS), Schloss-Wolfsbrunnenweg 35, Heidelberg 69118, Germany Center for Molecular Biology at Heidelberg University (ZMBH), DKFZ-ZMBH Alliance and Interdisciplinary Center for Scientific Computing (IWR), Im Neuenheimer Feld 282, Heidelberg 69120, Germany
| |
Collapse
|
2
|
Esque J, Urbain A, Etchebest C, de Brevern AG. Sequence-structure relationship study in all-α transmembrane proteins using an unsupervised learning approach. Amino Acids 2015; 47:2303-22. [PMID: 26043903 DOI: 10.1007/s00726-015-2010-5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2014] [Accepted: 05/15/2015] [Indexed: 01/28/2023]
Abstract
Transmembrane proteins (TMPs) are major drug targets, but the knowledge of their precise topology structure remains highly limited compared with globular proteins. In spite of the difficulties in obtaining their structures, an important effort has been made these last years to increase their number from an experimental and computational point of view. In view of this emerging challenge, the development of computational methods to extract knowledge from these data is crucial for the better understanding of their functions and in improving the quality of structural models. Here, we revisit an efficient unsupervised learning procedure, called Hybrid Protein Model (HPM), which is applied to the analysis of transmembrane proteins belonging to the all-α structural class. HPM method is an original classification procedure that efficiently combines sequence and structure learning. The procedure was initially applied to the analysis of globular proteins. In the present case, HPM classifies a set of overlapping protein fragments, extracted from a non-redundant databank of TMP 3D structure. After fine-tuning of the learning parameters, the optimal classification results in 65 clusters. They represent at best similar relationships between sequence and local structure properties of TMPs. Interestingly, HPM distinguishes among the resulting clusters two helical regions with distinct hydrophobic patterns. This underlines the complexity of the topology of these proteins. The HPM classification enlightens unusual relationship between amino acids in TMP fragments, which can be useful to elaborate new amino acids substitution matrices. Finally, two challenging applications are described: the first one aims at annotating protein functions (channel or not), the second one intends to assess the quality of the structures (X-ray or models) via a new scoring function deduced from the HPM classification.
Collapse
Affiliation(s)
- Jérémy Esque
- INSERM, U 1134, DSIMB, 75739, Paris, France.,Univ. Paris Diderot, Sorbonne Paris Cité UMR-S 1134, 75739, Paris, France.,Institut National de la Transfusion Sanguine (INTS), 75739, Paris, France.,Laboratoire d'Excellence GR-Ex, 75739, Paris, France.,Laboratoire d'Ingénierie des Fonctions Moléculaire (IFM), ISIS, UMR 7006, 67000, Strasbourg, France.,Department of Integrative Structural Biology, INSERM U964, Institut de Génétique et de Biologie Moléculaire et Cellulaire (IGBMC), 67404, Illkirch, France.,UMR7104, Centre National de la Recherche Scientifique (CNRS), 67404, Illkirch, France.,Université de Strasbourg, 67404, Illkirch, France
| | - Aurélie Urbain
- Institut Jean-Pierre Bourgin, INRA, UMR 1318, 78026, Versailles, France
| | - Catherine Etchebest
- INSERM, U 1134, DSIMB, 75739, Paris, France.,Univ. Paris Diderot, Sorbonne Paris Cité UMR-S 1134, 75739, Paris, France.,Institut National de la Transfusion Sanguine (INTS), 75739, Paris, France.,Laboratoire d'Excellence GR-Ex, 75739, Paris, France
| | - Alexandre G de Brevern
- INSERM, U 1134, DSIMB, 75739, Paris, France. .,Univ. Paris Diderot, Sorbonne Paris Cité UMR-S 1134, 75739, Paris, France. .,Institut National de la Transfusion Sanguine (INTS), 75739, Paris, France. .,Laboratoire d'Excellence GR-Ex, 75739, Paris, France.
| |
Collapse
|
3
|
Riazanov A, Laurila JB, Baker CJO. Deploying mutation impact text-mining software with the SADI Semantic Web Services framework. BMC Bioinformatics 2011; 12 Suppl 4:S6. [PMID: 21992079 PMCID: PMC3194198 DOI: 10.1186/1471-2105-12-s4-s6] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Mutation impact extraction is an important task designed to harvest relevant annotations from scientific documents for reuse in multiple contexts. Our previous work on text mining for mutation impacts resulted in (i) the development of a GATE-based pipeline that mines texts for information about impacts of mutations on proteins, (ii) the population of this information into our OWL DL mutation impact ontology, and (iii) establishing an experimental semantic database for storing the results of text mining. RESULTS This article explores the possibility of using the SADI framework as a medium for publishing our mutation impact software and data. SADI is a set of conventions for creating web services with semantic descriptions that facilitate automatic discovery and orchestration. We describe a case study exploring and demonstrating the utility of the SADI approach in our context. We describe several SADI services we created based on our text mining API and data, and demonstrate how they can be used in a number of biologically meaningful scenarios through a SPARQL interface (SHARE) to SADI services. In all cases we pay special attention to the integration of mutation impact services with external SADI services providing information about related biological entities, such as proteins, pathways, and drugs. CONCLUSION We have identified that SADI provides an effective way of exposing our mutation impact data such that it can be leveraged by a variety of stakeholders in multiple use cases. The solutions we provide for our use cases can serve as examples to potential SADI adopters trying to solve similar integration problems.
Collapse
Affiliation(s)
- Alexandre Riazanov
- Department of Computer Science & Applied Statistics, University of New Brunswick, Saint John, New Brunswick, E2L 4L5, Canada
| | - Jonas Bergman Laurila
- Department of Computer Science & Applied Statistics, University of New Brunswick, Saint John, New Brunswick, E2L 4L5, Canada
| | - Christopher JO Baker
- Department of Computer Science & Applied Statistics, University of New Brunswick, Saint John, New Brunswick, E2L 4L5, Canada
| |
Collapse
|
4
|
Abstract
Structural biology is rapidly accumulating a wealth of detailed information about protein function, binding sites, RNA, large assemblies and molecular motions. These data are increasingly of interest to a broader community of life scientists, not just structural experts. Visualization is a primary means for accessing and using these data, yet visualization is also a stumbling block that prevents many life scientists from benefiting from three-dimensional structural data. In this review, we focus on key biological questions where visualizing three-dimensional structures can provide insight and describe available methods and tools.
Collapse
|
5
|
Izarzugaza JMG, Baresic A, McMillan LEM, Yeats C, Clegg AB, Orengo CA, Martin ACR, Valencia A. An integrated approach to the interpretation of single amino acid polymorphisms within the framework of CATH and Gene3D. BMC Bioinformatics 2009; 10 Suppl 8:S5. [PMID: 19758469 PMCID: PMC2745587 DOI: 10.1186/1471-2105-10-s8-s5] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023] Open
Abstract
BACKGROUND The phenotypic effects of sequence variations in protein-coding regions come about primarily via their effects on the resulting structures, for example by disrupting active sites or affecting structural stability. In order better to understand the mechanisms behind known mutant phenotypes, and predict the effects of novel variations, biologists need tools to gauge the impacts of DNA mutations in terms of their structural manifestation. Although many mutations occur within domains whose structure has been solved, many more occur within genes whose protein products have not been structurally characterized. RESULTS Here we present 3DSim (3D Structural Implication of Mutations), a database and web application facilitating the localization and visualization of single amino acid polymorphisms (SAAPs) mapped to protein structures even where the structure of the protein of interest is unknown. The server displays information on 6514 point mutations, 4865 of them known to be associated with disease. These polymorphisms are drawn from SAAPdb, which aggregates data from various sources including dbSNP and several pathogenic mutation databases. While the SAAPdb interface displays mutations on known structures, 3DSim projects mutations onto known sequence domains in Gene3D. This resource contains sequences annotated with domains predicted to belong to structural families in the CATH database. Mappings between domain sequences in Gene3D and known structures in CATH are obtained using a MUSCLE alignment. 1210 three-dimensional structures corresponding to CATH structural domains are currently included in 3DSim; these domains are distributed across 396 CATH superfamilies, and provide a comprehensive overview of the distribution of mutations in structural space. CONCLUSION The server is publicly available at http://3DSim.bioinfo.cnio.es/. In addition, the database containing the mapping between SAAPdb, Gene3D and CATH is available on request and most of the functionality is available through programmatic web service access.
Collapse
Affiliation(s)
- Jose M G Izarzugaza
- Institute of Structural and Molecular Biology, University College London, UK.
| | | | | | | | | | | | | | | |
Collapse
|
6
|
Kanagasabai R, Choo KH, Ranganathan S, Baker CJO. A workflow for mutation extraction and structure annotation. J Bioinform Comput Biol 2008; 5:1319-37. [PMID: 18172931 DOI: 10.1142/s0219720007003119] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2007] [Revised: 09/11/2007] [Accepted: 09/30/2007] [Indexed: 11/18/2022]
Abstract
Rich information on point mutation studies is scattered across heterogeneous data sources. This paper presents an automated workflow for mining mutation annotations from full-text biomedical literature using natural language processing (NLP) techniques as well as for their subsequent reuse in protein structure annotation and visualization. This system, called mSTRAP (Mutation extraction and STRucture Annotation Pipeline), is designed for both information aggregation and subsequent brokerage of the mutation annotations. It facilitates the coordination of semantically related information from a series of text mining and sequence analysis steps into a formal OWL-DL ontology. The ontology is designed to support application-specific data management of sequence, structure, and literature annotations that are populated as instances of object and data type properties. mSTRAPviz is a subsystem that facilitates the brokerage of structure information and the associated mutations for visualization. For mutated sequences without any corresponding structure available in the Protein Data Bank (PDB), an automated pipeline for homology modeling is developed to generate the theoretical model. With mSTRAP, we demonstrate a workable system that can facilitate automation of the workflow for the retrieval, extraction, processing, and visualization of mutation annotations -- tasks which are well known to be tedious, time-consuming, complex, and error-prone. The ontology and visualization tool are available at (http://datam.i2r.a-star.edu.sg/mstrap).
Collapse
|