1
|
Yamada I, Campbell MP, Edwards N, Castro LJ, Lisacek F, Mariethoz J, Ono T, Ranzinger R, Shinmachi D, Aoki-Kinoshita KF. The glycoconjugate ontology (GlycoCoO) for standardizing the annotation of glycoconjugate data and its application. Glycobiology 2021; 31:741-750. [PMID: 33677548 PMCID: PMC8351504 DOI: 10.1093/glycob/cwab013] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 12/31/2020] [Accepted: 01/01/2021] [Indexed: 01/19/2023] Open
Abstract
Recent years have seen great advances in the development of glycoproteomics protocols and methods resulting in a sustainable increase in the reporting proteins, their attached glycans and glycosylation sites. However, only very few of these reports find their way into databases or data repositories. One of the major reasons is the absence of digital standard to represent glycoproteins and the challenging annotations with glycans. Depending on the experimental method, such a standard must be able to represent glycans as complete structures or as compositions, store not just single glycans but also represent glycoforms on a specific glycosylation side, deal with partially missing site information if no site mapping was performed, and store abundances or ratios of glycans within a glycoform of a specific site. To support the above, we have developed the GlycoConjugate Ontology (GlycoCoO) as a standard semantic framework to describe and represent glycoproteomics data. GlycoCoO can be used to represent glycoproteomics data in triplestores and can serve as a basis for data exchange formats. The ontology, database providers and supporting documentation are available online (https://github.com/glycoinfo/GlycoCoO).
Collapse
Affiliation(s)
- Issaku Yamada
- Research Department, The Noguchi Institute, 1-9-7 Kaga, Itabashi, Tokyo 173-0003, Japan
| | - Matthew P Campbell
- Institute for Glycomics, Griffith University at Gold Coast, Southport, QLD 4215, Australia
| | - Nathan Edwards
- Department of Biochemistry, Molecular and Cellular Biology, Georgetown University Medical Center, Washington, D.C. 20007, USA
| | - Leyla Jael Castro
- ZB MED Information Centre for Life Sciences, Gleueler Str. 60, 50931 Cologne, Germany
| | - Frederique Lisacek
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Computer Science Department, University of Geneva, route de Drize 7, CH - 1227 Geneva Switzerland, and also Section of Biology, University of Geneva, Geneva, Switzerland
| | - Julien Mariethoz
- Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, 7 Route de Drize, 1227 Geneva, Switzerland
| | - Tamiko Ono
- Faculty of Science and Engineering, Soka University, 1-236 Tangi-machi, Hachioji, Tokyo 192-8577, Japan
| | - Rene Ranzinger
- Complex Carbohydrate Research Center, The University of Georgia, 315 Riverbend Rd, Athens, Georgia 30602, USA
| | - Daisuke Shinmachi
- R&D Department, SparqLite LLC., 1615-22 Ishikawamachi, Hachioji, Tokyo 192-0032, Japan
| | - Kiyoko F Aoki-Kinoshita
- Glycan & Life Science Integration Center (GaLSIC), Faculty of Science and Engineering, Soka University, 1-236 Tangi-machi, Hachioji, Tokyo 192-8577, Japan
| |
Collapse
|
2
|
Sima AC, Mendes de Farias T, Zbinden E, Anisimova M, Gil M, Stockinger H, Stockinger K, Robinson-Rechavi M, Dessimoz C. Enabling semantic queries across federated bioinformatics databases. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5614223. [PMID: 31697362 PMCID: PMC6836710 DOI: 10.1093/database/baz106] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 08/01/2019] [Accepted: 08/02/2019] [Indexed: 11/23/2022]
Abstract
Motivation: Data integration promises to be one of the main catalysts in enabling new insights to be drawn from the wealth of biological data available publicly. However, the heterogeneity of the different data sources, both at the syntactic and the semantic level, still poses significant challenges for achieving interoperability among biological databases. Results: We introduce an ontology-based federated approach for data integration. We applied this approach to three heterogeneous data stores that span different areas of biological knowledge: (i) Bgee, a gene expression relational database; (ii) Orthologous Matrix (OMA), a Hierarchical Data Format 5 orthology DS; and (iii) UniProtKB, a Resource Description Framework (RDF) store containing protein sequence and functional information. To enable federated queries across these sources, we first defined a new semantic model for gene expression called GenEx. We then show how the relational data in Bgee can be expressed as a virtual RDF graph, instantiating GenEx, through dedicated relational-to-RDF mappings. By applying these mappings, Bgee data are now accessible through a public SPARQL endpoint. Similarly, the materialized RDF data of OMA, expressed in terms of the Orthology ontology, is made available in a public SPARQL endpoint. We identified and formally described intersection points (i.e. virtual links) among the three data sources. These allow performing joint queries across the data stores. Finally, we lay the groundwork to enable nontechnical users to benefit from the integrated data, by providing a natural language template-based search interface.
Collapse
Affiliation(s)
- Ana Claudia Sima
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland.,Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Tarcisio Mendes de Farias
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
| | - Erich Zbinden
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Maria Anisimova
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Manuel Gil
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Heinz Stockinger
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland
| | - Kurt Stockinger
- ZHAW Zurich University of Applied Sciences, Obere Kirchgasse 2, 8400 Winterthur Switzerland
| | - Marc Robinson-Rechavi
- SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Department of Ecology and Evolution, University of Lausanne, 1015 Lausanne, Switzerland
| | - Christophe Dessimoz
- Department of Computational Biology, University of Lausanne, 1015 Lausanne, Switzerland.,Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.,SIB Swiss Institute of Bioinformatics, 1015 Lausanne, Switzerland.,Department of Genetics, Evolution, and Environment, University College London, Gower St, London WC1E 6BT, UK.,Department of Computer Science, University College London, Gower St, London WC1E 6BT, UK
| |
Collapse
|
3
|
Liu J, Yang M, Zhang L, Zhou W. An effective biomedical data migration tool from resource description framework to JSON. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2020; 2019:5538640. [PMID: 31343683 PMCID: PMC6657663 DOI: 10.1093/database/baz088] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/21/2019] [Revised: 06/06/2019] [Accepted: 06/09/2019] [Indexed: 12/28/2022]
Abstract
Resource Description Framework (RDF) is widely used for representing biomedical data in practical applications. With the increases of RDF-based applications, there is an emerging requirement of novel architectures to provide effective supports for the future RDF data explosion. Inspired by the success of the new designs in National Center for Biotechnology Information dbSNP (The Single Nucleotide Polymorphism Database) for managing the increasing data volumes using JSON (JavaScript Object Notation), in this paper we present an effective mapping tool that allows data migrations from RDF to JSON for supporting future massive data explosions and releases. We firstly introduce a set of mapping rules, which transform an RDF format into the JSON format, and then present the corresponding transformation algorithm. On this basis, we develop an effective and user-friendly tool called RDF2JSON, which enables automating the process of RDF data extractions and the corresponding JSON data generations.
Collapse
Affiliation(s)
- Jian Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Mo Yang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Lei Zhang
- Zhejiang University of Science and Technology, Hangzhou, China
| | - Weijun Zhou
- Department of Hematology, Zhujiang Hospital, Southern Medical University, Guangzhou, China
| |
Collapse
|