1
|
Corrigendum to: The glycoconjugate ontology (GlycoCoO) for standardizing the annotation of glycoconjugate data and its application. Glycobiology 2021; 32:909. [PMID: 34379754 PMCID: PMC9487897 DOI: 10.1093/glycob/cwab065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 12/31/2020] [Accepted: 01/01/2021] [Indexed: 12/01/2022] Open
|
2
|
Enhancing the interoperability of glycan data flow between ChEBI, PubChem, and GlyGen. Glycobiology 2021; 31:1510-1519. [PMID: 34314492 DOI: 10.1093/glycob/cwab078] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2021] [Revised: 07/02/2021] [Accepted: 07/18/2021] [Indexed: 11/13/2022] Open
Abstract
Glycans play a vital role in health, disease, bioenergy, biomaterials, and bio-therapeutics. As a result, there is keen interest to identify and increase glycan data in bioinformatics databases like ChEBI and PubChem, and connecting them to resources at the EMBL-EBI and NCBI to facilitate access to important annotations at a global level. GlyTouCan is a comprehensive archival database that contains glycans obtained primarily through batch upload from glycan repositories, glycoprotein databases, and individual laboratories. In many instances, the glycan structures deposited in GlyTouCan may not be fully defined or have supporting experimental evidence and citations. Databases like ChEBI and PubChem were designed to accommodate complete atomistic structures with well-defined chemical linkages. As a result, they cannot easily accommodate the structural ambiguity inherent in glycan databases. Consequently, there is a need to improve the organization of glycan data coherently to enhance connectivity across the major NCBI, EMBL-EBI, and glycoscience databases. This paper outlines a workflow developed in collaboration between GlyGen, ChEBI, and PubChem to improve the visibility and connectivity of glycan data across these resources. GlyGen hosts a subset of glycans (~29,000) from the GlyTouCan database and has submitted valuable glycan annotations to the PubChem database and integrated over 10,500 (including ambiguously defined) glycans into the ChEBI database. The integrated glycans were prioritized based on links to PubChem and connectivity to glycoprotein data. The pipeline provides a blueprint for how glycan data can be harmonized between different resources. The current PubChem, ChEBI, and GlyTouCan mappings can be downloaded from GlyGen (https://data.glygen.org).
Collapse
|
3
|
The glycoconjugate ontology (GlycoCoO) for standardizing the annotation of glycoconjugate data and its application. Glycobiology 2021; 31:741-750. [PMID: 33677548 PMCID: PMC8351504 DOI: 10.1093/glycob/cwab013] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2020] [Revised: 12/31/2020] [Accepted: 01/01/2021] [Indexed: 01/19/2023] Open
Abstract
Recent years have seen great advances in the development of glycoproteomics protocols and methods resulting in a sustainable increase in the reporting proteins, their attached glycans and glycosylation sites. However, only very few of these reports find their way into databases or data repositories. One of the major reasons is the absence of digital standard to represent glycoproteins and the challenging annotations with glycans. Depending on the experimental method, such a standard must be able to represent glycans as complete structures or as compositions, store not just single glycans but also represent glycoforms on a specific glycosylation side, deal with partially missing site information if no site mapping was performed, and store abundances or ratios of glycans within a glycoform of a specific site. To support the above, we have developed the GlycoConjugate Ontology (GlycoCoO) as a standard semantic framework to describe and represent glycoproteomics data. GlycoCoO can be used to represent glycoproteomics data in triplestores and can serve as a basis for data exchange formats. The ontology, database providers and supporting documentation are available online (https://github.com/glycoinfo/GlycoCoO).
Collapse
|
4
|
GlyGen data model and processing workflow. Bioinformatics 2020; 36:3941-3943. [PMID: 32324859 PMCID: PMC7320628 DOI: 10.1093/bioinformatics/btaa238] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 03/31/2020] [Accepted: 04/16/2020] [Indexed: 11/18/2022] Open
Abstract
Summary Glycoinformatics plays a major role in glycobiology research, and the development of a comprehensive glycoinformatics knowledgebase is critical. This application note describes the GlyGen data model, processing workflow and the data access interfaces featuring programmatic use case example queries based on specific biological questions. The GlyGen project is a data integration, harmonization and dissemination project for carbohydrate and glycoconjugate-related data retrieved from multiple international data sources including UniProtKB, GlyTouCan, UniCarbKB and other key resources. Availability and implementation GlyGen web portal is freely available to access at https://glygen.org. The data portal, web services, SPARQL endpoint and GitHub repository are also freely available at https://data.glygen.org, https://api.glygen.org, https://sparql.glygen.org and https://github.com/glygener, respectively. All code is released under license GNU General Public License version 3 (GNU GPLv3) and is available on GitHub https://github.com/glygener. The datasets are made available under Creative Commons Attribution 4.0 International (CC BY 4.0) license. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
5
|
A consensus-based and readable extension of Linear Code for Reaction Rules (LiCoRR). Beilstein J Org Chem 2020; 16:2645-2662. [PMID: 33178355 PMCID: PMC7607430 DOI: 10.3762/bjoc.16.215] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 09/17/2020] [Indexed: 12/18/2022] Open
Abstract
Systems glycobiology aims to provide models and analysis tools that account for the biosynthesis, regulation, and interactions with glycoconjugates. To facilitate these methods, there is a need for a clear glycan representation accessible to both computers and humans. Linear Code, a linearized and readily parsable glycan structure representation, is such a language. For this reason, Linear Code was adapted to represent reaction rules, but the syntax has drifted from its original description to accommodate new and originally unforeseen challenges. Here, we delineate the consensuses and inconsistencies that have arisen through this adaptation. We recommend options for a consensus-based extension of Linear Code that can be used for reaction rule specification going forward. Through this extension and specification of Linear Code to reaction rules, we aim to minimize inconsistent symbology thereby making glycan database queries easier. With a clear guide for generating reaction rule descriptions, glycan synthesis models will be more interoperable and reproducible thereby moving glycoinformatics closer to compliance with FAIR standards. Here, we present Linear Code for Reaction Rules (LiCoRR), version 1.0, an unambiguous representation for describing glycosylation reactions in both literature and code.
Collapse
|
6
|
GlyGen: Computational and Informatics Resources for Glycoscience. Glycobiology 2020; 30:72-73. [PMID: 31616925 DOI: 10.1093/glycob/cwz080] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2019] [Revised: 09/19/2019] [Accepted: 09/19/2019] [Indexed: 11/13/2022] Open
|
7
|
SPRINT-Gly: predicting N- and O-linked glycosylation sites of human and mouse proteins by using sequence and predicted structural properties. Bioinformatics 2020; 35:4140-4146. [PMID: 30903686 DOI: 10.1093/bioinformatics/btz215] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2018] [Revised: 03/03/2019] [Accepted: 03/21/2019] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION Protein glycosylation is one of the most abundant post-translational modifications that plays an important role in immune responses, intercellular signaling, inflammation and host-pathogen interactions. However, due to the poor ionization efficiency and microheterogeneity of glycopeptides identifying glycosylation sites is a challenging task, and there is a demand for computational methods. Here, we constructed the largest dataset of human and mouse glycosylation sites to train deep learning neural networks and support vector machine classifiers to predict N-/O-linked glycosylation sites, respectively. RESULTS The method, called SPRINT-Gly, achieved consistent results between ten-fold cross validation and independent test for predicting human and mouse glycosylation sites. For N-glycosylation, a mouse-trained model performs equally well in human glycoproteins and vice versa, however, due to significant differences in O-linked sites separate models were generated. Overall, SPRINT-Gly is 18% and 50% higher in Matthews correlation coefficient than the next best method compared in N-linked and O-linked sites, respectively. This improved performance is due to the inclusion of novel structure and sequence-based features. AVAILABILITY AND IMPLEMENTATION http://sparks-lab.org/server/SPRINT-Gly/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
|
8
|
Recent advances in glycoinformatic platforms for glycomics and glycoproteomics. Curr Opin Struct Biol 2020. [PMID: 31874386 DOI: 10.1016/jsbi.2019.11.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/19/2023]
Abstract
Protein glycosylation is the most complex and prevalent post-translation modification in terms of the number of proteins modified and the diversity generated. To understand the functional roles of glycoproteins it is important to gain an insight into the repertoire of oligosaccharides present. The comparison and relative quantitation of glycoforms combined with site-specific identification and occupancy are necessary steps in this direction. Computational platforms have continued to mature assisting researchers with the interpretation of such glycomics and glycoproteomics data sets, but frequently support dedicated workflows and users rely on the manual interpretation of data to gain insights into the glycoproteome. The growth of site-specific knowledge has also led to the implementation of machine-learning algorithms to predict glycosylation which is now being integrated into glycoproteomics pipelines. This short review describes commercial and open-access databases and software with an emphasis on those that are actively maintained and designed to support current analytical workflows.
Collapse
|
9
|
Expanding the capillary electrophoresis-based glucose unit database of the GUcal app. Glycobiology 2020; 30:362-364. [PMID: 31829415 DOI: 10.1093/glycob/cwz102] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2019] [Revised: 11/22/2019] [Accepted: 12/03/2019] [Indexed: 02/02/2023] Open
Abstract
GUcal is a standalone application for automatically calculating the glucose unit (GU) values for separated N-glycan components of interest in an electropherogram and suggests their tentative structures by utilizing an internal database. We have expanded the original database of GUcal by integrating all publicly available capillary electrophoresis (CE) data in the GlycoStore collection (https://www.glycostore.org) and with in-house measured GU values. The GUcal app is freely available online (https://www.gucal.hu) and readily facilitates CE-based high throughput GU value determination for first line structural elucidation.
Collapse
|
10
|
Recent advances in glycoinformatic platforms for glycomics and glycoproteomics. Curr Opin Struct Biol 2019; 62:56-69. [PMID: 31874386 DOI: 10.1016/j.sbi.2019.11.009] [Citation(s) in RCA: 61] [Impact Index Per Article: 12.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Revised: 11/05/2019] [Accepted: 11/15/2019] [Indexed: 12/16/2022]
Abstract
Protein glycosylation is the most complex and prevalent post-translation modification in terms of the number of proteins modified and the diversity generated. To understand the functional roles of glycoproteins it is important to gain an insight into the repertoire of oligosaccharides present. The comparison and relative quantitation of glycoforms combined with site-specific identification and occupancy are necessary steps in this direction. Computational platforms have continued to mature assisting researchers with the interpretation of such glycomics and glycoproteomics data sets, but frequently support dedicated workflows and users rely on the manual interpretation of data to gain insights into the glycoproteome. The growth of site-specific knowledge has also led to the implementation of machine-learning algorithms to predict glycosylation which is now being integrated into glycoproteomics pipelines. This short review describes commercial and open-access databases and software with an emphasis on those that are actively maintained and designed to support current analytical workflows.
Collapse
|
11
|
The minimum information required for a glycomics experiment (MIRAGE) project: LC guidelines. Glycobiology 2019; 29:349-354. [PMID: 30778580 DOI: 10.1093/glycob/cwz009] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2019] [Revised: 02/11/2019] [Accepted: 02/13/2019] [Indexed: 11/13/2022] Open
Abstract
The Minimum Information Required for a Glycomics Experiment (MIRAGE) is an initiative created by experts in the fields of glycobiology, glycoanalytics and glycoinformatics to design guidelines that improve the reporting and reproducibility of glycoanalytical methods. Previously, the MIRAGE Commission has published guidelines for describing sample preparation methods and the reporting of glycan array and mass spectrometry techniques and data collections. Here, we present the first version of guidelines that aim to improve the quality of the reporting of liquid chromatography (LC) glycan data in the scientific literature. These guidelines cover all aspects of instrument setup and modality of data handling and manipulation and is cross-linked with other MIRAGE recommendations. The most recent version of the MIRAGE-LC guidelines is freely available at the MIRAGE project website doi:10.3762/mirage.4.
Collapse
|
12
|
GlycoStore: a database of retention properties for glycan analysis. Bioinformatics 2019; 34:3231-3232. [PMID: 29897488 DOI: 10.1093/bioinformatics/bty319] [Citation(s) in RCA: 59] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Accepted: 04/19/2018] [Indexed: 11/12/2022] Open
Abstract
Summary GlycoStore is a curated chromatographic, electrophoretic and mass-spectrometry composition database of N-, O-, glycosphingolipid (GSL) glycans and free oligosaccharides associated with a range of glycoproteins, glycolipids and biotherapeutics. The database is built on publicly available experimental datasets from GlycoBase developed in the Oxford Glycobiology Institute and then the National Institute for Bioprocessing Research and Training (NIBRT). It has now been extended to include recently published and in-house data collections from the Bioprocessing Technology Institute (BTI) A*STAR, Macquarie University and Ludger Ltd. GlycoStore provides access to approximately 850 unique glycan structure entries supported by over 8500 retention positions determined by: (i) hydrophilic interaction chromatography (HILIC) ultra-high performance liquid chromatography (U/HPLC) and reversed phase (RP)-U/HPLC with fluorescent detection; (ii) porous graphitized carbon (PGC) chromatography in combination with ESI-MS/MS detection; and (iii) capillary electrophoresis with laser induced fluorescence detection (CE-LIF). GlycoStore enhances many features previously available in GlycoBase while addressing the limitations of the data collections and model of this popular resource. GlycoStore aims to support detailed glycan analysis by providing a resource that underpins current workflows. It will be regularly updated by expert annotation of published data and data obtained from the project partners. Availability and implementation http://www.glycostore.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
|
13
|
GlyTouCan: an accessible glycan structure repository. Glycobiology 2017; 27:915-919. [PMID: 28922742 PMCID: PMC5881658 DOI: 10.1093/glycob/cwx066] [Citation(s) in RCA: 103] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2017] [Revised: 07/06/2017] [Accepted: 07/07/2017] [Indexed: 11/12/2022] Open
Abstract
Rapid and continued growth in the generation of glycomic data has revealed the need for enhanced development of basic infrastructure for presenting and interpreting these datasets in a manner that engages the broader biomedical research community. Early in their growth, the genomic and proteomic fields implemented mechanisms for assigning unique gene and protein identifiers that were essential for organizing data presentation and for enhancing bioinformatic approaches to extracting knowledge. Similar unique identifiers are currently absent from glycomic data. In order to facilitate continued growth and expanded accessibility of glycomic data, the authors strongly encourage the glycomics community to coordinate the submission of their glycan structures to the GlyTouCan Repository and to make use of GlyTouCan identifiers in their communications and publications. The authors also deeply encourage journals to recommend a submission workflow in which submitted publications utilize GlyTouCan identifiers as a standard reference for explicitly describing glycan structures cited in manuscripts.
Collapse
|
14
|
Building a PGC-LC-MS N-glycan retention library and elution mapping resource. Glycoconj J 2017; 35:15-29. [PMID: 28905148 DOI: 10.1007/s10719-017-9793-4] [Citation(s) in RCA: 78] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2017] [Revised: 08/10/2017] [Accepted: 08/18/2017] [Indexed: 11/27/2022]
Abstract
Porous graphitised carbon-liquid chromatography (PGC-LC) has been proven to be a powerful technique for the analysis and characterisation of complex mixtures of isomeric and isobaric glycan structures. Here we evaluate the elution behaviour of N-glycans on PGC-LC and thereby provide the potential of using chromatographic separation properties, together with mass spectrometry (MS) fragmentation, to determine glycan structure assignments more easily. We used previously reported N-glycan structures released from the purified glycoproteins Immunoglobulin G (IgG), Immunoglobulin A (IgA), lactoferrin, α1-acid glycoprotein, Ribonuclease B (RNase B), fetuin and ovalbumin to profile their behaviour on capillary PGC-LC-MS. Over 100 glycan structures were determined by MS/MS, and together with targeted exoglycosidase digestions, created a N-glycan PGC retention library covering a full spectrum of biologically significant N-glycans from pauci mannose to sialylated tetra-antennary classes. The resultant PGC retention library ( http://www.glycostore.org/showPgc ) incorporates retention times and supporting fragmentation spectra including exoglycosidase digestion products, and provides detailed knowledge on the elution properties of N-glycans by PGC-LC. Consequently, this platform should serve as a valuable resource for facilitating the detailed analysis of the glycosylation of both purified recombinant, and complex mixtures of, glycoproteins using established workflows.
Collapse
|
15
|
A Review of Software Applications and Databases for the Interpretation of Glycopeptide Data. TRENDS GLYCOSCI GLYC 2017. [DOI: 10.4052/tigg.1601.1e] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
16
|
Abstract
UniCarbKB ( http://unicarbkb.org ) is a comprehensive resource for mammalian glycoprotein and annotation data. In particular, the database provides information on the oligosaccharides characterized from a glycoprotein at either the global or site-specific level. This evidence is accumulated from a peer-reviewed and manually curated collection of information on oligosaccharides derived from membrane and secreted glycoproteins purified from biological fluids and/or tissues. This information is further supplemented with experimental method descriptions that summarize important sample preparation and analytical strategies. A new release of UniCarbKB is published every three months, each includes a collection of curated data and improvements to database functionality. In this Chapter, we outline the objectives of UniCarbKB, and describe a selection of step-by-step workflows for navigating the information available. We also provide a short description of web services available and future plans for improving data access. The information presented in this Chapter supplements content available in our knowledgebase including regular updates on interface improvements, new features, and revisions to the database content ( http://confluence.unicarbkb.org ).
Collapse
|
17
|
Abstract
The access to biodatabases for glycomics and glycoproteomics has proven to be essential for current glycobiological research. This chapter presents available databases that are devoted to different aspects of glycobioinformatics. This includes oligosaccharide sequence databases, experimental databases, 3D structure databases (of both glycans and glycorelated proteins) and association of glycans with tissue, disease, and proteins. Specific search protocols are also provided using tools associated with experimental databases for converting primary glycoanalytical data to glycan structural information. In particular, researchers using glycoanalysis methods by U/HPLC (GlycoBase), MS (GlycoWorkbench, UniCarb-DB, GlycoDigest), and NMR (CASPER) will benefit from this chapter. In addition we also include information on how to utilize glycan structural information to query databases that associate glycans with proteins (UniCarbKB) and with interactions with pathogens (SugarBind).
Collapse
|
18
|
Hall effect in charged conducting ferroelectric domain walls. Nat Commun 2016; 7:13764. [PMID: 27941794 PMCID: PMC5159852 DOI: 10.1038/ncomms13764] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Accepted: 10/31/2016] [Indexed: 11/17/2022] Open
Abstract
Enhanced conductivity at specific domain walls in ferroelectrics is now an established phenomenon. Surprisingly, however, little is known about the most fundamental aspects of conduction. Carrier types, densities and mobilities have not been determined and transport mechanisms are still a matter of guesswork. Here we demonstrate that intermittent-contact atomic force microscopy (AFM) can detect the Hall effect in conducting domain walls. Studying YbMnO3 single crystals, we have confirmed that p-type conduction occurs in tail-to-tail charged domain walls. By calibration of the AFM signal, an upper estimate of ∼1 × 1016 cm−3 is calculated for the mobile carrier density in the wall, around four orders of magnitude below that required for complete screening of the polar discontinuity. A carrier mobility of∼50 cm2V−1s−1 is calculated, about an order of magnitude below equivalent carrier mobilities in p-type silicon, but sufficiently high to preclude carrier-lattice coupling associated with small polarons.
Conduction in ferroelectric domain walls is now an established phenomenon, yet fundamental aspects of transport physics remain elusive. Here, Campbell et al. report the type, density and mobility of carriers in conducting domain walls in ytterbium manganite using nanoscale Hall effect measurements.
Collapse
|
19
|
The minimum information required for a glycomics experiment (MIRAGE) project: improving the standards for reporting glycan microarray-based data. Glycobiology 2016; 27:280-284. [PMID: 27993942 PMCID: PMC5444268 DOI: 10.1093/glycob/cww118] [Citation(s) in RCA: 50] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2016] [Revised: 11/14/2016] [Accepted: 11/21/2016] [Indexed: 11/12/2022] Open
Abstract
MIRAGE (Minimum Information Required for AGlycomics Experiment) is an initiative that was created by experts in the fields of glycobiology, glycoanalytics and glycoinformatics to produce guidelines for reporting results from the diverse types of experiments and analyses used in structural and functional studies of glycans in the scientific literature. As a sequel to the guidelines for sample preparation (Struwe et al. 2016, Glycobiology, 26:907–910) and mass spectrometry data (Kolarich et al. 2013, Mol. Cell Proteomics, 12:991–995), here we present the first version of guidelines intended to improve the standards for reporting data from glycan microarray analyses. For each of eight areas in the workflow of a glycan microarray experiment, we provide guidelines for the minimal information that should be provided in reporting results. We hope that the MIRAGE glycan microarray guidelines proposed here will gain broad acceptance by the community, and will facilitate interpretation and reproducibility of the glycan microarray results with implications in comparison of data from different laboratories and eventual deposition of glycan microarray data in international databases.
Collapse
|
20
|
The minimum information required for a glycomics experiment (MIRAGE) project: sample preparation guidelines for reliable reporting of glycomics datasets. Glycobiology 2016; 26:907-910. [PMID: 27654115 DOI: 10.1093/glycob/cww082] [Citation(s) in RCA: 53] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2015] [Accepted: 04/22/2016] [Indexed: 11/13/2022] Open
Abstract
The minimum information required for a glycomics experiment (MIRAGE) project was established in 2011 to provide guidelines to aid in data reporting from all types of experiments in glycomics research including mass spectrometry (MS), liquid chromatography, glycan arrays, data handling and sample preparation. MIRAGE is a concerted effort of the wider glycomics community that considers the adaptation of reporting guidelines as an important step towards critical evaluation and dissemination of datasets as well as broadening of experimental techniques worldwide. The MIRAGE Commission published reporting guidelines for MS data and here we outline guidelines for sample preparation. The sample preparation guidelines include all aspects of sample generation, purification and modification from biological and/or synthetic carbohydrate material. The application of MIRAGE sample preparation guidelines will lead to improved recording of experimental protocols and reporting of understandable and reproducible glycomics datasets.
Collapse
|
21
|
Comprehensive analysis of the N-glycan biosynthetic pathway using bioinformatics to generate UniCorn: A theoretical N-glycan structure database. Carbohydr Res 2016; 431:56-63. [PMID: 27318307 DOI: 10.1016/j.carres.2016.05.012] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Revised: 05/23/2016] [Accepted: 05/29/2016] [Indexed: 02/06/2023]
Abstract
Glycan structures attached to proteins are comprised of diverse monosaccharide sequences and linkages that are produced from precursor nucleotide-sugars by a series of glycosyltransferases. Databases of these structures are an essential resource for the interpretation of analytical data and the development of bioinformatics tools. However, with no template to predict what structures are possible the human glycan structure databases are incomplete and rely heavily on the curation of published, experimentally determined, glycan structure data. In this work, a library of 45 human glycosyltransferases was used to generate a theoretical database of N-glycan structures comprised of 15 or less monosaccharide residues. Enzyme specificities were sourced from major online databases including Kyoto Encyclopedia of Genes and Genomes (KEGG) Glycan, Consortium for Functional Glycomics (CFG), Carbohydrate-Active enZymes (CAZy), GlycoGene DataBase (GGDB) and BRENDA. Based on the known activities, more than 1.1 million theoretical structures and 4.7 million synthetic reactions were generated and stored in our database called UniCorn. Furthermore, we analyzed the differences between the predicted glycan structures in UniCorn and those contained in UniCarbKB (www.unicarbkb.org), a database which stores experimentally described glycan structures reported in the literature, and demonstrate that UniCorn can be used to aid in the assignment of ambiguous structures whilst also serving as a discovery database.
Collapse
|
22
|
UniCarbKB: New database features for integrating glycan structure abundance, compositional glycoproteomics data, and disease associations. Biochim Biophys Acta Gen Subj 2016; 1860:1669-75. [PMID: 26940363 DOI: 10.1016/j.bbagen.2016.02.016] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Revised: 02/23/2016] [Accepted: 02/24/2016] [Indexed: 10/22/2022]
Abstract
BACKGROUND UniCarbKB aims to provide a resource for the representation of mammalian glycobiology knowledge by providing a curated database of structural and experimental data, supported by a web application that allows users to easily find and view richly annotated information. The database comprises two levels of annotation (i) global-specific data of oligosaccharides released and characterised from single purified glycoproteins and (ii) information pertaining to site-specific glycan heterogeneity. Additional, contextual information is provided including structural, bibliographic, and taxonomic information for each entry. METHODS Since the launch of UniCarbKB in 2012, we have continued to improve the organisation of our data model. Recently, we have extended our pipeline to collate structural and abundance changes of oligosaccharides in different human disease states and experimental models to extend our coverage of the human glycome. RESULTS In this manuscript, we demonstrate the capability of UniCarbKB to store and query relative glycan abundance data using a set of published colorectal and prostate cancer cell lines as examples. Furthermore, we outline our strategy for managing large-scale glycoproteomics data, site-specific and glycan compositional data, and how this information is adding value to UniCarbKB. Finally, we summarise our efforts to improve the efficient representation of disease terms and associated changes in glycan heterogeneity by integrating the Disease Ontology. CONCLUSIONS Updates and improvements to UniCarbKB have introduced unique features for storing and displaying glycosylation features of mammalian glycoproteins. The integration of site-specific glycosylation data obtained from large-scale glycoproteomics and introduction of cell line studies will improve the analysis of glycoproteins and entire glycomes. GENERAL SIGNIFICANCE Continuing advancements in analytical technologies and new data types are advancing disease-related glycomics. It is increasingly necessary to ensure all the data are comprehensively annotated. UniCarbKB was established with the mission of providing a resource for human glycobiology by capturing a wide range of data with corresponding annotations. This article is part of a Special Issue entitled "Glycans in personalised medicine" Guest Editor: Professor Gordan Lauc.
Collapse
|
23
|
SugarBindDB, a resource of glycan-mediated host-pathogen interactions. Nucleic Acids Res 2016; 44:D1243-50. [PMID: 26578555 PMCID: PMC4702881 DOI: 10.1093/nar/gkv1247] [Citation(s) in RCA: 34] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 10/22/2015] [Accepted: 10/31/2015] [Indexed: 12/16/2022] Open
Abstract
The SugarBind Database (SugarBindDB) covers knowledge of glycan binding of human pathogen lectins and adhesins. It is a curated database; each glycan-protein binding pair is associated with at least one published reference. The core data element of SugarBindDB is a set of three inseparable components: the pathogenic agent, a lectin/adhesin and a glycan ligand. Each entity (agent, lectin or ligand) is described by a range of properties that are summarized in an entity-dedicated page. Several search, navigation and visualisation tools are implemented to investigate the functional role of glycans in pathogen binding. The database is cross-linked to protein and glycan-relaled resources such as UniProtKB and UniCarbKB. It is tightly bound to the latter via a substructure search tool that maps each ligand to full structures where it occurs. Thus, a glycan-lectin binding pair of SugarBindDB can lead to the identification of a glycan-mediated protein-protein interaction, that is, a lectin-glycoprotein interaction, via substructure search and the knowledge of site-specific glycosylation stored in UniCarbKB. SugarBindDB is accessible at: http://sugarbind.expasy.org.
Collapse
|
24
|
Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search. PLoS One 2015; 10:e0144578. [PMID: 26656740 PMCID: PMC4684231 DOI: 10.1371/journal.pone.0144578] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2015] [Accepted: 11/22/2015] [Indexed: 11/18/2022] Open
Abstract
Resource description framework (RDF) and Property Graph databases are emerging technologies that are used for storing graph-structured data. We compare these technologies through a molecular biology use case: glycan substructure search. Glycans are branched tree-like molecules composed of building blocks linked together by chemical bonds. The molecular structure of a glycan can be encoded into a direct acyclic graph where each node represents a building block and each edge serves as a chemical linkage between two building blocks. In this context, Graph databases are possible software solutions for storing glycan structures and Graph query languages, such as SPARQL and Cypher, can be used to perform a substructure search. Glycan substructure searching is an important feature for querying structure and experimental glycan databases and retrieving biologically meaningful data. This applies for example to identifying a region of the glycan recognised by a glycan binding protein (GBP). In this study, 19,404 glycan structures were selected from GlycomeDB (www.glycome-db.org) and modelled for being stored into a RDF triple store and a Property Graph. We then performed two different sets of searches and compared the query response times and the results from both technologies to assess performance and accuracy. The two implementations produced the same results, but interestingly we noted a difference in the query response times. Qualitative measures such as portability were also used to define further criteria for choosing the technology adapted to solving glycan substructure search and other comparable issues.
Collapse
|
25
|
GlycoMob: an ion mobility-mass spectrometry collision cross section database for glycomics. Glycoconj J 2015; 33:399-404. [PMID: 26314736 DOI: 10.1007/s10719-015-9613-7] [Citation(s) in RCA: 61] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Revised: 07/21/2015] [Accepted: 07/27/2015] [Indexed: 12/29/2022]
Abstract
Ion mobility mass spectrometry (IM-MS) is a promising analytical technique for glycomics that separates glycan ions based on their collision cross section (CCS) and provides glycan precursor and fragment masses. It has been shown that isomeric oligosaccharide species can be separated by IM and identified on basis of their CCS and fragmentation. These results indicate that adding CCSs information for glycans and glycan fragments to searchable databases and analysis pipelines will increase identification confidence and accuracy. We have developed a freely accessible database, GlycoMob ( http://www.glycomob.org ), containing over 900 CCSs values of glycans, oligosaccharide standards and their fragments that will be continually updated. We have measured the absolute CCSs of calibration standards, biologically derived and synthetic N-glycans ionized with various adducts in positive and negative mode or as protonated (positive ion) and deprotonated (negative ion) ions.
Collapse
|
26
|
GlycoRDF: an ontology to standardize glycomics data in RDF. Bioinformatics 2015; 31:919-25. [PMID: 25388145 PMCID: PMC4380026 DOI: 10.1093/bioinformatics/btu732] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2014] [Revised: 10/12/2014] [Accepted: 10/28/2014] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Over the last decades several glycomics-based bioinformatics resources and databases have been created and released to the public. Unfortunately, there is no common standard in the representation of the stored information or a common machine-readable interface allowing bioinformatics groups to easily extract and cross-reference the stored information. RESULTS An international group of bioinformatics experts in the field of glycomics have worked together to create a standard Resource Description Framework (RDF) representation for glycomics data, focused on glycan sequences and related biological source, publications and experimental data. This RDF standard is defined by the GlycoRDF ontology and will be used by database providers to generate common machine-readable exports of the data stored in their databases. AVAILABILITY AND IMPLEMENTATION The ontology, supporting documentation and source code used by database providers to generate standardized RDF are available online (http://www.glycoinfo.org/GlycoRDF/).
Collapse
|
27
|
Abstract
The biological relevance of protein glycosylation has made glycomics, the comprehensive study to identify all glycans in an organism, indispensable in many research fields. Determining the structure and functional relationship of glycoproteins requires the comprehensive characterization of glycan structures by a range of analytical methods. High performance liquid chromatography (HPLC) is a well-established technology commonly used for the complete structural elucidation of N- and O-linked glycans; however, the analysis of data is a major bottleneck and robust bioinformatic solutions are required. This chapter describes the availability of databases and tools, GlycoBase and autoGU developed in conjunction with the EUROCarbDB initiative, to assist the interpretation of HPLC-glycan data collections.
Collapse
|
28
|
GlycoDigest: a tool for the targeted use of exoglycosidase digestions in glycan structure determination. Bioinformatics 2014; 30:3131-3. [PMID: 25015990 DOI: 10.1093/bioinformatics/btu425] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED Sequencing oligosaccharides by exoglycosidases, either sequentially or in an array format, is a powerful tool to unambiguously determine the structure of complex N- and O-link glycans. Here, we introduce GlycoDigest, a tool that simulates exoglycosidase digestion, based on controlled rules acquired from expert knowledge and experimental evidence available in GlycoBase. The tool allows the targeted design of glycosidase enzyme mixtures by allowing researchers to model the action of exoglycosidases, thereby validating and improving the efficiency and accuracy of glycan analysis. AVAILABILITY AND IMPLEMENTATION http://www.glycodigest.org.
Collapse
|
29
|
Abstract
The MIRAGE (minimum information required for a glycomics experiment) initiative was founded in Seattle, WA, in November 2011 in order to develop guidelines for reporting the qualitative and quantitative results obtained by diverse types of glycomics analyses, including the conditions and techniques that were applied to prepare the glycans for analysis and generate the primary data along with the tools and parameters that were used to process and annotate this data. These guidelines must address a broad range of issues, as glycomics data are inherently complex and are generated using diverse methods, including mass spectrometry (MS), chromatography, glycan array-binding assays, nuclear magnetic resonance (NMR) and other rapidly developing technologies. The acceptance of these guidelines by scientists conducting research on biological systems in which glycans have a significant role will facilitate the evaluation and reproduction of glycomics experiments and data that is reported in scientific journals and uploaded to glycomics databases. As a first step, MIRAGE guidelines for glycan analysis by MS have been recently published (Kolarich D, Rapp E, Struwe WB, Haslam SM, Zaia J., et al. 2013. The minimum information required for a glycomics experiment (MIRAGE) project – Improving the standards for reporting mass spectrometry-based glycoanalytic data. Mol. Cell Proteomics. 12:991–995), allowing them to be implemented and evaluated in the context of real-world glycobiology research. In this paper, we set out the historical context, organization structure and overarching objectives of the MIRAGE initiative.
Collapse
|
30
|
BioHackathon series in 2011 and 2012: penetration of ontology and linked data in life science domains. J Biomed Semantics 2014; 5:5. [PMID: 24495517 PMCID: PMC3978116 DOI: 10.1186/2041-1480-5-5] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2013] [Accepted: 11/26/2013] [Indexed: 01/24/2023] Open
Abstract
The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.
Collapse
|
31
|
Abstract
Background Recent progress in method development for characterising the branched structures of complex carbohydrates has now enabled higher throughput technology. Automation of structure analysis then calls for software development since adding meaning to large data collections in reasonable time requires corresponding bioinformatics methods and tools. Current glycobioinformatics resources do cover information on the structure and function of glycans, their interaction with proteins or their enzymatic synthesis. However, this information is partial, scattered and often difficult to find to for non-glycobiologists. Methods Following our diagnosis of the causes of the slow development of glycobioinformatics, we review the "objective" difficulties encountered in defining adequate formats for representing complex entities and developing efficient analysis software. Results Various solutions already implemented and strategies defined to bridge glycobiology with different fields and integrate the heterogeneous glyco-related information are presented. Conclusions Despite the initial stage of our integrative efforts, this paper highlights the rapid expansion of glycomics, the validity of existing resources and the bright future of glycobioinformatics.
Collapse
|
32
|
Abstract
BACKGROUND Glycoscience is a research field focusing on complex carbohydrates (otherwise known as glycans)a, which can, for example, serve as "switches" that toggle between different functions of a glycoprotein or glycolipid. Due to the advancement of glycomics technologies that are used to characterize glycan structures, many glycomics databases are now publicly available and provide useful information for glycoscience research. However, these databases have almost no link to other life science databases. RESULTS In order to implement support for the Semantic Web most efficiently for glycomics research, the developers of major glycomics databases agreed on a minimal standard for representing glycan structure and annotation information using RDF (Resource Description Framework). Moreover, all of the participants implemented this standard prototype and generated preliminary RDF versions of their data. To test the utility of the converted data, all of the data sets were uploaded into a Virtuoso triple store, and several SPARQL queries were tested as "proofs-of-concept" to illustrate the utility of the Semantic Web in querying across databases which were originally difficult to implement. CONCLUSIONS We were able to successfully retrieve information by linking UniCarbKB, GlycomeDB and JCGGDB in a single SPARQL query to obtain our target information. We also tested queries linking UniProt with GlycoEpitope as well as lectin data with GlycomeDB through PDB. As a result, we have been able to link proteomics data with glycomics data through the implementation of Semantic Web technologies, allowing for more flexible queries across these domains.
Collapse
|
33
|
Abstract
The UniCarb KnowledgeBase (UniCarbKB; http://unicarbkb.org) offers public access to a growing, curated database of information on the glycan structures of glycoproteins. UniCarbKB is an international effort that aims to further our understanding of structures, pathways and networks involved in glycosylation and glyco-mediated processes by integrating structural, experimental and functional glycoscience information. This initiative builds upon the success of the glycan structure database GlycoSuiteDB, together with the informatic standards introduced by EUROCarbDB, to provide a high-quality and updated resource to support glycomics and glycoproteomics research. UniCarbKB provides comprehensive information concerning glycan structures, and published glycoprotein information including global and site-specific attachment information. For the first release over 890 references, 3740 glycan structure entries and 400 glycoproteins have been curated. Further, 598 protein glycosylation sites have been annotated with experimentally confirmed glycan structures from the literature. Among these are 35 glycoproteins, 502 structures and 60 publications previously not included in GlycoSuiteDB. This article provides an update on the transformation of GlycoSuiteDB (featured in previous NAR Database issues and hosted by ExPASy since 2009) to UniCarbKB and its integration with UniProtKB and GlycoMod. Here, we introduce a refactored database, supported by substantial new curated data collections and intuitive user-interfaces that improve database searching.
Collapse
|
34
|
Structural feature ions for distinguishing N- and O-linked glycan isomers by LC-ESI-IT MS/MS. JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY 2013; 24:895-906. [PMID: 23605685 DOI: 10.1007/s13361-013-0610-4] [Citation(s) in RCA: 91] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/18/2012] [Revised: 02/22/2013] [Accepted: 02/28/2013] [Indexed: 05/13/2023]
Abstract
Glycomics is the comprehensive study of glycan expression in an organism, cell, or tissue that relies on effective analytical technologies to understand glycan structure-function relationships. Owing to the macro- and micro-heterogeneity of oligosaccharides, detailed structure characterization has required an orthogonal approach, such as a combination of specific exoglycosidase digestions, LC-MS/MS, and the development of bioinformatic resources to comprehensively profile a complex biological sample. Liquid chromatography-electrospray ionization-mass spectrometry (LC-ESI-MS/MS) has emerged as a key tool in the structural analysis of oligosaccharides because of its high sensitivity, resolution, and robustness. Here, we present a strategy that uses LC-ESI-MS/MS to characterize over 200 N- and O-glycans from human saliva glycoproteins, complemented by sequential exoglycosidase treatment, to further verify the annotated glycan structures. Fragment-specific substructure diagnostic ions were collated from an extensive screen of the literature available on the detailed structural characterization of oligosaccharides and, together with other specific glycan structure feature ions derived from cross-ring and glycosidic-linkage fragmentation, were used to characterize the glycans and differentiate isomers. The availability of such annotated mass spectrometric fragmentation spectral libraries of glycan structures, together with such substructure diagnostic ions, will be key inputs for the future development of the automated elucidation of oligosaccharide structures from MS/MS data.
Collapse
|
35
|
Glycosylation status of serum in inflammatory arthritis in response to anti-TNF treatment. Rheumatology (Oxford) 2013; 52:1572-82. [PMID: 23681398 DOI: 10.1093/rheumatology/ket189] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVE Glycosylation is the most common post-translational modification and is altered in disease. The typical glycosylation change in patients with inflammatory arthritis (IA) is a decrease in galactosylation levels on IgG. The aim of this study is to evaluate the effect of anti-TNF therapy on whole serum glycosylation from IA patients and determine whether these alterations in the glycome change upon treatment of the disease. METHODS Serum samples were collected from 54 IA patients before treatment and at 1 and 12 months after commencing anti-TNF therapy. N-linked glycans from whole serum samples were analysed using a high-throughput hydrophilic interaction liquid chromatography-based method. RESULTS Glycosylation on the serum proteins of IA patients changed significantly with anti-TNF treatment. We observed an increase in galactosylated glycans from IgG, also an increase in core-fucosylated biantennary galactosylated glycans and a decrease in sialylated triantennary glycans with and without outer arm fucose. This increase in galactosylated IgG glycans suggests a reversing of the N-glycome towards normal healthy profiles. These changes are strongly correlated with decreasing CRP, suggesting a link between glycosylation changes and decreases in inflammatory processes. CONCLUSION Glycosylation changes in the serum of IA patients on anti-TNF therapy are strongly associated with a decrease in inflammatory processes and reflect the effect of anti-TNF on the immune system.
Collapse
|
36
|
Tandem mass spectra of glycan substructures enable the multistage mass spectrometric identification of determinants on oligosaccharides. RAPID COMMUNICATIONS IN MASS SPECTROMETRY : RCM 2013; 27:931-939. [PMID: 23592194 DOI: 10.1002/rcm.6527] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Accepted: 01/29/2013] [Indexed: 06/02/2023]
Abstract
RATIONALE Glycosylation of proteins and lipids affects many biological processes, such as host-pathogen interactions, cell communication, and initiation of the immune responses. Terminal glycan substructures, or determinants, often govern the function or recognition of the carrier glycoconjugate and modulate these processes. In this study we describe a strategy using multistage mass spectrometry to identify and confirm these glycan substructures. METHODS An online tandem mass spectrometry (MS(2)) spectral fragment library of glycan substructures that typically occur at the non-reducing terminus of glycoconjugates was created to enable the easier identification and confirmation of glycan determinants on oligosaccharides released from glycoproteins. Oligosaccharides were separated by porous graphitized carbon capillary chromatography and analysed by ion trap MS. Candidate product ions that constitute the glycan substructure mass were identified in the MS(2) product ion spectrum, and used as the precursor ion for subsequent MS(3) fragmentation. The resulting MS(3) spectrum was matched against the MS(2) spectral fragment library to identify the glycan substructure(s) that comprise the parent oligosaccharide. RESULTS Thirty biologically important terminal glycan determinants commonly observed on glycoconjugates were fragmented by positive and negative ion mass spectrometry and the MS(2) product ion masses manually annotated and stored in the UniCarb-DB online database. Negative ion tandem mass spectra were especially useful in assigning isobaric glycan structures. We have applied this strategy for the identification of the sulphation, blood group antigens and sialic acid linkages on complex N-and O-glycans released from glycoproteins. CONCLUSIONS We show the potential of these glycan substructure MS(2) spectra in the negative ionization mode to facilitate the assignment of determinants on N- and O-linked glycans released from glycoproteins. Comparing the structural feature ions of known glycan reference substructures assists in the annotation of complex glycan product ion spectra, and can remove the need for other orthogonal confirmation analyses such as sequential glycosidase digestion.
Collapse
|
37
|
Greedy feature selection for glycan chromatography data with the generalized Dirichlet distribution. BMC Bioinformatics 2013; 14:155. [PMID: 23651459 PMCID: PMC3703279 DOI: 10.1186/1471-2105-14-155] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2012] [Accepted: 03/20/2013] [Indexed: 11/25/2022] Open
Abstract
Background Glycoproteins are involved in a diverse range of biochemical and biological processes. Changes in protein glycosylation are believed to occur in many diseases, particularly during cancer initiation and progression. The identification of biomarkers for human disease states is becoming increasingly important, as early detection is key to improving survival and recovery rates. To this end, the serum glycome has been proposed as a potential source of biomarkers for different types of cancers. High-throughput hydrophilic interaction liquid chromatography (HILIC) technology for glycan analysis allows for the detailed quantification of the glycan content in human serum. However, the experimental data from this analysis is compositional by nature. Compositional data are subject to a constant-sum constraint, which restricts the sample space to a simplex. Statistical analysis of glycan chromatography datasets should account for their unusual mathematical properties. As the volume of glycan HILIC data being produced increases, there is a considerable need for a framework to support appropriate statistical analysis. Proposed here is a methodology for feature selection in compositional data. The principal objective is to provide a template for the analysis of glycan chromatography data that may be used to identify potential glycan biomarkers. Results A greedy search algorithm, based on the generalized Dirichlet distribution, is carried out over the feature space to search for the set of “grouping variables” that best discriminate between known group structures in the data, modelling the compositional variables using beta distributions. The algorithm is applied to two glycan chromatography datasets. Statistical classification methods are used to test the ability of the selected features to differentiate between known groups in the data. Two well-known methods are used for comparison: correlation-based feature selection (CFS) and recursive partitioning (rpart). CFS is a feature selection method, while recursive partitioning is a learning tree algorithm that has been used for feature selection in the past. Conclusions The proposed feature selection method performs well for both glycan chromatography datasets. It is computationally slower, but results in a lower misclassification rate and a higher sensitivity rate than both correlation-based feature selection and the classification tree method.
Collapse
|
38
|
Validation of the curation pipeline of UniCarb-DB: building a global glycan reference MS/MS repository. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2013; 1844:108-16. [PMID: 23624262 DOI: 10.1016/j.bbapap.2013.04.018] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/31/2012] [Revised: 04/01/2013] [Accepted: 04/16/2013] [Indexed: 10/26/2022]
Abstract
The UniCarb-DB database is an emerging public glycomics data repository, containing over 500 tandem mass spectra (as of March 2013) of glycans released from glycoproteins. A major challenge in glycomics research is to provide and maintain high-quality datasets that will offer the necessary diversity to support the development of accurate bioinformatics tools for data deposition and analysis. The role of UniCarb-DB, as an archival database, is to provide the glycomics community with open-access to a comprehensive LC MS/MS library of N- and O- linked glycans released from glycoproteins that have been annotated with glycosidic and cross-ring fragmentation ions, retention times, and associated experimental metadata descriptions. Here, we introduce the UniCarb-DB data submission pipeline and its practical application to construct a library of LC-MS/MS glycan standards that forms part of this database. In this context, an independent consortium of three laboratories was established to analyze the same 23 commercially available oligosaccharide standards, all by using graphitized carbon-liquid chromatography (LC) electrospray ionization (ESI) ion trap mass spectrometry in the negative ion mode. A dot product score was calculated for each spectrum in the three sets of data as a measure of the comparability that is necessary for use of such a collection in library-based spectral matching and glycan structural identification. The effects of charge state, de-isotoping and threshold levels on the quality of the input data are shown. The provision of well-characterized oligosaccharide fragmentation data provides the opportunity to identify determinants of specific glycan structures, and will contribute to the confidence level of algorithms that assign glycan structures to experimental MS/MS spectra. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan.
Collapse
|
39
|
The minimum information required for a glycomics experiment (MIRAGE) project: improving the standards for reporting mass-spectrometry-based glycoanalytic data. Mol Cell Proteomics 2013; 12:991-5. [PMID: 23378518 DOI: 10.1074/mcp.o112.026492] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The MIRAGE guidelines are being developed in response to a critical need in the glycobiology community to clarify glycoanalytic results so that they are more readily evaluated (in terms of their scope and depth) and to facilitate the reproduction of important results in the laboratory. The molecular and biological complexity of the glycosylation process makes thorough reporting of the results of a glycomics experiment a highly challenging endeavor. The resulting data specify the identity and quantity of complex structures, the precise molecular features of which are sometimes inferred using prior knowledge, such as familiarity with a particular biosynthetic mechanism. Specifying the exact methods and assumptions that were used to assign and quantify reported structures allows the interested scientist to appreciate the scope and depth of the analysis. Mass spectrometry (MS) is the most widely used tool for glycomics experiments. The interpretation and reproducibility of MS-based glycomics data depend on comprehensive meta-data describing the instrumentation, instrument setup, and data acquisition protocols. The MIRAGE guidelines for MS-based glycomics have been designed to facilitate the collection and sharing of this critical information in order to assist the glycoanalyst in generating data sets with maximum information content and biological relevance.
Collapse
|
40
|
Symbol nomenclature for representing glycan structures: Extension to cover different carbohydrate types. Proteomics 2011; 11:4291-5. [PMID: 21954138 DOI: 10.1002/pmic.201100300] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2011] [Revised: 08/08/2011] [Accepted: 08/10/2011] [Indexed: 11/10/2022]
Abstract
This Viewpoint article addresses comments made on our original article describing a symbolic system for the depiction of N- and O-linked carbohydrate structures and proposes a method for extending the symbol set to include monosaccharides commonly found in carbohydrates present in bacteria and plants. As before, basic monosaccharides are shown by shape with one or more additions such as solid fill or additions of lines, crosses or dots to represent functional groups. The use of colour to differentiate constituent monosaccharides is avoided, thus enabling the system to be used in a variety of formats. Linkage and anomericity are shown by the angle and type of line connecting the symbols. In this extended version, new symbols are proposed for additional hexoses and it is proposed that pyranose and furanose forms of the monosaccharides could be shown by solid or broken outlines to the symbols. Conventions for depicting the presence of multiple functional groups such as deoxy-(NH(2))(2) are also discussed. It is hoped that these proposals will stimulate discussion so that a consensus can be reached as to how the glycobiology community can best convey complex information in a simple manner.
Collapse
|
41
|
UniCarbKB: Putting the pieces together for glycomics research. Proteomics 2011; 11:4117-21. [DOI: 10.1002/pmic.201100302] [Citation(s) in RCA: 48] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2011] [Revised: 07/27/2011] [Accepted: 07/29/2011] [Indexed: 12/21/2022]
|
42
|
N-glycans modulate the function of human corticosteroid-binding globulin. Mol Cell Proteomics 2011; 10:M111.009100. [PMID: 21558494 DOI: 10.1074/mcp.m111.009100] [Citation(s) in RCA: 57] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Human corticosteroid-binding globulin (CBG), a heavily glycosylated protein containing six N-linked glycosylation sites, transports cortisol and other corticosteroids in blood circulation. Here, we investigate the biological importance of the N-glycans of CBG derived from human serum by performing a structural and functional characterization of CBG N-glycosylation. Liquid chromatography-tandem MS-based glycoproteomics and glycomics combined with exoglycosidase treatment revealed 26 complex type N-glycoforms, all of which were terminated with α2,3-linked neuraminic acid (NeuAc) residues. The CBG N-glycans showed predominantly bi- and tri-antennary branching, but higher branching was also observed. N-glycans from all six N-glycosylation sites were identified with high site occupancies (70.5-99.5%) and glycoforms from all sites contained a relatively low degree of core-fucosylation (0-34.9%). CBG showed site-specific glycosylation and the site-to-site differences in core-fucosylation and branching could be in silico correlated with the accessibility to the individual glycosylation sites on the maturely folded protein. Deglycosylated and desialylated CBG analogs were generated to investigate the biological importance of CBG N-glycans. As a functional assay, MCF-7 cells were challenged with native and glycan-modified CBG and the amount of cAMP, which is produced as a quantitative response upon CBG binding to its cell surface receptor, was used to evaluate the CBG:receptor interaction. The removal of both CBG N-glycans and NeuAc residues increased the production of cAMP significantly. This confirms that N-glycans are involved in the CBG:receptor interaction and indicates that the modulation is performed by steric and/or electrostatic means through the terminal NeuAc residues.
Collapse
|
43
|
EUROCarbDB: An open-access platform for glycoinformatics. Glycobiology 2011; 21:493-502. [PMID: 21106561 PMCID: PMC3055595 DOI: 10.1093/glycob/cwq188] [Citation(s) in RCA: 104] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2010] [Revised: 11/03/2010] [Accepted: 11/03/2010] [Indexed: 01/03/2023] Open
Abstract
The EUROCarbDB project is a design study for a technical framework, which provides sophisticated, freely accessible, open-source informatics tools and databases to support glycobiology and glycomic research. EUROCarbDB is a relational database containing glycan structures, their biological context and, when available, primary and interpreted analytical data from high-performance liquid chromatography, mass spectrometry and nuclear magnetic resonance experiments. Database content can be accessed via a web-based user interface. The database is complemented by a suite of glycoinformatics tools, specifically designed to assist the elucidation and submission of glycan structure and experimental data when used in conjunction with contemporary carbohydrate research workflows. All software tools and source code are licensed under the terms of the Lesser General Public License, and publicly contributed structures and data are freely accessible. The public test version of the web interface to the EUROCarbDB can be found at http://www.ebi.ac.uk/eurocarb.
Collapse
|
44
|
|
45
|
Connectivity and binding-site recognition: Applications relevant to drug design. J Comput Chem 2010; 31:2677-88. [DOI: 10.1002/jcc.21561] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
46
|
GlycoExtractor: A Web-Based Interface for High Throughput Processing of HPLC-Glycan Data. J Proteome Res 2010; 9:2037-41. [DOI: 10.1021/pr901213u] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
47
|
Abstract
SUMMARY The development of robust high-performance liquid chromatography (HPLC) technologies continues to improve the detailed analysis and sequencing of glycan structures released from glycoproteins. Here, we present a database (GlycoBase) and analytical tool (autoGU) to assist the interpretation and assignment of HPLC-glycan profiles. GlycoBase is a relational database which contains the HPLC elution positions for over 350 2-AB labelled N-glycan structures together with predicted products of exoglycosidase digestions. AutoGU assigns provisional structures to each integrated HPLC peak and, when used in combination with exoglycosidase digestions, progressively assigns each structure automatically based on the footprint data. These tools are potentially very promising and facilitate basic research as well as the quantitative high-throughput analysis of low concentrations of glycans released from glycoproteins. AVAILABILITY http://glycobase.ucd.ie
Collapse
|
48
|
HPLC-based analysis of serum N-glycans on a 96-well plate platform with dedicated database software. Anal Biochem 2007; 376:1-12. [PMID: 18194658 DOI: 10.1016/j.ab.2007.12.012] [Citation(s) in RCA: 373] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2007] [Revised: 12/04/2007] [Accepted: 12/06/2007] [Indexed: 11/29/2022]
Abstract
We present a robust, fully automatable technology platform that includes computer software for the detailed analysis of low femtomoles of N-linked sugars released from glycoproteins. Features include (i) sample immobilization in 96-well plates, glycan release, and fluorescent labeling; (ii) quantitative HPLC analysis, including monosaccharide sequence, linkage, and arm-specific information for charged and neutral glycans; (iii) automatic structural assignment of peaks from HPLC profiles via web-based software that accesses our database (GlycoBase) of more than 350 N-glycan structures, including 117 present in the human serum glycome; and (iv) software (autoGU) that progressively analyzes data from exoglycosidase digestions to produce a refined list of final structures. The N-glycans from a plate of 96 samples can be released and purified in 2 or 3 days and profiled in 2 days. This strategy can be used for (i) identification and screening of disease biomarkers and (ii) monitoring the production of therapeutic glycoproteins, allowing optimization of production conditions. This technology is also suitable for preparing released glycans for other analytical techniques. Here we demonstrate its application to rheumatoid arthritis using 5 microl of patient serum.
Collapse
|
49
|
Abstract
Evolutionary trace (ET) and entropy are two related methods for analyzing a multiple sequence alignment to determine functionally important residues in proteins. In this article, these methods have been enhanced with a view to reinvestigate the issue ofGPCR dimerization and oligomerization. In particular, cluster analysis has replaced the subjective visual analysis element of the original ET method. Previous applications of the ET method predicted two dimerization interfaces on the external transmembrane lipid-facing region of GPCRs; these were discussed in terms of dimerization and linear oligomers. Removing the subjective element of the ET method gives rise to the prediction of functionally important residues on the external face of each transmembrane helix for a large number of class A GPCRs. These results are consistent with a growing body of experimental information that, taken over many receptor subtypes, has implicated each transmembrane helix in dimeric interactions. In this application, entropy gave superior results to those obtained from the ET method in that its use gives rise to higher z-scores and fewer instances of z-scores below 3.
Collapse
|
50
|
The impact of diaphragm management on prolonged ventilator support after thoracoabdominal aortic repair. J Vasc Surg 1999; 29:150-6. [PMID: 9882799 DOI: 10.1016/s0741-5214(99)70356-3] [Citation(s) in RCA: 46] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
PURPOSE The relationship of the division of the diaphragm during thoracoabdominal aortic repair to prolonged ventilator support has not been studied. The purpose of this study was (1) to determine whether preservation of diaphragm integrity has a significant effect on postoperative ventilator duration and (2) to elucidate other pulmonary risk factors related to thoracoabdominal aortic surgery and to study the relationship of these factors to the intact diaphragm technique. METHODS Between February 1991 and January 1997, we repaired 397 descending and thoracoabdominal aortic aneurysms. Descending thoracic aneurysms were not included in the study because their repair does not include the diaphragm. A total of 256 patients participated in this study. The diaphragm was divided in 150 patients and left intact in 106 patients. Examined as potential risk factors were patient demographics, history and physical findings, aneurysm extent, urgency of the procedure, acute dissection, cross-clamp time, homologous and autologous blood product consumption, and adjunctive operative techniques. FEV1 also was considered in the 197 patients for whom preoperative spirometry was available. Prolonged mechanical ventilation was defined as ventilator support for >72 hours. Data were analyzed by univariate contingency table and multiple logistic regression methods. RESULTS Increasing age (odds ratio [OR], 1.02/y; P <.02), current smoking (OR, 2.6; P <.0008), total cross-clamp time (OR, 1.0/min; P <.008), units packed red blood cells transfused (OR, 1.06/unit; P <.008), and division of the diaphragm (OR, 2.03; P <.02) were significant, independent predictors of prolonged ventilation. Sixty-seven percent of patients (71 of 106) whose diaphragms were preserved were extubated in <72 hours compared with 52% of patients (78 of 150) who underwent diaphragm division (OR, 0.53; P <.02). CONCLUSION Independently of well known pulmonary risk factors, an intact diaphragm during thoracoabdominal aortic repair results in a higher probability of early ventilator weaning.
Collapse
|