1
|
Carpenter EJ, Seth S, Yue N, Greiner R, Derda R. GlyNet: a multi-task neural network for predicting protein-glycan interactions. Chem Sci 2022; 13:6669-6686. [PMID: 35756507 PMCID: PMC9172296 DOI: 10.1039/d1sc05681f] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 05/02/2022] [Indexed: 12/14/2022] Open
Abstract
Advances in diagnostics, therapeutics, vaccines, transfusion, and organ transplantation build on a fundamental understanding of glycan-protein interactions. To aid this, we developed GlyNet, a model that accurately predicts interactions (relative binding strengths) between mammalian glycans and 352 glycan-binding proteins, many at multiple concentrations. For each glycan input, our model produces 1257 outputs, each representing the relative interaction strength between the input glycan and a particular protein sample. GlyNet learns these continuous values using relative fluorescence units (RFUs) measured on 599 glycans in the Consortium for Functional Glycomics glycan arrays and extrapolates these to RFUs from additional, untested glycans. GlyNet's output of continuous values provides more detailed results than the standard binary classification models. After incorporating a simple threshold to transform such continuous outputs the resulting GlyNet classifier outperforms those standard classifiers. GlyNet is the first multi-output regression model for predicting protein-glycan interactions and serves as an important benchmark, facilitating development of quantitative computational glycobiology.
Collapse
Affiliation(s)
- Eric J Carpenter
- Department of Chemistry, University of Alberta Edmonton Alberta Canada
| | - Shaurya Seth
- Department of Chemistry, University of Alberta Edmonton Alberta Canada
| | - Noel Yue
- Department of Chemistry, University of Alberta Edmonton Alberta Canada
| | - Russell Greiner
- Department of Computing Science, University of Alberta Edmonton Alberta Canada
- Alberta Machine Intelligence Institute (AMII) Edmonton Alberta Canada
| | - Ratmir Derda
- Department of Chemistry, University of Alberta Edmonton Alberta Canada
| |
Collapse
|
2
|
Coff L, Chan J, Ramsland PA, Guy AJ. Identifying glycan motifs using a novel subtree mining approach. BMC Bioinformatics 2020; 21:42. [PMID: 32019496 PMCID: PMC7001330 DOI: 10.1186/s12859-020-3374-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 01/20/2020] [Indexed: 11/17/2022] Open
Abstract
Background Glycans are complex sugar chains, crucial to many biological processes. By participating in binding interactions with proteins, glycans often play key roles in host–pathogen interactions. The specificities of glycan-binding proteins, such as lectins and antibodies, are governed by motifs within larger glycan structures, and improved characterisations of these determinants would aid research into human diseases. Identification of motifs has previously been approached as a frequent subtree mining problem, and we extend these approaches with a glycan notation that allows recognition of terminal motifs. Results In this work, we customised a frequent subtree mining approach by altering the glycan notation to include information on terminal connections. This allows specific identification of terminal residues as potential motifs, better capturing the complexity of glycan-binding interactions. We achieved this by including additional nodes in a graph representation of the glycan structure to indicate the presence or absence of a linkage at particular backbone carbon positions. Combining this frequent subtree mining approach with a state-of-the-art feature selection algorithm termed minimum-redundancy, maximum-relevance (mRMR), we have generated a classification pipeline that is trained on data from a glycan microarray. When applied to a set of commonly used lectins, the identified motifs were consistent with known binding determinants. Furthermore, logistic regression classifiers trained using these motifs performed well across most lectins examined, with a median AUC value of 0.89. Conclusions We present here a new subtree mining approach for the classification of glycan binding and identification of potential binding motifs. The Carbohydrate Classification Accounting for Restricted Linkages (CCARL) method will assist in the interpretation of glycan microarray experiments and will aid in the discovery of novel binding motifs for further experimental characterisation.
Collapse
Affiliation(s)
- Lachlan Coff
- School of Science, College of Science, Engineering and Health, RMIT University, 3000, Melbourne, Australia
| | - Jeffrey Chan
- School of Science, College of Science, Engineering and Health, RMIT University, 3000, Melbourne, Australia
| | - Paul A Ramsland
- School of Science, College of Science, Engineering and Health, RMIT University, 3000, Melbourne, Australia.,Department of Immunology, Monash University, 3004, Melbourne, Australia.,Department of Surgery Austin Health, University of Melbourne, 3084, Heidelberg, Australia
| | - Andrew J Guy
- School of Science, College of Science, Engineering and Health, RMIT University, 3000, Melbourne, Australia.
| |
Collapse
|
3
|
Haab BB, Klamer Z. Advances in Tools to Determine the Glycan-Binding Specificities of Lectins and Antibodies. Mol Cell Proteomics 2020; 19:224-232. [PMID: 31848260 PMCID: PMC7000120 DOI: 10.1074/mcp.r119.001836] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 12/13/2019] [Indexed: 01/17/2023] Open
Abstract
Proteins that bind carbohydrate structures can serve as tools to quantify or localize specific glycans in biological specimens. Such proteins, including lectins and glycan-binding antibodies, are particularly valuable if accurate information is available about the glycans that a protein binds. Glycan arrays have been transformational for uncovering rich information about the nuances and complexities of glycan-binding specificity. A challenge, however, has been the analysis of the data. Because protein-glycan interactions are so complex, simplistic modes of analyzing the data and describing glycan-binding specificities have proven inadequate in many cases. This review surveys the methods for handling high-content data on protein-glycan interactions. We contrast the approaches that have been demonstrated and provide an overview of the resources that are available. We also give an outlook on the promising experimental technologies for generating new insights into protein-glycan interactions, as well as a perspective on the limitations that currently face the field.
Collapse
|
4
|
Finite Dimension: A Mathematical Tool to Analise Glycans. Sci Rep 2018. [PMID: 29535393 PMCID: PMC5849774 DOI: 10.1038/s41598-018-22575-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
There is a need to develop widely applicable tools to understand glycan organization, diversity and structure. We present a graph-theoretical study of a large sample of glycans in terms of finite dimension, a new metric which is an adaptation to finite sets of the classical Hausdorff “fractal” dimension. Every glycan in the sample is encoded, via finite dimension, as a point of Glycan Space, a new notion introduced in this paper. Two major outcomes were found: (a) the existence of universal bounds that restrict the universe of possible glycans and show, for instance, that the graphs of glycans are a very special type of chemical graph, and (b) how Glycan Space is related to biological domains associated to the analysed glycans. In addition, we discuss briefly how this encoding may help to improve search in glycan databases.
Collapse
|
5
|
Takigawa I, Mamitsuka H. Generalized Sparse Learning of Linear Models Over the Complete Subgraph Feature Set. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2017; 39:617-624. [PMID: 27187949 DOI: 10.1109/tpami.2016.2567399] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Supervised learning over graphs is an intrinsically difficult problem: simultaneous learning of relevant features from the complete subgraph feature set, in which enumerating all subgraph features occurring in given graphs is practically intractable due to combinatorial explosion. We show that 1) existing graph supervised learning studies, such as Adaboost, LPBoost, and LARS/LASSO, can be viewed as variations of a branch-and-bound algorithm with simple bounds, which we call Morishita-Kudo bounds; 2) We present a direct sparse optimization algorithm for generalized problems with arbitrary twice-differentiable loss functions, to which Morishita-Kudo bounds cannot be directly applied; 3) We experimentally showed that i) our direct optimization method improves the convergence rate and stability, and ii) L1-penalized logistic regression (L1-LogReg) by our method identifies a smaller subgraph set, keeping the competitive performance, iii) the learned subgraphs by L1-LogReg are more size-balanced than competing methods, which are biased to small-sized subgraphs.
Collapse
|
6
|
Bennun SV, Hizal DB, Heffner K, Can O, Zhang H, Betenbaugh MJ. Systems Glycobiology: Integrating Glycogenomics, Glycoproteomics, Glycomics, and Other ‘Omics Data Sets to Characterize Cellular Glycosylation Processes. J Mol Biol 2016; 428:3337-3352. [PMID: 27423401 DOI: 10.1016/j.jmb.2016.07.005] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2016] [Revised: 07/05/2016] [Accepted: 07/07/2016] [Indexed: 12/17/2022]
|
7
|
Barnett CB, Aoki-Kinoshita KF, Naidoo KJ. The Glycome Analytics Platform: an integrative framework for glycobioinformatics. Bioinformatics 2016; 32:3005-11. [PMID: 27288496 DOI: 10.1093/bioinformatics/btw341] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2015] [Accepted: 05/26/2016] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Complex carbohydrates play a central role in cellular communication and in disease development. O- and N-glycans, which are post-translationally attached to proteins and lipids, are sugar chains that are rooted, tree structures. Independent efforts to develop computational tools for analyzing complex carbohydrate structures have been designed to exploit specific databases requiring unique formatting and limited transferability. Attempts have been made at integrating these resources, yet it remains difficult to communicate and share data across several online resources. A disadvantage of the lack of coordination between development efforts is the inability of the user community to create reproducible analyses (workflows). The latter results in the more serious unreliability of glycomics metadata. RESULTS In this paper, we realize the significance of connecting multiple online glycan resources that can be used to design reproducible experiments for obtaining, generating and analyzing cell glycomes. To address this, a suite of tools and utilities, have been integrated into the analytic functionality of the Galaxy bioinformatics platform to provide a Glycome Analytics Platform (GAP).Using this platform, users can design in silico workflows to manipulate various formats of glycan sequences and analyze glycomes through access to web data and services. We illustrate the central functionality and features of the GAP by way of example; we analyze and compare the features of the N-glycan glycome of monocytic cells sourced from two separate data depositions.This paper highlights the use of reproducible research methods for glycomics analysis and the GAP presents an opportunity for integrating tools in glycobioinformatics. AVAILABILITY AND IMPLEMENTATION This software is open-source and available online at https://bitbucket.org/scientificomputing/glycome-analytics-platform CONTACTS chris.barnett@uct.ac.za or kevin.naidoo@uct.ac.za SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher B Barnett
- Scientific Computing Research Unit and Department of Chemistry, University of Cape Town, Rondebosch 7701, South Africa
| | - Kiyoko F Aoki-Kinoshita
- Department of Bioinformatics, Faculty of Engineering, Soka University, Hachioji, Tokyo 192-8577, Japan
| | - Kevin J Naidoo
- Scientific Computing Research Unit and Department of Chemistry, University of Cape Town, Rondebosch 7701, South Africa
| |
Collapse
|
8
|
Zhao Y, Hayashida M, Cao Y, Hwang J, Akutsu T. Grammar-based compression approach to extraction of common rules among multiple trees of glycans and RNAs. BMC Bioinformatics 2015; 16:128. [PMID: 25907438 PMCID: PMC4419412 DOI: 10.1186/s12859-015-0558-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2014] [Accepted: 03/30/2015] [Indexed: 11/17/2022] Open
Abstract
BACKGROUND Many tree structures are found in nature and organisms. Such trees are believed to be constructed on the basis of certain rules. We have previously developed grammar-based compression methods for ordered and unordered single trees, based on bisection-type tree grammars. Here, these methods find construction rules for one single tree. On the other hand, specified construction rules can be utilized to generate multiple similar trees. RESULTS Therefore, in this paper, we develop novel methods to discover common rules for the construction of multiple distinct trees, by improving and extending the previous methods using integer programming. We apply our proposed methods to several sets of glycans and RNA secondary structures, which play important roles in cellular systems, and can be regarded as tree structures. The results suggest that our method can be successfully applied to determining the minimum grammar and several common rules among glycans and RNAs. CONCLUSIONS We propose integer programming-based methods MinSEOTGMul and MinSEUTGMul for the determination of the minimum grammars constructing multiple ordered and unordered trees, respectively. The proposed methods can provide clues for the determination of hierarchical structures contained in tree-structured biological data, beyond the extraction of frequent patterns.
Collapse
Affiliation(s)
- Yang Zhao
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan.
| | - Morihiro Hayashida
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan.
| | - Yue Cao
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan.
| | - Jaewook Hwang
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan.
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Japan.
| |
Collapse
|
9
|
A comparison of N-glycan profiles in human plasma and vitreous fluid. Graefes Arch Clin Exp Ophthalmol 2014; 252:1235-43. [DOI: 10.1007/s00417-014-2671-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2014] [Revised: 04/30/2014] [Accepted: 05/08/2014] [Indexed: 01/31/2023] Open
|
10
|
Ichimiya T, Nishihara S, Takase-Yoden S, Kida H, Aoki-Kinoshita K. Frequent glycan structure mining of influenza virus data revealed a sulfated glycan motif that increased viral infection. Bioinformatics 2013; 30:706-11. [DOI: 10.1093/bioinformatics/btt573] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
11
|
Cholleti SR, Agravat S, Morris T, Saltz JH, Song X, Cummings RD, Smith DF. Automated motif discovery from glycan array data. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2012; 16:497-512. [PMID: 22877213 DOI: 10.1089/omi.2012.0013] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
Abstract
Assessing interactions of a glycan-binding protein (GBP) or lectin with glycans on a microarray generates large datasets, making it difficult to identify a glycan structural motif or determinant associated with the highest apparent binding strength of the GBP. We have developed a computational method, termed GlycanMotifMiner, that uses the relative binding of a GBP with glycans within a glycan microarray to automatically reveal the glycan structural motifs recognized by a GBP. We implemented the software with a web-based graphical interface for users to explore and visualize the discovered motifs. The utility of GlycanMotifMiner was determined using five plant lectins, SNA, HPA, PNA, Con A, and UEA-I. Data from the analyses of the lectins at different protein concentrations were processed to rank the glycans based on their relative binding strengths. The motifs, defined as glycan substructures that exist in a large number of the bound glycans and few non-bound glycans, were then discovered by our algorithm and displayed in a web-based graphical user interface ( http://glycanmotifminer.emory.edu ). The information is used in defining the glycan-binding specificity of GBPs. The results were compared to the known glycan specificities of these lectins generated by manual methods. A more complex analysis was also carried out using glycan microarray data obtained for a recombinant form of human galectin-8. Results for all of these lectins show that GlycanMotifMiner identified the major motifs known in the literature along with some unexpected novel binding motifs.
Collapse
Affiliation(s)
- Sharath R Cholleti
- Center for Comprehensive Informatics, Emory University, Atlanta, Georgia, USA
| | | | | | | | | | | | | |
Collapse
|
12
|
Lütteke T. The use of glycoinformatics in glycochemistry. Beilstein J Org Chem 2012; 8:915-29. [PMID: 23015842 PMCID: PMC3388882 DOI: 10.3762/bjoc.8.104] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2012] [Accepted: 05/29/2012] [Indexed: 01/10/2023] Open
Abstract
Glycoinformatics is a small but growing branch of bioinformatics and chemoinformatics. Various resources are now available that can be of use to glycobiologists, but also to chemists who work on the synthesis or analysis of carbohydrates. This article gives an overview of existing glyco-specific databases and tools, with a focus on their application to glycochemistry: Databases can provide information on candidate glycan structures for synthesis, or on glyco-enzymes that can be used to synthesize carbohydrates. Statistical analyses of glycan databases help to plan glycan synthesis experiments. 3D-Structural data of protein-carbohydrate complexes are used in targeted drug design, and tools to support glycan structure analysis aid with quality control. Specific problems of glycoinformatics compared to bioinformatics for genomics or proteomics, especially concerning integration and long-term maintenance of the existing glycan databases, are also discussed.
Collapse
Affiliation(s)
- Thomas Lütteke
- Justus-Liebig-University Gießen, Institute of Veterinary Physiology and Biochemistry, Frankfurter Str. 100, 35392 Gießen, Germany
| |
Collapse
|
13
|
|
14
|
Akune Y, Hosoda M, Kaiya S, Shinmachi D, Aoki-Kinoshita KF. The RINGS Resource for Glycome Informatics Analysis and Data Mining on the Web. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2010; 14:475-86. [DOI: 10.1089/omi.2009.0129] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Affiliation(s)
- Yukie Akune
- Department of Bioinformatics, Faculty of Engineering, Soka University, 1-236 Tangi-cho, Hachioji, Tokyo 192-8577, Japan
| | - Masae Hosoda
- Department of Bioinformatics, Faculty of Engineering, Soka University, 1-236 Tangi-cho, Hachioji, Tokyo 192-8577, Japan
| | - Sakiko Kaiya
- Department of Bioinformatics, Faculty of Engineering, Soka University, 1-236 Tangi-cho, Hachioji, Tokyo 192-8577, Japan
| | - Daisuke Shinmachi
- Department of Bioinformatics, Faculty of Engineering, Soka University, 1-236 Tangi-cho, Hachioji, Tokyo 192-8577, Japan
| | - Kiyoko F. Aoki-Kinoshita
- Department of Bioinformatics, Faculty of Engineering, Soka University, 1-236 Tangi-cho, Hachioji, Tokyo 192-8577, Japan
| |
Collapse
|
15
|
Frank M, Schloissnig S. Bioinformatics and molecular modeling in glycobiology. Cell Mol Life Sci 2010; 67:2749-72. [PMID: 20364395 PMCID: PMC2912727 DOI: 10.1007/s00018-010-0352-4] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2009] [Revised: 03/08/2010] [Accepted: 03/11/2010] [Indexed: 12/11/2022]
Abstract
The field of glycobiology is concerned with the study of the structure, properties, and biological functions of the family of biomolecules called carbohydrates. Bioinformatics for glycobiology is a particularly challenging field, because carbohydrates exhibit a high structural diversity and their chains are often branched. Significant improvements in experimental analytical methods over recent years have led to a tremendous increase in the amount of carbohydrate structure data generated. Consequently, the availability of databases and tools to store, retrieve and analyze these data in an efficient way is of fundamental importance to progress in glycobiology. In this review, the various graphical representations and sequence formats of carbohydrates are introduced, and an overview of newly developed databases, the latest developments in sequence alignment and data mining, and tools to support experimental glycan analysis are presented. Finally, the field of structural glycoinformatics and molecular modeling of carbohydrates, glycoproteins, and protein-carbohydrate interaction are reviewed.
Collapse
Affiliation(s)
- Martin Frank
- Molecular Structure Analysis Core Facility-W160, Deutsches Krebsforschungszentrum (German Cancer Research Centre), 69120 Heidelberg, Germany.
| | | |
Collapse
|
16
|
Fukuzaki M, Seki M, Kashima H, Sese J. Finding Itemset-Sharing Patterns in a Large Itemset-Associated Graph. ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING 2010. [DOI: 10.1007/978-3-642-13672-6_15] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
|
17
|
|