Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For:	Yamanishi Y, Bach F, Vert JP. Glycan classification with tree kernels. Bioinformatics 2007;23:1211-6. [PMID: 17344232 DOI: 10.1093/bioinformatics/btm090] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open

Number

Cited by Other Article(s)

Yom A, Chiang A, Lewis NE. Boltzmann Model Predicts Glycan Structures from Lectin Binding. Anal Chem 2024;96:8332-8341. [PMID: 38720429 PMCID: PMC11162346 DOI: 10.1021/acs.analchem.3c04992] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2024]

Bojar D, Lisacek F. Glycoinformatics in the Artificial Intelligence Era. Chem Rev 2022;122:15971-15988. [PMID: 35961636 PMCID: PMC9615983 DOI: 10.1021/acs.chemrev.2c00110] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Indexed: 11/29/2022]

Carpenter EJ, Seth S, Yue N, Greiner R, Derda R. GlyNet: a multi-task neural network for predicting protein-glycan interactions. Chem Sci 2022;13:6669-6686. [PMID: 35756507 PMCID: PMC9172296 DOI: 10.1039/d1sc05681f] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 05/02/2022] [Indexed: 12/14/2022] Open

Flevaris K, Kontoravdi C. Immunoglobulin G N-glycan Biomarkers for Autoimmune Diseases: Current State and a Glycoinformatics Perspective. Int J Mol Sci 2022;23:5180. [PMID: 35563570 PMCID: PMC9100869 DOI: 10.3390/ijms23095180] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 05/02/2022] [Accepted: 05/04/2022] [Indexed: 02/04/2023] Open

A vectorial tree distance measure. Sci Rep 2022;12:5256. [PMID: 35347186 PMCID: PMC8960910 DOI: 10.1038/s41598-022-08360-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 02/28/2022] [Indexed: 11/08/2022] Open

Mohapatra S, An J, Gómez-Bombarelli R. Chemistry-informed macromolecule graph representation for similarity computation, unsupervised and supervised learning. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac545e] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open

Coff L, Chan J, Ramsland PA, Guy AJ. Identifying glycan motifs using a novel subtree mining approach. BMC Bioinformatics 2020;21:42. [PMID: 32019496 PMCID: PMC7001330 DOI: 10.1186/s12859-020-3374-4] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 01/20/2020] [Indexed: 11/17/2022] Open

Abstract

Background

Glycans are complex sugar chains, crucial to many biological processes. By participating in binding interactions with proteins, glycans often play key roles in host–pathogen interactions. The specificities of glycan-binding proteins, such as lectins and antibodies, are governed by motifs within larger glycan structures, and improved characterisations of these determinants would aid research into human diseases. Identification of motifs has previously been approached as a frequent subtree mining problem, and we extend these approaches with a glycan notation that allows recognition of terminal motifs.

Results

In this work, we customised a frequent subtree mining approach by altering the glycan notation to include information on terminal connections. This allows specific identification of terminal residues as potential motifs, better capturing the complexity of glycan-binding interactions. We achieved this by including additional nodes in a graph representation of the glycan structure to indicate the presence or absence of a linkage at particular backbone carbon positions. Combining this frequent subtree mining approach with a state-of-the-art feature selection algorithm termed minimum-redundancy, maximum-relevance (mRMR), we have generated a classification pipeline that is trained on data from a glycan microarray. When applied to a set of commonly used lectins, the identified motifs were consistent with known binding determinants. Furthermore, logistic regression classifiers trained using these motifs performed well across most lectins examined, with a median AUC value of 0.89.

Conclusions

We present here a new subtree mining approach for the classification of glycan binding and identification of potential binding motifs. The Carbohydrate Classification Accounting for Restricted Linkages (CCARL) method will assist in the interpretation of glycan microarray experiments and will aid in the discovery of novel binding motifs for further experimental characterisation.

Collapse

Haab BB, Klamer Z. Advances in Tools to Determine the Glycan-Binding Specificities of Lectins and Antibodies. Mol Cell Proteomics 2020;19:224-232. [PMID: 31848260 PMCID: PMC7000120 DOI: 10.1074/mcp.r119.001836] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2019] [Revised: 12/13/2019] [Indexed: 01/17/2023] Open

Akiyoshi S, Iwata M, Berenger F, Yamanishi Y. Omics-based Identification of Glycan Structures as Biomarkers for a Variety of Diseases. Mol Inform 2019;39:e1900112. [PMID: 31622036 DOI: 10.1002/minf.201900112] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2019] [Accepted: 09/24/2019] [Indexed: 12/11/2022]

Hosoda M, Akune Y, Aoki-Kinoshita KF. Development and application of an algorithm to compute weighted multiple glycan alignments. Bioinformatics 2017;33:1317-1323. [PMID: 28093404 PMCID: PMC5408794 DOI: 10.1093/bioinformatics/btw827] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Revised: 12/22/2016] [Accepted: 01/10/2017] [Indexed: 11/13/2022] Open

Takigawa I, Mamitsuka H. Generalized Sparse Learning of Linear Models Over the Complete Subgraph Feature Set. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2017;39:617-624. [PMID: 27187949 DOI: 10.1109/tpami.2016.2567399] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Bennun SV, Hizal DB, Heffner K, Can O, Zhang H, Betenbaugh MJ. Systems Glycobiology: Integrating Glycogenomics, Glycoproteomics, Glycomics, and Other ‘Omics Data Sets to Characterize Cellular Glycosylation Processes. J Mol Biol 2016;428:3337-3352. [PMID: 27423401 DOI: 10.1016/j.jmb.2016.07.005] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2016] [Revised: 07/05/2016] [Accepted: 07/07/2016] [Indexed: 12/17/2022]

Shen D, Shen H, Bhamidi S, Maldonado YM, Kim Y, Marron JS. Functional Data Analysis of Tree Data Objects. J Comput Graph Stat 2014;23:418-438. [PMID: 25346588 DOI: 10.1080/10618600.2013.786943] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]

Tang H, Mayampurath A, Yu C, Mechref Y. Bioinformatics Protocols in Glycomics and Glycoproteomics. ACTA ACUST UNITED AC 2014;76:2.15.1-2.15.7. [DOI: 10.1002/0471140864.ps0215s76] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]

Sánchez-Rodríguez MI, Caridad JM. Modeling and partial least squares approaches in OODA. Biom J 2014;56:771-3. [PMID: 24652826 DOI: 10.1002/bimj.201300178] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2013] [Revised: 12/26/2013] [Accepted: 01/04/2014] [Indexed: 11/06/2022]

Aoki-Kinoshita KF. Using glycome databases for drug discovery. Expert Opin Drug Discov 2013;3:877-90. [PMID: 23484965 DOI: 10.1517/17460441.3.8.877] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]

Tree Echo State Networks. Neurocomputing 2013. [DOI: 10.1016/j.neucom.2012.08.017] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]

Jiang H, Aoki-Kinoshita KF, Ching WK. Extracting glycan motifs using a biochemicallyweighted kernel. Bioinformation 2011;7:405-12. [PMID: 22347783 PMCID: PMC3280441 DOI: 10.6026/97320630007405] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2011] [Accepted: 12/07/2011] [Indexed: 11/28/2022] Open

Abstract

Carbohydrates, or glycans, are one of the most abundant and structurally diverse biopolymers constitute the third major class of biomolecules, following DNA and proteins. However, the study of carbohydrate sugar chains has lagged behind compared to that of DNA and proteins, mainly due to their inherent structural complexity. However, their analysis is important because they serve various important roles in biological processes, including signaling transduction and cellular recognition. In order to glean some light into glycan function based on carbohydrate structure, kernel methods have been developed in the past, in particular to extract potential glycan biomarkers by classifying glycan structures found in different tissue samples. The recently developed weighted qgram method (LK-method) exhibits good performance on glycan structure classification while having limitations in feature selection. That is, it was unable to extract biologically meaningful features from the data. Therefore, we propose a biochemicallyweighted tree kernel (BioLK-method) which is based on a glycan similarity matrix and also incorporates biochemical information of individual q-grams in constructing the kernel matrix. We further applied our new method for the classification and recognition of motifs on publicly available glycan data. Our novel tree kernel (BioLK-method) using a Support Vector Machine (SVM) is capable of detecting biologically important motifs accurately while LK-method failed to do so. It was tested on three glycan data sets from the Consortium for Functional Glycomics (CFG) and Kyoto Encyclopedia of Genes and Genomes (KEGG) GLYCAN and showed that the results are consistent with the literature. The newly developed BioLK-method also maintains comparable classification performance with the LK-method. Our results obtained here indicate that the incorporation of biochemical information of q-grams further shows the flexibility and capability of the novel kernel in feature extraction, which may aid in the prediction of glycan biomarkers.

Collapse

Xuan P, Zhang Y, Tzeng TRJ, Wan XF, Luo F. A quantitative structure-activity relationship (QSAR) study on glycan array data to determine the specificities of glycan-binding proteins. Glycobiology 2011;22:552-60. [PMID: 22156918 DOI: 10.1093/glycob/cwr163] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open

Fukagawa D, Tamura T, Takasu A, Tomita E, Akutsu T. A clique-based method for the edit distance between unordered trees and its application to analysis of glycan structures. BMC Bioinformatics 2011;12 Suppl 1:S13. [PMID: 21342542 PMCID: PMC3044267 DOI: 10.1186/1471-2105-12-s1-s13] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open

Frank M, Schloissnig S. Bioinformatics and molecular modeling in glycobiology. Cell Mol Life Sci 2010;67:2749-72. [PMID: 20364395 PMCID: PMC2912727 DOI: 10.1007/s00018-010-0352-4] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2009] [Revised: 03/08/2010] [Accepted: 03/11/2010] [Indexed: 12/11/2022]

Cerulo L, Elkan C, Ceccarelli M. Learning gene regulatory networks from only positive and unlabeled data. BMC Bioinformatics 2010;11:228. [PMID: 20444264 PMCID: PMC2887423 DOI: 10.1186/1471-2105-11-228] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2009] [Accepted: 05/05/2010] [Indexed: 11/16/2022] Open

Li L, Ching WK, Yamaguchi T, Aoki-Kinoshita KF. A weighted q-gram method for glycan structure classification. BMC Bioinformatics 2010;11 Suppl 1:S33. [PMID: 20122206 PMCID: PMC3009505 DOI: 10.1186/1471-2105-11-s1-s33] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

A Survey of Graph Mining Techniques for Biological Datasets. MANAGING AND MINING GRAPH DATA 2010. [DOI: 10.1007/978-1-4419-6045-0_18] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Aydın B, Pataki G, Wang H, Bullitt E, Marron JS. A principal component analysis for trees. Ann Appl Stat 2009. [DOI: 10.1214/09-aoas263] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]

Ozen A, Gönen M, Alpaydan E, Haliloğlu T. Machine learning integration for predicting the effect of single amino acid substitutions on protein stability. BMC STRUCTURAL BIOLOGY 2009;9:66. [PMID: 19840377 PMCID: PMC2777163 DOI: 10.1186/1472-6807-9-66] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/03/2009] [Accepted: 10/19/2009] [Indexed: 11/10/2022]

Abstract

BACKGROUND

Computational prediction of protein stability change due to single-site amino acid substitutions is of interest in protein design and analysis. We consider the following four ways to improve the performance of the currently available predictors: (1) We include additional sequence- and structure-based features, namely, the amino acid substitution likelihoods, the equilibrium fluctuations of the alpha- and beta-carbon atoms, and the packing density. (2) By implementing different machine learning integration approaches, we combine information from different features or representations. (3) We compare classification vs. regression methods to predict the sign vs. the output of stability change. (4) We allow a reject option for doubtful cases where the risk of misclassification is high.

RESULTS

We investigate three different approaches: early, intermediate and late integration, which respectively combine features, kernels over feature subsets, and decisions. We perform simulations on two data sets: (1) S1615 is used in previous studies, (2) S2783 is the updated version (as of July 2, 2009) extracted also from ProTherm. For S1615 data set, our highest accuracy using both sequence and structure information is 0.842 on cross-validation and 0.904 on testing using early integration. Newly added features, namely, local compositional packing and the mobility extent of the mutated residues, improve accuracy significantly with intermediate integration. For S2783 data set, we also train regression methods to estimate not only the sign but also the amount of stability change and apply risk-based classification to reject when the learner has low confidence and the loss of misclassification is high. The highest accuracy is 0.835 on cross-validation and 0.832 on testing using only sequence information. The percentage of false positives can be decreased to less than 0.005 by rejecting 10 per cent using late integration.

CONCLUSION

We find that in both early and late integration, combining inputs or decisions is useful in increasing accuracy. Intermediate integration allows assessing the contributions of individual features by looking at the assigned weights. Overall accuracy of regression is not better than that of classification but it has less false positives, especially when combined with the reject option. The server for stability prediction for three integration approaches and the data sets are available at http://www.prc.boun.edu.tr/appserv/prc/mlsta.

Collapse

Hashimoto K, Takigawa I, Shiga M, Kanehisa M, Mamitsuka H. Mining significant tree patterns in carbohydrate sugar chains. Bioinformatics 2008;24:i167-73. [PMID: 18689820 DOI: 10.1093/bioinformatics/btn293] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

Aoki-Kinoshita KF. An introduction to bioinformatics for glycomics research. PLoS Comput Biol 2008;4:e1000075. [PMID: 18516240 PMCID: PMC2398734 DOI: 10.1371/journal.pcbi.1000075] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023] Open