1
|
Lisacek F, Schnider B, Imberty A. Tools for structural lectinomics: From structures to lectomes. BBA ADVANCES 2025; 7:100154. [PMID: 40166736 PMCID: PMC11957679 DOI: 10.1016/j.bbadva.2025.100154] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2024] [Revised: 02/24/2025] [Accepted: 03/05/2025] [Indexed: 04/02/2025] Open
Abstract
Lectins are ubiquitous proteins that interact with glycans in a variety of molecular processes and as such, also play a role in diseases, whether infectious, chronic or cancer-related. The systematic study of lectins is therefore essential, in particular for understanding cell-cell communication. Accumulated protein three-dimensional structural data in the past decades boosted advance in AI-based prediction and opened up new options to characterise lectins that are known to often be multimeric and multivalent. This article reviews the methods to obtain structures of lectins, the current data available for lectin 3D structures and their interactions, how this knowledge is used to classify these proteins and shows that the combination of an array of bioinformatics tools should make the prediction of binding specificity possible in a near future.
Collapse
Affiliation(s)
- Frédérique Lisacek
- SIB Swiss Institute of Bioinformatics CH-1227 Geneva, Switzerland
- Computer Science Department, UniGe CH-1227 Geneva, Switzerland
| | - Boris Schnider
- SIB Swiss Institute of Bioinformatics CH-1227 Geneva, Switzerland
- Computer Science Department, UniGe CH-1227 Geneva, Switzerland
| | - Anne Imberty
- Univ. Grenoble Alpes, CNRS, CERMAV 38000 Grenoble, France
| |
Collapse
|
2
|
Liang G, Sha S, Wang Z, Liu H, Yoon S. Soft-sensor model development for CHO growth/production, intracellular metabolite, and glycan predictions. Front Mol Biosci 2024; 11:1441885. [PMID: 39502716 PMCID: PMC11535473 DOI: 10.3389/fmolb.2024.1441885] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 09/30/2024] [Indexed: 11/08/2024] Open
Abstract
Efficaciously assessing product quality remains time- and resource-intensive. Online Process Analytical Technologies (PATs), encompassing real-time monitoring tools and soft-sensor models, are indispensable for understanding process effects and real-time product quality. This research study evaluated three modeling approaches for predicting CHO cell growth and production, metabolites (extracellular, nucleotide sugar donors (NSD) and glycan profiles): Mechanistic based on first principle Michaelis-Menten kinetics (MMK), data-driven orthogonal partial least square (OPLS) and neural network machine learning (NN). Our experimental design involved galactose-fed batch cultures. MMK excelled in predicting growth and production, demonstrating its reliability in these aspects and reducing the data burden by requiring fewer inputs. However, it was less precise in simulating glycan profiles and intracellular metabolite trends. In contrast, NN and OPLS performed better for predicting precise glycan compositions but displayed shortcomings in accurately predicting growth and production. We utilized time in the training set to address NN and OPLS extrapolation challenges. OPLS and NN models demanded more extensive inputs with similar intracellular metabolite trend prediction. However, there was a significant reduction in time required to develop these two models. The guidance presented here can provide valuable insight into rapid development and application of soft-sensor models with PATs for ipurposes. Therefore, we examined three model typesmproving real-time product CHO therapeutic product quality. Coupled with emerging -omics technologies, NN and OPLS will benefit from massive data availability, and we foresee more robust prediction models that can be advantageous to kinetic or partial-kinetic (hybrid) models.
Collapse
Affiliation(s)
| | | | | | | | - Seongkyu Yoon
- Department of Chemical Engineering, University of Massachusetts Lowell, Lowell, MA, United States
| |
Collapse
|
3
|
Shrivastava A, Nikita S, Rathore AS. Machine learning tool as an enabler for rapid quantification of monoclonal antibodies N-glycans using fluorescence detector. Int J Biol Macromol 2024; 271:132694. [PMID: 38810859 DOI: 10.1016/j.ijbiomac.2024.132694] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2024] [Revised: 05/19/2024] [Accepted: 05/25/2024] [Indexed: 05/31/2024]
Abstract
Liquid chromatography-mass spectrometry (LC-MS) is widely used for identification and quantification of N-glycans of monoclonal antibodies (mAbs), owing to its high sensitivity and accuracy. However, its resource-intensive nature necessitates the development of rapid and cost-effective orthogonal analysis approaches. This study aims to develop an online method utilizing the Extreme Gradient Boosting (XGBoost) machine learning (ML) algorithm for real time quantification of InstantPC labelled N-glycans by Liquid Chromatography (LC) - fluorescence detector (FLD). The LC-FLD profile is pre-processed for baseline correction and noise reduction prior to fed to the machine learning (ML) algorithm. The algorithm has been successfully tested for commercial and inhouse developed mAbs and validated using LC-MS quantification as reference. The LC-FLD-ML model predicted values were at par with the LC-MS values with root mean square error of <0.5 and R2 of >0.95. The average errors using ML model (1.80 %) was reduced by a minimum of 28 % and 40 % for origin (1.5 %) and manual (1.07 %) based integration, respectively. The approach reduces the data analysis time per sample by ~70 % (from ~5 min to ~1.5 min), thereby offering a time and resource efficient orthogonality with LC-MS for quantification of N-glycans in mAbs.
Collapse
Affiliation(s)
- Anuj Shrivastava
- Department of Chemical Engineering, IIT Delhi, Hauz Khas, New Delhi 110016, India
| | - Saxena Nikita
- Department of Chemical Engineering, IIT Delhi, Hauz Khas, New Delhi 110016, India
| | - Anurag S Rathore
- Department of Chemical Engineering, IIT Delhi, Hauz Khas, New Delhi 110016, India.
| |
Collapse
|
4
|
Kellman BP, Mariethoz J, Zhang Y, Shaul S, Alteri M, Sandoval D, Jeffris M, Armingol E, Bao B, Lisacek F, Bojar D, Lewis NE. Decoding glycosylation potential from protein structure across human glycoproteins with a multi-view recurrent neural network. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.15.594334. [PMID: 38798633 PMCID: PMC11118808 DOI: 10.1101/2024.05.15.594334] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2024]
Abstract
Glycosylation is described as a non-templated biosynthesis. Yet, the template-free premise is antithetical to the observation that different N-glycans are consistently placed at specific sites. It has been proposed that glycosite-proximal protein structures could constrain glycosylation and explain the observed microheterogeneity. Using site-specific glycosylation data, we trained a hybrid neural network to parse glycosites (recurrent neural network) and match them to feasible N-glycosylation events (graph neural network). From glycosite-flanking sequences, the algorithm predicts most human N-glycosylation events documented in the GlyConnect database and proposed structures corresponding to observed monosaccharide composition of the glycans at these sites. The algorithm also recapitulated glycosylation in Enhanced Aromatic Sequons, SARS-CoV-2 spike, and IgG3 variants, thus demonstrating the ability of the algorithm to predict both glycan structure and abundance. Thus, protein structure constrains glycosylation, and the neural network enables predictive in silico glycosylation of uncharacterized or novel protein sequences and genetic variants.
Collapse
Affiliation(s)
- Benjamin P. Kellman
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
- Augment Biologics, La Jolla, CA 92092
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, USA
| | - Julien Mariethoz
- Proteome Informatics Group, Swiss Institute of Bioinformatics, CH-1227 Geneva, Switzerland
| | - Yujie Zhang
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Sigal Shaul
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Mia Alteri
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Daniel Sandoval
- Department of Cellular and Molecular Medicine, University of California, San Diego, La Jolla, CA 92093, USA
| | - Mia Jeffris
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Erick Armingol
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Bokan Bao
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
| | - Frederique Lisacek
- Proteome Informatics Group, Swiss Institute of Bioinformatics, CH-1227 Geneva, Switzerland
- Computer Science Department & Section of Biology, University of Geneva, route de Drize 7, CH-1227, Geneva, Switzerland
| | - Daniel Bojar
- Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg 41390, Sweden
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg 41390, Sweden
| | - Nathan E. Lewis
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
- Bioinformatics and Systems Biology Graduate Program, University of California, San Diego, La Jolla, CA 92093, USA
- Ragon Institute of MGH, MIT, and Harvard, Cambridge, MA, USA
| |
Collapse
|
5
|
Li H, Peralta AG, Schoffelen S, Hansen AH, Arnsdorf J, Schinn SM, Skidmore J, Choudhury B, Paulchakrabarti M, Voldborg BG, Chiang AW, Lewis NE. LeGenD: determining N-glycoprofiles using an explainable AI-leveraged model with lectin profiling. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.03.27.587044. [PMID: 38585977 PMCID: PMC10996628 DOI: 10.1101/2024.03.27.587044] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/09/2024]
Abstract
Glycosylation affects many vital functions of organisms. Therefore, its surveillance is critical from basic science to biotechnology, including biopharmaceutical development and clinical diagnostics. However, conventional glycan structure analysis faces challenges with throughput and cost. Lectins offer an alternative approach for analyzing glycans, but they only provide glycan epitopes and not full glycan structure information. To overcome these limitations, we developed LeGenD, a lectin and AI-based approach to predict N-glycan structures and determine their relative abundance in purified proteins based on lectin-binding patterns. We trained the LeGenD model using 309 glycoprofiles from 10 recombinant proteins, produced in 30 glycoengineered CHO cell lines. Our approach accurately reconstructed experimentally-measured N-glycoprofiles of bovine Fetuin B and IgG from human sera. Explanatory AI analysis with SHapley Additive exPlanations (SHAP) helped identify the critical lectins for glycoprofile predictions. Our LeGenD approach thus presents an alternative approach for N-glycan analysis.
Collapse
Affiliation(s)
- Haining Li
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Angelo G. Peralta
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Sanne Schoffelen
- National Biologics Facility Department of Biotechnology and Biomedicine, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby Denmark
| | - Anders Holmgaard Hansen
- National Biologics Facility Department of Biotechnology and Biomedicine, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby Denmark
| | - Johnny Arnsdorf
- National Biologics Facility Department of Biotechnology and Biomedicine, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby Denmark
| | - Song-Min Schinn
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Jonathan Skidmore
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Microbiology and Molecular Biology, Brigham Young University, Provo, UT 84602, USA
| | - Biswa Choudhury
- Glycobiology Research and Training Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Mousumi Paulchakrabarti
- Glycobiology Research and Training Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Bjorn G. Voldborg
- National Biologics Facility Department of Biotechnology and Biomedicine, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs. Lyngby Denmark
| | - Austin W.T. Chiang
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
| | - Nathan E. Lewis
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
6
|
Thomès L, Karlsson V, Lundstrøm J, Bojar D. Mammalian milk glycomes: Connecting the dots between evolutionary conservation and biosynthetic pathways. Cell Rep 2023; 42:112710. [PMID: 37379211 DOI: 10.1016/j.celrep.2023.112710] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 05/09/2023] [Accepted: 06/12/2023] [Indexed: 06/30/2023] Open
Abstract
Milk oligosaccharides (MOs) are among the most abundant constituents of breast milk and are essential for health and development. Biosynthesized from monosaccharides into complex sequences, MOs differ considerably between taxonomic groups. Even human MO biosynthesis is insufficiently understood, hampering evolutionary and functional analyses. Using a comprehensive resource of all published MOs from >100 mammals, we develop a pipeline for generating and analyzing MO biosynthetic networks. We then use evolutionary relationships and inferred intermediates of these networks to discover (1) systematic glycome biases, (2) biosynthetic restrictions, such as reaction path preference, and (3) conserved biosynthetic modules. This allows us to prune and pinpoint biosynthetic pathways despite missing information. Machine learning and network analysis cluster species by their milk glycome, identifying characteristic sequence relationships and evolutionary gains/losses of motifs, MOs, and biosynthetic modules. These resources and analyses will advance our understanding of glycan biosynthesis and the evolution of breast milk.
Collapse
Affiliation(s)
- Luc Thomès
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
| | - Viktoria Karlsson
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
| | - Jon Lundstrøm
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
| | - Daniel Bojar
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden; Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden.
| |
Collapse
|
7
|
Rios-Martinez C, Bhattacharya N, Amini AP, Crawford L, Yang KK. Deep self-supervised learning for biosynthetic gene cluster detection and product classification. PLoS Comput Biol 2023; 19:e1011162. [PMID: 37220151 PMCID: PMC10241353 DOI: 10.1371/journal.pcbi.1011162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 06/05/2023] [Accepted: 05/07/2023] [Indexed: 05/25/2023] Open
Abstract
Natural products are chemical compounds that form the basis of many therapeutics used in the pharmaceutical industry. In microbes, natural products are synthesized by groups of colocalized genes called biosynthetic gene clusters (BGCs). With advances in high-throughput sequencing, there has been an increase of complete microbial isolate genomes and metagenomes, from which a vast number of BGCs are undiscovered. Here, we introduce a self-supervised learning approach designed to identify and characterize BGCs from such data. To do this, we represent BGCs as chains of functional protein domains and train a masked language model on these domains. We assess the ability of our approach to detect BGCs and characterize BGC properties in bacterial genomes. We also demonstrate that our model can learn meaningful representations of BGCs and their constituent domains, detect BGCs in microbial genomes, and predict BGC product classes. These results highlight self-supervised neural networks as a promising framework for improving BGC prediction and classification.
Collapse
Affiliation(s)
- Carolina Rios-Martinez
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
- Department of Bioengineering, Stanford University, Stanford, California, United States of America
| | - Nicholas Bhattacharya
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
- Department of Mathematics, University of California, Berkeley, Berkeley, California, United States of America
| | - Ava P. Amini
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
| | - Lorin Crawford
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
| | - Kevin K. Yang
- Microsoft Research New England, Cambridge, Massachusetts, United States of America
| |
Collapse
|
8
|
Pegg CL, Schulz BL, Neely BA, Albery GF, Carlson CJ. Glycosylation and the global virome. Mol Ecol 2023; 32:37-44. [PMID: 36217579 PMCID: PMC10947461 DOI: 10.1111/mec.16731] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 09/25/2022] [Accepted: 09/29/2022] [Indexed: 12/29/2022]
Abstract
The sugars that coat the outsides of viruses and host cells are key to successful disease transmission, but they remain understudied compared to other molecular features. Understanding the comparative zoology of glycosylation - and harnessing it for predictive science - could help close the molecular gap in zoonotic risk assessment.
Collapse
Affiliation(s)
- Cassandra L. Pegg
- School of Chemistry and Molecular BiosciencesThe University of QueenslandSt LuciaQueenslandAustralia
| | - Benjamin L. Schulz
- School of Chemistry and Molecular BiosciencesThe University of QueenslandSt LuciaQueenslandAustralia
| | - Benjamin A. Neely
- National Institute of Standards and TechnologyCharlestonSouth CarolinaUSA
| | - Gregory F. Albery
- Department of BiologyGeorgetown UniversityWashingtonDistrict of ColumbiaUSA
| | - Colin J. Carlson
- Department of BiologyGeorgetown UniversityWashingtonDistrict of ColumbiaUSA
- Department of Microbiology and ImmunologyGeorgetown University Medical CenterWashingtonDistrict of ColumbiaUSA
- Center for Global Health Science and SecurityGeorgetown University Medical CenterWashingtonDistrict of ColumbiaUSA
| |
Collapse
|
9
|
Reiser P, Neubert M, Eberhard A, Torresi L, Zhou C, Shao C, Metni H, van Hoesel C, Schopmans H, Sommer T, Friederich P. Graph neural networks for materials science and chemistry. COMMUNICATIONS MATERIALS 2022; 3:93. [PMID: 36468086 PMCID: PMC9702700 DOI: 10.1038/s43246-022-00315-6] [Citation(s) in RCA: 134] [Impact Index Per Article: 44.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/07/2022] [Indexed: 05/14/2023]
Abstract
Machine learning plays an increasingly important role in many areas of chemistry and materials science, being used to predict materials properties, accelerate simulations, design new structures, and predict synthesis routes of new materials. Graph neural networks (GNNs) are one of the fastest growing classes of machine learning models. They are of particular relevance for chemistry and materials science, as they directly work on a graph or structural representation of molecules and materials and therefore have full access to all relevant information required to characterize materials. In this Review, we provide an overview of the basic principles of GNNs, widely used datasets, and state-of-the-art architectures, followed by a discussion of a wide range of recent applications of GNNs in chemistry and materials science, and concluding with a road-map for the further development and application of GNNs.
Collapse
Affiliation(s)
- Patrick Reiser
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Marlen Neubert
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - André Eberhard
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Luca Torresi
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Zhou
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
| | - Chen Shao
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Present Address: Institute for Applied Informatics and Formal Description Systems, Karlsruhe Institute of Technology, Kaiserstr. 89, 76133 Karlsruhe, Germany
| | - Houssam Metni
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- ECPM, Université de Strasbourg, 25 Rue Becquerel, 67087 Strasbourg, France
| | - Clint van Hoesel
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Department of Applied Physics, Eindhoven University of Technology, Groene Loper 19, 5612 AP Eindhoven, The Netherlands
| | - Henrik Schopmans
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| | - Timo Sommer
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute for Theory of Condensed Matter, Karlsruhe Institute of Technology, Wolfgang-Gaede-Str. 1, 76131 Karlsruhe, Germany
- Present Address: School of Chemistry, Trinity College Dublin, College Green, Dublin 2, Ireland
| | - Pascal Friederich
- Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Am Fasanengarten 5, 76131 Karlsruhe, Germany
- Institute of Nanotechnology, Karlsruhe Institute of Technology, Hermann-von-Helmholtz-Platz 1, 76344 Eggenstein-Leopoldshafen, Germany
| |
Collapse
|
10
|
Li H, Chiang AWT, Lewis NE. Artificial intelligence in the analysis of glycosylation data. Biotechnol Adv 2022; 60:108008. [PMID: 35738510 PMCID: PMC11157671 DOI: 10.1016/j.biotechadv.2022.108008] [Citation(s) in RCA: 18] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2022] [Revised: 06/15/2022] [Accepted: 06/16/2022] [Indexed: 11/18/2022]
Abstract
Glycans are complex, yet ubiquitous across biological systems. They are involved in diverse essential organismal functions. Aberrant glycosylation may lead to disease development, such as cancer, autoimmune diseases, and inflammatory diseases. Glycans, both normal and aberrant, are synthesized using extensive glycosylation machinery, and understanding this machinery can provide invaluable insights for diagnosis, prognosis, and treatment of various diseases. Increasing amounts of glycomics data are being generated thanks to advances in glycoanalytics technologies, but to maximize the value of such data, innovations are needed for analyzing and interpreting large-scale glycomics data. Artificial intelligence (AI) provides a powerful analysis toolbox in many scientific fields, and here we review state-of-the-art AI approaches on glycosylation analysis. We further discuss how models can be analyzed to gain mechanistic insights into glycosylation machinery and how the machinery shapes glycans under different scenarios. Finally, we propose how to leverage the gained knowledge for developing predictive AI-based models of glycosylation. Thus, guiding future research of AI-based glycosylation model development will provide valuable insights into glycosylation and glycan machinery.
Collapse
Affiliation(s)
- Haining Li
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA
| | - Austin W T Chiang
- Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA.
| | - Nathan E Lewis
- Department of Bioengineering, University of California, San Diego, La Jolla, CA 92093, USA; Department of Pediatrics, University of California, San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
11
|
Abstract
Artificial intelligence (AI) methods have been and are now being increasingly integrated in prediction software implemented in bioinformatics and its glycoscience branch known as glycoinformatics. AI techniques have evolved in the past decades, and their applications in glycoscience are not yet widespread. This limited use is partly explained by the peculiarities of glyco-data that are notoriously hard to produce and analyze. Nonetheless, as time goes, the accumulation of glycomics, glycoproteomics, and glycan-binding data has reached a point where even the most recent deep learning methods can provide predictors with good performance. We discuss the historical development of the application of various AI methods in the broader field of glycoinformatics. A particular focus is placed on shining a light on challenges in glyco-data handling, contextualized by lessons learnt from related disciplines. Ending on the discussion of state-of-the-art deep learning approaches in glycoinformatics, we also envision the future of glycoinformatics, including development that need to occur in order to truly unleash the capabilities of glycoscience in the systems biology era.
Collapse
Affiliation(s)
- Daniel Bojar
- Department
of Chemistry and Molecular Biology, University
of Gothenburg, Gothenburg 41390, Sweden
- Wallenberg
Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg 41390, Sweden
| | - Frederique Lisacek
- Proteome
Informatics Group, Swiss Institute of Bioinformatics, CH-1227 Geneva, Switzerland
- Computer
Science Department & Section of Biology, University of Geneva, route de Drize 7, CH-1227, Geneva, Switzerland
| |
Collapse
|
12
|
Akmal MA, Hassan MA, Muhammad S, Khurshid KS, Mohamed A. An analytical study on the identification of N-linked glycosylation sites using machine learning model. PeerJ Comput Sci 2022; 8:e1069. [PMID: 36262138 PMCID: PMC9575850 DOI: 10.7717/peerj-cs.1069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2022] [Accepted: 07/25/2022] [Indexed: 06/16/2023]
Abstract
N-linked is the most common type of glycosylation which plays a significant role in identifying various diseases such as type I diabetes and cancer and helps in drug development. Most of the proteins cannot perform their biological and psychological functionalities without undergoing such modification. Therefore, it is essential to identify such sites by computational techniques because of experimental limitations. This study aims to analyze and synthesize the progress to discover N-linked places using machine learning methods. It also explores the performance of currently available tools to predict such sites. Almost seventy research articles published in recognized journals of the N-linked glycosylation field have shortlisted after the rigorous filtering process. The findings of the studies have been reported based on multiple aspects: publication channel, feature set construction method, training algorithm, and performance evaluation. Moreover, a literature survey has developed a taxonomy of N-linked sequence identification. Our study focuses on the performance evaluation criteria, and the importance of N-linked glycosylation motivates us to discover resources that use computational methods instead of the experimental method due to its limitations.
Collapse
Affiliation(s)
- Muhammad Aizaz Akmal
- Department of Computer Science, University of Engineering and Technology, KSK, Lahore, Punjab, Pakistan
| | - Muhammad Awais Hassan
- Department of Computer Science, University of Engineering and Technology, Lahore, Punjab, Pakistan
| | - Shoaib Muhammad
- Department of Computer Science, University of Engineering and Technology, Lahore, Punjab, Pakistan
| | - Khaldoon S. Khurshid
- Department of Computer Science, University of Engineering and Technology, Lahore, Punjab, Pakistan
| | | |
Collapse
|
13
|
Carpenter EJ, Seth S, Yue N, Greiner R, Derda R. GlyNet: a multi-task neural network for predicting protein-glycan interactions. Chem Sci 2022; 13:6669-6686. [PMID: 35756507 PMCID: PMC9172296 DOI: 10.1039/d1sc05681f] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2021] [Accepted: 05/02/2022] [Indexed: 12/14/2022] Open
Abstract
Advances in diagnostics, therapeutics, vaccines, transfusion, and organ transplantation build on a fundamental understanding of glycan-protein interactions. To aid this, we developed GlyNet, a model that accurately predicts interactions (relative binding strengths) between mammalian glycans and 352 glycan-binding proteins, many at multiple concentrations. For each glycan input, our model produces 1257 outputs, each representing the relative interaction strength between the input glycan and a particular protein sample. GlyNet learns these continuous values using relative fluorescence units (RFUs) measured on 599 glycans in the Consortium for Functional Glycomics glycan arrays and extrapolates these to RFUs from additional, untested glycans. GlyNet's output of continuous values provides more detailed results than the standard binary classification models. After incorporating a simple threshold to transform such continuous outputs the resulting GlyNet classifier outperforms those standard classifiers. GlyNet is the first multi-output regression model for predicting protein-glycan interactions and serves as an important benchmark, facilitating development of quantitative computational glycobiology.
Collapse
Affiliation(s)
- Eric J Carpenter
- Department of Chemistry, University of Alberta Edmonton Alberta Canada
| | - Shaurya Seth
- Department of Chemistry, University of Alberta Edmonton Alberta Canada
| | - Noel Yue
- Department of Chemistry, University of Alberta Edmonton Alberta Canada
| | - Russell Greiner
- Department of Computing Science, University of Alberta Edmonton Alberta Canada
- Alberta Machine Intelligence Institute (AMII) Edmonton Alberta Canada
| | - Ratmir Derda
- Department of Chemistry, University of Alberta Edmonton Alberta Canada
| |
Collapse
|
14
|
Rickert CA, Lieleg O. Machine learning approaches for biomolecular, biophysical, and biomaterials research. BIOPHYSICS REVIEWS 2022; 3:021306. [PMID: 38505413 PMCID: PMC10914139 DOI: 10.1063/5.0082179] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/13/2021] [Accepted: 05/12/2022] [Indexed: 03/21/2024]
Abstract
A fluent conversation with a virtual assistant, person-tailored news feeds, and deep-fake images created within seconds-all those things that have been unthinkable for a long time are now a part of our everyday lives. What these examples have in common is that they are realized by different means of machine learning (ML), a technology that has fundamentally changed many aspects of the modern world. The possibility to process enormous amount of data in multi-hierarchical, digital constructs has paved the way not only for creating intelligent systems but also for obtaining surprising new insight into many scientific problems. However, in the different areas of biosciences, which typically rely heavily on the collection of time-consuming experimental data, applying ML methods is a bit more challenging: Here, difficulties can arise from small datasets and the inherent, broad variability, and complexity associated with studying biological objects and phenomena. In this Review, we give an overview of commonly used ML algorithms (which are often referred to as "machines") and learning strategies as well as their applications in different bio-disciplines such as molecular biology, drug development, biophysics, and biomaterials science. We highlight how selected research questions from those fields were successfully translated into machine readable formats, discuss typical problems that can arise in this context, and provide an overview of how to resolve those encountered difficulties.
Collapse
|
15
|
Mohapatra S, An J, Gómez-Bombarelli R. Chemistry-informed macromolecule graph representation for similarity computation, unsupervised and supervised learning. MACHINE LEARNING: SCIENCE AND TECHNOLOGY 2022. [DOI: 10.1088/2632-2153/ac545e] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Abstract
The near-infinite chemical diversity of natural and artificial macromolecules arises from the vast range of possible component monomers, linkages, and polymers topologies. This enormous variety contributes to the ubiquity and indispensability of macromolecules but hinders the development of general machine learning methods with macromolecules as input. To address this, we developed a chemistry-informed graph representation of macromolecules that enables quantifying structural similarity, and interpretable supervised learning for macromolecules. Our work enables quantitative chemistry-informed decision-making and iterative design in the macromolecular chemical space.
Collapse
|
16
|
Lundstrøm J, Korhonen E, Lisacek F, Bojar D. LectinOracle: A Generalizable Deep Learning Model for Lectin-Glycan Binding Prediction. ADVANCED SCIENCE (WEINHEIM, BADEN-WURTTEMBERG, GERMANY) 2022; 9:e2103807. [PMID: 34862760 PMCID: PMC8728848 DOI: 10.1002/advs.202103807] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/30/2021] [Revised: 11/03/2021] [Indexed: 05/07/2023]
Abstract
Ranging from bacterial cell adhesion over viral cell entry to human innate immunity, glycan-binding proteins or lectins are abound in nature. Widely used as staining and characterization reagents in cell biology and crucial for understanding the interactions in biological systems, lectins are a focal point of study in glycobiology. Yet the sheer breadth and depth of specificity for diverse oligosaccharide motifs has made studying lectins a largely piecemeal approach, with few options to generalize. Here, LectinOracle, a model combining transformer-based representations for proteins and graph convolutional neural networks for glycans to predict their interaction, is presented. Using a curated data set of 564,647 unique protein-glycan interactions, it is shown that LectinOracle predictions agree with literature-annotated specificities for a wide range of lectins. Using a range of specialized glycan arrays, it is shown that LectinOracle predictions generalize to new glycans and lectins, with qualitative and quantitative agreement with experimental data. It is further demonstrated that LectinOracle can be used to improve lectin classification, accelerate lectin directed evolution, predict epidemiological outcomes in the context of influenza virus, and analyze whole lectomes in host-microbe interactions. It is envisioned that the herein presented platform will advance both the study of lectins and their role in (glyco)biology.
Collapse
Affiliation(s)
- Jon Lundstrøm
- Department of Chemistry and Molecular BiologyUniversity of GothenburgGothenburg41390Sweden
- Wallenberg Centre for Molecular and Translational MedicineUniversity of GothenburgGothenburg41390Sweden
| | - Emma Korhonen
- Department of Chemistry and Molecular BiologyUniversity of GothenburgGothenburg41390Sweden
- Wallenberg Centre for Molecular and Translational MedicineUniversity of GothenburgGothenburg41390Sweden
| | - Frédérique Lisacek
- Swiss Institute of BioinformaticsGeneva1227Switzerland
- Computer Science DepartmentUniGeGeneva1227Switzerland
- Section of BiologyUniGeGeneva1205Switzerland
| | - Daniel Bojar
- Department of Chemistry and Molecular BiologyUniversity of GothenburgGothenburg41390Sweden
- Wallenberg Centre for Molecular and Translational MedicineUniversity of GothenburgGothenburg41390Sweden
| |
Collapse
|
17
|
Fischer S, Stegmann F, Gnanapragassam VS, Lepenies B. From structure to function – Ligand recognition by myeloid C-type lectin receptors. Comput Struct Biotechnol J 2022; 20:5790-5812. [DOI: 10.1016/j.csbj.2022.10.019] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2022] [Revised: 10/14/2022] [Accepted: 10/14/2022] [Indexed: 11/29/2022] Open
|
18
|
Imberty A, Bonnardel F, Lisacek F. UniLectin, A One-Stop-Shop to Explore and Study Carbohydrate-Binding Proteins. Curr Protoc 2021; 1:e305. [PMID: 34826352 DOI: 10.1002/cpz1.305] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
All eukaryotic cells are covered with a dense layer of glycoconjugates, and the cell walls of bacteria are made of various polysaccharides, putting glycans in key locations for mediating protein-protein interactions at cell interfaces. Glycan function is therefore mainly defined as binding to other molecules, and lectins are proteins that specifically recognize and interact non-covalently with glycans. UniLectin was designed based on insight into the knowledge of lectins, their classification, and their biological role. This modular platform provides a curated and periodically updated classification of lectins along with a set of comparative and visualization tools, as well as structured results of screening comprehensive sequence datasets. UniLectin can be used to explore lectins, find precise information on glycan-protein interactions, and mine the results of predictive tools based on HMM profiles. This usage is illustrated here with two protocols. The first one highlights the fine-tuned role of the O blood group antigen in distinctive pathogen recognition, while the second compares the various bacterial lectin arsenals that clearly depend on living conditions of species even in the same genus. © 2021 The Authors. Current Protocols published by Wiley Periodicals LLC. Basic Protocol 1: Searching for the structural details of lectins binding the O blood group antigen Basic Protocol 2: Comparing the lectomes of related organisms in different environments.
Collapse
Affiliation(s)
- Anne Imberty
- Université Grenoble Alpes, CNRS, CERMAV, Grenoble, France
| | - François Bonnardel
- Université Grenoble Alpes, CNRS, CERMAV, Grenoble, France.,SIB Swiss Institute of Bioinformatics, Geneva, Switzerland.,Computer Science Department, UniGe, Geneva, Switzerland
| | - Frédérique Lisacek
- SIB Swiss Institute of Bioinformatics, Geneva, Switzerland.,Computer Science Department, UniGe, Geneva, Switzerland.,Section of Biology, UniGe, Geneva, Switzerland
| |
Collapse
|
19
|
Thomès L, Burkholz R, Bojar D. Glycowork: A Python package for glycan data science and machine learning. Glycobiology 2021; 31:1240-1244. [PMID: 34192308 PMCID: PMC8600276 DOI: 10.1093/glycob/cwab067] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2021] [Revised: 06/02/2021] [Accepted: 06/25/2021] [Indexed: 12/14/2022] Open
Abstract
While glycans are crucial for biological processes, existing analysis modalities make it difficult for researchers with limited computational background to include these diverse carbohydrates into workflows. Here, we present glycowork, an open-source Python package designed for glycan-related data science and machine learning by end users. Glycowork includes functions to, for instance, automatically annotate glycan motifs and analyze their distributions via heatmaps and statistical enrichment. We also provide visualization methods, routines to interact with stored databases, trained machine learning models and learned glycan representations. We envision that glycowork can extract further insights from glycan datasets and demonstrate this with workflows that analyze glycan motifs in various biological contexts. Glycowork can be freely accessed at https://github.com/BojarLab/glycowork/.
Collapse
Affiliation(s)
- Luc Thomès
- Department of Chemistry and Molecular Biology and Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, 41390 Gothenburg, Sweden
| | - Rebekka Burkholz
- Department of Biostatistics, Harvard School of Public Health, Boston, 02115 MA, USA
| | - Daniel Bojar
- Department of Chemistry and Molecular Biology and Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, 41390 Gothenburg, Sweden
| |
Collapse
|
20
|
Thomès L, Bojar D. The Role of Fucose-Containing Glycan Motifs Across Taxonomic Kingdoms. Front Mol Biosci 2021; 8:755577. [PMID: 34631801 PMCID: PMC8492980 DOI: 10.3389/fmolb.2021.755577] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Accepted: 09/10/2021] [Indexed: 11/13/2022] Open
Abstract
The extraordinary diversity of glycans leads to large differences in the glycomes of different kingdoms of life. Yet, while most monosaccharides are solely found in certain taxonomic groups, there is a small set of monosaccharides with widespread distribution across nearly all domains of life. These general monosaccharides are particularly relevant for glycan motifs, as they can readily be used by commensals and pathogens to mimic host glycans or hijack existing glycan recognition systems. Among these, the monosaccharide fucose is especially interesting, as it frequently presents itself as a terminal monosaccharide, primed for interaction with proteins. Here, we analyze fucose-containing glycan motifs across all taxonomic kingdoms. Using a hereby presented large species-specific glycan dataset and a plethora of methods for glycan-focused bioinformatics and machine learning, we identify characteristic as well as shared fucose-containing glycan motifs for various taxonomic groups, demonstrating clear differences in fucose usage. Even within domains, fucose is used differentially based on an organism’s physiology and habitat. We particularly highlight differences in fucose-containing motifs between vertebrates and invertebrates. With the example of pathogenic and non-pathogenic Escherichia coli strains, we also demonstrate the importance of fucose-containing motifs in molecular mimicry and thereby pathogenic potential. We envision that this study will shed light on an important class of glycan motifs, with potential new insights into the role of fucosylated glycans in symbiosis, pathogenicity, and immunity.
Collapse
Affiliation(s)
- Luc Thomès
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden.,Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
| | - Daniel Bojar
- Department of Chemistry and Molecular Biology, University of Gothenburg, Gothenburg, Sweden.,Wallenberg Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg, Sweden
| |
Collapse
|