1
|
Abstract
Artificial intelligence (AI) methods have been and are now being increasingly integrated in prediction software implemented in bioinformatics and its glycoscience branch known as glycoinformatics. AI techniques have evolved in the past decades, and their applications in glycoscience are not yet widespread. This limited use is partly explained by the peculiarities of glyco-data that are notoriously hard to produce and analyze. Nonetheless, as time goes, the accumulation of glycomics, glycoproteomics, and glycan-binding data has reached a point where even the most recent deep learning methods can provide predictors with good performance. We discuss the historical development of the application of various AI methods in the broader field of glycoinformatics. A particular focus is placed on shining a light on challenges in glyco-data handling, contextualized by lessons learnt from related disciplines. Ending on the discussion of state-of-the-art deep learning approaches in glycoinformatics, we also envision the future of glycoinformatics, including development that need to occur in order to truly unleash the capabilities of glycoscience in the systems biology era.
Collapse
Affiliation(s)
- Daniel Bojar
- Department
of Chemistry and Molecular Biology, University
of Gothenburg, Gothenburg 41390, Sweden
- Wallenberg
Centre for Molecular and Translational Medicine, University of Gothenburg, Gothenburg 41390, Sweden
| | - Frederique Lisacek
- Proteome
Informatics Group, Swiss Institute of Bioinformatics, CH-1227 Geneva, Switzerland
- Computer
Science Department & Section of Biology, University of Geneva, route de Drize 7, CH-1227, Geneva, Switzerland
| |
Collapse
|
2
|
Gong Y, Qin S, Dai L, Tian Z. The glycosylation in SARS-CoV-2 and its receptor ACE2. Signal Transduct Target Ther 2021; 6:396. [PMID: 34782609 PMCID: PMC8591162 DOI: 10.1038/s41392-021-00809-8] [Citation(s) in RCA: 104] [Impact Index Per Article: 34.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2021] [Revised: 10/10/2021] [Accepted: 10/24/2021] [Indexed: 02/05/2023] Open
Abstract
Coronavirus disease 2019 (COVID-19), a highly infectious disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has infected more than 235 million individuals and led to more than 4.8 million deaths worldwide as of October 5 2021. Cryo-electron microscopy and topology show that the SARS-CoV-2 genome encodes lots of highly glycosylated proteins, such as spike (S), envelope (E), membrane (M), and ORF3a proteins, which are responsible for host recognition, penetration, binding, recycling and pathogenesis. Here we reviewed the detections, substrates, biological functions of the glycosylation in SARS-CoV-2 proteins as well as the human receptor ACE2, and also summarized the approved and undergoing SARS-CoV-2 therapeutics associated with glycosylation. This review may not only broad the understanding of viral glycobiology, but also provide key clues for the development of new preventive and therapeutic methodologies against SARS-CoV-2 and its variants.
Collapse
Affiliation(s)
- Yanqiu Gong
- National Clinical Research Center for Geriatrics and Department of General Practice, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, and Collaborative Innovation Center of Biotherapy, 610041, Chengdu, China
| | - Suideng Qin
- School of Chemical Science & Engineering, Shanghai Key Laboratory of Chemical Assessment and Sustainability, Tongji University, 200092, Shanghai, China
| | - Lunzhi Dai
- National Clinical Research Center for Geriatrics and Department of General Practice, State Key Laboratory of Biotherapy, West China Hospital, Sichuan University, and Collaborative Innovation Center of Biotherapy, 610041, Chengdu, China.
| | - Zhixin Tian
- School of Chemical Science & Engineering, Shanghai Key Laboratory of Chemical Assessment and Sustainability, Tongji University, 200092, Shanghai, China.
| |
Collapse
|
3
|
Illiano A, Pinto G, Melchiorre C, Carpentieri A, Faraco V, Amoresano A. Protein Glycosylation Investigated by Mass Spectrometry: An Overview. Cells 2020; 9:E1986. [PMID: 32872358 PMCID: PMC7564411 DOI: 10.3390/cells9091986] [Citation(s) in RCA: 36] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Revised: 08/14/2020] [Accepted: 08/24/2020] [Indexed: 12/16/2022] Open
Abstract
The protein glycosylation is a post-translational modification of crucial importance for its involvement in molecular recognition, protein trafficking, regulation, and inflammation. Indeed, abnormalities in protein glycosylation are correlated with several disease states such as cancer, inflammatory diseases, and congenial disorders. The understanding of cellular mechanisms through the elucidation of glycan composition encourages researchers to find analytical solutions for their detection. Actually, the multiplicity and diversity of glycan structures bond to the proteins, the variations in polarity of the individual saccharide residues, and the poor ionization efficiencies make their detection much trickier than other kinds of biopolymers. An overview of the most prominent techniques based on mass spectrometry (MS) for protein glycosylation (glycoproteomics) studies is here presented. The tricks and pre-treatments of samples are discussed as a crucial step prodromal to the MS analysis to improve the glycan ionization efficiency. Therefore, the different instrumental MS mode is also explored for the qualitative and quantitative analysis of glycopeptides and the glycans structural composition, thus contributing to the elucidation of biological mechanisms.
Collapse
Affiliation(s)
- Anna Illiano
- Department of Chemical Sciences, University of Naples Federico II, Via Cinthia 26, 80126 Napoles, Italy; (A.I.); (G.P.); (C.M.); (A.C.); (A.A.)
- CEINGE Advanced Biotechnology, University of Naples Federico II, Via Cinthia 26, 80126 Napoles, Italy
| | - Gabriella Pinto
- Department of Chemical Sciences, University of Naples Federico II, Via Cinthia 26, 80126 Napoles, Italy; (A.I.); (G.P.); (C.M.); (A.C.); (A.A.)
| | - Chiara Melchiorre
- Department of Chemical Sciences, University of Naples Federico II, Via Cinthia 26, 80126 Napoles, Italy; (A.I.); (G.P.); (C.M.); (A.C.); (A.A.)
| | - Andrea Carpentieri
- Department of Chemical Sciences, University of Naples Federico II, Via Cinthia 26, 80126 Napoles, Italy; (A.I.); (G.P.); (C.M.); (A.C.); (A.A.)
| | - Vincenza Faraco
- Department of Chemical Sciences, University of Naples Federico II, Via Cinthia 26, 80126 Napoles, Italy; (A.I.); (G.P.); (C.M.); (A.C.); (A.A.)
| | - Angela Amoresano
- Department of Chemical Sciences, University of Naples Federico II, Via Cinthia 26, 80126 Napoles, Italy; (A.I.); (G.P.); (C.M.); (A.C.); (A.A.)
- Istituto Nazionale Biostrutture e Biosistemi—Consorzio Interuniversitario, Viale delle Medaglie d’Oro, 305, 00136 Rome, Italy
| |
Collapse
|
4
|
Jhong JH, Chi YH, Li WC, Lin TH, Huang KY, Lee TY. dbAMP: an integrated resource for exploring antimicrobial peptides with functional activities and physicochemical properties on transcriptome and proteome data. Nucleic Acids Res 2020; 47:D285-D297. [PMID: 30380085 PMCID: PMC6323920 DOI: 10.1093/nar/gky1030] [Citation(s) in RCA: 79] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 10/24/2018] [Indexed: 02/04/2023] Open
Abstract
Antimicrobial peptides (AMPs), naturally encoded from genes and generally contained 10–100 amino acids, are crucial components of the innate immune system and can protect the host from various pathogenic bacteria, as well as viruses. In recent years, the widespread use of antibiotics has inspired the rapid growth of antibiotic-resistant microorganisms that usually induce critical infection and pathogenesis. An increasing interest therefore was motivated to explore natural AMPs that enable the development of new antibiotics. With the potential of AMPs being as new drugs for multidrug-resistant pathogens, we were thus motivated to develop a database (dbAMP, http://csb.cse.yzu.edu.tw/dbAMP/) by accumulating comprehensive AMPs from public domain and manually curating literature. Currently in dbAMP there are 12 389 unique entries, including 4271 experimentally verified AMPs and 8118 putative AMPs along with their functional activities, supported by 1924 research articles. The advent of high-throughput biotechnologies, such as mass spectrometry and next-generation sequencing, has led us to further expand dbAMP as a database-assisted platform for providing comprehensively functional and physicochemical analyses for AMPs based on the large-scale transcriptome and proteome data. Significant improvements available in dbAMP include the information of AMP–protein interactions, antimicrobial potency analysis for ‘cryptic’ region detection, annotations of AMP target species, as well as AMP detection on transcriptome and proteome datasets. Additionally, a Docker container has been developed as a downloadable package for discovering known and novel AMPs on high-throughput omics data. The user-friendly visualization interfaces have been created to facilitate peptide searching, browsing, and sequence alignment against dbAMP entries. All the facilities integrated into dbAMP can promote the functional analyses of AMPs and the discovery of new antimicrobial drugs.
Collapse
Affiliation(s)
- Jhih-Hua Jhong
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Yu-Hsiang Chi
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Wen-Chi Li
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Tsai-Hsuan Lin
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Kai-Yao Huang
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
- To whom correspondence should be addressed. Tel: +86 75523519551;
| |
Collapse
|
5
|
Wang HY, Li WC, Huang KY, Chung CR, Horng JT, Hsu JF, Lu JJ, Lee TY. Rapid classification of group B Streptococcus serotypes based on matrix-assisted laser desorption ionization-time of flight mass spectrometry and machine learning techniques. BMC Bioinformatics 2019; 20:703. [PMID: 31870283 PMCID: PMC6929280 DOI: 10.1186/s12859-019-3282-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2019] [Accepted: 11/18/2019] [Indexed: 12/04/2022] Open
Abstract
Background Group B streptococcus (GBS) is an important pathogen that is responsible for invasive infections, including sepsis and meningitis. GBS serotyping is an essential means for the investigation of possible infection outbreaks and can identify possible sources of infection. Although it is possible to determine GBS serotypes by either immuno-serotyping or geno-serotyping, both traditional methods are time-consuming and labor-intensive. In recent years, the matrix-assisted laser desorption ionization-time of flight mass spectrometry (MALDI-TOF MS) has been reported as an effective tool for the determination of GBS serotypes in a more rapid and accurate manner. Thus, this work aims to investigate GBS serotypes by incorporating machine learning techniques with MALDI-TOF MS to carry out the identification. Results In this study, a total of 787 GBS isolates, obtained from three research and teaching hospitals, were analyzed by MALDI-TOF MS, and the serotype of the GBS was determined by a geno-serotyping experiment. The peaks of mass-to-charge ratios were regarded as the attributes to characterize the various serotypes of GBS. Machine learning algorithms, such as support vector machine (SVM) and random forest (RF), were then used to construct predictive models for the five different serotypes (Types Ia, Ib, III, V, and VI). After optimization of feature selection and model generation based on training datasets, the accuracies of the selected models attained 54.9–87.1% for various serotypes based on independent testing data. Specifically, for the major serotypes, namely type III and type VI, the accuracies were 73.9 and 70.4%, respectively. Conclusion The proposed models have been adopted to implement a web-based tool (GBSTyper), which is now freely accessible at http://csb.cse.yzu.edu.tw/GBSTyper/, for providing efficient and effective detection of GBS serotypes based on a MALDI-TOF MS spectrum. Overall, this work has demonstrated that the combination of MALDI-TOF MS and machine intelligence could provide a practical means of clinical pathogen testing.
Collapse
Affiliation(s)
- Hsin-Yao Wang
- Department of Laboratory Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan, 33305, Taiwan.,Program in Biomedical Engineering, Chang Gung University, Taoyuan City, Taiwan
| | - Wen-Chi Li
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 518172, China
| | - Kai-Yao Huang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 518172, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, Taoyuan, 32001, Taiwan
| | - Jorng-Tzong Horng
- Department of Computer Science and Information Engineering, National Central University, Taoyuan, 32001, Taiwan.,Department of Bioinformatics and Medical Engineering, Asia University, Taoyuan City, Taiwan
| | - Jen-Fu Hsu
- Division of Pediatric Neonatology, Department of Pediatrics, Chang Gung Memorial Hospital, Linkou, Taoyuan, 33305, Taiwan. .,School of Traditional Chinese Medicine, College of Medicine, Chang Gung University, Taoyuan, 33302, Taiwan.
| | - Jang-Jih Lu
- Department of Laboratory Medicine, Chang Gung Memorial Hospital at Linkou, Taoyuan, 33305, Taiwan. .,Department of Medical Biotechnology and Laboratory Science, Chang Gung University, Taoyuan, Taiwan. .,Department of Medicine, College of Medicine, Chang Gung University, Taoyuan, Taiwan.
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 518172, China. .,School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, 518172, China.
| |
Collapse
|
6
|
Abrahams JL, Taherzadeh G, Jarvas G, Guttman A, Zhou Y, Campbell MP. Recent advances in glycoinformatic platforms for glycomics and glycoproteomics. Curr Opin Struct Biol 2019; 62:56-69. [PMID: 31874386 DOI: 10.1016/j.sbi.2019.11.009] [Citation(s) in RCA: 65] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2019] [Revised: 11/05/2019] [Accepted: 11/15/2019] [Indexed: 12/16/2022]
Abstract
Protein glycosylation is the most complex and prevalent post-translation modification in terms of the number of proteins modified and the diversity generated. To understand the functional roles of glycoproteins it is important to gain an insight into the repertoire of oligosaccharides present. The comparison and relative quantitation of glycoforms combined with site-specific identification and occupancy are necessary steps in this direction. Computational platforms have continued to mature assisting researchers with the interpretation of such glycomics and glycoproteomics data sets, but frequently support dedicated workflows and users rely on the manual interpretation of data to gain insights into the glycoproteome. The growth of site-specific knowledge has also led to the implementation of machine-learning algorithms to predict glycosylation which is now being integrated into glycoproteomics pipelines. This short review describes commercial and open-access databases and software with an emphasis on those that are actively maintained and designed to support current analytical workflows.
Collapse
Affiliation(s)
- Jodie L Abrahams
- Institute for Glycomics, Griffith University, Gold Coast, QLD, Australia
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Gabor Jarvas
- Translational Glycomics Research Group, Research Institute of Biomolecular and Chemical Engineering, University of Pannonia, Veszprém, Hungary; Horváth Csaba Laboratory of Bioseparation Sciences, Research Centre for Molecular Medicine, Faculty of Medicine, University of Debrecen, Debrecen, Hungary
| | - Andras Guttman
- Translational Glycomics Research Group, Research Institute of Biomolecular and Chemical Engineering, University of Pannonia, Veszprém, Hungary; Horváth Csaba Laboratory of Bioseparation Sciences, Research Centre for Molecular Medicine, Faculty of Medicine, University of Debrecen, Debrecen, Hungary; SCIEX, Brea, CA, USA
| | - Yaoqi Zhou
- School of Information and Communication Technology, Griffith University, Gold Coast, QLD, Australia
| | - Matthew P Campbell
- Institute for Glycomics, Griffith University, Gold Coast, QLD, Australia.
| |
Collapse
|
7
|
Fan Z, Kong F, Zhou Y, Chen Y, Dai Y. Intelligence Algorithms for Protein Classification by Mass Spectrometry. BIOMED RESEARCH INTERNATIONAL 2018; 2018:2862458. [PMID: 30534555 PMCID: PMC6252195 DOI: 10.1155/2018/2862458] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 06/26/2018] [Revised: 09/27/2018] [Accepted: 10/29/2018] [Indexed: 11/17/2022]
Abstract
Mass spectrometry (MS) is an important technique in protein research. Effective classification methods by MS data could contribute to early and less-invasive diagnosis and also facilitate developments in the bioinformatics field. As MS data is featured by high dimension, appropriate methods which can effectively deal with the large amount of MS data have been widely studied. In this paper, the applications of methods based on intelligence algorithms have been investigated. Firstly, classification and biomarker analysis methods using typical machine learning approaches have been discussed. Then those are followed by the Ensemble strategy algorithms. Clearly, simple and basic machine learning algorithms hardly addressed the various needs of protein MS classification. Preprocessing algorithms have been also studied, as these methods are useful for feature selection or feature extraction to improve classification performance. Protein MS data growing with data volume becomes complicated and large; improvements in classification methods in terms of classifier selection and combinations of different algorithms and preprocessing algorithms are more emphasized in further work.
Collapse
Affiliation(s)
- Zichuan Fan
- School of Computer and Information Science, Southwest University, Chongqing 400715, China
| | - Fanchen Kong
- School of Computer and Information Science, Southwest University, Chongqing 400715, China
| | - Yang Zhou
- School of Computer and Information Science, Southwest University, Chongqing 400715, China
| | - Yiqing Chen
- School of Computer and Information Science, Southwest University, Chongqing 400715, China
| | - Yalan Dai
- School of Computer and Information Science, Southwest University, Chongqing 400715, China
| |
Collapse
|
8
|
Liu W, Liu C, Yu J, Zhang Y, Li J, Chen Y, Zheng L. Discrimination of geographical origin of extra virgin olive oils using terahertz spectroscopy combined with chemometrics. Food Chem 2018; 251:86-92. [DOI: 10.1016/j.foodchem.2018.01.081] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2017] [Revised: 01/07/2018] [Accepted: 01/11/2018] [Indexed: 01/20/2023]
|
9
|
Dorl S, Winkler S, Mechtler K, Dorfer V. PhoStar: Identifying Tandem Mass Spectra of Phosphorylated Peptides before Database Search. J Proteome Res 2017; 17:290-295. [PMID: 29057658 DOI: 10.1021/acs.jproteome.7b00563] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Standard proteomics workflows use tandem mass spectrometry followed by sequence database search to analyze complex biological samples. The identification of proteins carrying post-translational modifications, for example, phosphorylation, is typically addressed by allowing variable modifications in the searched sequences. Accounting for these variations exponentially increases the combinatorial space in the database, which leads to increased processing times and more false positive identifications. The here-presented tool PhoStar identifies spectra that originate from phosphorylated peptides before database search using a supervised machine learning approach. The model for the prediction of phosphorylation was trained and validated with an accuracy of 97.6% on a large set of high-confidence spectra collected from publicly available experimental data. Its power was further validated by predicting phosphorylation in the complete NIST human and mouse high collision-dissociation spectral libraries, achieving an accuracy of 98.2 and 97.9%, respectively. We demonstrate the application of PhoStar by using it for spectra filtering before database search. In database search of HeLa samples the peptide search space was reduced by 27-66% while finding at least 97% of total peptide identifications (at 1% FDR) compared with a standard workflow.
Collapse
Affiliation(s)
- Sebastian Dorl
- University of Applied Sciences Upper Austria , Bioinformatics Research Group, Softwarepark 11, 4232 Hagenberg, Austria
| | - Stephan Winkler
- University of Applied Sciences Upper Austria , Bioinformatics Research Group, Softwarepark 11, 4232 Hagenberg, Austria
| | - Karl Mechtler
- Research Institute of Molecular Pathology (IMP) , Protein Chemistry, Campus-Vienna-Biocenter 1, 1030 Vienna, Austria.,Institute of Molecular Biotechnology (IMBA), Protein Chemistry , Vienna Biocenter (VBC), Dr. Bohr-Gasse 3, 1030 Vienna, Austria
| | - Viktoria Dorfer
- University of Applied Sciences Upper Austria , Bioinformatics Research Group, Softwarepark 11, 4232 Hagenberg, Austria
| |
Collapse
|
10
|
Liu G, Cheng K, Lo CY, Li J, Qu J, Neelamegham S. A Comprehensive, Open-source Platform for Mass Spectrometry-based Glycoproteomics Data Analysis. Mol Cell Proteomics 2017; 16:2032-2047. [PMID: 28887379 DOI: 10.1074/mcp.m117.068239] [Citation(s) in RCA: 36] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Revised: 08/23/2017] [Indexed: 12/12/2022] Open
Abstract
Glycosylation is among the most abundant and diverse protein post-translational modifications (PTMs) identified to date. The structural analysis of this PTM is challenging because of the diverse monosaccharides which are not conserved among organisms, the branched nature of glycans, their isomeric structures, and heterogeneity in the glycan distribution at a given site. Glycoproteomics experiments have adopted the traditional high-throughput LC-MSn proteomics workflow to analyze site-specific glycosylation. However, comprehensive computational platforms for data analyses are scarce. To address this limitation, we present a comprehensive, open-source, modular software for glycoproteomics data analysis called GlycoPAT (GlycoProteomics Analysis Toolbox; freely available from www.VirtualGlycome.org/glycopat). The program includes three major advances: (1) "SmallGlyPep," a minimal linear representation of glycopeptides for MSn data analysis. This format allows facile serial fragmentation of both the peptide backbone and PTM at one or more locations. (2) A novel scoring scheme based on calculation of the "Ensemble Score (ES)," a measure that scores and rank-orders MS/MS spectrum for N- and O-linked glycopeptides using cross-correlation and probability based analyses. (3) A false discovery rate (FDR) calculation scheme where decoy glycopeptides are created by simultaneously scrambling the amino acid sequence and by introducing artificial monosaccharides by perturbing the original sugar mass. Parallel computing facilities and user-friendly GUIs (Graphical User Interfaces) are also provided. GlycoPAT is used to catalogue site-specific glycosylation on simple glycoproteins, standard protein mixtures and human plasma cryoprecipitate samples in three common MS/MS fragmentation modes: CID, HCD and ETD. It is also used to identify 960 unique glycopeptides in cell lysates from prostate cancer cells. The results show that the simultaneous consideration of peptide and glycan fragmentation is necessary for high quality MSn spectrum annotation in CID and HCD fragmentation modes. Additionally, they confirm the suitability of GlycoPAT to analyze shotgun glycoproteomics data.
Collapse
Affiliation(s)
- Gang Liu
- From the ‡Chemical and Biological Engineering
| | - Kai Cheng
- From the ‡Chemical and Biological Engineering.,§Clinical & Translational Research Center
| | - Chi Y Lo
- From the ‡Chemical and Biological Engineering
| | - Jun Li
- ¶Pharmaceutical Sciences; and.,‖New York State Center for Excellence in Bioinformatics and Life Sciences, Buffalo, New York
| | - Jun Qu
- ¶Pharmaceutical Sciences; and.,‖New York State Center for Excellence in Bioinformatics and Life Sciences, Buffalo, New York
| | - Sriram Neelamegham
- From the ‡Chemical and Biological Engineering; .,§Clinical & Translational Research Center
| |
Collapse
|
11
|
Campbell MP. A Review of Software Applications and Databases for the Interpretation of Glycopeptide Data. TRENDS GLYCOSCI GLYC 2017. [DOI: 10.4052/tigg.1601.1e] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
12
|
Walsh I, Zhao S, Campbell M, Taron CH, Rudd PM. Quantitative profiling of glycans and glycopeptides: an informatics' perspective. Curr Opin Struct Biol 2016; 40:70-80. [PMID: 27522273 DOI: 10.1016/j.sbi.2016.07.022] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2016] [Revised: 07/25/2016] [Accepted: 07/30/2016] [Indexed: 12/16/2022]
Abstract
Experimental techniques to identify and quantify glycan structures in a given sample are continuously improving. However, as they advance data analysis and annotation seems to become more complex. To address this issue, much progress has been made in developing software for interpretation of quantitative glycan profiles. Here, we focus on these informatics tools for high/ultra performance liquid chromatography (H/UPLC), mass spectrometry (MS), tandem mass spectrometry (MSn) and combinations thereof. Software for biomarker discovery, pathway, genomic and disease analysis and a final note on some future prospects for glycoinformatics are also mentioned.
Collapse
Affiliation(s)
- Ian Walsh
- Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), 20 Biopolis Way, #06-01 Centros, Singapore 138668, Singapore; New England Biolabs, Ipswich, MA, United States
| | - Sophie Zhao
- Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), 20 Biopolis Way, #06-01 Centros, Singapore 138668, Singapore
| | - Matthew Campbell
- Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW 2109, Australia
| | | | - Pauline M Rudd
- Bioprocessing Technology Institute, Agency for Science, Technology and Research (A*STAR), 20 Biopolis Way, #06-01 Centros, Singapore 138668, Singapore; National Institute for Bioprocessing Research & Training, Dublin, Ireland.
| |
Collapse
|
13
|
Nasir W, Toledo AG, Noborn F, Nilsson J, Wang M, Bandeira N, Larson G. SweetNET: A Bioinformatics Workflow for Glycopeptide MS/MS Spectral Analysis. J Proteome Res 2016; 15:2826-40. [PMID: 27399812 DOI: 10.1021/acs.jproteome.6b00417] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Glycoproteomics has rapidly become an independent analytical platform bridging the fields of glycomics and proteomics to address site-specific protein glycosylation and its impact in biology. Current glycopeptide characterization relies on time-consuming manual interpretations and demands high levels of personal expertise. Efficient data interpretation constitutes one of the major challenges to be overcome before true high-throughput glycopeptide analysis can be achieved. The development of new glyco-related bioinformatics tools is thus of crucial importance to fulfill this goal. Here we present SweetNET: a data-oriented bioinformatics workflow for efficient analysis of hundreds of thousands of glycopeptide MS/MS-spectra. We have analyzed MS data sets from two separate glycopeptide enrichment protocols targeting sialylated glycopeptides and chondroitin sulfate linkage region glycopeptides, respectively. Molecular networking was performed to organize the glycopeptide MS/MS data based on spectral similarities. The combination of spectral clustering, oxonium ion intensity profiles, and precursor ion m/z shift distributions provided typical signatures for the initial assignment of different N-, O- and CS-glycopeptide classes and their respective glycoforms. These signatures were further used to guide database searches leading to the identification and validation of a large number of glycopeptide variants including novel deoxyhexose (fucose) modifications in the linkage region of chondroitin sulfate proteoglycans.
Collapse
Affiliation(s)
- Waqas Nasir
- Department of Clinical Chemistry and Transfusion Medicine, Institute of Biomedicine, Sahlgrenska Academy at the University of Gothenburg , SE 413 45 Gothenburg, Sweden
| | - Alejandro Gomez Toledo
- Department of Clinical Chemistry and Transfusion Medicine, Institute of Biomedicine, Sahlgrenska Academy at the University of Gothenburg , SE 413 45 Gothenburg, Sweden
| | - Fredrik Noborn
- Department of Clinical Chemistry and Transfusion Medicine, Institute of Biomedicine, Sahlgrenska Academy at the University of Gothenburg , SE 413 45 Gothenburg, Sweden
| | - Jonas Nilsson
- Department of Clinical Chemistry and Transfusion Medicine, Institute of Biomedicine, Sahlgrenska Academy at the University of Gothenburg , SE 413 45 Gothenburg, Sweden
| | - Mingxun Wang
- Department of Computer Science and Engineering, Center for Computational Mass Spectrometry, CSE, and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego , La Jolla, California 92093, United States
| | - Nuno Bandeira
- Department of Computer Science and Engineering, Center for Computational Mass Spectrometry, CSE, and Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego , La Jolla, California 92093, United States
| | - Göran Larson
- Department of Clinical Chemistry and Transfusion Medicine, Institute of Biomedicine, Sahlgrenska Academy at the University of Gothenburg , SE 413 45 Gothenburg, Sweden
| |
Collapse
|
14
|
Thaysen-Andersen M, Packer NH, Schulz BL. Maturing Glycoproteomics Technologies Provide Unique Structural Insights into the N-glycoproteome and Its Regulation in Health and Disease. Mol Cell Proteomics 2016; 15:1773-90. [PMID: 26929216 PMCID: PMC5083109 DOI: 10.1074/mcp.o115.057638] [Citation(s) in RCA: 137] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2015] [Revised: 02/09/2016] [Indexed: 12/21/2022] Open
Abstract
The glycoproteome remains severely understudied because of significant analytical challenges associated with glycoproteomics, the system-wide analysis of intact glycopeptides. This review introduces important structural aspects of protein N-glycosylation and summarizes the latest technological developments and applications in LC-MS/MS-based qualitative and quantitative N-glycoproteomics. These maturing technologies provide unique structural insights into the N-glycoproteome and its synthesis and regulation by complementing existing methods in glycoscience. Modern glycoproteomics is now sufficiently mature to initiate efforts to capture the molecular complexity displayed by the N-glycoproteome, opening exciting opportunities to increase our understanding of the functional roles of protein N-glycosylation in human health and disease.
Collapse
Affiliation(s)
- Morten Thaysen-Andersen
- From the ‡Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW, Australia;
| | - Nicolle H Packer
- From the ‡Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, NSW, Australia
| | - Benjamin L Schulz
- §School of Chemistry & Molecular Biosciences, St Lucia, The University of Queensland, Brisbane, QLD, Australia
| |
Collapse
|
15
|
Liquid chromatography-tandem mass spectrometry-based fragmentation analysis of glycopeptides. Glycoconj J 2016; 33:261-72. [PMID: 26780731 DOI: 10.1007/s10719-016-9649-3] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2015] [Revised: 12/23/2015] [Accepted: 01/04/2016] [Indexed: 02/08/2023]
Abstract
The use of liquid chromatography-electrospray ionization-tandem mass spectrometry (LC-ESI-MS(n)) for the glycoproteomic characterization of glycopeptides is a growing field of research. The N- and O-glycosylated peptides (N- and O-glycopeptides) analyzed typically originate from protease-digested glycoproteins where many of them are expected to be biomedically important. Examples of LC-MS(2) and MS(3) fragmentation strategies used to pursue glycan structure, peptide identity and attachment-site identification analyses of glycopeptides are described in this review. MS(2) spectra, using the CID and HCD fragmentation techniques of a complex biantennary N-glycopeptide and a core 1 O-glycopeptide, representing two examples of commonly studied glycopeptide types, are presented. A few practical tips for accomplishing glycopeptide analysis using reversed-phase LC-MS(n) shotgun proteomics settings, together with references to the latest glycoproteomic studies, are presented.
Collapse
|
16
|
Liu G, Neelamegham S. Integration of systems glycobiology with bioinformatics toolboxes, glycoinformatics resources, and glycoproteomics data. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2015; 7:163-81. [PMID: 25871730 DOI: 10.1002/wsbm.1296] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/24/2014] [Revised: 02/08/2015] [Accepted: 03/04/2015] [Indexed: 12/22/2022]
Abstract
The glycome constitutes the entire complement of free carbohydrates and glycoconjugates expressed on whole cells or tissues. 'Systems Glycobiology' is an emerging discipline that aims to quantitatively describe and analyse the glycome. Here, instead of developing a detailed understanding of single biochemical processes, a combination of computational and experimental tools are used to seek an integrated or 'systems-level' view. This can explain how multiple biochemical reactions and transport processes interact with each other to control glycome biosynthesis and function. Computational methods in this field commonly build in silico reaction network models to describe experimental data derived from structural studies that measure cell-surface glycan distribution. While considerable progress has been made, several challenges remain due to the complex and heterogeneous nature of this post-translational modification. First, for the in silico models to be standardized and shared among laboratories, it is necessary to integrate glycan structure information and glycosylation-related enzyme definitions into the mathematical models. Second, as glycoinformatics resources grow, it would be attractive to utilize 'Big Data' stored in these repositories for model construction and validation. Third, while the technology for profiling the glycome at the whole-cell level has been standardized, there is a need to integrate mass spectrometry derived site-specific glycosylation data into the models. The current review discusses progress that is being made to resolve the above bottlenecks. The focus is on how computational models can bridge the gap between 'data' generated in wet-laboratory studies with 'knowledge' that can enhance our understanding of the glycome.
Collapse
Affiliation(s)
- Gang Liu
- Department of Chemical and Biological Engineering, State University of New York, Buffalo, NY, USA
| | - Sriram Neelamegham
- Department of Chemical and Biological Engineering, State University of New York, Buffalo, NY, USA
| |
Collapse
|