1
|
Hong S, Chattaraj KG, Guo J, Trout BL, Braatz RD. Enhanced O-glycosylation site prediction using explainable machine learning technique with spatial local environment. Bioinformatics 2025; 41:btaf034. [PMID: 39878910 PMCID: PMC11814488 DOI: 10.1093/bioinformatics/btaf034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 11/29/2024] [Accepted: 01/26/2025] [Indexed: 01/31/2025] Open
Abstract
MOTIVATION The accurate prediction of O-GlcNAcylation sites is crucial for understanding disease mechanisms and developing effective treatments. Previous machine learning (ML) models primarily relied on primary or secondary protein structural and related properties, which have limitations in capturing the spatial interactions of neighboring amino acids. This study introduces local environmental features as a novel approach that incorporates three-dimensional spatial information, significantly improving model performance by considering the spatial context around the target site. Additionally, we utilize sparse recurrent neural networks to effectively capture sequential nature of the proteins and to identify key factors influencing O-GlcNAcylation as an explainable ML model. RESULTS Our findings demonstrate the effectiveness of our proposed features with the model achieving an F1 score of 28.3%, as well as feature selection capability with the model using only the top 20% of features achieving the highest F1 score of 32.02%, a 1.4-fold improvement over existing PTM models. Statistical analysis of the top 20 features confirmed their consistency with literature. This method not only boosts prediction accuracy but also paves the way for further research in understanding and targeting O-GlcNAcylation. AVAILABILITY AND IMPLEMENTATION The entire code, data, features used in this study are available in the GitHub repository: https://github.com/pseokyoung/o-glcnac-prediction.
Collapse
Affiliation(s)
- Seokyoung Hong
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
| | - Krishna Gopal Chattaraj
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
| | - Jing Guo
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
| | - Bernhardt L Trout
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
| | - Richard D Braatz
- Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, United States
| |
Collapse
|
2
|
He XF, Hu X, Wen GJ, Wang Z, Lin WJ. O-GlcNAcylation in cancer development and immunotherapy. Cancer Lett 2023; 566:216258. [PMID: 37279852 DOI: 10.1016/j.canlet.2023.216258] [Citation(s) in RCA: 39] [Impact Index Per Article: 19.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 05/03/2023] [Accepted: 05/30/2023] [Indexed: 06/08/2023]
Abstract
O-linked β-D-N-acetylglucosamine (O-GlcNAc), as a posttranslational modification (PTM), is a reversible reaction that attaches β-N-GlcNAc to Ser/Thr residues on specific proteins by O-GlcNAc transferase (OGT). O-GlcNAcase (OGA) removes the O-GlcNAc from O-GlcNAcylated proteins. O-GlcNAcylation regulates numerous cellular processes, including signal transduction, the cell cycle, metabolism, and energy homeostasis. Dysregulation of O-GlcNAcylation contributes to the development of various diseases, including cancers. Accumulating evidence has revealed that higher expression levels of OGT and hyper-O-GlcNAcylation are detected in many cancer types and governs glucose metabolism, proliferation, metastasis, invasion, angiogenesis, migration and drug resistance. In this review, we describe the biological functions and molecular mechanisms of OGT- or O-GlcNAcylation-mediated tumorigenesis. Moreover, we discuss the potential role of O-GlcNAcylation in tumor immunotherapy. Furthermore, we highlight that compounds can target O-GlcNAcylation by regulating OGT to suppress oncogenesis. Taken together, targeting protein O-GlcNAcylation might be a promising strategy for the treatment of human malignancies.
Collapse
Affiliation(s)
- Xue-Fen He
- Department of Obstetrics and Gynecology, Wenzhou Third Clinical Institute Affiliated to Wenzhou Medical University, Wenzhou People's Hospital, Wenzhou, 325000, Zhejiang, China
| | - Xiaoli Hu
- Department of Gynecology, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Gao-Jing Wen
- Department of Obstetrics and Gynecology, Wenzhou Third Clinical Institute Affiliated to Wenzhou Medical University, Wenzhou People's Hospital, Wenzhou, 325000, Zhejiang, China
| | - Zhiwei Wang
- Department of Biochemistry and Molecular Biology, School of Laboratory Medicine, Bengbu Medical College, Anhui, China; Department of Obstetrics and Gynecology, The Second Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, China.
| | - Wen-Jing Lin
- Department of Obstetrics and Gynecology, Wenzhou Third Clinical Institute Affiliated to Wenzhou Medical University, Wenzhou People's Hospital, Wenzhou, 325000, Zhejiang, China.
| |
Collapse
|
3
|
Cheng CC, Ke GM, Chu PY, Ke LY. Elucidating the Implications of Norovirus N- and O-Glycosylation, O-GlcNAcylation, and Phosphorylation. Viruses 2023; 15:v15030798. [PMID: 36992506 PMCID: PMC10054809 DOI: 10.3390/v15030798] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Revised: 03/19/2023] [Accepted: 03/21/2023] [Indexed: 03/31/2023] Open
Abstract
Norovirus is the most common cause of foodborne gastroenteritis, affecting millions of people worldwide annually. Among the ten genotypes (GI-GX) of norovirus, only GI, GII, GIV, GVIII, and GIX infect humans. Some genotypes reportedly exhibit post-translational modifications (PTMs), including N- and O-glycosylation, O-GlcNAcylation, and phosphorylation, in their viral antigens. PTMs have been linked to increased viral genome replication, viral particle release, and virulence. Owing to breakthroughs in mass spectrometry (MS) technologies, more PTMs have been discovered in recent years and have contributed significantly to preventing and treating infectious diseases. However, the mechanisms by which PTMs act on noroviruses remain poorly understood. In this section, we outline the current knowledge of the three common types of PTM and investigate their impact on norovirus pathogenesis. Moreover, we summarize the strategies and techniques for the identification of PTMs.
Collapse
Affiliation(s)
- Chia-Chi Cheng
- Department of Medical Laboratory Science and Biotechnology, College of Health Sciences, Kaohsiung Medical University, Kaohsiung 807378, Taiwan
| | - Guan-Ming Ke
- Graduate Institute of Animal Vaccine Technology, College of Veterinary Medicine, National Pingtung University of Science and Technology, Pingtung 912301, Taiwan
| | - Pei-Yu Chu
- Department of Medical Laboratory Science and Biotechnology, College of Health Sciences, Kaohsiung Medical University, Kaohsiung 807378, Taiwan
| | - Liang-Yin Ke
- Department of Medical Laboratory Science and Biotechnology, College of Health Sciences, Kaohsiung Medical University, Kaohsiung 807378, Taiwan
- Graduate Institute of Animal Vaccine Technology, College of Veterinary Medicine, National Pingtung University of Science and Technology, Pingtung 912301, Taiwan
- Center for Lipid Biosciences, Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung 807378, Taiwan
- Graduate Institute of Medicine, College of Medicine, Kaohsiung Medical University, Kaohsiung 807378, Taiwan
| |
Collapse
|
4
|
Burt RA, Dejanovic B, Peckham HJ, Lee KA, Li X, Ounadjela JR, Rao A, Malaker SA, Carr SA, Myers SA. Novel Antibodies for the Simple and Efficient Enrichment of Native O-GlcNAc Modified Peptides. Mol Cell Proteomics 2021; 20:100167. [PMID: 34678516 PMCID: PMC8605273 DOI: 10.1016/j.mcpro.2021.100167] [Citation(s) in RCA: 34] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2021] [Revised: 09/26/2021] [Accepted: 10/18/2021] [Indexed: 01/03/2023] Open
Abstract
Antibodies against posttranslational modifications (PTMs) such as lysine acetylation, ubiquitin remnants, or phosphotyrosine have resulted in significant advances in our understanding of the fundamental roles of these PTMs in biology. However, the roles of a number of PTMs remain largely unexplored due to the lack of robust enrichment reagents. The addition of N-acetylglucosamine to serine and threonine residues (O-GlcNAc) by the O-GlcNAc transferase (OGT) is a PTM implicated in numerous biological processes and disease states but with limited techniques for its study. Here, we evaluate a new mixture of anti-O-GlcNAc monoclonal antibodies for the immunoprecipitation of native O-GlcNAcylated peptides from cells and tissues. The anti-O-GlcNAc antibodies display good sensitivity and high specificity toward O-GlcNAc-modified peptides and do not recognize O-GalNAc or GlcNAc in extended glycans. Applying this antibody-based enrichment strategy to synaptosomes from mouse brain tissue samples, we identified over 1300 unique O-GlcNAc-modified peptides and over 1000 sites using just a fraction of sample preparation and instrument time required in other landmark investigations of O-GlcNAcylation. Our rapid and robust method greatly simplifies the analysis of O-GlcNAc signaling and will help to elucidate the role of this challenging PTM in health and disease. Anti-O-GlcNAc antibodies are fast and simple enrichment reagents. Anti-O-GlcNAc antibodies are sensitive and achieve significant depth of coverage. Anti-O-GlcNAc antibodies are specific for singular O-GlcNAc modifications. Anti-O-GlcNAc antibody enrichment techniques can be applied to cells and tissues. HCD product-triggered EThcD data acquisition improves depth of coverage.
Collapse
Affiliation(s)
- Rajan A Burt
- The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Borislav Dejanovic
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | | | - Kimberly A Lee
- Cell Signaling Technology, Inc, Danvers, Massachusetts, USA
| | - Xiang Li
- La Jolla Institute for Immunology, La Jolla, California, USA
| | | | - Anjana Rao
- La Jolla Institute for Immunology, La Jolla, California, USA; Department of Pharmacology, University of California San Diego, La Jolla, California, USA; Moores Cancer Center, University of California San Diego, La Jolla, California, USA
| | - Stacy A Malaker
- Department of Chemistry, Yale University, New Haven, Connecticut, USA
| | - Steven A Carr
- The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA.
| | - Samuel A Myers
- The Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA; La Jolla Institute for Immunology, La Jolla, California, USA.
| |
Collapse
|
5
|
Dou L, Yang F, Xu L, Zou Q. A comprehensive review of the imbalance classification of protein post-translational modifications. Brief Bioinform 2021; 22:6217722. [PMID: 33834199 DOI: 10.1093/bib/bbab089] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2021] [Revised: 02/17/2021] [Accepted: 02/24/2021] [Indexed: 12/13/2022] Open
Abstract
Post-translational modifications (PTMs) play significant roles in regulating protein structure, activity and function, and they are closely involved in various pathologies. Therefore, the identification of associated PTMs is the foundation of in-depth research on related biological mechanisms, disease treatments and drug design. Due to the high cost and time consumption of high-throughput sequencing techniques, developing machine learning-based predictors has been considered an effective approach to rapidly recognize potential modified sites. However, the imbalanced distribution of true and false PTM sites, namely, the data imbalance problem, largely effects the reliability and application of prediction tools. In this article, we conduct a systematic survey of the research progress in the imbalanced PTMs classification. First, we describe the modeling process in detail and outline useful data imbalance solutions. Then, we summarize the recently proposed bioinformatics tools based on imbalanced PTM data and simultaneously build a convenient website, ImClassi_PTMs (available at lab.malab.cn/∼dlj/ImbClassi_PTMs/), to facilitate the researchers to view. Moreover, we analyze the challenges of current computational predictors and propose some suggestions to improve the efficiency of imbalance learning. We hope that this work will provide comprehensive knowledge of imbalanced PTM recognition and contribute to advanced predictors in the future.
Collapse
Affiliation(s)
- Lijun Dou
- University of Electronic Science and Technology of China and the Shenzhen Polytechnic, China
| | - Fenglong Yang
- University of Electronic Science and Technology of China and the Shenzhen Polytechnic, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
6
|
Zuo Y, Zou Q, Lin J, Jiang M, Liu X. 2lpiRNApred: a two-layered integrated algorithm for identifying piRNAs and their functions based on LFE-GM feature selection. RNA Biol 2020; 17:892-902. [PMID: 32138598 PMCID: PMC7549647 DOI: 10.1080/15476286.2020.1734382] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Revised: 12/16/2019] [Accepted: 02/18/2020] [Indexed: 12/18/2022] Open
Abstract
Piwi-interacting RNAs (piRNAs) are indispensable in the transposon silencing, including in germ cell formation, germline stem cell maintenance, spermatogenesis, and oogenesis. piRNA pathways are amongst the major genome defence mechanisms, which maintain genome integrity. They also have important functions in tumorigenesis, as indicated by aberrantly expressed piRNAs being recently shown to play roles in the process of cancer development. A number of computational methods for this have recently been proposed, but they still have not yielded satisfactory predictive performance. Moreover, only one computational method that identifies whether piRNAs function in inducting target mRNA deadenylation been reported in the literature. In this study, we developed a two-layered integrated classifier algorithm, 2lpiRNApred. It identifies piRNAs in the first layer and determines whether they function in inducting target mRNA deadenylation in the second layer. A new feature selection algorithm, which was based on Luca fuzzy entropy and Gaussian membership function (LFE-GM), was proposed to reduce the dimensionality of the features. Five feature extraction strategies, namely, Kmer, General parallel correlation pseudo-dinucleotide composition, General series correlation pseudo-dinucleotide composition, Normalized Moreau-Broto autocorrelation, and Geary autocorrelation, and two types of classifier, Sparse Representation Classifier (SRC) and support vector machine with Mahalanobis distance-based radial basis function (SVMMDRBF), were used to construct a two-layered integrated classifier algorithm, 2lpiRNApred. The results indicate that 2lpiRNApred performs significantly better than six other existing prediction tools.
Collapse
Affiliation(s)
- Yun Zuo
- Department of Computer Science, Xiamen University, Xiamen, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, China
| | - Jianyuan Lin
- Department of Computer Science, Xiamen University, Xiamen, China
| | - Min Jiang
- Department of Cognitive Science and Technology, Xiamen University, Xiamen, China
| | - Xiangrong Liu
- Department of Computer Science, Xiamen University, Xiamen, China
| |
Collapse
|
7
|
Huang KY, Lee TY, Kao HJ, Ma CT, Lee CC, Lin TH, Chang WC, Huang HD. dbPTM in 2019: exploring disease association and cross-talk of post-translational modifications. Nucleic Acids Res 2020; 47:D298-D308. [PMID: 30418626 PMCID: PMC6323979 DOI: 10.1093/nar/gky1074] [Citation(s) in RCA: 159] [Impact Index Per Article: 31.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Accepted: 10/19/2018] [Indexed: 12/25/2022] Open
Abstract
The dbPTM (http://dbPTM.mbc.nctu.edu.tw/) has been maintained for over 10 years with the aim to provide functional and structural analyses for post-translational modifications (PTMs). In this update, dbPTM not only integrates more experimentally validated PTMs from available databases and through manual curation of literature but also provides PTM-disease associations based on non-synonymous single nucleotide polymorphisms (nsSNPs). The high-throughput deep sequencing technology has led to a surge in the data generated through analysis of association between SNPs and diseases, both in terms of growth amount and scope. This update thus integrated disease-associated nsSNPs from dbSNP based on genome-wide association studies. The PTM substrate sites located at a specified distance in terms of the amino acids encoded from nsSNPs were deemed to have an association with the involved diseases. In recent years, increasing evidence for crosstalk between PTMs has been reported. Although mass spectrometry-based proteomics has substantially improved our knowledge about substrate site specificity of single PTMs, the fact that the crosstalk of combinatorial PTMs may act in concert with the regulation of protein function and activity is neglected. Because of the relatively limited information about concurrent frequency and functional relevance of PTM crosstalk, in this update, the PTM sites neighboring other PTM sites in a specified window length were subjected to motif discovery and functional enrichment analysis. This update highlights the current challenges in PTM crosstalk investigation and breaks the bottleneck of how proteomics may contribute to understanding PTM codes, revealing the next level of data complexity and proteomic limitation in prospective PTM research.
Collapse
Affiliation(s)
- Kai-Yao Huang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Life and Health Science, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Life and Health Science, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Hui-Ju Kao
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan
| | - Chen-Tse Ma
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan
| | - Chao-Chun Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan
| | - Tsai-Hsuan Lin
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan
| | - Wen-Chi Chang
- Institute of Tropical Plant Sciences, College of Biosciences and Biotechnology, National Cheng Kung University, Tainan 70101, Taiwan
| | - Hsien-Da Huang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China.,School of Life and Health Science, The Chinese University of Hong Kong, Shenzhen 518172, China
| |
Collapse
|
8
|
Li C, Zhang H, Chu D, Xu X. SRTM: a supervised relation topic model for multi-classification on large-scale document network. Neural Comput Appl 2019. [DOI: 10.1007/s00521-019-04145-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
|
9
|
Shi J, Ruijtenbeek R, Pieters RJ. Demystifying O-GlcNAcylation: hints from peptide substrates. Glycobiology 2019; 28:814-824. [PMID: 29635275 DOI: 10.1093/glycob/cwy031] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2017] [Accepted: 03/21/2018] [Indexed: 12/20/2022] Open
Abstract
O-GlcNAcylation, analogous to phosphorylation, is an essential post-translational modification of proteins at Ser/Thr residues with a single β-N-acetylglucosamine moiety. This dynamic protein modification regulates many fundamental cellular processes and its deregulation has been linked to chronic diseases such as cancer, diabetes and neurodegenerative disorders. Reversible attachment and removal of O-GlcNAc is governed only by O-GlcNAc transferase and O-GlcNAcase, respectively. Peptide substrates, derived from natural O-GlcNAcylation targets, function in the catalytic cores of these two enzymes by maintaining interactions between enzyme and substrate, which makes them ideal models for the study of O-GlcNAcylation and deglycosylation. These peptides provide valuable tools for a deeper understanding of O-GlcNAc processing enzymes. By taking advantage of peptide chemistry, recent progress in the study of activity and regulatory mechanisms of these two enzymes has advanced our understanding of their fundamental specificities as well as their potential as therapeutic targets. Hence, this review summarizes the recent achievements on this modification studied at the peptide level, focusing on enzyme activity, enzyme specificity, direct function, site-specific antibodies and peptide substrate-inspired inhibitors.
Collapse
Affiliation(s)
- Jie Shi
- Department of Chemical Biology and Drug Discovery, Utrecht Institute for Pharmaceutical Sciences, Utrecht University, TB Utrecht, The Netherlands
| | - Rob Ruijtenbeek
- Department of Chemical Biology and Drug Discovery, Utrecht Institute for Pharmaceutical Sciences, Utrecht University, TB Utrecht, The Netherlands.,PamGene International BV, HH's-Hertogenbosch, The Netherlands
| | - Roland J Pieters
- Department of Chemical Biology and Drug Discovery, Utrecht Institute for Pharmaceutical Sciences, Utrecht University, TB Utrecht, The Netherlands
| |
Collapse
|
10
|
Huang KY, Kao HJ, Hsu JBK, Weng SL, Lee TY. Characterization and identification of lysine glutarylation based on intrinsic interdependence between positions in the substrate sites. BMC Bioinformatics 2019; 19:384. [PMID: 30717647 PMCID: PMC7394328 DOI: 10.1186/s12859-018-2394-9] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2018] [Accepted: 09/25/2018] [Indexed: 01/06/2023] Open
Abstract
Background Glutarylation, the addition of a glutaryl group (five carbons) to a lysine residue of a protein molecule, is an important post-translational modification and plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified glutarylated peptides increases, it becomes imperative to investigate substrate motifs to enhance the study of protein glutarylation. We carried out a bioinformatics investigation of glutarylation sites based on amino acid composition using a public database containing information on 430 non-homologous glutarylation sites. Results The TwoSampleLogo analysis indicates that positively charged and polar amino acids surrounding glutarylated sites may be associated with the specificity in substrate site of protein glutarylation. Additionally, the chi-squared test was utilized to explore the intrinsic interdependence between two positions around glutarylation sites. Further, maximal dependence decomposition (MDD), which consists of partitioning a large-scale dataset into subgroups with statistically significant amino acid conservation, was used to capture motif signatures of glutarylation sites. We considered single features, such as amino acid composition (AAC), amino acid pair composition (AAPC), and composition of k-spaced amino acid pairs (CKSAAP), as well as the effectiveness of incorporating MDD-identified substrate motifs into an integrated prediction model. Evaluation by five-fold cross-validation showed that AAC was most effective in discriminating between glutarylation and non-glutarylation sites, according to support vector machine (SVM). Conclusions The SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.677, a specificity of 0.619, an accuracy of 0.638, and a Matthews Correlation Coefficient (MCC) value of 0.28. Using an independent testing dataset (46 glutarylated and 92 non-glutarylated sites) obtained from the literature, we demonstrated that the integrated SVM model could improve the predictive performance effectively, yielding a balanced sensitivity and specificity of 0.652 and 0.739, respectively. This integrated SVM model has been implemented as a web-based system (MDDGlutar), which is now freely available at http://csb.cse.yzu.edu.tw/MDDGlutar/. Electronic supplementary material The online version of this article (10.1186/s12859-018-2394-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kai-Yao Huang
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 518172, China.,Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 518172, China
| | - Hui-Ju Kao
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 518172, China.,Department of Computer Science and Engineering, Yuan Ze University, Taoyuan city, 320, Taiwan
| | - Justin Bo-Kai Hsu
- Department of Medical Research, Taipei Medical University Hospital, Taipei city, 110, Taiwan
| | - Shun-Long Weng
- Department of Medicine, Mackay Medical College, New Taipei City, 252, Taiwan.,Mackay Medicine, Nursing and Management College, Taipei, 112, Taiwan.,Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsin-Chu, 300, Taiwan
| | - Tzong-Yi Lee
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 518172, China. .,Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 518172, China.
| |
Collapse
|
11
|
Abstract
Protein O-GlcNAcylation on serine and threonine residues is a significant posttranslational modification. Experimental techniques can uncover only a small portion of O-GlcNAcylation sites. Several computational algorithms have been proposed as necessary auxiliary tools to identify potential O-GlcNAcylation sites. This chapter discusses the metrics and procedures used to assess prediction tools and surveys six computational tools for the prediction of protein O-GlcNAcylation sites. Analyses of these tools using an independent test dataset indicated the advantages and disadvantages of the six existing prediction methods. We also discuss the challenges that may be faced while developing novel predictors in the future.
Collapse
Affiliation(s)
- Cangzhi Jia
- Department of Mathematics, Dalian Maritime University, Dalian, China.
| | - Yun Zuo
- Department of Mathematics, Dalian Maritime University, Dalian, China
| |
Collapse
|
12
|
Levine ZG, Fan C, Melicher MS, Orman M, Benjamin T, Walker S. O-GlcNAc Transferase Recognizes Protein Substrates Using an Asparagine Ladder in the Tetratricopeptide Repeat (TPR) Superhelix. J Am Chem Soc 2018; 140:3510-3513. [PMID: 29485866 PMCID: PMC5937710 DOI: 10.1021/jacs.7b13546] [Citation(s) in RCA: 70] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The essential mammalian enzyme O-GlcNAc Transferase (OGT) is uniquely responsible for transferring N-acetylglucosamine to over a thousand nuclear and cytoplasmic proteins, yet there is no known consensus sequence and it remains unclear how OGT recognizes its substrates. To address this question, we developed a protein microarray assay that chemoenzymatically labels de novo sites of glycosylation with biotin, allowing us to simultaneously assess OGT activity across >6000 human proteins. With this assay we examined the contribution to substrate selection of a conserved asparagine ladder within the lumen of OGT's superhelical tetratricopeptide repeat (TPR) domain. When five asparagines were mutated, OGT retained significant activity against short peptides, but showed limited limited glycosylation of protein substrates on the microarray. O-GlcNAcylation of protein substrates in cell extracts was also greatly attenuated. We conclude that OGT recognizes the majority of its substrates by binding them to the asparagine ladder in the TPR lumen proximal to the catalytic domain.
Collapse
Affiliation(s)
- Zebulon G. Levine
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Chenguang Fan
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Michael S. Melicher
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Marina Orman
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Tania Benjamin
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, Massachusetts 02115, United States
| | - Suzanne Walker
- Department of Microbiology and Immunobiology, Harvard Medical School, Boston, Massachusetts 02115, United States
| |
Collapse
|
13
|
Jia C, Zuo Y, Zou Q. O-GlcNAcPRED-II: an integrated classification algorithm for identifying O-GlcNAcylation sites based on fuzzy undersampling and a K-means PCA oversampling technique. Bioinformatics 2018; 34:2029-2036. [DOI: 10.1093/bioinformatics/bty039] [Citation(s) in RCA: 97] [Impact Index Per Article: 13.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 02/05/2018] [Indexed: 11/13/2022] Open
Affiliation(s)
- Cangzhi Jia
- Department of Mathematics, Dalian Maritime University, Dalian, China
| | - Yun Zuo
- Department of Mathematics, Dalian Maritime University, Dalian, China
| | - Quan Zou
- School of Computer Science and Technology, Tianjin University, Tianjin, China
| |
Collapse
|
14
|
Kumar S, Sharma A, Tsunoda T. An improved discriminative filter bank selection approach for motor imagery EEG signal classification using mutual information. BMC Bioinformatics 2017; 18:545. [PMID: 29297303 PMCID: PMC5751568 DOI: 10.1186/s12859-017-1964-6] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Common spatial pattern (CSP) has been an effective technique for feature extraction in electroencephalography (EEG) based brain computer interfaces (BCIs). However, motor imagery EEG signal feature extraction using CSP generally depends on the selection of the frequency bands to a great extent. METHODS In this study, we propose a mutual information based frequency band selection approach. The idea of the proposed method is to utilize the information from all the available channels for effectively selecting the most discriminative filter banks. CSP features are extracted from multiple overlapping sub-bands. An additional sub-band has been introduced that cover the wide frequency band (7-30 Hz) and two different types of features are extracted using CSP and common spatio-spectral pattern techniques, respectively. Mutual information is then computed from the extracted features of each of these bands and the top filter banks are selected for further processing. Linear discriminant analysis is applied to the features extracted from each of the filter banks. The scores are fused together, and classification is done using support vector machine. RESULTS The proposed method is evaluated using BCI Competition III dataset IVa, BCI Competition IV dataset I and BCI Competition IV dataset IIb, and it outperformed all other competing methods achieving the lowest misclassification rate and the highest kappa coefficient on all three datasets. CONCLUSIONS Introducing a wide sub-band and using mutual information for selecting the most discriminative sub-bands, the proposed method shows improvement in motor imagery EEG signal classification.
Collapse
Affiliation(s)
- Shiu Kumar
- Department of Electronics, Instrumentation and Control Engineering, School of Electrical & Electronics Engineering, Fiji National University, Suva, Fiji. .,School of Engineering and Physics, Faculty of Science, Technology and Environment, The University of the South Pacific, Suva, Fiji.
| | - Alok Sharma
- School of Engineering and Physics, Faculty of Science, Technology and Environment, The University of the South Pacific, Suva, Fiji.,Institute for Integrated and Intelligent Systems (IIIS), Griffith University, Brisbane, Australia.,RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan.,CREST, JST, Yokohama, 230-0045, Japan
| | - Tatsuhiko Tsunoda
- RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan.,CREST, JST, Yokohama, 230-0045, Japan.,Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, 113-8510, Japan
| |
Collapse
|
15
|
Britto-Borges T, Barton GJ. A study of the structural properties of sites modified by the O-linked 6-N-acetylglucosamine transferase. PLoS One 2017; 12:e0184405. [PMID: 28886091 PMCID: PMC5590929 DOI: 10.1371/journal.pone.0184405] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2017] [Accepted: 08/23/2017] [Indexed: 01/17/2023] Open
Abstract
Protein O-GlcNAcylation (O-GlcNAc) is an essential post-translational modification (PTM) in higher eukaryotes. The O-linked β-N-acetylglucosamine transferase (OGT), targets specific Serines and Threonines (S/T) in intracellular proteins. However, unlike phosphorylation, fewer than 25% of known O-GlcNAc sites match a clear sequence pattern. Accordingly, the three-dimensional structures of O-GlcNAc sites were characterised to investigate the role of structure in molecular recognition. From 1,584 O-GlcNAc sites in 620 proteins, 143 were mapped to protein structures determined by X-ray crystallography. The modified S/T were 1.7 times more likely to be annotated in the REM465 field which defines missing residues in a protein structure, while 7 O-GlcNAc sites were solvent inaccessible and unlikely to be targeted by OGT. 132 sites with complete backbone atoms clustered into 10 groups, but these were indistinguishable from clusters from unmodified S/T. This suggests there is no prevalent three-dimensional motif for OGT recognition. Predicted features from the 620 proteins were compared to unmodified S/T in O-GlcNAcylated proteins and globular proteins. The Jpred4 predicted secondary structure shows that modified S/T were more likely to be coils. 5/6 methods to predict intrinsic disorder indicated O-GlcNAcylated S/T to be significantly more disordered than unmodified S/T. Although the analysis did not find a pattern in the site three-dimensional structure, it revealed the residues around the modification site are likely to be disordered and suggests a potential role of secondary structure elements in OGT site recognition.
Collapse
Affiliation(s)
- Thiago Britto-Borges
- Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, United Kingdom
| | - Geoffrey J. Barton
- Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, United Kingdom
| |
Collapse
|
16
|
Weng SL, Kao HJ, Huang CH, Lee TY. MDD-Palm: Identification of protein S-palmitoylation sites with substrate motifs based on maximal dependence decomposition. PLoS One 2017; 12:e0179529. [PMID: 28662047 PMCID: PMC5491019 DOI: 10.1371/journal.pone.0179529] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2016] [Accepted: 05/31/2017] [Indexed: 12/14/2022] Open
Abstract
S-palmitoylation, the covalent attachment of 16-carbon palmitic acids to a cysteine residue via a thioester linkage, is an important reversible lipid modification that plays a regulatory role in a variety of physiological and biological processes. As the number of experimentally identified S-palmitoylated peptides increases, it is imperative to investigate substrate motifs to facilitate the study of protein S-palmitoylation. Based on 710 non-homologous S-palmitoylation sites obtained from published databases and the literature, we carried out a bioinformatics investigation of S-palmitoylation sites based on amino acid composition. Two Sample Logo indicates that positively charged and polar amino acids surrounding S-palmitoylated sites may be associated with the substrate site specificity of protein S-palmitoylation. Additionally, maximal dependence decomposition (MDD) was applied to explore the motif signatures of S-palmitoylation sites by categorizing a large-scale dataset into subgroups with statistically significant conservation of amino acids. Single features such as amino acid composition (AAC), amino acid pair composition (AAPC), position specific scoring matrix (PSSM), position weight matrix (PWM), amino acid substitution matrix (BLOSUM62), and accessible surface area (ASA) were considered, along with the effectiveness of incorporating MDD-identified substrate motifs into a two-layered prediction model. Evaluation by five-fold cross-validation showed that a hybrid of AAC and PSSM performs best at discriminating between S-palmitoylation and non-S-palmitoylation sites, according to the support vector machine (SVM). The two-layered SVM model integrating MDD-identified substrate motifs performed well, with a sensitivity of 0.79, specificity of 0.80, accuracy of 0.80, and Matthews Correlation Coefficient (MCC) value of 0.45. Using an independent testing dataset (613 S-palmitoylated and 5412 non-S-palmitoylated sites) obtained from the literature, we demonstrated that the two-layered SVM model could outperform other prediction tools, yielding a balanced sensitivity and specificity of 0.690 and 0.694, respectively. This two-layered SVM model has been implemented as a web-based system (MDD-Palm), which is now freely available at http://csb.cse.yzu.edu.tw/MDDPalm/.
Collapse
Affiliation(s)
- Shun-Long Weng
- Department of Medicine, Mackay Medical College, New Taipei City, Taiwan
- Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsinchu city, Taiwan
- Mackay Junior College of Medicine, Nursing and Management, Taipei, Taiwan
| | - Hui-Ju Kao
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan
| | - Chien-Hsun Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan
- Tao-Yuan Hospital, Ministry of Health & Welfare, Taoyuan, Taiwan
- * E-mail: (TYL); (CHH)
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, Taiwan
- Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, Taiwan
- * E-mail: (TYL); (CHH)
| |
Collapse
|
17
|
Shi J, Tomašič T, Sharif S, Brouwer AJ, Anderluh M, Ruijtenbeek R, Pieters RJ. Peptide microarray analysis of the cross-talk between O-GlcNAcylation and tyrosine phosphorylation. FEBS Lett 2017; 591:1872-1883. [DOI: 10.1002/1873-3468.12708] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 05/31/2017] [Indexed: 12/18/2022]
Affiliation(s)
- Jie Shi
- Department of Chemical Biology and Drug Discovery; Utrecht Institute for Pharmaceutical Sciences, Utrecht University; Utrecht The Netherlands
| | | | - Suhela Sharif
- Department of Chemical Biology and Drug Discovery; Utrecht Institute for Pharmaceutical Sciences, Utrecht University; Utrecht The Netherlands
| | - Arwin J. Brouwer
- Department of Chemical Biology and Drug Discovery; Utrecht Institute for Pharmaceutical Sciences, Utrecht University; Utrecht The Netherlands
| | | | - Rob Ruijtenbeek
- Department of Chemical Biology and Drug Discovery; Utrecht Institute for Pharmaceutical Sciences, Utrecht University; Utrecht The Netherlands
- PamGene International BV; ‘s-Hertogenbosch The Netherlands
| | - Roland J. Pieters
- Department of Chemical Biology and Drug Discovery; Utrecht Institute for Pharmaceutical Sciences, Utrecht University; Utrecht The Netherlands
| |
Collapse
|
18
|
Bouadjenek MR, Verspoor K, Zobel J. Automated detection of records in biological sequence databases that are inconsistent with the literature. J Biomed Inform 2017. [PMID: 28624643 DOI: 10.1016/j.jbi.2017.06.015] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
We investigate and analyse the data quality of nucleotide sequence databases with the objective of automatic detection of data anomalies and suspicious records. Specifically, we demonstrate that the published literature associated with each data record can be used to automatically evaluate its quality, by cross-checking the consistency of the key content of the database record with the referenced publications. Focusing on GenBank, we describe a set of quality indicators based on the relevance paradigm of information retrieval (IR). Then, we use these quality indicators to train an anomaly detection algorithm to classify records as "confident" or "suspicious". Our experiments on the PubMed Central collection show assessing the coherence between the literature and database records, through our algorithms, is an effective mechanism for assisting curators to perform data cleansing. Although fewer than 0.25% of the records in our data set are known to be faulty, we would expect that there are many more in GenBank that have not yet been identified. By automated comparison with literature they can be identified with a precision of up to 10% and a recall of up to 30%, while strongly outperforming several baselines. While these results leave substantial room for improvement, they reflect both the very imbalanced nature of the data, and the limited explicitly labelled data that is available. Overall, the obtained results show promise for the development of a new kind of approach to detecting low-quality and suspicious sequence records based on literature analysis and consistency. From a practical point of view, this will greatly help curators in identifying inconsistent records in large-scale sequence databases by highlighting records that are likely to be inconsistent with the literature.
Collapse
Affiliation(s)
- Mohamed Reda Bouadjenek
- Department of Computing and Information Systems, The University of Melbourne, Parkville 3053, Australia.
| | - Karin Verspoor
- Department of Computing and Information Systems, The University of Melbourne, Parkville 3053, Australia.
| | - Justin Zobel
- Department of Computing and Information Systems, The University of Melbourne, Parkville 3053, Australia.
| |
Collapse
|
19
|
Nguyen VN, Huang KY, Weng JTY, Lai KR, Lee TY. UbiNet: an online resource for exploring the functional associations and regulatory networks of protein ubiquitylation. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2016; 2016:baw054. [PMID: 27114492 PMCID: PMC4843525 DOI: 10.1093/database/baw054] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/18/2015] [Accepted: 03/20/2016] [Indexed: 12/19/2022]
Abstract
Protein ubiquitylation catalyzed by E3 ubiquitin ligases are crucial in the regulation of many cellular processes. Owing to the high throughput of mass spectrometry-based proteomics, a number of methods have been developed for the experimental determination of ubiquitylation sites, leading to a large collection of ubiquitylation data. However, there exist no resources for the exploration of E3-ligase-associated regulatory networks of for ubiquitylated proteins in humans. Therefore, the UbiNet database was developed to provide a full investigation of protein ubiquitylation networks by incorporating experimentally verified E3 ligases, ubiquitylated substrates and protein-protein interactions (PPIs). To date, UbiNet has accumulated 43 948 experimentally verified ubiquitylation sites from 14 692 ubiquitylated proteins of humans. Additionally, we have manually curated 499 E3 ligases as well as two E1 activating and 46 E2 conjugating enzymes. To delineate the regulatory networks among E3 ligases and ubiquitylated proteins, a total of 430 530 PPIs were integrated into UbiNet for the exploration of ubiquitylation networks with an interactive network viewer. A case study demonstrated that UbiNet was able to decipher a scheme for the ubiquitylation of tumor proteins p63 and p73 that is consistent with their functions. Although the essential role of Mdm2 in p53 regulation is well studied, UbiNet revealed that Mdm2 and additional E3 ligases might be implicated in the regulation of other tumor proteins by protein ubiquitylation. Moreover, UbiNet could identify potential substrates for a specific E3 ligase based on PPIs and substrate motifs. With limited knowledge about the mechanisms through which ubiquitylated proteins are regulated by E3 ligases, UbiNet offers users an effective means for conducting preliminary analyses of protein ubiquitylation. The UbiNet database is now freely accessible via http://csb.cse.yzu.edu.tw/UbiNet/ The content is regularly updated with the literature and newly released data.Database URL: http://csb.cse.yzu.edu.tw/UbiNet/.
Collapse
Affiliation(s)
- Van-Nui Nguyen
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan University of Information and Communication Technology, Thai Nguyen University, Vietnam and
| | - Kai-Yao Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan
| | - Julia Tzu-Ya Weng
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan Innovation Center for Big Data and Digital Convergence, Yuan Ze University, 320, Taiwan
| | - K Robert Lai
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan Innovation Center for Big Data and Digital Convergence, Yuan Ze University, 320, Taiwan
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan Innovation Center for Big Data and Digital Convergence, Yuan Ze University, 320, Taiwan
| |
Collapse
|
20
|
Shi J, Sharif S, Ruijtenbeek R, Pieters RJ. Activity Based High-Throughput Screening for Novel O-GlcNAc Transferase Substrates Using a Dynamic Peptide Microarray. PLoS One 2016; 11:e0151085. [PMID: 26960196 PMCID: PMC4784888 DOI: 10.1371/journal.pone.0151085] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2016] [Accepted: 02/23/2016] [Indexed: 11/22/2022] Open
Abstract
O-GlcNAcylation is a reversible and dynamic protein post-translational modification in mammalian cells. The O-GlcNAc cycle is catalyzed by O-GlcNAc transferase (OGT) and O-GlcNAcase (OGA). O-GlcNAcylation plays important role in many vital cellular events including transcription, cell cycle regulation, stress response and protein degradation, and altered O-GlcNAcylation has long been implicated in cancer, diabetes and neurodegenerative diseases. Recently, numerous approaches have been developed to identify OGT substrates and study their function, but there is still a strong demand for highly efficient techniques. Here we demonstrated the utility of the peptide microarray approach to discover novel OGT substrates and study its specificity. Interestingly, the protein RBL-2, which is a key regulator of entry into cell division and may function as a tumor suppressor, was identified as a substrate for three isoforms of OGT. Using peptide Ala scanning, we found Ser 420 is one possible O-GlcNAc site in RBL-2. Moreover, substitution of Ser 420, on its own, inhibited OGT activity, raising the possibility of mechanism-based development for selective OGT inhibitors. This approach will prove useful for both discovery of novel OGT substrates and studying OGT specificity.
Collapse
Affiliation(s)
- Jie Shi
- Department of Medicinal Chemistry and Chemical Biology, Utrecht University, Utrecht, The Netherlands
| | - Suhela Sharif
- Department of Medicinal Chemistry and Chemical Biology, Utrecht University, Utrecht, The Netherlands
| | - Rob Ruijtenbeek
- Department of Medicinal Chemistry and Chemical Biology, Utrecht University, Utrecht, The Netherlands
- PamGene International BV, ‘s-Hertogenbosch, The Netherlands
| | - Roland J. Pieters
- Department of Medicinal Chemistry and Chemical Biology, Utrecht University, Utrecht, The Netherlands
- * E-mail:
| |
Collapse
|
21
|
Huang CH, Su MG, Kao HJ, Jhong JH, Weng SL, Lee TY. UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines. BMC SYSTEMS BIOLOGY 2016; 10 Suppl 1:6. [PMID: 26818456 PMCID: PMC4895383 DOI: 10.1186/s12918-015-0246-z] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
Background The conjugation of ubiquitin to a substrate protein (protein ubiquitylation), which involves a sequential process – E1 activation, E2 conjugation and E3 ligation, is crucial to the regulation of protein function and activity in eukaryotes. This ubiquitin-conjugation process typically binds the last amino acid of ubiquitin (glycine 76) to a lysine residue of a target protein. The high-throughput of mass spectrometry-based proteomics has stimulated a large-scale identification of ubiquitin-conjugated peptides. Hence, a new web resource, UbiSite, was developed to identify ubiquitin-conjugation site on lysines based on large-scale proteome dataset. Results Given a total of 37,647 ubiquitin-conjugated proteins, including 128026 ubiquitylated peptides, obtained from various resources, this study carries out a large-scale investigation on ubiquitin-conjugation sites based on sequenced and structural characteristics. A TwoSampleLogo reveals that a significant depletion of histidine (H), arginine (R) and cysteine (C) residues around ubiquitylation sites may impact the conjugation of ubiquitins in closed three-dimensional environments. Based on the large-scale ubiquitylation dataset, a motif discovery tool, MDDLogo, has been adopted to characterize the potential substrate motifs for ubiquitin conjugation. Not only are single features such as amino acid composition (AAC), positional weighted matrix (PWM), position-specific scoring matrix (PSSM) and solvent-accessible surface area (SASA) considered, but also the effectiveness of incorporating MDDLogo-identified substrate motifs into a two-layered prediction model is taken into account. Evaluation by five-fold cross-validation showed that PSSM is the best feature in discriminating between ubiquitylation and non-ubiquitylation sites, based on support vector machine (SVM). Additionally, the two-layered SVM model integrating MDDLogo-identified substrate motifs could obtain a promising accuracy and the Matthews Correlation Coefficient (MCC) at 81.06 % and 0.586, respectively. Furthermore, the independent testing showed that the two-layered SVM model could outperform other prediction tools, reaching at 85.10 % sensitivity, 69.69 % specificity, 73.69 % accuracy and the 0.483 of MCC value. Conclusion The independent testing result indicated the effectiveness of incorporating MDDLogo-identified motifs into the prediction of ubiquitylation sites. In order to provide meaningful assistance to researchers interested in large-scale ubiquitinome data, the two-layered SVM model has been implemented onto a web-based system (UbiSite), which is freely available at http://csb.cse.yzu.edu.tw/UbiSite/. Two cases given in the UbiSite provide a demonstration of effective identification of ubiquitylation sites with reference to substrate motifs. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0246-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Chien-Hsun Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan. .,Ministry of Health & Welfare, Tao-Yuan Hospital, Taoyuan, 320, Taiwan.
| | - Min-Gang Su
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Hui-Ju Kao
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Jhih-Hua Jhong
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Shun-Long Weng
- Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsin-Chu, 300, Taiwan. .,Mackay Junior College of Medicine, Nursing and Management , Taipei, 112, Taiwan. .,Department of Medicine, Mackay Medical College, New Taipei City, 252, Taiwan.
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan. .,Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, 320, Taiwan.
| |
Collapse
|
22
|
Huang KY, Weng JTY, Lee TY, Weng SL. A new scheme to discover functional associations and regulatory networks of E3 ubiquitin ligases. BMC SYSTEMS BIOLOGY 2016; 10 Suppl 1:3. [PMID: 26818115 PMCID: PMC4895279 DOI: 10.1186/s12918-015-0244-1] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Background Protein ubiquitination catalyzed by E3 ubiquitin ligases play important modulatory roles in various biological processes. With the emergence of high-throughput mass spectrometry technology, the proteomics research community embraced the development of numerous experimental methods for the determination of ubiquitination sites. The result is an accumulation of ubiquitinome data, coupled with a lack of available resources for investigating the regulatory networks among E3 ligases and ubiquitinated proteins. In this study, by integrating existing ubiquitinome data, experimentally validated E3 ligases and established protein-protein interactions, we have devised a strategy to construct a comprehensive map of protein ubiquitination networks. Results In total, 41,392 experimentally verified ubiquitination sites from 12,786 ubiquitinated proteins of humans have been obtained for this study. Additional 494 E3 ligases along with 1220 functional annotations and 28588 protein domains were manually curated. To characterize the regulatory networks among E3 ligases and ubiquitinated proteins, a well-established network viewer was utilized for the exploration of ubiquitination networks from 40892 protein-protein interactions. The effectiveness of the proposed approach was demonstrated in a case study examining E3 ligases involved in the ubiquitination of tumor suppressor p53. In addition to Mdm2, a known regulator of p53, the investigation also revealed other potential E3 ligases that may participate in the ubiquitination of p53. Conclusion Aside from the ability to facilitate comprehensive investigations of protein ubiquitination networks, by integrating information regarding protein-protein interactions and substrate specificities, the proposed method could discover potential E3 ligases for ubiquitinated proteins. Our strategy presents an efficient means for the preliminary screen of ubiquitination networks and overcomes the challenge as a result of limited knowledge about E3 ligase-regulated ubiquitination. Electronic supplementary material The online version of this article (doi:10.1186/s12918-015-0244-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kai-Yao Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Julia Tzu-Ya Weng
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan. .,Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan, 320, Taiwan. .,Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan, 320, Taiwan.
| | - Shun-Long Weng
- Department of Obstetrics and Gynecology, Hsinchu Mackay Memorial Hospital, Hsin-Chu, 300, Taiwan. .,Mackay Junior College of Medicine, Nursing and Management, Taipei, 112, Taiwan. .,Department of Medicine, Mackay Medical College, New Taipei City, 252, Taiwan.
| |
Collapse
|
23
|
Kao HJ, Huang CH, Bretaña NA, Lu CT, Huang KY, Weng SL, Lee TY. A two-layered machine learning method to identify protein O-GlcNAcylation sites with O-GlcNAc transferase substrate motifs. BMC Bioinformatics 2015; 16 Suppl 18:S10. [PMID: 26680539 PMCID: PMC4682369 DOI: 10.1186/1471-2105-16-s18-s10] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Protein O-GlcNAcylation, involving the β-attachment of single N-acetylglucosamine (GlcNAc) to the hydroxyl group of serine or threonine residues, is an O-linked glycosylation catalyzed by O-GlcNAc transferase (OGT). Molecular level investigation of the basis for OGT's substrate specificity should aid understanding how O-GlcNAc contributes to diverse cellular processes. Due to an increasing number of O-GlcNAcylated peptides with site-specific information identified by mass spectrometry (MS)-based proteomics, we were motivated to characterize substrate site motifs of O-GlcNAc transferases. In this investigation, a non-redundant dataset of 410 experimentally verified O-GlcNAcylation sites were manually extracted from dbOGAP, OGlycBase and UniProtKB. After detection of conserved motifs by using maximal dependence decomposition, profile hidden Markov model (profile HMM) was adopted to learn a first-layered model for each identified OGT substrate motif. Support Vector Machine (SVM) was then used to generate a second-layered model learned from the output values of profile HMMs in first layer. The two-layered predictive model was evaluated using a five-fold cross validation which yielded a sensitivity of 85.4%, a specificity of 84.1%, and an accuracy of 84.7%. Additionally, an independent testing set from PhosphoSitePlus, which was really non-homologous to the training data of predictive model, was used to demonstrate that the proposed method could provide a promising accuracy (84.05%) and outperform other O-GlcNAcylation site prediction tools. A case study indicated that the proposed method could be a feasible means of conducting preliminary analyses of protein O-GlcNAcylation and has been implemented as a web-based system, OGTSite, which is now freely available at http://csb.cse.yzu.edu.tw/OGTSite/.
Collapse
|
24
|
Huang KY, Su MG, Kao HJ, Hsieh YC, Jhong JH, Cheng KH, Huang HD, Lee TY. dbPTM 2016: 10-year anniversary of a resource for post-translational modification of proteins. Nucleic Acids Res 2015; 44:D435-46. [PMID: 26578568 PMCID: PMC4702878 DOI: 10.1093/nar/gkv1240] [Citation(s) in RCA: 136] [Impact Index Per Article: 13.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2015] [Accepted: 11/02/2015] [Indexed: 01/23/2023] Open
Abstract
Owing to the importance of the post-translational modifications (PTMs) of proteins in regulating biological processes, the dbPTM (http://dbPTM.mbc.nctu.edu.tw/) was developed as a comprehensive database of experimentally verified PTMs from several databases with annotations of potential PTMs for all UniProtKB protein entries. For this 10th anniversary of dbPTM, the updated resource provides not only a comprehensive dataset of experimentally verified PTMs, supported by the literature, but also an integrative interface for accessing all available databases and tools that are associated with PTM analysis. As well as collecting experimental PTM data from 14 public databases, this update manually curates over 12 000 modified peptides, including the emerging S-nitrosylation, S-glutathionylation and succinylation, from approximately 500 research articles, which were retrieved by text mining. As the number of available PTM prediction methods increases, this work compiles a non-homologous benchmark dataset to evaluate the predictive power of online PTM prediction tools. An increasing interest in the structural investigation of PTM substrate sites motivated the mapping of all experimental PTM peptides to protein entries of Protein Data Bank (PDB) based on database identifier and sequence identity, which enables users to examine spatially neighboring amino acids, solvent-accessible surface area and side-chain orientations for PTM substrate sites on tertiary structures. Since drug binding in PDB is annotated, this update identified over 1100 PTM sites that are associated with drug binding. The update also integrates metabolic pathways and protein-protein interactions to support the PTM network analysis for a group of proteins. Finally, the web interface is redesigned and enhanced to facilitate access to this resource.
Collapse
Affiliation(s)
- Kai-Yao Huang
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Min-Gang Su
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Hui-Ju Kao
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Yun-Chung Hsieh
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Jhih-Hua Jhong
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Kuang-Hao Cheng
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan
| | - Hsien-Da Huang
- Department of Biological Science and Technology, National Chiao Tung University, Hsinchu 300, Taiwan Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu 300, Taiwan
| | - Tzong-Yi Lee
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 320, Taiwan Innovation Center for Big Data and Digital Convergence, Yuan Ze University, Taoyuan 320, Taiwan
| |
Collapse
|
25
|
Ranganathan S, Tan TW, Schönbach C. InCoB2014: bioinformatics to tackle the data to knowledge challenge. Introduction. BMC Bioinformatics 2014; 15 Suppl 16:I1. [PMID: 25521055 PMCID: PMC4290632 DOI: 10.1186/1471-2105-15-s16-i1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Since 2006, the International Conference on Bioinformatics (InCoB) has been publishing selected papers in BMC Bioinformatics. Papers within the scope of the journal from the 13th InCoB July 31-2 August, 2014 in Sydney, Australia have been compiled in this supplement. These span protein and proteome informatics, structural bioinformatics, software development and bioimaging to pharmacoinformatics and disease informatics, representing the breadth of bioinformatics research in the Asia-Pacific.
Collapse
Affiliation(s)
- Shoba Ranganathan
- Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence in Bioinformatics, Macquarie University, Sydney NSW 2109, Australia
| | - Tin Wee Tan
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, Singapore 117599
| | - Christian Schönbach
- Department of Biology, School of Science and Technology, Nazarbayev University, Astana 010000, Republic of Kazakhstan
- Center for AIDS Research, Kumamoto University, Kumamoto 860-0811, Japan
| |
Collapse
|