1
|
Liu X, Zhu B, Dai XW, Xu ZA, Li R, Qian Y, Lu YP, Zhang W, Liu Y, Zheng J. GBDT_KgluSite: An improved computational prediction model for lysine glutarylation sites based on feature fusion and GBDT classifier. BMC Genomics 2023; 24:765. [PMID: 38082413 PMCID: PMC10712101 DOI: 10.1186/s12864-023-09834-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2023] [Accepted: 11/23/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND Lysine glutarylation (Kglu) is one of the most important Post-translational modifications (PTMs), which plays significant roles in various cellular functions, including metabolism, mitochondrial processes, and translation. Therefore, accurate identification of the Kglu site is important for elucidating protein molecular function. Due to the time-consuming and expensive limitations of traditional biological experiments, computational-based Kglu site prediction research is gaining more and more attention. RESULTS In this paper, we proposed GBDT_KgluSite, a novel Kglu site prediction model based on GBDT and appropriate feature combinations, which achieved satisfactory performance. Specifically, seven features including sequence-based features, physicochemical property-based features, structural-based features, and evolutionary-derived features were used to characterize proteins. NearMiss-3 and Elastic Net were applied to address data imbalance and feature redundancy issues, respectively. The experimental results show that GBDT_KgluSite has good robustness and generalization ability, with accuracy and AUC values of 93.73%, and 98.14% on five-fold cross-validation as well as 90.11%, and 96.75% on the independent test dataset, respectively. CONCLUSION GBDT_KgluSite is an effective computational method for identifying Kglu sites in protein sequences. It has good stability and generalization ability and could be useful for the identification of new Kglu sites in the future. The relevant code and dataset are available at https://github.com/flyinsky6/GBDT_KgluSite .
Collapse
Affiliation(s)
- Xin Liu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
| | - Bao Zhu
- Cancer Institute, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Xia-Wei Dai
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Zhi-Ao Xu
- School of Life Sciences, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Rui Li
- School of Life Sciences, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Yuting Qian
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Ya-Ping Lu
- School of Humanities and Arts, China University of Mining and Technology, Xuzhou, Jiangsu, 221116, China
| | - Wenqing Zhang
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China
| | - Yong Liu
- Cancer Institute, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
| | - Junnian Zheng
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Xuzhou Medical University, Xuzhou, Jiangsu, 221004, China.
- Center of Clinical Oncology, The Affiliated Hospital of Xuzhou Medical University, Xuzhou, Jiangsu, 221002, China.
| |
Collapse
|
2
|
Kumari S, Gupta R, Ambasta RK, Kumar P. Emerging trends in post-translational modification: Shedding light on Glioblastoma multiforme. Biochim Biophys Acta Rev Cancer 2023; 1878:188999. [PMID: 37858622 DOI: 10.1016/j.bbcan.2023.188999] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 10/06/2023] [Accepted: 10/06/2023] [Indexed: 10/21/2023]
Abstract
Recent multi-omics studies, including proteomics, transcriptomics, genomics, and metabolomics have revealed the critical role of post-translational modifications (PTMs) in the progression and pathogenesis of Glioblastoma multiforme (GBM). Further, PTMs alter the oncogenic signaling events and offer a novel avenue in GBM therapeutics research through PTM enzymes as potential biomarkers for drug targeting. In addition, PTMs are critical regulators of chromatin architecture, gene expression, and tumor microenvironment (TME), that play a crucial function in tumorigenesis. Moreover, the implementation of artificial intelligence and machine learning algorithms enhances GBM therapeutics research through the identification of novel PTM enzymes and residues. Herein, we briefly explain the mechanism of protein modifications in GBM etiology, and in altering the biologics of GBM cells through chromatin remodeling, modulation of the TME, and signaling pathways. In addition, we highlighted the importance of PTM enzymes as therapeutic biomarkers and the role of artificial intelligence and machine learning in protein PTM prediction.
Collapse
Affiliation(s)
- Smita Kumari
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India
| | - Rohan Gupta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India; School of Medicine, University of South Carolina, Columbia, SC, United States of America
| | - Rashmi K Ambasta
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India; Department of Biotechnology and Microbiology, SRM University, Sonepat, Haryana, India.
| | - Pravir Kumar
- Molecular Neuroscience and Functional Genomics Laboratory, Department of Biotechnology, Delhi Technological, University, India.
| |
Collapse
|
3
|
Wang X, Ding Z, Wang R, Lin X. Deepro-Glu: combination of convolutional neural network and Bi-LSTM models using ProtBert and handcrafted features to identify lysine glutarylation sites. Brief Bioinform 2023; 24:6991122. [PMID: 36653898 DOI: 10.1093/bib/bbac631] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2022] [Revised: 12/11/2022] [Accepted: 12/28/2022] [Indexed: 01/20/2023] Open
Abstract
Lysine glutarylation (Kglu) is a newly discovered post-translational modification of proteins with important roles in mitochondrial functions, oxidative damage, etc. The established biological experimental methods to identify glutarylation sites are often time-consuming and costly. Therefore, there is an urgent need to develop computational methods for efficient and accurate identification of glutarylation sites. Most of the existing computational methods only utilize handcrafted features to construct the prediction model and do not consider the positive impact of the pre-trained protein language model on the prediction performance. Based on this, we develop an ensemble deep-learning predictor Deepro-Glu that combines convolutional neural network and bidirectional long short-term memory network using the deep learning features and traditional handcrafted features to predict lysine glutaryation sites. The deep learning features are generated from the pre-trained protein language model called ProtBert, and the handcrafted features consist of sequence-based features, physicochemical property-based features and evolution information-based features. Furthermore, the attention mechanism is used to efficiently integrate the deep learning features and the handcrafted features by learning the appropriate attention weights. 10-fold cross-validation and independent tests demonstrate that Deepro-Glu achieves competitive or superior performance than the state-of-the-art methods. The source codes and data are publicly available at https://github.com/xwanggroup/Deepro-Glu.
Collapse
Affiliation(s)
- Xiao Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, No. 136, Science Avenue, 450002, Zhengzhou, China
| | - Zhaoyuan Ding
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, No. 136, Science Avenue, 450002, Zhengzhou, China
| | - Rong Wang
- School of Computer and Communication Engineering, Zhengzhou University of Light Industry, No. 136, Science Avenue, 450002, Zhengzhou, China
| | - Xi Lin
- Instiute of Artificial Intelligence, Xiamen University, No.4221, Xiang'an South Road, 361000, Xiamen, China
| |
Collapse
|
4
|
Jia J, Sun M, Wu G, Qiu W. DeepDN_iGlu: prediction of lysine glutarylation sites based on attention residual learning method and DenseNet. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:2815-2830. [PMID: 36899559 DOI: 10.3934/mbe.2023132] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/18/2023]
Abstract
As a key issue in orchestrating various biological processes and functions, protein post-translational modification (PTM) occurs widely in the mechanism of protein's function of animals and plants. Glutarylation is a type of protein-translational modification that occurs at active ε-amino groups of specific lysine residues in proteins, which is associated with various human diseases, including diabetes, cancer, and glutaric aciduria type I. Therefore, the issue of prediction for glutarylation sites is particularly important. This study developed a brand-new deep learning-based prediction model for glutarylation sites named DeepDN_iGlu via adopting attention residual learning method and DenseNet. The focal loss function is utilized in this study in place of the traditional cross-entropy loss function to address the issue of a substantial imbalance in the number of positive and negative samples. It can be noted that DeepDN_iGlu based on the deep learning model offers a greater potential for the glutarylation site prediction after employing the straightforward one hot encoding method, with Sensitivity (Sn), Specificity (Sp), Accuracy (ACC), Mathews Correlation Coefficient (MCC), and Area Under Curve (AUC) of 89.29%, 61.97%, 65.15%, 0.33 and 0.80 accordingly on the independent test set. To the best of the authors' knowledge, this is the first time that DenseNet has been used for the prediction of glutarylation sites. DeepDN_iGlu has been deployed as a web server (https://bioinfo.wugenqiang.top/~smw/DeepDN_iGlu/) that is available to make glutarylation site prediction data more accessible.
Collapse
Affiliation(s)
- Jianhua Jia
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Mingwei Sun
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Genqiang Wu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| | - Wangren Qiu
- School of Information Engineering, Jingdezhen Ceramic University, Jingdezhen 333403, China
| |
Collapse
|
5
|
Naseer S, Ali RF, Khan YD, Dominic PDD. iGluK-Deep: computational identification of lysine glutarylation sites using deep neural networks with general pseudo amino acid compositions. J Biomol Struct Dyn 2022; 40:11691-11704. [PMID: 34396935 DOI: 10.1080/07391102.2021.1962738] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Lysine glutarylation is a post-translation modification which plays an important regulatory role in a variety of physiological and enzymatic processes including mitochondrial functions and metabolic processes both in eukaryotic and prokaryotic cells. This post-translational modification influences chromatin structure and thereby results in global regulation of transcription, defects in cell-cycle progression, DNA damage repair, and telomere silencing. To better understand the mechanism of lysine glutarylation, its identification in a protein is necessary, however, experimental methods are time-consuming and labor-intensive. Herein, we propose a new computational prediction approach to supplement experimental methods for identification of lysine glutarylation site prediction by deep neural networks and Chou's Pseudo Amino Acid Composition (PseAAC). We employed well-known deep neural networks for feature representation learning and classification of peptide sequences. Our approach opts raw pseudo amino acid compositions and obsoletes the need to separately perform costly and cumbersome feature extraction and selection. Among the developed deep learning-based predictors, the standard neural network-based predictor demonstrated highest scores in terms of accuracy and all other performance evaluation measures and outperforms majority of previously reported predictors without requiring expensive feature extraction process. iGluK-Deep:Computational Identification of lysine glutarylationsites using deep neural networks with general Pseudo Amino Acid Compositions Sheraz Naseer, Rao Faizan Ali, Yaser Daanial Khan, P.D.D DominicCommunicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Sheraz Naseer
- Department of Computer Science, University of Management and Technology, Lahore, Pakistan
| | - Rao Faizan Ali
- Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Perak Darul Ridzuan, Malaysia
| | - Yaser Daanial Khan
- Department of Computer Science, University of Management and Technology, Lahore, Pakistan
| | - P D D Dominic
- Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Perak Darul Ridzuan, Malaysia
| |
Collapse
|
6
|
Ning Q, Zhao X, Ma Z. A Novel Method for Identification of Glutarylation Sites Combining Borderline-SMOTE With Tomek Links Technique in Imbalanced Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2632-2641. [PMID: 34236968 DOI: 10.1109/tcbb.2021.3095482] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Glutarylation is a type of post-translational modification that occurs on lysine residues. It plays an irreplaceable role in various cellular functions. Therefore, identification of glutarylation sites is significant for understanding the molecular mechanism of glutarylation. In this study, we proposed a method named DEXGB_Glu to identify lysine glutarylation sites using XGBoost as classifier which was optimized by differential evolution algorithm. Aiming at the imbalance between positive samples and negative samples, Borderline-SMOTE method was employed to synthesize positive samples, increasing their amount equal to negative samples. Then, Tomek links technique was applied to filter out noise data. Analysis of this method and its results showed that differential evolution algorithm obviously improved the performance and the combination of Borderline-SMOTE and Tomek links effectively solved the imbalance between positive samples and negative samples. Finally, the performance of this method was much better than other methods in prediction of glutarylation sites. The data and code are available on https://github.com/ningq669/DEXGB_Glu.
Collapse
|
7
|
Liu CM, Ta VD, Le NQK, Tadesse DA, Shi C. Deep Neural Network Framework Based on Word Embedding for Protein Glutarylation Sites Prediction. Life (Basel) 2022; 12:life12081213. [PMID: 36013392 PMCID: PMC9410500 DOI: 10.3390/life12081213] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2022] [Revised: 08/03/2022] [Accepted: 08/05/2022] [Indexed: 04/08/2023] Open
Abstract
In recent years, much research has found that dysregulation of glutarylation is associated with many human diseases, such as diabetes, cancer, and glutaric aciduria type I. Therefore, glutarylation identification and characterization are essential tasks for determining modification-specific proteomics. This study aims to propose a novel deep neural network framework based on word embedding techniques for glutarylation sites prediction. Multiple deep neural network models are implemented to evaluate the performance of glutarylation sites prediction. Furthermore, an extensive experimental comparison of word embedding techniques is conducted to utilize the most efficient method for improving protein sequence data representation. The results suggest that the proposed deep neural networks not only improve protein sequence representation but also work effectively in glutarylation sites prediction by obtaining a higher accuracy and confidence rate compared to the previous work. Moreover, embedding techniques were proven to be more productive than the pre-trained word embedding techniques for glutarylation sequence representation. Our proposed method has significantly outperformed all traditional performance metrics compared to the advanced integrated vector support, with accuracy, specificity, sensitivity, and correlation coefficient of 0.79, 0.89, 0.59, and 0.51, respectively. It shows the potential to detect new glutarylation sites and uncover the relationships between glutarylation and well-known lysine modification.
Collapse
Affiliation(s)
- Chuan-Ming Liu
- Department of Computer Science and Information Engineering, National Taipei University of Technology (Taipei Tech), Taipei City 106, Taiwan
- Correspondence: (C.-M.L.); (C.S.); Tel.: +886-2-2771-2171 (ext. 4251) (C.-M.L.)
| | - Van-Dai Ta
- Samsung Display Vietnam (SDV), Yen Phong Industrial Park, Bac Ninh 16000, Vietnam
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei City 106, Taiwan
| | | | - Chongyang Shi
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 102488, China
- Correspondence: (C.-M.L.); (C.S.); Tel.: +886-2-2771-2171 (ext. 4251) (C.-M.L.)
| |
Collapse
|
8
|
Indriani F, Mahmudah KR, Purnama B, Satou K. ProtTrans-Glutar: Incorporating Features From Pre-trained Transformer-Based Models for Predicting Glutarylation Sites. Front Genet 2022; 13:885929. [PMID: 35711929 PMCID: PMC9194472 DOI: 10.3389/fgene.2022.885929] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Accepted: 04/26/2022] [Indexed: 11/16/2022] Open
Abstract
Lysine glutarylation is a post-translational modification (PTM) that plays a regulatory role in various physiological and biological processes. Identifying glutarylated peptides using proteomic techniques is expensive and time-consuming. Therefore, developing computational models and predictors can prove useful for rapid identification of glutarylation. In this study, we propose a model called ProtTrans-Glutar to classify a protein sequence into positive or negative glutarylation site by combining traditional sequence-based features with features derived from a pre-trained transformer-based protein model. The features of the model were constructed by combining several feature sets, namely the distribution feature (from composition/transition/distribution encoding), enhanced amino acid composition (EAAC), and features derived from the ProtT5-XL-UniRef50 model. Combined with random under-sampling and XGBoost classification method, our model obtained recall, specificity, and AUC scores of 0.7864, 0.6286, and 0.7075 respectively on an independent test set. The recall and AUC scores were notably higher than those of the previous glutarylation prediction models using the same dataset. This high recall score suggests that our method has the potential to identify new glutarylation sites and facilitate further research on the glutarylation process.
Collapse
Affiliation(s)
- Fatma Indriani
- Graduate School of Natural Science and Technology, Kanazawa University, Kanazawa, Japan
- Department of Computer Science, Lambung Mangkurat University, Banjarmasin, Indonesia
| | - Kunti Robiatul Mahmudah
- Department of Postgraduate of Mathematics Education, Universitas Ahmad Dahlan, Yogyakarta, Indonesia
| | - Bedy Purnama
- School of Computing, Telkom University, Bandung, Indonesia
| | - Kenji Satou
- Institute of Science and Engineering, Kanazawa University, Kanazawa, Japan
| |
Collapse
|
9
|
Ning Q, Ma Z, Zhao X, Yin M. SSKM_Succ: A Novel Succinylation Sites Prediction Method Incorporating K-Means Clustering With a New Semi-Supervised Learning Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:643-652. [PMID: 32750881 DOI: 10.1109/tcbb.2020.3006144] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Protein succinylation is a type of post-translational modification (PTM) that occurs on lysine sites and plays a key role in protein conformation regulation and cellular function control. When training in computational method, it is difficult to designate negative samples because of the uncertainty of non-succinylation lysine sites, and if not handled properly, it may affect the performance of computational models dramatically. Therefore, we propose a new semi-supervised learning method to identify reliable non-succinylation lysine sites as negative samples. This method, named SSKM_Succ, also employs K-means clustering to divide data into 5 clusters. Besides, information of proximal PTMs and three kinds of sequence features (grey pseudo amino acid composition, K-space and position-special amino acid propensity) are utilized to formulate protein. Then, we perform a two-step feature selection to remove redundant features and construct the optimization model for each cluster. Finally, support vector machine is applied to construct a prediction model for each cluster. Promising results are obtained by this method with an accuracy of 80.18 percent for succinylation sites on the independent testing dataset. Meanwhile, we compare the result with other existing tools, and it shows that our method is promising for predicting succinylation sites. Through analysis, we further verify that succinylated protein has potential effects on amino acid degradation and fatty acid metabolism, and speculate that protein succinylation may be closely related to neurodegenerative diseases. The code of SSKM_Succ is available on the web https://github.com/yangyq505/SSKM_Succ.git.
Collapse
|
10
|
Li W, Li F, Zhang X, Lin HK, Xu C. Insights into the post-translational modification and its emerging role in shaping the tumor microenvironment. Signal Transduct Target Ther 2021; 6:422. [PMID: 34924561 PMCID: PMC8685280 DOI: 10.1038/s41392-021-00825-8] [Citation(s) in RCA: 108] [Impact Index Per Article: 27.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2021] [Revised: 11/02/2021] [Accepted: 11/05/2021] [Indexed: 12/11/2022] Open
Abstract
More and more in-depth studies have revealed that the occurrence and development of tumors depend on gene mutation and tumor heterogeneity. The most important manifestation of tumor heterogeneity is the dynamic change of tumor microenvironment (TME) heterogeneity. This depends not only on the tumor cells themselves in the microenvironment where the infiltrating immune cells and matrix together forming an antitumor and/or pro-tumor network. TME has resulted in novel therapeutic interventions as a place beyond tumor beds. The malignant cancer cells, tumor infiltrate immune cells, angiogenic vascular cells, lymphatic endothelial cells, cancer-associated fibroblastic cells, and the released factors including intracellular metabolites, hormonal signals and inflammatory mediators all contribute actively to cancer progression. Protein post-translational modification (PTM) is often regarded as a degradative mechanism in protein destruction or turnover to maintain physiological homeostasis. Advances in quantitative transcriptomics, proteomics, and nuclease-based gene editing are now paving the global ways for exploring PTMs. In this review, we focus on recent developments in the PTM area and speculate on their importance as a critical functional readout for the regulation of TME. A wealth of information has been emerging to prove useful in the search for conventional therapies and the development of global therapeutic strategies.
Collapse
Affiliation(s)
- Wen Li
- Integrative Cancer Center & Cancer Clinical Research Center, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, School of Medicine, University of Electronic Science and Technology of China, 610042, Chengdu, P. R. China
| | - Feifei Li
- Integrative Cancer Center & Cancer Clinical Research Center, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, School of Medicine, University of Electronic Science and Technology of China, 610042, Chengdu, P. R. China
- Guangxi Collaborative Innovation Center for Biomedicine (Guangxi-ASEAN Collaborative Innovation Center for Major Disease Prevention and Treatment), Guangxi Medical University, 530021, Nanning, Guangxi, China
| | - Xia Zhang
- Institute of Pathology and Southwest Cancer Center, Southwest Hospital, Third Military Medical University (Army Medical University), 400038, Chongqing, China
| | - Hui-Kuan Lin
- Department of Cancer Biology, Wake Forest Baptist Medical Center, Wake Forest University, Winston Salem, NC, 27101, USA
| | - Chuan Xu
- Integrative Cancer Center & Cancer Clinical Research Center, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, School of Medicine, University of Electronic Science and Technology of China, 610042, Chengdu, P. R. China.
- Department of Cancer Biology, Wake Forest Baptist Medical Center, Wake Forest University, Winston Salem, NC, 27101, USA.
| |
Collapse
|
11
|
Akbar S, Ahmad A, Hayat M, Rehman AU, Khan S, Ali F. iAtbP-Hyb-EnC: Prediction of antitubercular peptides via heterogeneous feature representation and genetic algorithm based ensemble learning model. Comput Biol Med 2021; 137:104778. [PMID: 34481183 DOI: 10.1016/j.compbiomed.2021.104778] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2021] [Revised: 08/16/2021] [Accepted: 08/17/2021] [Indexed: 11/26/2022]
Abstract
Tuberculosis (TB) is a worldwide illness caused by the bacteria Mycobacterium tuberculosis. Owing to the high prevalence of multidrug-resistant tuberculosis, numerous traditional strategies for developing novel alternative therapies have been presented. The effectiveness and dependability of these procedures are not always consistent. Peptide-based therapy has recently been regarded as a preferable alternative due to its excellent selectivity in targeting specific cells without affecting the normal cells. However, due to the rapid growth of the peptide samples, predicting TB accurately has become a challenging task. To effectively identify antitubercular peptides, an intelligent and reliable prediction model is indispensable. An ensemble learning approach was used in this study to improve expected results by compensating for the shortcomings of individual classification algorithms. Initially, three distinct representation approaches were used to formulate the training samples: k-space amino acid composition, composite physiochemical properties, and one-hot encoding. The feature vectors of the applied feature extraction methods are then combined to generate a heterogeneous vector. Finally, utilizing individual and heterogeneous vectors, five distinct nature classification models were used to evaluate prediction rates. In addition, a genetic algorithm-based ensemble model was used to improve the suggested model's prediction and training capabilities. Using Training and independent datasets, the proposed ensemble model achieved an accuracy of 94.47% and 92.68%, respectively. It was observed that our proposed "iAtbP-Hyb-EnC" model outperformed and reported ~10% highest training accuracy than existing predictors. The "iAtbP-Hyb-EnC" model is suggested to be a reliable tool for scientists and might play a valuable role in academic research and drug discovery. The source code and all datasets are publicly available at https://github.com/Farman335/iAtbP-Hyb-EnC.
Collapse
Affiliation(s)
- Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Ashfaq Ahmad
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Ateeq Ur Rehman
- Department of Information Technology, The University of Haripur, KP, Pakistan.
| | - Salman Khan
- Department of Computer Science, Abdul Wali Khan University, Mardan, KP, 23200, Pakistan.
| | - Farman Ali
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, China.
| |
Collapse
|
12
|
Xie L, Xiao Y, Meng F, Li Y, Shi Z, Qian K. Functions and Mechanisms of Lysine Glutarylation in Eukaryotes. Front Cell Dev Biol 2021; 9:667684. [PMID: 34249920 PMCID: PMC8264553 DOI: 10.3389/fcell.2021.667684] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2021] [Accepted: 06/01/2021] [Indexed: 01/22/2023] Open
Abstract
Lysine glutarylation (Kglu) is a newly discovered post-translational modification (PTM), which is considered to be reversible, dynamic, and conserved in prokaryotes and eukaryotes. Recent developments in the identification of Kglu by mass spectrometry have shown that Kglu is mainly involved in the regulation of metabolism, oxidative damage, chromatin dynamics and is associated with various diseases. In this review, we firstly summarize the development history of glutarylation, the biochemical processes of glutarylation and deglutarylation. Then we focus on the pathophysiological functions such as glutaric acidemia 1, asthenospermia, etc. Finally, the current computational tools for predicting glutarylation sites are discussed. These emerging findings point to new functions for lysine glutarylation and related enzymes, and also highlight the mechanisms by which glutarylation regulates diverse cellular processes.
Collapse
Affiliation(s)
- Longxiang Xie
- Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics Center, Henan Provincial Engineering Center for Tumor Molecular Medicine, School of Basic Medical Sciences, Huaihe Hospital, Henan University, Kaifeng, China
| | - Yafei Xiao
- Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics Center, Henan Provincial Engineering Center for Tumor Molecular Medicine, School of Basic Medical Sciences, Huaihe Hospital, Henan University, Kaifeng, China
| | - Fucheng Meng
- Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics Center, Henan Provincial Engineering Center for Tumor Molecular Medicine, School of Basic Medical Sciences, Huaihe Hospital, Henan University, Kaifeng, China
| | - Yongqiang Li
- Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics Center, Henan Provincial Engineering Center for Tumor Molecular Medicine, School of Basic Medical Sciences, Huaihe Hospital, Henan University, Kaifeng, China
| | - Zhenyu Shi
- Institute of Biomedical Informatics, Cell Signal Transduction Laboratory, Bioinformatics Center, Henan Provincial Engineering Center for Tumor Molecular Medicine, School of Basic Medical Sciences, Huaihe Hospital, Henan University, Kaifeng, China
| | - Keli Qian
- Infection Control Department, The First Affiliated Hospital of Chongqing Medical University, Chongqing, China
| |
Collapse
|
13
|
Gonzalez Melo M, Remacle N, Cudré-Cung HP, Roux C, Poms M, Cudalbu C, Barroso M, Gersting SW, Feichtinger RG, Mayr JA, Costanzo M, Caterino M, Ruoppolo M, Rüfenacht V, Häberle J, Braissant O, Ballhausen D. The first knock-in rat model for glutaric aciduria type I allows further insights into pathophysiology in brain and periphery. Mol Genet Metab 2021; 133:157-181. [PMID: 33965309 DOI: 10.1016/j.ymgme.2021.03.017] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Revised: 03/10/2021] [Accepted: 03/30/2021] [Indexed: 02/08/2023]
Abstract
Glutaric aciduria type I (GA-I, OMIM # 231670) is an inborn error of metabolism caused by a deficiency of glutaryl-CoA dehydrogenase (GCDH). Patients develop acute encephalopathic crises (AEC) with striatal injury most often triggered by catabolic stress. The pathophysiology of GA-I, particularly in brain, is still not fully understood. We generated the first knock-in rat model for GA-I by introduction of the mutation p.R411W, the rat sequence homologue of the most common Caucasian mutation p.R402W, into the Gcdh gene of Sprague Dawley rats by CRISPR/CAS9 technology. Homozygous Gcdhki/ki rats revealed a high excretor phenotype, but did not present any signs of AEC under normal diet (ND). Exposure to a high lysine diet (HLD, 4.7%) after weaning resulted in clinical and biochemical signs of AEC. A significant increase of plasmatic ammonium concentrations was found in Gcdhki/ki rats under HLD, accompanied by a decrease of urea concentrations and a concomitant increase of arginine excretion. This might indicate an inhibition of the urea cycle. Gcdhki/ki rats exposed to HLD showed highly diminished food intake resulting in severely decreased weight gain and moderate reduction of body mass index (BMI). This constellation suggests a loss of appetite. Under HLD, pipecolic acid increased significantly in cerebral and extra-cerebral liquids and tissues of Gcdhki/ki rats, but not in WT rats. It seems that Gcdhki/ki rats under HLD activate the pipecolate pathway for lysine degradation. Gcdhki/ki rat brains revealed depletion of free carnitine, microglial activation, astroglyosis, astrocytic death by apoptosis, increased vacuole numbers, impaired OXPHOS activities and neuronal damage. Under HLD, Gcdhki/ki rats showed imbalance of intra- and extracellular creatine concentrations and indirect signs of an intracerebral ammonium accumulation. We successfully created the first rat model for GA-I. Characterization of this Gcdhki/ki strain confirmed that it is a suitable model not only for the study of pathophysiological processes, but also for the development of new therapeutic interventions. We further brought up interesting new insights into the pathophysiology of GA-I in brain and periphery.
Collapse
Affiliation(s)
- Mary Gonzalez Melo
- Pediatric Metabolic Unit, Pediatrics, Woman-Mother-Child Department, University of Lausanne and University Hospital of Lausanne, Switzerland.
| | - Noémie Remacle
- Pediatric Metabolic Unit, Pediatrics, Woman-Mother-Child Department, University of Lausanne and University Hospital of Lausanne, Switzerland
| | - Hong-Phuc Cudré-Cung
- Pediatric Metabolic Unit, Pediatrics, Woman-Mother-Child Department, University of Lausanne and University Hospital of Lausanne, Switzerland
| | - Clothilde Roux
- Service of Clinical Chemistry, University of Lausanne and University Hospital of Lausanne, Switzerland.
| | - Martin Poms
- Klinische Chemie und Biochemie Universitäts-Kinderspital Zürich, Switzerland.
| | - Cristina Cudalbu
- CIBM Center for Biomedical Imaging, Switzerland; Animal Imaging and Technology, Ecole Polytechnique Fédérale de Lausanne, Lausanne, Switzerland.
| | - Madalena Barroso
- University Children's Research, UCR@Kinder-UKE, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
| | - Søren Waldemar Gersting
- University Children's Research, UCR@Kinder-UKE, University Medical Center Hamburg-Eppendorf, Hamburg, Germany.
| | - René Günther Feichtinger
- Department of Pediatrics, University Hospital Salzburg, Paracelsus Medical University, Salzburg, Austria.
| | - Johannes Adalbert Mayr
- Department of Pediatrics, University Hospital Salzburg, Paracelsus Medical University, Salzburg, Austria.
| | - Michele Costanzo
- Department of Molecular Medicine and Medical Biotechnology, School of Medicine, University of Naples Federico II, 80131 Naples, Italy; CEINGE - Biotecnologie Avanzate s.c.ar.l., 80145 Naples, Italy.
| | - Marianna Caterino
- Department of Molecular Medicine and Medical Biotechnology, School of Medicine, University of Naples Federico II, 80131 Naples, Italy; CEINGE - Biotecnologie Avanzate s.c.ar.l., 80145 Naples, Italy.
| | - Margherita Ruoppolo
- Department of Molecular Medicine and Medical Biotechnology, School of Medicine, University of Naples Federico II, 80131 Naples, Italy; CEINGE - Biotecnologie Avanzate s.c.ar.l., 80145 Naples, Italy.
| | - Véronique Rüfenacht
- Division of Metabolism and Children's Research Center, University Children's Hospital Zurich, Zurich, Switzerland.
| | - Johannes Häberle
- Division of Metabolism and Children's Research Center, University Children's Hospital Zurich, Zurich, Switzerland.
| | - Olivier Braissant
- Service of Clinical Chemistry, University of Lausanne and University Hospital of Lausanne, Switzerland.
| | - Diana Ballhausen
- Pediatric Metabolic Unit, Pediatrics, Woman-Mother-Child Department, University of Lausanne and University Hospital of Lausanne, Switzerland.
| |
Collapse
|
14
|
iPTT(2 L)-CNN: A Two-Layer Predictor for Identifying Promoters and Their Types in Plant Genomes by Convolutional Neural Network. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021; 2021:6636350. [PMID: 33488763 PMCID: PMC7803414 DOI: 10.1155/2021/6636350] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 12/13/2020] [Accepted: 12/16/2020] [Indexed: 11/18/2022]
Abstract
A promoter is a short DNA sequence near to the start codon, responsible for initiating transcription of a specific gene in genome. The accurate recognition of promoters has great significance for a better understanding of the transcriptional regulation. Because of their importance in the process of biological transcriptional regulation, there is an urgent need to develop in silico tools to identify promoters and their types timely and accurately. A number of prediction methods had been developed in this regard; however, almost all of them were merely used for identifying promoters and their strength or sigma types. Owing to that TATA box region in TATA promoter that influences posttranscriptional processes, in the current study, we developed a two-layer predictor called iPTT(2L)-CNN by using the convolutional neural network (CNN) for identifying TATA and TATA-less promoters. The first layer can be used to identify a given DNA sequence as a promoter or nonpromoter. The second layer is used to identify whether the recognized promoter is TATA promoter or not. The 5-fold crossvalidation and independent testing results demonstrate that the constructed predictor is promising for identifying promoter and classifying TATA and TATA-less promoter. Furthermore, to make it easier for most experimental scientists get the results they need, a user-friendly web server has been established at http://www.jci-bioinfo.cn/iPPT(2L)-CNN.
Collapse
|
15
|
Dou L, Li X, Zhang L, Xiang H, Xu L. iGlu_AdaBoost: Identification of Lysine Glutarylation Using the AdaBoost Classifier. J Proteome Res 2020; 20:191-201. [PMID: 33090794 DOI: 10.1021/acs.jproteome.0c00314] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
Lysine glutarylation is a newly reported post-translational modification (PTM) that plays significant roles in regulating metabolic and mitochondrial processes. Accurate identification of protein glutarylation is the primary task to better investigate molecular functions and various applications. Due to the common disadvantages of the time-consuming and expensive nature of traditional biological sequencing techniques as well as the explosive growth of protein data, building precise computational models to rapidly diagnose glutarylation is a popular and feasible solution. In this work, we proposed a novel AdaBoost-based predictor called iGlu_AdaBoost to distinguish glutarylation and non-glutarylation sequences. Here, the top 37 features were chosen from a total of 1768 combined features using Chi2 following incremental feature selection (IFS) to build the model, including 188D, the composition of k-spaced amino acid pairs (CKSAAP), and enhanced amino acid composition (EAAC). With the help of the hybrid-sampling method SMOTE-Tomek, the AdaBoost algorithm was performed with satisfactory recall, specificity, and AUC values of 87.48%, 72.49%, and 0.89 over 10-fold cross validation as well as 72.73%, 71.92%, and 0.63 over independent test, respectively. Further feature analysis inferred that positively charged amino acids RK play critical roles in glutarylation recognition. Our model presented the well generalization ability and consistency of the prediction results of positive and negative samples, which is comparable to four published tools. The proposed predictor is an efficient tool to find potential glutarylation sites and provides helpful suggestions for further research on glutarylation mechanisms and concerned disease treatments.
Collapse
Affiliation(s)
- Lijun Dou
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen 518055, China.,Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
| | - Xiaoling Li
- Department of Oncology, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin 150000, China
| | - Lichao Zhang
- School of Intelligent Manufacturing and Equipment, Shenzhen Institute of Information Technology, Shenzhen 518172, China
| | - Huaikun Xiang
- School of Automotive and Transportation Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
| | - Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
| |
Collapse
|
16
|
Arafat ME, Ahmad MW, Shovan S, Dehzangi A, Dipta SR, Hasan MAM, Taherzadeh G, Shatabda S, Sharma A. Accurately Predicting Glutarylation Sites Using Sequential Bi-Peptide-Based Evolutionary Features. Genes (Basel) 2020; 11:E1023. [PMID: 32878321 PMCID: PMC7565944 DOI: 10.3390/genes11091023] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 08/19/2020] [Accepted: 08/27/2020] [Indexed: 02/07/2023] Open
Abstract
Post Translational Modification (PTM) is defined as the alteration of protein sequence upon interaction with different macromolecules after the translation process. Glutarylation is considered one of the most important PTMs, which is associated with a wide range of cellular functioning, including metabolism, translation, and specified separate subcellular localizations. During the past few years, a wide range of computational approaches has been proposed to predict Glutarylation sites. However, despite all the efforts that have been made so far, the prediction performance of the Glutarylation sites has remained limited. One of the main challenges to tackle this problem is to extract features with significant discriminatory information. To address this issue, we propose a new machine learning method called BiPepGlut using the concept of a bi-peptide-based evolutionary method for feature extraction. To build this model, we also use the Extra-Trees (ET) classifier for the classification purpose, which, to the best of our knowledge, has never been used for this task. Our results demonstrate BiPepGlut is able to significantly outperform previously proposed models to tackle this problem. BiPepGlut achieves 92.0%, 84.8%, 95.6%, 0.82, and 0.88 in accuracy, sensitivity, specificity, Matthew's Correlation Coefficient, and F1-score, respectively. BiPepGlut is implemented as a publicly available online predictor.
Collapse
Affiliation(s)
- Md. Easin Arafat
- Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh; (M.E.A.); (M.W.A.); (S.R.D.)
| | - Md. Wakil Ahmad
- Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh; (M.E.A.); (M.W.A.); (S.R.D.)
| | - S.M. Shovan
- Department of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi 6204, Bangladesh; (S.M.S.); (M.A.M.H.)
| | - Abdollah Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ 08102, USA;
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA
| | - Shubhashis Roy Dipta
- Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh; (M.E.A.); (M.W.A.); (S.R.D.)
| | - Md. Al Mehedi Hasan
- Department of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi 6204, Bangladesh; (S.M.S.); (M.A.M.H.)
| | - Ghazaleh Taherzadeh
- Institute for Bioscience and Biotechnology Research, University of Maryland, College Park, MD 20742, USA
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Dhaka 1212, Bangladesh; (M.E.A.); (M.W.A.); (S.R.D.)
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD 4111, Australia
- Department of Medical Science Mathematics, Tokyo Medical and Dental University (TMDU), Tokyo 113-8510, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa 230-0045, Japan
- School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji
| |
Collapse
|
17
|
Ju Z, Wang SY. Computational Identification of Lysine Glutarylation Sites Using Positive-Unlabeled Learning. Curr Genomics 2020; 21:204-211. [PMID: 33071614 PMCID: PMC7521029 DOI: 10.2174/1389202921666200511072327] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2019] [Revised: 04/12/2020] [Accepted: 04/13/2020] [Indexed: 12/27/2022] Open
Abstract
Background
As a new type of protein acylation modification, lysine glutarylation has been found to play a crucial role in metabolic processes and mitochondrial functions. To further explore the biological mechanisms and functions of glutarylation, it is significant to predict the potential glutarylation sites. In the existing glutarylation site predictors, experimentally verified glutarylation sites are treated as positive samples and non-verified lysine sites as the negative samples to train predictors. However, the non-verified lysine sites may contain some glutarylation sites which have not been experimentally identified yet. Methods
In this study, experimentally verified glutarylation sites are treated as the positive samples, whereas the remaining non-verified lysine sites are treated as unlabeled samples. A bioinformatics tool named PUL-GLU was developed to identify glutarylation sites using a positive-unlabeled learning algorithm. Results
Experimental results show that PUL-GLU significantly outperforms the current glutarylation site predictors. Therefore, PUL-GLU can be a powerful tool for accurate identification of protein glutarylation sites. Conclusion
A user-friendly web-server for PUL-GLU is available at http://bioinform.cn/pul_glu/.
Collapse
Affiliation(s)
- Zhe Ju
- College of Science, Shenyang Aerospace University, Shenyang110136, P.R. China
| | - Shi-Yun Wang
- College of Science, Shenyang Aerospace University, Shenyang110136, P.R. China
| |
Collapse
|
18
|
AL-barakati HJ, Saigo H, Newman RH, KC DB. RF-GlutarySite: a random forest based predictor for glutarylation sites. Mol Omics 2019; 15:189-204. [DOI: 10.1039/c9mo00028c] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Glutarylation, which is a newly identified posttranslational modification that occurs on lysine residues, has recently emerged as an important regulator of several metabolic and mitochondrial processes. Here, we describe the development of RF-GlutarySite, a random forest-based predictor designed to predict glutarylation sites based on protein primary amino acid sequence.
Collapse
Affiliation(s)
- Hussam J. AL-barakati
- Department of Computational Science and Engineering
- North Carolina Agricultural & Technical State University
- Greensboro
- USA
| | - Hiroto Saigo
- Department of Informatics
- Kyushu University
- Fukuoka 819-0395
- Japan
| | - Robert H. Newman
- Department of Biology
- North Carolina Agricultural & Technical State University
- Greensboro
- USA
| | - Dukka B. KC
- Department of Computational Science and Engineering
- North Carolina Agricultural & Technical State University
- Greensboro
- USA
| |
Collapse
|