1
|
Zhu L, Zhang Z, Yang S. BioSeq_Ksite: Multi-perspective feature-driven prediction of protein succinylation based on an adaptive attention module with SSBCE loss strategy. Int J Biol Macromol 2025; 310:143601. [PMID: 40306513 DOI: 10.1016/j.ijbiomac.2025.143601] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2024] [Revised: 04/23/2025] [Accepted: 04/26/2025] [Indexed: 05/02/2025]
Abstract
Succinylation is a post-translational modification in which a succinyl group is transferred to the lysine residue of a protein, playing a crucial role in regulating both protein structure and cellular function. This paper introduces a novel sequential model, BioSeq_Ksite, designed to enhance succinylation prediction accuracy by integrating an adaptive attention mechanism and a joint loss function. This study first presents a new hybrid feature, ProtFusion, which combines the physicochemical properties of amino acids with pretrained models. Next, this paper introduces an adaptive attention module that enables the model to autonomously identify important features during training. Additionally, a gated network architecture is adopted to create a dual-branch sequential model. Finally, by combining sensitivity, specificity, and cross-entropy loss, a new joint loss function is proposed, which is used for succinylation prediction for the first time and significantly enhances the model's ability to handle class-imbalanced data. Evaluation on the test dataset shows that BioSeq_Ksite outperforms other models in MCC, Sn, AUC, and F1-Score, with a 7.68 % improvement in MCC over the second-best model. It provides an efficient and reliable tool for succinylation research and application. BioSeq_Ksite can be accessed at https://github.com/zzq1124ZHZ/BioSeq_Ksite.
Collapse
Affiliation(s)
- Lun Zhu
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China; The Affiliated Changzhou No.2 People's Hospital of Nanjing Medical University, Changzhou 213164, China
| | - Ziqi Zhang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China
| | - Sen Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou 213164, China; The Affiliated Changzhou No.2 People's Hospital of Nanjing Medical University, Changzhou 213164, China.
| |
Collapse
|
2
|
Tran TX, Khanh Le NQ, Nguyen VN. Integrating CNN and Bi-LSTM for protein succinylation sites prediction based on Natural Language Processing technique. Comput Biol Med 2025; 186:109664. [PMID: 39798505 DOI: 10.1016/j.compbiomed.2025.109664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2024] [Revised: 12/10/2024] [Accepted: 01/06/2025] [Indexed: 01/15/2025]
Abstract
Protein succinylation, a post-translational modification wherein a succinyl group (-CO-CH₂-CH₂-CO-) attaches to lysine residues, plays a critical regulatory role in cellular processes. Dysregulated succinylation has been implicated in the onset and progression of various diseases, including liver, cardiac, pulmonary, and neurological disorders. However, identifying succinylation sites through experimental methods is often labor-intensive, costly, and technically challenging. To address this, we introduce an approach called CbiLSuccSite, that integrates Convolutional Neural Networks (CNN) with Bidirectional Long Short-Term Memory (Bi-LSTM) networks for the accurate prediction of protein succinylation sites. Our approach employs a word embedding layer to encode protein sequences, enabling the automatic learning of intricate patterns and dependencies without manual feature extraction. In 10-fold cross-validation, CBiLSuccSite achieved superior predictive performance, with an Area Under the Curve (AUC) of 0.826 and a Matthews Correlation Coefficient (MCC) of 0.502. Independent testing further validated its robustness, yielding an AUC of 0.818 and an MCC of 0.53. The integration of CNN and Bi-LSTM leverages the strengths of both architectures, establishing CBiLSuccSite as an effective tool for protein language processing and succinylation site prediction. Our model and code are publicly accessible at: https://github.com/nuinvtnu/CBiLSuccSite.
Collapse
Affiliation(s)
- Thi-Xuan Tran
- Thai Nguyen University of Economics and Business Administration, Thai Nguyen City, Viet Nam.
| | - Nguyen Quoc Khanh Le
- In-Service Master Program in Artificial Intelligence in Medicine, Taipei Medical University, Taiwan; AIBioMed Research Group, Taipei Medical University, Taiwan.
| | - Van-Nui Nguyen
- Thai Nguyen University of Information and Communication Technology, Thai Nguyen City, Viet Nam.
| |
Collapse
|
3
|
Raju C, Sankaranarayanan K. Insights on post-translational modifications in fatty liver and fibrosis progression. Biochim Biophys Acta Mol Basis Dis 2025; 1871:167659. [PMID: 39788217 DOI: 10.1016/j.bbadis.2025.167659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 12/20/2024] [Accepted: 01/02/2025] [Indexed: 01/12/2025]
Abstract
Metabolic dysfunction-associated steatotic liver disease [MASLD] is a pervasive multifactorial health burden. Post-translational modifications [PTMs] of amino acid residues in protein domains demonstrate pivotal roles for imparting dynamic alterations in the cellular micro milieu. The crux of identifying novel druggable targets relies on comprehensively studying the etiology of metabolic disorders. This review article presents how different chemical moieties of various PTMs like phosphorylation, methylation, ubiquitination, glutathionylation, neddylation, acetylation, SUMOylation, lactylation, crotonylation, hydroxylation, glycosylation, citrullination, S-sulfhydration and succinylation presents the cause-effect contribution towards the MASLD spectra. Additionally, the therapeutic prospects in the management of liver steatosis and hepatic fibrosis via targeting PTMs and regulatory enzymes are also encapsulated. This review seeks to understand the function of protein modifications in progression and promote the markers discovery of diagnostic, prognostic and drug targets towards MASLD management which could also halt the progression of a catalogue of related diseases.
Collapse
Affiliation(s)
- Chithra Raju
- Ion Channel Biology Laboratory, AU-KBC Research Centre, Madras Institute of Technology Campus, Anna University, Chrompet, Chennai 600 044, Tamil Nadu, India
| | - Kavitha Sankaranarayanan
- Ion Channel Biology Laboratory, AU-KBC Research Centre, Madras Institute of Technology Campus, Anna University, Chrompet, Chennai 600 044, Tamil Nadu, India.
| |
Collapse
|
4
|
Zeng K, Yin H. KAT2A changes the function of endometrial stromal cells via regulating the succinylation of ENO1. Open Life Sci 2024; 19:20220785. [PMID: 38585644 PMCID: PMC10997078 DOI: 10.1515/biol-2022-0785] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 02/29/2024] [Accepted: 03/05/2024] [Indexed: 04/09/2024] Open
Abstract
Endometriosis is increasingly affecting women worldwide and research is focusing on identifying key targets in its pathogenesis. Changes in succinylation genes regulate the function of this protein and further influence the development of the disease. However, the role of succinylation genes in endometriosis is not clear from current studies. The expression of succinylation genes was determined in ectopic endometrium (EC) and ectopic patients with uterine fibroids (EN) by real-time quantitative PCR (qRT-PCR) and Western blot. Cell Counting Kit-8, transwell assays, and flow cytometry were used to assess endometrial stromal cells (ESCs) proliferation, apoptosis, migration, and invasion. KAT2A and ENO1 association was detected by qRT-PCR, immunofluorescence, and CoIP. We found that gene and protein levels of KAT2A were significantly increased in the EC group compared to EN group tissues. KAT2A silencing inhibited cell proliferation, migration, and invasion and promoted apoptosis. Western blot results showed that the expression of ENO1 and its succinylation was significantly upregulated in ECSc after KAT2A overexpression. CoIP results showed that KAT2A is positively bound to ENO1. Immunofluorescence also showed co-localized expression of KAT2A with ENO1. Furthermore, ENO1 overexpression reversed the effects of KAT2A silencing on the malignant behavior of ESCs. In summary, we found that succinylation of ENO1 mediated by KAT2A played a role in promoting the progression of endometriosis.
Collapse
Affiliation(s)
- Kangkang Zeng
- Department of Obstetrics and Gynecology, Taihe Hospital, Hubei University of Medicine, 32 Renmin South Road, Maojian District, Shiyan442000, Hubei, China
| | - Hao Yin
- Department of Obstetrics and Gynecology, Taihe Hospital, Hubei University of Medicine, 32 Renmin South Road, Maojian District, Shiyan442000, Hubei, China
| |
Collapse
|
5
|
Adejor J, Tumukunde E, Li G, Lin H, Xie R, Wang S. Impact of Lysine Succinylation on the Biology of Fungi. Curr Issues Mol Biol 2024; 46:1020-1046. [PMID: 38392183 PMCID: PMC10888112 DOI: 10.3390/cimb46020065] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2023] [Revised: 01/02/2024] [Accepted: 01/03/2024] [Indexed: 02/24/2024] Open
Abstract
Post-translational modifications (PTMs) play a crucial role in protein functionality and the control of various cellular processes and secondary metabolites (SMs) in fungi. Lysine succinylation (Ksuc) is an emerging protein PTM characterized by the addition of a succinyl group to a lysine residue, which induces substantial alteration in the chemical and structural properties of the affected protein. This chemical alteration is reversible, dynamic in nature, and evolutionarily conserved. Recent investigations of numerous proteins that undergo significant succinylation have underscored the potential significance of Ksuc in various biological processes, encompassing normal physiological functions and the development of certain pathological processes and metabolites. This review aims to elucidate the molecular mechanisms underlying Ksuc and its diverse functions in fungi. Both conventional investigation techniques and predictive tools for identifying Ksuc sites were also considered. A more profound comprehension of Ksuc and its impact on the biology of fungi have the potential to unveil new insights into post-translational modification and may pave the way for innovative approaches that can be applied across various clinical contexts in the management of mycotoxins.
Collapse
Affiliation(s)
- John Adejor
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Elisabeth Tumukunde
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Guoqi Li
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Hong Lin
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Rui Xie
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Shihua Wang
- Key Laboratory of Pathogenic Fungi and Mycotoxins of Fujian Province, Key Laboratory of Biopesticide and Chemical Biology of Education Ministry, College of Life Sciences, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| |
Collapse
|
6
|
PD-BertEDL: An Ensemble Deep Learning Method Using BERT and Multivariate Representation to Predict Peptide Detectability. Int J Mol Sci 2022; 23:ijms232012385. [PMID: 36293242 PMCID: PMC9604182 DOI: 10.3390/ijms232012385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 10/11/2022] [Accepted: 10/12/2022] [Indexed: 12/03/2022] Open
Abstract
Peptide detectability is defined as the probability of identifying a peptide from a mixture of standard samples, which is a key step in protein identification and analysis. Exploring effective methods for predicting peptide detectability is helpful for disease treatment and clinical research. However, most existing computational methods for predicting peptide detectability rely on a single information. With the increasing complexity of feature representation, it is necessary to explore the influence of multivariate information on peptide detectability. Thus, we propose an ensemble deep learning method, PD-BertEDL. Bidirectional encoder representations from transformers (BERT) is introduced to capture the context information of peptides. Context information, sequence information, and physicochemical information of peptides were combined to construct the multivariate feature space of peptides. We use different deep learning methods to capture the high-quality features of different categories of peptides information and use the average fusion strategy to integrate three model prediction results to solve the heterogeneity problem and to enhance the robustness and adaptability of the model. The experimental results show that PD-BertEDL is superior to the existing prediction methods, which can effectively predict peptide detectability and provide strong support for protein identification and quantitative analysis, as well as disease treatment.
Collapse
|
7
|
Liu X, Xu LL, Lu YP, Yang T, Gu XY, Wang L, Liu Y. Deep_KsuccSite: A novel deep learning method for the identification of lysine succinylation sites. Front Genet 2022; 13:1007618. [PMID: 36246655 PMCID: PMC9557156 DOI: 10.3389/fgene.2022.1007618] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 09/08/2022] [Indexed: 11/13/2022] Open
Abstract
Identification of lysine (symbol Lys or K) succinylation (Ksucc) sites centralizes the basis for disclosing the mechanism and function of lysine succinylation modifications. Traditional experimental methods for Ksucc site ientification are often costly and time-consuming. Therefore, it is necessary to construct an efficient computational method to prediction the presence of Ksucc sites in protein sequences. In this study, we proposed a novel and effective predictor for the identification of Ksucc sites based on deep learning algorithms that was termed as Deep_KsuccSite. The predictor adopted Composition, Transition, and Distribution (CTD) Composition (CTDC), Enhanced Grouped Amino Acid Composition (EGAAC), Amphiphilic Pseudo-Amino Acid Composition (APAAC), and Embedding Encoding methods to encode peptides, then constructed three base classifiers using one-dimensional (1D) convolutional neural network (CNN) and 2D-CNN, and finally utilized voting method to get the final results. K-fold cross-validation and independent testing showed that Deep_KsuccSite could serve as an effective tool to identify Ksucc sites in protein sequences. In addition, the ablation experiment results based on voting, feature combination, and model architecture showed that Deep_KsuccSite could make full use of the information of different features to construct an effective classifier. Taken together, we developed Deep_KsuccSite in this study, which was based on deep learning algorithm and could achieved better prediction accuracy than current methods for lysine succinylation sites. The code and dataset involved in this methodological study are permanently available at the URL https://github.com/flyinsky6/Deep_KsuccSite.
Collapse
Affiliation(s)
- Xin Liu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, China
- *Correspondence: Xin Liu, ; Liang Wang, ; Yong Liu,
| | - Lin-Lin Xu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, China
| | - Ya-Ping Lu
- College of Computer Science and Technology, China University of Mining and Technology, Xuzhou, China
| | - Ting Yang
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, China
| | - Xin-Yu Gu
- School of Medical Informatics and Engineering, Xuzhou Medical University, Xuzhou, China
| | - Liang Wang
- Laboratory Medicine, Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences, Guangzhou, China
- *Correspondence: Xin Liu, ; Liang Wang, ; Yong Liu,
| | - Yong Liu
- Jiangsu Center for the Collaboration and Innovation of Cancer Biotherapy, Cancer Institute, Xuzhou Medical University, Xuzhou, Jiangsu, China
- *Correspondence: Xin Liu, ; Liang Wang, ; Yong Liu,
| |
Collapse
|
8
|
Zeng Y, Chen Y, Yuan Z. iSuc-ChiDT: a computational method for identifying succinylation sites using statistical difference table encoding and the chi-square decision table classifier. BioData Min 2022; 15:3. [PMID: 35144656 PMCID: PMC8832670 DOI: 10.1186/s13040-022-00290-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2021] [Accepted: 01/30/2022] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Lysine succinylation is a type of protein post-translational modification which is widely involved in cell differentiation, cell metabolism and other important physiological activities. To study the molecular mechanism of succinylation in depth, succinylation sites need to be accurately identified, and because experimental approaches are costly and time-consuming, there is a great demand for reliable computational methods. Feature extraction is a key step in building succinylation site prediction models, and the development of effective new features improves predictive accuracy. Because the number of false succinylation sites far exceeds that of true sites, traditional classifiers perform poorly, and designing a classifier to effectively handle highly imbalanced datasets has always been a challenge. RESULTS A new computational method, iSuc-ChiDT, is proposed to identify succinylation sites in proteins. In iSuc-ChiDT, chi-square statistical difference table encoding is developed to extract positional features, and has a higher predictive accuracy and fewer features compared to common position-based encoding schemes such as binary encoding and physicochemical property encoding. Single amino acid and undirected pair-coupled amino acid composition features are supplemented to improve the fault tolerance for residue insertions and deletions. After feature selection by Chi-MIC-share algorithm, the chi-square decision table (ChiDT) classifier is constructed for imbalanced classification. With a training set of 4748:50,551(true: false sites), ChiDT clearly outperforms traditional classifiers in predictive accuracy, and runs fast. Using an independent testing set of experimentally identified succinylation sites, iSuc-ChiDT achieves a sensitivity of 70.47%, a specificity of 66.27%, a Matthews correlation coefficient of 0.205, and a global accuracy index Q9 of 0.683, showing a significant improvement in sensitivity and overall accuracy compared to PSuccE, Success, SuccinSite, and other existing succinylation site predictors. CONCLUSIONS iSuc-ChiDT shows great promise in predicting succinylation sites and is expected to facilitate further experimental investigation of protein succinylation.
Collapse
Affiliation(s)
- Ying Zeng
- School of Computer and Communication, Hunan Institute of Engineering, Xiangtan, 411104 Hunan China
| | - Yuan Chen
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-making, Hunan Agricultural University, Changsha, 410128 Hunan China
| | - Zheming Yuan
- Hunan Engineering & Technology Research Center for Agricultural Big Data Analysis & Decision-making, Hunan Agricultural University, Changsha, 410128 Hunan China
| |
Collapse
|
9
|
Ning Q, Ma Z, Zhao X, Yin M. SSKM_Succ: A Novel Succinylation Sites Prediction Method Incorporating K-Means Clustering With a New Semi-Supervised Learning Algorithm. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:643-652. [PMID: 32750881 DOI: 10.1109/tcbb.2020.3006144] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Protein succinylation is a type of post-translational modification (PTM) that occurs on lysine sites and plays a key role in protein conformation regulation and cellular function control. When training in computational method, it is difficult to designate negative samples because of the uncertainty of non-succinylation lysine sites, and if not handled properly, it may affect the performance of computational models dramatically. Therefore, we propose a new semi-supervised learning method to identify reliable non-succinylation lysine sites as negative samples. This method, named SSKM_Succ, also employs K-means clustering to divide data into 5 clusters. Besides, information of proximal PTMs and three kinds of sequence features (grey pseudo amino acid composition, K-space and position-special amino acid propensity) are utilized to formulate protein. Then, we perform a two-step feature selection to remove redundant features and construct the optimization model for each cluster. Finally, support vector machine is applied to construct a prediction model for each cluster. Promising results are obtained by this method with an accuracy of 80.18 percent for succinylation sites on the independent testing dataset. Meanwhile, we compare the result with other existing tools, and it shows that our method is promising for predicting succinylation sites. Through analysis, we further verify that succinylated protein has potential effects on amino acid degradation and fatty acid metabolism, and speculate that protein succinylation may be closely related to neurodegenerative diseases. The code of SSKM_Succ is available on the web https://github.com/yangyq505/SSKM_Succ.git.
Collapse
|
10
|
Structure, Biosynthesis, and Biological Activity of Succinylated Forms of Bacteriocin BacSp222. Int J Mol Sci 2021; 22:ijms22126256. [PMID: 34200765 PMCID: PMC8230399 DOI: 10.3390/ijms22126256] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2021] [Revised: 06/04/2021] [Accepted: 06/07/2021] [Indexed: 01/21/2023] Open
Abstract
BacSp222 is a multifunctional peptide produced by Staphylococcus pseudintermedius 222. This 50-amino acid long peptide belongs to subclass IId of bacteriocins and forms a four-helix bundle molecule. In addition to bactericidal functions, BacSp222 possesses also features of a virulence factor, manifested in immunomodulatory and cytotoxic activities toward eukaryotic cells. In the present study, we demonstrate that BacSp222 is produced in several post-translationally modified forms, succinylated at the ε-amino group of lysine residues. Such modifications have not been previously described for any bacteriocins. NMR and circular dichroism spectroscopy studies have shown that the modifications do not alter the spatial structure of the peptide. At the same time, succinylation significantly diminishes its bactericidal and cytotoxic potential. We demonstrate that the modification of the bacteriocin is an effect of non-enzymatic reaction with a highly reactive intracellular metabolite, i.e., succinyl-coenzyme A. The production of succinylated forms of the bacteriocin depends on environmental factors and on the access of bacteria to nutrients. Our study indicates that the production of succinylated forms of bacteriocin occurs in response to the changing environment, protects producer cells against the autotoxicity of the excreted peptide, and limits the pathogenicity of the strain.
Collapse
|
11
|
LSTMCNNsucc: A Bidirectional LSTM and CNN-Based Deep Learning Method for Predicting Lysine Succinylation Sites. BIOMED RESEARCH INTERNATIONAL 2021; 2021:9923112. [PMID: 34159204 PMCID: PMC8188601 DOI: 10.1155/2021/9923112] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/15/2021] [Revised: 04/25/2021] [Accepted: 05/03/2021] [Indexed: 11/17/2022]
Abstract
Lysine succinylation is a typical protein post-translational modification and plays a crucial role of regulation in the cellular process. Identifying succinylation sites is fundamental to explore its functions. Although many computational methods were developed to deal with this challenge, few considered semantic relationship between residues. We combined long short-term memory (LSTM) and convolutional neural network (CNN) into a deep learning method for predicting succinylation site. The proposed method obtained a Matthews correlation coefficient of 0.2508 on the independent test, outperforming state of the art methods. We also performed the enrichment analysis of succinylation proteins. The results showed that functions of succinylation were conserved across species but differed to a certain extent with species. On basis of the proposed method, we developed a user-friendly web server for predicting succinylation sites.
Collapse
|
12
|
Yang Y, Wang H, Li W, Wang X, Wei S, Liu Y, Xu Y. Prediction and analysis of multiple protein lysine modified sites based on conditional wasserstein generative adversarial networks. BMC Bioinformatics 2021; 22:171. [PMID: 33789579 PMCID: PMC8010967 DOI: 10.1186/s12859-021-04101-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Accepted: 03/23/2021] [Indexed: 01/05/2023] Open
Abstract
BACKGROUND Protein post-translational modification (PTM) is a key issue to investigate the mechanism of protein's function. With the rapid development of proteomics technology, a large amount of protein sequence data has been generated, which highlights the importance of the in-depth study and analysis of PTMs in proteins. METHOD We proposed a new multi-classification machine learning pipeline MultiLyGAN to identity seven types of lysine modified sites. Using eight different sequential and five structural construction methods, 1497 valid features were remained after the filtering by Pearson correlation coefficient. To solve the data imbalance problem, Conditional Generative Adversarial Network (CGAN) and Conditional Wasserstein Generative Adversarial Network (CWGAN), two influential deep generative methods were leveraged and compared to generate new samples for the types with fewer samples. Finally, random forest algorithm was utilized to predict seven categories. RESULTS In the tenfold cross-validation, accuracy (Acc) and Matthews correlation coefficient (MCC) were 0.8589 and 0.8376, respectively. In the independent test, Acc and MCC were 0.8549 and 0.8330, respectively. The results indicated that CWGAN better solved the existing data imbalance and stabilized the training error. Alternatively, an accumulated feature importance analysis reported that CKSAAP, PWM and structural features were the three most important feature-encoding schemes. MultiLyGAN can be found at https://github.com/Lab-Xu/MultiLyGAN . CONCLUSIONS The CWGAN greatly improved the predictive performance in all experiments. Features derived from CKSAAP, PWM and structure schemes are the most informative and had the greatest contribution to the prediction of PTM.
Collapse
Affiliation(s)
- Yingxi Yang
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Hui Wang
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100080, China
| | - Wen Li
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Xiaobo Wang
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing, 100083, China
| | - Shizhao Wei
- No. 15 Research Institute, China Electronics Technology Group Corporation, Beijing, 100083, China
| | - Yulong Liu
- No. 15 Research Institute, China Electronics Technology Group Corporation, Beijing, 100083, China
| | - Yan Xu
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing, 100083, China.
| |
Collapse
|
13
|
Wang H, Xi Q, Liang P, Zheng L, Hong Y, Zuo Y. IHEC_RAAC: a online platform for identifying human enzyme classes via reduced amino acid cluster strategy. Amino Acids 2021; 53:239-251. [PMID: 33486591 DOI: 10.1007/s00726-021-02941-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Accepted: 01/11/2021] [Indexed: 12/18/2022]
Abstract
Enzymes have been proven to play considerable roles in disease diagnosis and biological functions. The feature extraction that truly reflects the intrinsic properties of protein is the most critical step for the automatic identification of enzymes. Although lots of feature extraction methods have been proposed, some challenges remain. In this study, we developed a predictor called IHEC_RAAC, which has the capability to identify whether a protein is a human enzyme and distinguish the function of the human enzyme. To improve the feature representation ability, protein sequences were encoded by a new feature-vector called 'reduced amino acid cluster'. We calculated 673 amino acid reduction alphabets to determine the optimal feature representative scheme. The tenfold cross-validation test showed that the accuracy of IHEC_RAAC to identify human enzymes was 74.66% and further discriminate the human enzyme classes with an accuracy of 54.78%, which was 2.06% and 8.68% higher than the state-of-the-art predictors, respectively. Additionally, the results from the independent dataset indicated that IHEC_RAAC can effectively predict human enzymes and human enzyme classes to further provide guidance for protein research. A user-friendly web server, IHEC_RAAC, is freely accessible at http://bioinfor.imu.edu.cn/ihecraac .
Collapse
Affiliation(s)
- Hao Wang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Qilemuge Xi
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Pengfei Liang
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Lei Zheng
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Yan Hong
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China
| | - Yongchun Zuo
- State Key Laboratory of Reproductive Regulation and Breeding of Grassland Livestock, College of Life Sciences, Inner Mongolia University, Hohhot, 010070, China.
| |
Collapse
|
14
|
Zhang L, Liu M, Qin X, Liu G. Succinylation Site Prediction Based on Protein Sequences Using the IFS-LightGBM (BO) Model. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2020; 2020:8858489. [PMID: 33224267 PMCID: PMC7673955 DOI: 10.1155/2020/8858489] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 09/25/2020] [Accepted: 10/24/2020] [Indexed: 01/08/2023]
Abstract
Succinylation is an important posttranslational modification of proteins, which plays a key role in protein conformation regulation and cellular function control. Many studies have shown that succinylation modification on protein lysine residue is closely related to the occurrence of many diseases. To understand the mechanism of succinylation profoundly, it is necessary to identify succinylation sites in proteins accurately. In this study, we develop a new model, IFS-LightGBM (BO), which utilizes the incremental feature selection (IFS) method, the LightGBM feature selection method, the Bayesian optimization algorithm, and the LightGBM classifier, to predict succinylation sites in proteins. Specifically, pseudo amino acid composition (PseAAC), position-specific scoring matrix (PSSM), disorder status, and Composition of k-spaced Amino Acid Pairs (CKSAAP) are firstly employed to extract feature information. Then, utilizing the combination of the LightGBM feature selection method and the incremental feature selection (IFS) method selects the optimal feature subset for the LightGBM classifier. Finally, to increase prediction accuracy and reduce the computation load, the Bayesian optimization algorithm is used to optimize the parameters of the LightGBM classifier. The results reveal that the IFS-LightGBM (BO)-based prediction model performs better when it is evaluated by some common metrics, such as accuracy, recall, precision, Matthews Correlation Coefficient (MCC), and F-measure.
Collapse
Affiliation(s)
- Lu Zhang
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, China
| | - Min Liu
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, China
| | - Xinyi Qin
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, China
| | - Guangzhong Liu
- College of Information Engineering, Shanghai Maritime University, 1550 Haigang Ave., Shanghai 201306, China
| |
Collapse
|
15
|
HybridSucc: A Hybrid-learning Architecture for General and Species-specific Succinylation Site Prediction. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 18:194-207. [PMID: 32861878 PMCID: PMC7647696 DOI: 10.1016/j.gpb.2019.11.010] [Citation(s) in RCA: 25] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2019] [Revised: 09/17/2019] [Accepted: 11/13/2019] [Indexed: 11/21/2022]
Abstract
As an important protein acylation modification, lysine succinylation (Ksucc) is involved in diverse biological processes, and participates in human tumorigenesis. Here, we collected 26,243 non-redundant known Ksucc sites from 13 species as the benchmark data set, combined 10 types of informative features, and implemented a hybrid-learning architecture by integrating deep-learning and conventional machine-learning algorithms into a single framework. We constructed a new tool named HybridSucc, which achieved area under curve (AUC) values of 0.885 and 0.952 for general and human-specific prediction of Ksucc sites, respectively. In comparison, the accuracy of HybridSucc was 17.84%–50.62% better than that of other existing tools. Using HybridSucc, we conducted a proteome-wide prediction and prioritized 370 cancer mutations that change Ksucc states of 218 important proteins, including PKM2, SHMT2, and IDH2. We not only developed a high-profile tool for predicting Ksucc sites, but also generated useful candidates for further experimental consideration. The online service of HybridSucc can be freely accessed for academic research at http://hybridsucc.biocuckoo.org/.
Collapse
|
16
|
Kao HJ, Nguyen VN, Huang KY, Chang WC, Lee TY. SuccSite: Incorporating Amino Acid Composition and Informative k-spaced Amino Acid Pairs to Identify Protein Succinylation Sites. GENOMICS PROTEOMICS & BIOINFORMATICS 2020; 18:208-219. [PMID: 32592791 PMCID: PMC7647693 DOI: 10.1016/j.gpb.2018.10.010] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/08/2018] [Revised: 10/01/2018] [Accepted: 10/11/2018] [Indexed: 12/14/2022]
Abstract
Protein succinylation is a biochemical reaction in which a succinyl group (-CO-CH2-CH2-CO-) is attached to the lysine residue of a protein molecule. Lysine succinylation plays important regulatory roles in living cells. However, studies in this field are limited by the difficulty in experimentally identifying the substrate site specificity of lysine succinylation. To facilitate this process, several tools have been proposed for the computational identification of succinylated lysine sites. In this study, we developed an approach to investigate the substrate specificity of lysine succinylated sites based on amino acid composition. Using experimentally verified lysine succinylated sites collected from public resources, the significant differences in position-specific amino acid composition between succinylated and non-succinylated sites were represented using the Two Sample Logo program. These findings enabled the adoption of an effective machine learning method, support vector machine, to train a predictive model with not only the amino acid composition, but also the composition of k-spaced amino acid pairs. After the selection of the best model using a ten-fold cross-validation approach, the selected model significantly outperformed existing tools based on an independent dataset manually extracted from published research articles. Finally, the selected model was used to develop a web-based tool, SuccSite, to aid the study of protein succinylation. Two proteins were used as case studies on the website to demonstrate the effective prediction of succinylation sites. We will regularly update SuccSite by integrating more experimental datasets. SuccSite is freely accessible at http://csb.cse.yzu.edu.tw/SuccSite/.
Collapse
Affiliation(s)
- Hui-Ju Kao
- Department of Computer Science and Engineering, Yuan Ze University, Taoyuan 32003, Taiwan, China
| | - Van-Nui Nguyen
- Department of Information Technology, University of Information and Communication Technology, Thai Nguyen 1000, Vietnam
| | - Kai-Yao Huang
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China; Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Wen-Chi Chang
- Institute of Tropical Plant Sciences, Cheng Kung University, Tainan 701, Taiwan, China
| | - Tzong-Yi Lee
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China; Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen 518172, China.
| |
Collapse
|
17
|
Xu H, Jia P, Zhao Z. Deep4mC: systematic assessment and computational prediction for DNA N4-methylcytosine sites by deep learning. Brief Bioinform 2020; 22:5856341. [PMID: 32578842 DOI: 10.1093/bib/bbaa099] [Citation(s) in RCA: 51] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2020] [Revised: 04/16/2020] [Accepted: 05/02/2020] [Indexed: 12/11/2022] Open
Abstract
DNA N4-methylcytosine (4mC) modification represents a novel epigenetic regulation. It involves in various cellular processes, including DNA replication, cell cycle and gene expression, among others. In addition to experimental identification of 4mC sites, in silico prediction of 4mC sites in the genome has emerged as an alternative and promising approach. In this study, we first reviewed the current progress in the computational prediction of 4mC sites and systematically evaluated the predictive capacity of eight conventional machine learning algorithms as well as 12 feature types commonly used in previous studies in six species. Using a representative benchmark dataset, we investigated the contribution of feature selection and stacking approach to the model construction, and found that feature optimization and proper reinforcement learning could improve the performance. We next recollected newly added 4mC sites in the six species' genomes and developed a novel deep learning-based 4mC site predictor, namely Deep4mC. Deep4mC applies convolutional neural networks with four representative features. For species with small numbers of samples, we extended our deep learning framework with a bootstrapping method. Our evaluation indicated that Deep4mC could obtain high accuracy and robust performance with the average area under curve (AUC) values greater than 0.9 in all species (range: 0.9005-0.9722). In comparison, Deep4mC achieved an AUC value improvement from 10.14 to 46.21% when compared to previous tools in these six species. A user-friendly web server (https://bioinfo.uth.edu/Deep4mC) was built for predicting putative 4mC sites in a genome.
Collapse
Affiliation(s)
- Haodong Xu
- Center for Precision Health, School of Biomedical Informatics
| | - Peilin Jia
- Center for Precision Health, School of Biomedical Informatics
| | - Zhongming Zhao
- Center for Precision Health, School of Biomedical Informatics
| |
Collapse
|
18
|
Huang G, Zheng Y, Wu YQ, Han GS, Yu ZG. An Information Entropy-Based Approach for Computationally Identifying Histone Lysine Butyrylation. Front Genet 2020; 10:1325. [PMID: 32117407 PMCID: PMC7033570 DOI: 10.3389/fgene.2019.01325] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2019] [Accepted: 12/05/2019] [Indexed: 12/14/2022] Open
Abstract
Butyrylation plays a crucial role in the cellular processes. Due to limit of techniques, it is a challenging task to identify histone butyrylation sites on a large scale. To fill the gap, we propose an approach based on information entropy and machine learning for computationally identifying histone butyrylation sites. The proposed method achieves 0.92 of area under the receiver operating characteristic (ROC) curve over the training set by 3-fold cross validation and 0.80 over the testing set by independent test. Feature analysis implies that amino acid residues in the down/upstream of butyrylation sites would exhibit specific sequence motif to a certain extent. Functional analysis suggests that histone butyrylation was most possibly associated with four pathways (systemic lupus erythematosus, alcoholism, viral carcinogenesis and transcriptional misregulation in cancer), was involved in binding with other molecules, processes of biosynthesis, assembly, arrangement or disassembly and was located in such complex as consists of DNA, RNA, protein, etc. The proposed method is useful to predict histone butyrylation sites. Analysis of feature and function improves understanding of histone butyrylation and increases knowledge of functions of butyrylated histones.
Collapse
Affiliation(s)
- Guohua Huang
- Provincial Key Laboratory of Informational Service for Rural Area of Southwestern Hunan, Shaoyang University, Shaoyang, China
| | - Yang Zheng
- Provincial Key Laboratory of Informational Service for Rural Area of Southwestern Hunan, Shaoyang University, Shaoyang, China
| | - Yao-Qun Wu
- Provincial Key Laboratory of Informational Service for Rural Area of Southwestern Hunan, Shaoyang University, Shaoyang, China.,Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China
| | - Guo-Sheng Han
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China
| | - Zu-Guo Yu
- Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education and Hunan Key Laboratory for Computation and Simulation in Science and Engineering, Xiangtan University, Xiangtan, China.,School of Electrical Engineering and Computer Science, Queensland University of Technology, Brisbane, QLD, Australia
| |
Collapse
|
19
|
Huang KY, Hsu JBK, Lee TY. Characterization and Identification of Lysine Succinylation Sites based on Deep Learning Method. Sci Rep 2019; 9:16175. [PMID: 31700141 PMCID: PMC6838336 DOI: 10.1038/s41598-019-52552-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2019] [Accepted: 10/18/2019] [Indexed: 12/14/2022] Open
Abstract
Succinylation is a type of protein post-translational modification (PTM), which can play important roles in a variety of cellular processes. Due to an increasing number of site-specific succinylated peptides obtained from high-throughput mass spectrometry (MS), various tools have been developed for computationally identifying succinylated sites on proteins. However, most of these tools predict succinylation sites based on traditional machine learning methods. Hence, this work aimed to carry out the succinylation site prediction based on a deep learning model. The abundance of MS-verified succinylated peptides enabled the investigation of substrate site specificity of succinylation sites through sequence-based attributes, such as position-specific amino acid composition, the composition of k-spaced amino acid pairs (CKSAAP), and position-specific scoring matrix (PSSM). Additionally, the maximal dependence decomposition (MDD) was adopted to detect the substrate signatures of lysine succinylation sites by dividing all succinylated sequences into several groups with conserved substrate motifs. According to the results of ten-fold cross-validation, the deep learning model trained using PSSM and informative CKSAAP attributes can reach the best predictive performance and also perform better than traditional machine-learning methods. Moreover, an independent testing dataset that truly did not exist in the training dataset was used to compare the proposed method with six existing prediction tools. The testing dataset comprised of 218 positive and 2621 negative instances, and the proposed model could yield a promising performance with 84.40% sensitivity, 86.99% specificity, 86.79% accuracy, and an MCC value of 0.489. Finally, the proposed method has been implemented as a web-based prediction tool (CNN-SuccSite), which is now freely accessible at http://csb.cse.yzu.edu.tw/CNN-SuccSite/.
Collapse
Affiliation(s)
- Kai-Yao Huang
- Department of Medical Research, Hsinchu Mackay Memorial Hospital, Hsinchu city, 300, Taiwan
| | - Justin Bo-Kai Hsu
- Department of Medical Research, Taipei Medical University Hospital, Taipei city, 110, Taiwan
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 518172, China. .,School of Life and Health Sciences, The Chinese University of Hong Kong, Shenzhen, 518172, China.
| |
Collapse
|
20
|
Nandi SK, Rakete S, Nahomi RB, Michel C, Dunbar A, Fritz KS, Nagaraj RH. Succinylation Is a Gain-of-Function Modification in Human Lens αB-Crystallin. Biochemistry 2019; 58:1260-1274. [PMID: 30758948 DOI: 10.1021/acs.biochem.8b01053] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Acylation of lysine residues is a common post-translational modification of cellular proteins. Here, we show that lysine succinylation, a type of acylation, occurs in human lens proteins. All of the major crystallins exhibited Nε-succinyllysine (SuccK) residues. Quantification of SuccK in human lens proteins (from donors between the ages of 20 and 73 years) by LC-MS/MS showed a range between 1.2 and 14.3 pmol/mg lens protein. The total SuccK levels were slightly reduced in aged lenses (age > 60 years) relative to young lenses (age < 30 years). Immunohistochemical analyses revealed that SuccK was present in epithelium and fiber cells. Western blotting and immunoprecipitation experiments revealed that SuccK is particularly prominent in αB-crystallin, and succinylation in vitro revealed that αB-crystallin is more prone to succinylation than αA-crystallin. Mass spectrometric analyses showed succinylation at K72, K90, K92, K166, K175, and potentially K174 in human lens αB-crystallin. We detected succinylation at K72, K82, K90, K92, K103, K121, K150, K166, K175, and potentially K174 by mass spectrometry in mildly succinylated αB-crystallin. Mild succinylation improved the chaperone activity of αB-crystallin along with minor perturbation in tertiary and quaternary structure of the protein. These observations imply that succinylation is beneficial to αB-crystallin by improving its chaperone activity with only mild conformational alterations.
Collapse
|
21
|
Hasan MM, Khatun MS, Kurata H. Large-Scale Assessment of Bioinformatics Tools for Lysine Succinylation Sites. Cells 2019; 8:cells8020095. [PMID: 30696115 PMCID: PMC6406724 DOI: 10.3390/cells8020095] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Revised: 01/24/2019] [Accepted: 01/24/2019] [Indexed: 12/19/2022] Open
Abstract
Lysine succinylation is a form of posttranslational modification of the proteins that play an essential functional role in every aspect of cell metabolism in both prokaryotes and eukaryotes. Aside from experimental identification of succinylation sites, there has been an intense effort geared towards the development of sequence-based prediction through machine learning, due to its promising and essential properties of being highly accurate, robust and cost-effective. In spite of these advantages, there are several problems that are in need of attention in the design and development of succinylation site predictors. Notwithstanding of many studies on the employment of machine learning approaches, few articles have examined this bioinformatics field in a systematic manner. Thus, we review the advancements regarding the current state-of-the-art prediction models, datasets, and online resources and illustrate the challenges and limitations to present a useful guideline for developing powerful succinylation site prediction tools.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680⁻4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
| | - Mst Shamima Khatun
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680⁻4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680⁻4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
- Biomedical Informatics R&D Center, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.
| |
Collapse
|
22
|
Hasan MM, Kurata H. GPSuc: Global Prediction of Generic and Species-specific Succinylation Sites by aggregating multiple sequence features. PLoS One 2018; 13:e0200283. [PMID: 30312302 PMCID: PMC6193575 DOI: 10.1371/journal.pone.0200283] [Citation(s) in RCA: 47] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2018] [Accepted: 06/22/2018] [Indexed: 01/09/2023] Open
Abstract
Lysine succinylation is one of the dominant post-translational modification of the protein that contributes to many biological processes including cell cycle, growth and signal transduction pathways. Identification of succinylation sites is an important step for understanding the function of proteins. The complicated sequence patterns of protein succinylation revealed by proteomic studies highlight the necessity of developing effective species-specific in silico strategies for global prediction succinylation sites. Here we have developed the generic and nine species-specific succinylation site classifiers through aggregating multiple complementary features. We optimized the consecutive features using the Wilcoxon-rank feature selection scheme. The final feature vectors were trained by a random forest (RF) classifier. With an integration of RF scores via logistic regression, the resulting predictor termed GPSuc achieved better performance than other existing generic and species-specific succinylation site predictors. To reveal the mechanism of succinylation and assist hypothesis-driven experimental design, our predictor serves as a valuable resource. To provide a promising performance in large-scale datasets, a web application was developed at http://kurata14.bio.kyutech.ac.jp/GPSuc/.
Collapse
Affiliation(s)
- Md. Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, Japan
| | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, Japan
- Biomedi Informatics R&D Center, Kyushu Institute of Technology, Kawazu, Iizuka, Fukuoka, Japan
- * E-mail:
| |
Collapse
|
23
|
Xu Y, Yang Y, Ding J, Li C. iGlu-Lys: A Predictor for Lysine Glutarylation Through Amino Acid Pair Order Features. IEEE Trans Nanobioscience 2018; 17:394-401. [DOI: 10.1109/tnb.2018.2848673] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
24
|
SVM-SulfoSite: A support vector machine based predictor for sulfenylation sites. Sci Rep 2018; 8:11288. [PMID: 30050050 PMCID: PMC6062547 DOI: 10.1038/s41598-018-29126-x] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2018] [Accepted: 07/02/2018] [Indexed: 12/15/2022] Open
Abstract
Protein S-sulfenylation, which results from oxidation of free thiols on cysteine residues, has recently emerged as an important post-translational modification that regulates the structure and function of proteins involved in a variety of physiological and pathological processes. By altering the size and physiochemical properties of modified cysteine residues, sulfenylation can impact the cellular function of proteins in several different ways. Thus, the ability to rapidly and accurately identify putative sulfenylation sites in proteins will provide important insights into redox-dependent regulation of protein function in a variety of cellular contexts. Though bottom-up proteomic approaches, such as tandem mass spectrometry (MS/MS), provide a wealth of information about global changes in the sulfenylation state of proteins, MS/MS-based experiments are often labor-intensive, costly and technically challenging. Therefore, to complement existing proteomic approaches, researchers have developed a series of computational tools to identify putative sulfenylation sites on proteins. However, existing methods often suffer from low accuracy, specificity, and/or sensitivity. In this study, we developed SVM-SulfoSite, a novel sulfenylation prediction tool that uses support vector machines (SVM) to identify key determinants of sulfenylation among five feature classes: binary code, physiochemical properties, k-space amino acid pairs, amino acid composition and high-quality physiochemical indices. Using 10-fold cross-validation, SVM-SulfoSite achieved 95% sensitivity and 83% specificity, with an overall accuracy of 89% and Matthew’s correlation coefficient (MCC) of 0.79. Likewise, using an independent test set of experimentally identified sulfenylation sites, our method achieved scores of 74%, 62%, 80% and 0.42 for accuracy, sensitivity, specificity and MCC, with an area under the receiver operator characteristic (ROC) curve of 0.81. Moreover, in side-by-side comparisons, SVM-SulfoSite performed as well as or better than existing sulfenylation prediction tools. Together, these results suggest that our method represents a robust and complementary technique for advanced exploration of protein S-sulfenylation.
Collapse
|
25
|
Ning Q, Zhao X, Bao L, Ma Z, Zhao X. Detecting Succinylation sites from protein sequences using ensemble support vector machine. BMC Bioinformatics 2018; 19:237. [PMID: 29940836 PMCID: PMC6016146 DOI: 10.1186/s12859-018-2249-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2017] [Accepted: 06/14/2018] [Indexed: 12/14/2022] Open
Abstract
Background Lysine succinylation is a new kind of post-translational modification which plays a key role in protein conformation regulation and cellular function control. To understand the mechanism of succinylation profoundly, it is necessary to identify succinylation sites in proteins accurately. However, traditional methods, experimental approaches, are labor-intensive and time-consuming. Computational prediction methods have been proposed recent years, and they are popular because of their convenience and high speed. In this study, we developed a new method to predict succinylation sites in protein combining multiple features, including amino acid composition, binary encoding, physicochemical property and grey pseudo amino acid composition, with a feature selection scheme (information gain). And then, it was trained using SVM (Support Vector Machine) and an ensemble learning algorithm. Results The performance of this method was measured with an accuracy of 89.14% and a MCC (Matthew Correlation Coefficient) of 0.79 using 10-fold cross validation on training dataset and an accuracy of 84.5% and a MCC of 0.2 on independent dataset. Conclusions The conclusions made from this study can help to understand more of the succinylation mechanism. These results suggest that our method was very promising for predicting succinylation sites. The source code and data of this paper are freely available athttps://github.com/ningq669/PSuccE. Electronic supplementary material The online version of this article (10.1186/s12859-018-2249-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Qiao Ning
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Xiaosa Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Lingling Bao
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China
| | - Zhiqiang Ma
- School of Information Science and Technology, Northeast Normal University, Changchun, 130117, China.
| | - Xiaowei Zhao
- Key Laboratory of Intelligent Information Processing of Jilin Universities, Northeast Normal University, Changchun, 130117, China.
| |
Collapse
|
26
|
Dehzangi A, López Y, Lal SP, Taherzadeh G, Sattar A, Tsunoda T, Sharma A. Improving succinylation prediction accuracy by incorporating the secondary structure via helix, strand and coil, and evolutionary information from profile bigrams. PLoS One 2018; 13:e0191900. [PMID: 29432431 PMCID: PMC5809022 DOI: 10.1371/journal.pone.0191900] [Citation(s) in RCA: 43] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2017] [Accepted: 01/12/2018] [Indexed: 11/18/2022] Open
Abstract
Post-translational modification refers to the biological mechanism involved in the enzymatic modification of proteins after being translated in the ribosome. This mechanism comprises a wide range of structural modifications, which bring dramatic variations to the biological function of proteins. One of the recently discovered modifications is succinylation. Although succinylation can be detected through mass spectrometry, its current experimental detection turns out to be a timely process unable to meet the exponential growth of sequenced proteins. Therefore, the implementation of fast and accurate computational methods has emerged as a feasible solution. This paper proposes a novel classification approach, which effectively incorporates the secondary structure and evolutionary information of proteins through profile bigrams for succinylation prediction. The proposed predictor, abbreviated as SSEvol-Suc, made use of the above features for training an AdaBoost classifier and consequently predicting succinylated lysine residues. When SSEvol-Suc was compared with four benchmark predictors, it outperformed them in metrics such as sensitivity (0.909), accuracy (0.875) and Matthews correlation coefficient (0.75).
Collapse
Affiliation(s)
- Abdollah Dehzangi
- Department of Computer Science, Morgan State University, Baltimore, Maryland, United States of America
| | - Yosvany López
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- * E-mail:
| | - Sunil Pranit Lal
- School of Engineering & Advanced Technology, Massey University, Palmerston North, New Zealand
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Queensland, Australia
| | - Abdul Sattar
- School of Information and Communication Technology, Griffith University, Queensland, Australia
- Institute for Integrated and Intelligent Systems, Griffith University, Queensland, Australia
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- CREST, JST, Tokyo, Japan
| | - Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
- Institute for Integrated and Intelligent Systems, Griffith University, Queensland, Australia
- School of Engineering & Physics, University of the South Pacific, Suva, Fiji
| |
Collapse
|
27
|
López Y, Sharma A, Dehzangi A, Lal SP, Taherzadeh G, Sattar A, Tsunoda T. Success: evolutionary and structural properties of amino acids prove effective for succinylation site prediction. BMC Genomics 2018; 19:923. [PMID: 29363424 PMCID: PMC5781056 DOI: 10.1186/s12864-017-4336-8] [Citation(s) in RCA: 45] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND Post-translational modification is considered an important biological mechanism with critical impact on the diversification of the proteome. Although a long list of such modifications has been studied, succinylation of lysine residues has recently attracted the interest of the scientific community. The experimental detection of succinylation sites is an expensive process, which consumes a lot of time and resources. Therefore, computational predictors of this covalent modification have emerged as a last resort to tackling lysine succinylation. RESULTS In this paper, we propose a novel computational predictor called 'Success', which efficiently uses the structural and evolutionary information of amino acids for predicting succinylation sites. To do this, each lysine was described as a vector that combined the above information of surrounding amino acids. We then designed a support vector machine with a radial basis function kernel for discriminating between succinylated and non-succinylated residues. We finally compared the Success predictor with three state-of-the-art predictors in the literature. As a result, our proposed predictor showed a significant improvement over the compared predictors in statistical metrics, such as sensitivity (0.866), accuracy (0.838) and Matthews correlation coefficient (0.677) on a benchmark dataset. CONCLUSIONS The proposed predictor effectively uses the structural and evolutionary information of the amino acids surrounding a lysine. The bigram feature extraction approach, while retaining the same number of features, facilitates a better description of lysines. A support vector machine with a radial basis function kernel was used to discriminate between modified and unmodified lysines. The aforementioned aspects make the Success predictor outperform three state-of-the-art predictors in succinylation detection.
Collapse
Affiliation(s)
- Yosvany López
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan. .,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.
| | - Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan. .,Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia. .,School of Engineering & Physics, University of the South Pacific, Suva, Fiji.
| | - Abdollah Dehzangi
- Department of Computer Science, School of Computer, Mathematical, and Natural Sciences, Morgan State University, Baltimore, Maryland, USA
| | - Sunil Pranit Lal
- School of Engineering & Advanced Technology, Massey University, Palmerston North, New Zealand
| | - Ghazaleh Taherzadeh
- School of Information and Communication Technology, Griffith University, Brisbane, Australia
| | - Abdul Sattar
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia.,School of Information and Communication Technology, Griffith University, Brisbane, Australia
| | - Tatsuhiko Tsunoda
- Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University, Tokyo, Japan.,Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan.,CREST, JST, Tokyo, 113-8510, Japan
| |
Collapse
|
28
|
Wang LN, Shi SP, Wen PP, Zhou ZY, Qiu JD. Computing Prediction and Functional Analysis of Prokaryotic Propionylation. J Chem Inf Model 2017; 57:2896-2904. [DOI: 10.1021/acs.jcim.7b00482] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Li-Na Wang
- College
of Chemistry and Institute for Advanced Study, Nanchang University, Nanchang 330031, China
- Department
of Sciences, Nanchang Institute of Technology, Nanchang 330099, China
| | - Shao-Ping Shi
- College
of Chemistry and Institute for Advanced Study, Nanchang University, Nanchang 330031, China
| | - Ping-Ping Wen
- College
of Chemistry and Institute for Advanced Study, Nanchang University, Nanchang 330031, China
| | - Zhi-You Zhou
- College
of Chemistry and Institute for Advanced Study, Nanchang University, Nanchang 330031, China
| | - Jian-Ding Qiu
- College
of Chemistry and Institute for Advanced Study, Nanchang University, Nanchang 330031, China
- Department
of Materials and Chemical Engineering, Pingxiang University, Pingxiang 337055, China
| |
Collapse
|
29
|
Hasan MM, Khatun MS, Mollah MNH, Yong C, Guo D. A systematic identification of species-specific protein succinylation sites using joint element features information. Int J Nanomedicine 2017; 12:6303-6315. [PMID: 28894368 PMCID: PMC5584904 DOI: 10.2147/ijn.s140875] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Lysine succinylation, an important type of protein posttranslational modification, plays significant roles in many cellular processes. Accurate identification of succinylation sites can facilitate our understanding about the molecular mechanism and potential roles of lysine succinylation. However, even in well-studied systems, a majority of the succinylation sites remain undetected because the traditional experimental approaches to succinylation site identification are often costly, time-consuming, and laborious. In silico approach, on the other hand, is potentially an alternative strategy to predict succinylation substrates. In this paper, a novel computational predictor SuccinSite2.0 was developed for predicting generic and species-specific protein succinylation sites. This predictor takes the composition of profile-based amino acid and orthogonal binary features, which were used to train a random forest classifier. We demonstrated that the proposed SuccinSite2.0 predictor outperformed other currently existing implementations on a complementarily independent dataset. Furthermore, the important features that make visible contributions to species-specific and cross-species-specific prediction of protein succinylation site were analyzed. The proposed predictor is anticipated to be a useful computational resource for lysine succinylation site prediction. The integrated species-specific online tool of SuccinSite2.0 is publicly accessible.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- School of Life Sciences and the State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, New Territory, Hong Kong, People's Republic of China
| | - Mst Shamima Khatun
- Laboratory of Bioinformatics, Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh
| | - Md Nurul Haque Mollah
- Laboratory of Bioinformatics, Department of Statistics, University of Rajshahi, Rajshahi, Bangladesh
| | - Cao Yong
- Department of Mechanical Engineering and Automation, Harbin Institute of Technology, Shenzhen Graduate School, Shenzhen, People's Republic of China
| | - Dianjing Guo
- School of Life Sciences and the State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, New Territory, Hong Kong, People's Republic of China
| |
Collapse
|
30
|
Du Y, Zhai Z, Li Y, Lu M, Cai T, Zhou B, Huang L, Wei T, Li T. Prediction of Protein Lysine Acylation by Integrating Primary Sequence Information with Multiple Functional Features. J Proteome Res 2016; 15:4234-4244. [PMID: 27774790 DOI: 10.1021/acs.jproteome.6b00240] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based proteomic methods have been widely used to identify lysine acylation proteins. However, these experimental approaches often fail to detect proteins that are in low abundance or absent in specific biological samples. To circumvent these problems, we developed a computational method to predict lysine acylation, including acetylation, malonylation, succinylation, and glutarylation. The prediction algorithm integrated flanking primary sequence determinants and evolutionary conservation of acylated lysine as well as multiple protein functional annotation features including gene ontology, conserved domains, and protein-protein interactions. The inclusion of functional annotation features increases predictive power oversimple sequence considerations for four of the acylation species evaluated. For example, the Matthews correlation coefficient (MCC) for the prediction of malonylation increased from 0.26 to 0.73. The performance of prediction was validated against an independent data set for malonylation. Likewise, when tested with independent data sets, the algorithm displayed improved sensitivity and specificity over existing methods. Experimental validation by Western blot experiments and LC-MS/MS detection further attested to the performance of prediction. We then applied our algorithm on to the mouse proteome and reported the global-scale prediction of lysine acetylation, malonylation, succinylation, and glutarylation, which should serve as a valuable resource for future functional studies.
Collapse
Affiliation(s)
| | | | | | | | | | - Bo Zhou
- University of Chinese Academy of Sciences , Beijing 100049, China
| | - Lei Huang
- College of Information Science and Engineering, Ocean University of China , Qingdao, China
| | | | | |
Collapse
|
31
|
Xu Y, Ding J, Wu LY. iSulf-Cys: Prediction of S-sulfenylation Sites in Proteins with Physicochemical Properties of Amino Acids. PLoS One 2016; 11:e0154237. [PMID: 27104833 PMCID: PMC4841585 DOI: 10.1371/journal.pone.0154237] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2015] [Accepted: 04/10/2016] [Indexed: 02/07/2023] Open
Abstract
Cysteine S-sulfenylation is an important post-translational modification (PTM) in proteins, and provides redox regulation of protein functions. Bioinformatics and structural analyses indicated that S-sulfenylation could impact many biological and functional categories and had distinct structural features. However, major limitations for identifying cysteine S-sulfenylation were expensive and low-throughout. In view of this situation, the establishment of a useful computational method and the development of an efficient predictor are highly desired. In this study, a predictor iSulf-Cys which incorporated 14 kinds of physicochemical properties of amino acids was proposed. With the 10-fold cross-validation, the value of area under the curve (AUC) was 0.7155 ± 0.0085, MCC 0.3122 ± 0.0144 on the training dataset for 20 times. iSulf-Cys also showed satisfying performance in the independent testing dataset with AUC 0.7343 and MCC 0.3315. Features which were constructed from physicochemical properties and position were carefully analyzed. Meanwhile, a user-friendly web-server for iSulf-Cys is accessible at http://app.aporc.org/iSulf-Cys/.
Collapse
Affiliation(s)
- Yan Xu
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing 100083, China
- * E-mail:
| | - Jun Ding
- Department of Information and Computer Science, University of Science and Technology Beijing, Beijing 100083, China
| | - Ling-Yun Wu
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, China
| |
Collapse
|
32
|
Jia J, Liu Z, Xiao X, Liu B, Chou KC. pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. J Theor Biol 2016; 394:223-230. [DOI: 10.1016/j.jtbi.2016.01.020] [Citation(s) in RCA: 231] [Impact Index Per Article: 25.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2015] [Revised: 01/06/2016] [Accepted: 01/07/2016] [Indexed: 10/22/2022]
|