1
|
Ahmed S, Rahman A, Hasan MAM, Rahman J, Islam MKB, Ahmad S. predML-Site: Predicting Multiple Lysine PTM Sites With Optimal Feature Representation and Data Imbalance Minimization. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3624-3634. [PMID: 34546927 DOI: 10.1109/tcbb.2021.3114349] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Identifying of post-translational modifications (PTM) is crucial in the study of computational proteomics, cell biology, pathogenesis, and drug development due to its role in many bio-molecular mechanisms. Computational methods for predicting multiple PTM at the same lysine residues, often referred to as K-PTM, is still evolving. This paper presents a novel computational tool, abbreviated as predML-Site, for predicting KPTM, such as acetylation, crotonylation, methylation, succinylation from an uncategorized peptide sample involving single, multiple, or no modification. For informative feature representation, multiple sequence encoding schemes, such as the sequence-coupling, binary encoding, k-spaced amino acid pairs, amino acid factor have been used with ANOVA and incremental feature selection. As a core predictor, a cost-sensitive SVM classifier has been adopted which effectively mitigates the effect of class-label imbalance in the dataset. predML-Site predicts multi-label PTM sites with 84.18% accuracy using the top 91 features. It has also achieved 85.34% aiming and 86.58% coverage rate which are much better than the existing state-of-the-art predictors on the same rigorous validation test. This performance indicates that predML-Site can be used as a supportive tool for further K-PTM study. For the convenience of the experimental scientists, predML-Site has been deployed as a user-friendly web-server at http://103.99.176.239/predML-Site.
Collapse
|
2
|
Rahman A, Ahmed S, Al Mehedi Hasan M, Ahmad S, Dehzangi I. Accurately predicting nitrosylated tyrosine sites using probabilistic sequence information. Gene 2022; 826:146445. [PMID: 35358650 DOI: 10.1016/j.gene.2022.146445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 02/16/2022] [Accepted: 03/18/2022] [Indexed: 11/04/2022]
Abstract
Post-translational modification (PTM) is defined as the enzymatic changes of proteins after the translation process in protein biosynthesis. Nitrotyrosine, which is one of the most important modifications of proteins, is interceded by the active nitrogen molecule. It is known to be associated with different diseases including autoimmune diseases characterized by chronic inflammation and cell damage. Currently, nitrotyrosine sites are identified using experimental approaches which are laborious and costly. In this study, we propose a new machine learning method called PredNitro to accurately predict nitrotyrosine sites. To build PredNitro, we use sequence coupling information from the neighboring amino acids of tyrosine residues along with a support vector machine as our classification technique.Our results demonstrates that PredNitro achieves 98.0% accuracy with more than 0.96 MCC and 0.99 AUC in both 5-fold cross-validation and jackknife cross-validation tests which are significantly better than those reported in previous studies. PredNitro is publicly available as an online predictor at: http://103.99.176.239/PredNitro.
Collapse
Affiliation(s)
- Afrida Rahman
- Department of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Sabit Ahmed
- Department of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Md Al Mehedi Hasan
- Department of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Shamim Ahmad
- Department of Computer Science and Engineering, University of Rajshahi, Rajshahi, Bangladesh
| | - Iman Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ 08102, USA; Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA.
| |
Collapse
|
3
|
Zhou H, Wang H, Ding Y, Tang J. Multivariate Information Fusion for Identifying Antifungal Peptides with
Hilbert-Schmidt Independence Criterion. Curr Bioinform 2022. [DOI: 10.2174/1574893616666210727161003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
Antifungal Peptides (AFP) have been found to be effective against many fungal
infections.
Objective:
However, it is difficult to identify AFP. Therefore, it is great practical significance to identify
AFP via machine learning methods (with sequence information).
Method:
In this study, a Multi-Kernel Support Vector Machine (MKSVM) with Hilbert-Schmidt Independence
Criterion (HSIC) is proposed. Proteins are encoded with five types of features (188-bit,
AAC, ASDC, CKSAAP, DPC), and then construct kernels using Gaussian kernel function. HSIC are
used to combine kernels and multi-kernel SVM model is built.
Results:
Our model performed well on three AFPs datasets and the performance is better than or comparable
to other state-of-art predictive models.
Conclusion:
Our method will be a useful tool for identifying antifungal peptides.
Collapse
Affiliation(s)
- Haohao Zhou
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin,
300354, China
| | - Hao Wang
- School of Computer Science and Technology, College of Intelligence and Computing, Tianjin University, Tianjin,
300354, China
| | - Yijie Ding
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou,
215009, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of
China, Quzhou, 324000, China
| | - Jijun Tang
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, USA
- Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055,
China
| |
Collapse
|
4
|
Ahmed S, Rahman A, Hasan MAM, Ahmad S, Shovan SM. Computational identification of multiple lysine PTM sites by analyzing the instance hardness and feature importance. Sci Rep 2021; 11:18882. [PMID: 34556767 PMCID: PMC8460736 DOI: 10.1038/s41598-021-98458-y] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2021] [Accepted: 09/08/2021] [Indexed: 02/08/2023] Open
Abstract
Identification of post-translational modifications (PTM) is significant in the study of computational proteomics, cell biology, pathogenesis, and drug development due to its role in many bio-molecular mechanisms. Though there are several computational tools to identify individual PTMs, only three predictors have been established to predict multiple PTMs at the same lysine residue. Furthermore, detailed analysis and assessment on dataset balancing and the significance of different feature encoding techniques for a suitable multi-PTM prediction model are still lacking. This study introduces a computational method named 'iMul-kSite' for predicting acetylation, crotonylation, methylation, succinylation, and glutarylation, from an unrecognized peptide sample with one, multiple, or no modifications. After successfully eliminating the redundant data samples from the majority class by analyzing the hardness of the sequence-coupling information, feature representation has been optimized by adopting the combination of ANOVA F-Test and incremental feature selection approach. The proposed predictor predicts multi-label PTM sites with 92.83% accuracy using the top 100 features. It has also achieved a 93.36% aiming rate and 96.23% coverage rate, which are much better than the existing state-of-the-art predictors on the validation test. This performance indicates that 'iMul-kSite' can be used as a supportive tool for further K-PTM study. For the convenience of the experimental scientists, 'iMul-kSite' has been deployed as a user-friendly web-server at http://103.99.176.239/iMul-kSite .
Collapse
Affiliation(s)
- Sabit Ahmed
- grid.443086.d0000 0004 1755 355XComputer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, 6204 Bangladesh
| | - Afrida Rahman
- grid.443086.d0000 0004 1755 355XComputer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, 6204 Bangladesh
| | - Md. Al Mehedi Hasan
- grid.443086.d0000 0004 1755 355XComputer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, 6204 Bangladesh
| | - Shamim Ahmad
- grid.412656.20000 0004 0451 7306Computer Science and Engineering, University of Rajshahi, Rajshahi, 6205 Bangladesh
| | - S. M. Shovan
- grid.443086.d0000 0004 1755 355XComputer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, 6204 Bangladesh
| |
Collapse
|
5
|
Islam MKB, Rahman J, Hasan MAM, Ahmad S. predForm-Site: Formylation site prediction by incorporating multiple features and resolving data imbalance. Comput Biol Chem 2021; 94:107553. [PMID: 34384997 DOI: 10.1016/j.compbiolchem.2021.107553] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Revised: 06/22/2021] [Accepted: 07/28/2021] [Indexed: 10/20/2022]
Abstract
Formylation is one of the newly discovered post-translational modifications in lysine residue which is responsible for different kinds of diseases. In this work, a novel predictor, named predForm-Site, has been developed to predict formylation sites with higher accuracy. We have integrated multiple sequence features for developing a more informative representation of formylation sites. Moreover, decision function of the underlying classifier have been optimized on skewed formylation dataset during prediction model training for prediction quality improvement. On the dataset used by LFPred and Formator predictor, predForm-Site achieved 99.5% sensitivity, 99.8% specificity and 99.8% overall accuracy with AUC of 0.999 in the jackknife test. In the independent test, it has also achieved more than 97% sensitivity and 99% specificity. Similarly, in benchmarking with recent method CKSAAP_FormSite, the proposed predictor significantly outperformed in all the measures, particularly sensitivity by around 20%, specificity by nearly 30% and overall accuracy by more than 22%. These experimental results show that the proposed predForm-Site can be used as a complementary tool for the fast exploration of formylation sites. For convenience of the scientific community, predForm-Site has been deployed as an online tool, accessible at http://103.99.176.239:8080/predForm-Site.
Collapse
Affiliation(s)
- Md Khaled Ben Islam
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; Department of Computer Science & Engineering, Pabna University of Science and Technology, Pabna, Bangladesh.
| | - Julia Rahman
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia; Department of Computer Science & Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh.
| | - Md Al Mehedi Hasan
- Department of Computer Science & Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Shamim Ahmad
- Department of Computer Science & Engineering, Rajshahi University, Rajshahi, Bangladesh
| |
Collapse
|
6
|
Ahmed S, Rahman A, Hasan MAM, Islam MKB, Rahman J, Ahmad S. predPhogly-Site: Predicting phosphoglycerylation sites by incorporating probabilistic sequence-coupling information into PseAAC and addressing data imbalance. PLoS One 2021; 16:e0249396. [PMID: 33793659 PMCID: PMC8016359 DOI: 10.1371/journal.pone.0249396] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Accepted: 03/18/2021] [Indexed: 12/14/2022] Open
Abstract
Post-translational modification (PTM) involves covalent modification after the biosynthesis process and plays an essential role in the study of cell biology. Lysine phosphoglycerylation, a newly discovered reversible type of PTM that affects glycolytic enzyme activities, and is responsible for a wide variety of diseases, such as heart failure, arthritis, and degeneration of the nervous system. Our goal is to computationally characterize potential phosphoglycerylation sites to understand the functionality and causality more accurately. In this study, a novel computational tool, referred to as predPhogly-Site, has been developed to predict phosphoglycerylation sites in the protein. It has effectively utilized the probabilistic sequence-coupling information among the nearby amino acid residues of phosphoglycerylation sites along with a variable cost adjustment for the skewed training dataset to enhance the prediction characteristics. It has achieved around 99% accuracy with more than 0.96 MCC and 0.97 AUC in both 10-fold cross-validation and independent test. Even, the standard deviation in 10-fold cross-validation is almost negligible. This performance indicates that predPhogly-Site remarkably outperformed the existing prediction tools and can be used as a promising predictor, preferably with its web interface at http://103.99.176.239/predPhogly-Site.
Collapse
Affiliation(s)
- Sabit Ahmed
- Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
- * E-mail:
| | - Afrida Rahman
- Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Md. Al Mehedi Hasan
- Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Md Khaled Ben Islam
- Computer Science and Engineering, Pabna University of Science and Technology, Pabna, Bangladesh
| | - Julia Rahman
- Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Shamim Ahmad
- Computer Science and Engineering, University of Rajshahi, Rajshahi, Bangladesh
| |
Collapse
|
7
|
Wu M, Lu P, Yang Y, Liu L, Wang H, Xu Y, Chu J. LipoSVM: Prediction of Lysine Lipoylation in Proteins based on the Support Vector Machine. Curr Genomics 2019; 20:362-370. [PMID: 32476993 PMCID: PMC7235397 DOI: 10.2174/1389202919666191014092843] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 08/09/2019] [Accepted: 09/05/2019] [Indexed: 12/21/2022] Open
Abstract
Background Lysine lipoylation which is a rare and highly conserved post-translational modification of proteins has been considered as one of the most important processes in the biological field. To obtain a comprehensive understanding of regulatory mechanism of lysine lipoylation, the key is to identify lysine lipoylated sites. The experimental methods are expensive and laborious. Due to the high cost and complexity of experimental methods, it is urgent to develop computational ways to predict lipoylation sites. Methodology In this work, a predictor named LipoSVM is developed to accurately predict lipoylation sites. To overcome the problem of an unbalanced sample, synthetic minority over-sampling technique (SMOTE) is utilized to balance negative and positive samples. Furthermore, different ratios of positive and negative samples are chosen as training sets. Results By comparing five different encoding schemes and five classification algorithms, LipoSVM is constructed finally by using a training set with positive and negative sample ratio of 1:1, combining with position-specific scoring matrix and support vector machine. The best performance achieves an accuracy of 99.98% and AUC 0.9996 in 10-fold cross-validation. The AUC of independent test set reaches 0.9997, which demonstrates the robustness of LipoSVM. The analysis between lysine lipoylation and non-lipoylation fragments shows significant statistical differences. Conclusion A good predictor for lysine lipoylation is built based on position-specific scoring matrix and support vector machine. Meanwhile, an online webserver LipoSVM can be freely downloaded from https://github.com/stars20180811/LipoSVM.
Collapse
Affiliation(s)
- Meiqi Wu
- Department of Applied Mathematics, University of Science and Technology Beijing, Beijing 100083, China
| | - Pengchao Lu
- Equipment Leasing Company of China Petroleum Pipeline Engineering Co., Ltd. 065000 Langfang City, Hebei Province, China
| | - Yingxi Yang
- Department of Chemical and Biological Engineering, Hong Kong University of Science and Technology, Hong Kong, China
| | - Liwen Liu
- Department of Applied Mathematics, University of Science and Technology Beijing, Beijing 100083, China
| | - Hui Wang
- Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China
| | - Yan Xu
- Department of Applied Mathematics, University of Science and Technology Beijing, Beijing 100083, China
| | - Jixun Chu
- Department of Applied Mathematics, University of Science and Technology Beijing, Beijing 100083, China
| |
Collapse
|
8
|
Yang Y, Wang H, Ding J, Xu Y. iAcet-Sumo: Identification of lysine acetylation and sumoylation sites in proteins by multi-class transformation methods. Comput Biol Med 2018; 100:144-151. [DOI: 10.1016/j.compbiomed.2018.07.006] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2018] [Revised: 06/30/2018] [Accepted: 07/08/2018] [Indexed: 11/16/2022]
|