Zeng Y, Liu D, Wang Y. Identification of phosphorylation site using S-padding strategy based convolutional neural network.
Health Inf Sci Syst 2022;
10:29. [PMID:
36124094 PMCID:
PMC9481819 DOI:
10.1007/s13755-022-00196-6]
[Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Accepted: 08/25/2022] [Indexed: 10/14/2022] Open
Abstract
Purpose
Abnormal phosphorylation has been proved to associate with a variety of human diseases, and the identification of phosphorylation sites is one of the research hotspots in healthcare. The study of phosphorylation site prediction in deep learning models often introduces a variety of information, and the utilization of complex models limits the usage scenarios of the models.
Methods
An enhanced deep learning method with S-padding strategy based on convolutional neural network is proposed in this paper. The S-padding strategy forms a three-dimensional matrix with extension information from original amino acid sequences, and a corresponding 2D-CNN model is designed to abstract the comprehensive features of phosphorylation site area in protein sequences.
Results
The fivefold cross-validation experiments are conducted, and the results show the performance of the proposed method on human dataset can achieve an accuracy of 89.68 % on serine/threonine sites and 88.16 % on tyrosine sites, respectively. Furthermore, phosphorylation site prediction on different organisms obtains the accuracy, sensitivity, and specificity of over 0.85, indicating a potential capability on phosphorylation site prediction task. Comparison result with existing models shows that the proposed method obtains better performance on both accuracy and AUC value, and the proposed method can further improve performance with sufficient training data.
Conclusion
This method enables proteome-wide predictions via models trained on a large amount of phosphorylation data, further exploiting the potential of protein phosphorylation site identification, and helping to provide insights into phosphorylation mechanisms.
Collapse