1
|
Liang Y, Li M. A deep learning model for prediction of lysine crotonylation sites by fusing multi-features based on multi-head self-attention mechanism. Sci Rep 2025; 15:18940. [PMID: 40442183 PMCID: PMC12122789 DOI: 10.1038/s41598-025-04058-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2025] [Accepted: 05/23/2025] [Indexed: 06/02/2025] Open
Abstract
Lysine crotonylation (Kcr) is an important post-translational modification, which is present in both histone and non-histone proteins, and plays a key role in a variety of biological processes such as metabolism and cell differentiation. Therefore, rapid and accurate identification of this modification has become a key task to study its biological effects. In the past few years, some calculation methods have been developed, but there is room for improvement in prediction performance. In this paper, we propose an effective model named DeepMM-Kcr, which is based on multiple features and an innovative deep learning framework. Multiple features are extracted from natural language processing features and hand-crafted features, where natural language processing features include token embedding and positional embedding encoded by transformer, and hand-crafted features include one-hot, amino acid index and position-weighted amino acid composition, and encoded by bidirectional long short-term memory network. Then natural language processing features and hand-crafted features are fusing by multi-head self-attention mechanism. Finally, a deep learning framework is constructed based on convolutional neural network, bidirectional gated recurrent unit and multilayer perceptron for robust prediction of Kcr sites. On the independent test set, the accuracy of DeepMM-Kcr is highest among the existing models. The experimental results show that our model has very good performance in predicting Kcr sites. The source datasets and codes (in Python) are publicly available at https://github.com/yunyunliang88/DeepMM-Kcr .
Collapse
Affiliation(s)
- Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, People's Republic of China.
| | - Minwei Li
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, People's Republic of China
| |
Collapse
|
2
|
Wang Y, Wang B, Zou J, Wu A, Liu Y, Wan Y, Luo J, Wu J. Capsule neural network and its applications in drug discovery. iScience 2025; 28:112217. [PMID: 40241764 PMCID: PMC12002614 DOI: 10.1016/j.isci.2025.112217] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/18/2025] Open
Abstract
Deep learning holds great promise in drug discovery, yet its application is hindered by high labeling costs and limited datasets. Developing algorithms that effectively learn from sparsely labeled data is crucial. Capsule networks (CapsNet), introduced in 2017, solve the spatial information loss in traditional neural networks and excel in handling small datasets by capturing spatial hierarchical relationships among features. This capability makes CapsNet particularly promising for drug discovery, where data scarcity is a common challenge. Various modified CapsNet architectures have been successfully applied to drug design and discovery tasks. This review provides a comprehensive analysis of CapsNet's theoretical foundations, its current applications in drug discovery, and its performance in addressing key challenges in the field. Additionally, the study highlights the limitations of CapsNet and outlines potential future research directions to further enhance its utility in drug discovery, offering valuable insights for researchers in both computational and pharmaceutical sciences.
Collapse
Affiliation(s)
- Yiwei Wang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China
- Key Laboratory of Medical Electrophysiology, Ministry of Education & Medical Electrophysiological Key Laboratory of Sichuan Province, Institute of Cardiovascular Research, Southwest Medical University, Luzhou 646000, China
| | - Binyou Wang
- School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China
| | - Jun Zou
- State Key Laboratory of Biotherapy and Cancer Center, West China Hospital, Sichuan University, Chengdu 610041, China
| | - Anguo Wu
- Sichuan Key Medical Laboratory of New Drug Discovery and Druggability Evaluation, Luzhou Key Laboratory of Activity Screening and Druggability Evaluation for Chinese Materia Medica, School of Pharmacy, Southwest Medical University, Luzhou 646000, China
| | - Yuan Liu
- School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China
| | - Ying Wan
- School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China
| | - Jiesi Luo
- School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China
| | - Jianming Wu
- School of Basic Medical Sciences, Southwest Medical University, Luzhou 646000, China
- Key Laboratory of Medical Electrophysiology, Ministry of Education & Medical Electrophysiological Key Laboratory of Sichuan Province, Institute of Cardiovascular Research, Southwest Medical University, Luzhou 646000, China
- Sichuan Key Medical Laboratory of New Drug Discovery and Druggability Evaluation, Luzhou Key Laboratory of Activity Screening and Druggability Evaluation for Chinese Materia Medica, School of Pharmacy, Southwest Medical University, Luzhou 646000, China
| |
Collapse
|
3
|
Akbar S, Raza A, Awan HH, Zou Q, Alghamdi W, Saeed A. pNPs-CapsNet: Predicting Neuropeptides Using Protein Language Models and FastText Encoding-Based Weighted Multi-View Feature Integration with Deep Capsule Neural Network. ACS OMEGA 2025; 10:12403-12416. [PMID: 40191328 PMCID: PMC11966582 DOI: 10.1021/acsomega.4c11449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Revised: 02/04/2025] [Accepted: 03/07/2025] [Indexed: 04/09/2025]
Abstract
Neuropeptides (NPs) are critical signaling molecules that are essential in numerous physiological processes and possess significant therapeutic potential. Computational prediction of NPs has emerged as a promising alternative to traditional experimental methods, often labor-intensive, time-consuming, and expensive. Recent advancements in computational peptide models provide a cost-effective approach to identifying NPs, characterized by high selectivity toward target cells and minimal side effects. In this study, we propose a novel deep capsule neural network-based computational model, namely pNPs-CapsNet, to predict NPs and non-NPs accurately. Input samples are numerically encoded using pretrained protein language models, including ESM, ProtBERT-BFD, and ProtT5, to extract attention mechanism-based contextual and semantic features. A differential evolution-based weighted feature integration method is utilized to construct a multiview vector. Additionally, a two-tier feature selection strategy, comprising MRMD and SHAP analysis, is developed to identify and select optimal features. Finally, the novel capsule neural network (CapsNet) is trained using the selected optimal feature set. The proposed pNPs-CapsNet model achieved a remarkable predictive accuracy of 98.10% and an AUC of 0.98. To validate the generalization capability of the pNPs-CapsNet model, independent samples reported an accuracy of 95.21% and an AUC of 0.96. The pNPs-CapsNet model outperforms existing state-of-the-art models, demonstrating 4% and 2.5% improved predictive accuracy for training and independent data sets, respectively. The demonstrated efficacy and consistency of pNPs-CapsNet underline its potential as a valuable and robust tool for advancing drug discovery and academic research.
Collapse
Affiliation(s)
- Shahid Akbar
- Institute
of Fundamental and Frontier Sciences, University
of Electronic Science and Technology of China, Chengdu 610054, China
- Department
of Computer Science, Abdul Wali Khan University
Mardan, Mardan 23200, Khyber Pakhtunkhwa, Pakistan
| | - Ali Raza
- Department
of Computer Science, Bahria University, Islamabad 44220, Pakistan
| | - Hamid Hussain Awan
- Department
of Computer Science, Rawalpindi Women University, Rawalpindi 46300, Punjab, Pakistan
| | - Quan Zou
- Institute
of Fundamental and Frontier Sciences, University
of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze
Delta Region Institute (Quzhou), University
of Electronic Science and Technology of China, Quzhou 324000, PR China
| | - Wajdi Alghamdi
- Department
of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Aamir Saeed
- Department
of Computer Science and IT, University of
Engineering and Technology, Jalozai Campus, Peshawar 25000, Pakistan
| |
Collapse
|
4
|
Yao L, Xie P, Dong D, Guo Y, Guan J, Zhang W, Chung CR, Zhao Z, Chiang YC, Lee TY. Caps-ac4C: An effective computational framework for identifying N4-acetylcytidine sites in human mRNA based on deep learning. J Mol Biol 2025; 437:168961. [PMID: 39884569 DOI: 10.1016/j.jmb.2025.168961] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2024] [Revised: 01/20/2025] [Accepted: 01/21/2025] [Indexed: 02/01/2025]
Abstract
N4-acetylcytidine (ac4C) is a crucial post-transcriptional modification in human mRNA, involving the acetylation of the nitrogen atom at the fourth position of cytidine. This modification, catalyzed by N-acetyltransferases such as NAT10, is primarily found in mRNA's coding regions and enhances translation efficiency and mRNA stability. ac4C is closely associated with various diseases, including cancer. Therefore, accurately identifying ac4C in human mRNA is essential for gaining deeper insights into disease pathogenesis and provides potential pathways for the development of novel medical interventions. In silico methods for identifying ac4C are gaining increasing attention due to their cost-effectiveness, requiring minimal human and material resources. In this study, we propose an efficient and accurate computational framework, Caps-ac4C, for the precise detection of ac4C in human mRNA. Caps-ac4C utilizes chaos game representation to encode RNA sequences into "images" and employs capsule networks to learn global and local features from these RNA "images". Experimental results demonstrate that Caps-ac4C achieves state-of-the-art performance, achieving 95.47% accuracy and 0.912 MCC on the test set, surpassing the current best methods by 10.69% accuracy and 0.216 MCC. In summary, Caps-ac4C represents the most accurate tool for predicting ac4C sites in human mRNA, highlighting its significant contribution to RNA modification research. For user convenience, we developed a user-friendly web server, which can be accessed for free at:https://awi.cuhk.edu.cn/~Caps-ac4C/index.php.
Collapse
Affiliation(s)
- Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China; School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China.
| | - Peilin Xie
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China; School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Danhong Dong
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Yilin Guo
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Jiahui Guan
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China; School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Wenyang Zhang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
| | - Zhihao Zhao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China
| | - Ying-Chih Chiang
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China; School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, 518172 Shenzhen, China.
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu, Taiwan.
| |
Collapse
|
5
|
Raju C, Sankaranarayanan K. Insights on post-translational modifications in fatty liver and fibrosis progression. Biochim Biophys Acta Mol Basis Dis 2025; 1871:167659. [PMID: 39788217 DOI: 10.1016/j.bbadis.2025.167659] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2024] [Revised: 12/20/2024] [Accepted: 01/02/2025] [Indexed: 01/12/2025]
Abstract
Metabolic dysfunction-associated steatotic liver disease [MASLD] is a pervasive multifactorial health burden. Post-translational modifications [PTMs] of amino acid residues in protein domains demonstrate pivotal roles for imparting dynamic alterations in the cellular micro milieu. The crux of identifying novel druggable targets relies on comprehensively studying the etiology of metabolic disorders. This review article presents how different chemical moieties of various PTMs like phosphorylation, methylation, ubiquitination, glutathionylation, neddylation, acetylation, SUMOylation, lactylation, crotonylation, hydroxylation, glycosylation, citrullination, S-sulfhydration and succinylation presents the cause-effect contribution towards the MASLD spectra. Additionally, the therapeutic prospects in the management of liver steatosis and hepatic fibrosis via targeting PTMs and regulatory enzymes are also encapsulated. This review seeks to understand the function of protein modifications in progression and promote the markers discovery of diagnostic, prognostic and drug targets towards MASLD management which could also halt the progression of a catalogue of related diseases.
Collapse
Affiliation(s)
- Chithra Raju
- Ion Channel Biology Laboratory, AU-KBC Research Centre, Madras Institute of Technology Campus, Anna University, Chrompet, Chennai 600 044, Tamil Nadu, India
| | - Kavitha Sankaranarayanan
- Ion Channel Biology Laboratory, AU-KBC Research Centre, Madras Institute of Technology Campus, Anna University, Chrompet, Chennai 600 044, Tamil Nadu, India.
| |
Collapse
|
6
|
Liang Y, Ma X, Li J, Zhang S. iACVP-MR: Accurate Identification of Anti-coronavirus Peptide based on Multiple Features Information and Recurrent Neural Network. Curr Med Chem 2025; 32:2055-2067. [PMID: 38549527 DOI: 10.2174/0109298673277663240101111507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/26/2023] [Accepted: 11/30/2023] [Indexed: 05/14/2024]
Abstract
BACKGROUND Over the years, viruses have caused human illness and threatened human health. Therefore, it is pressing to develop anti-coronavirus infection drugs with clear function, low cost, and high safety. Anti-coronavirus peptide (ACVP) is a key therapeutic agent against coronavirus. Traditional methods for finding ACVP need a great deal of money and man power. Hence, it is a significant task to establish intelligent computational tools to able rapid, efficient and accurate identification of ACVP. METHODS In this paper, we construct an excellent model named iACVP-MR to identify ACVP based on multiple features and recurrent neural networks. Multiple features are extracted by using reduced amino acid component and dipeptide component, compositions of k-spaced amino acid pairs, BLOSUM62 encoder according to the N5C5 sequence, as well as second-order moving average approach based on 16 physicochemical properties. Then, two recurrent neural networks named long-short term memory (LSTM) and bidirectional gated recurrent unit (BiGRU) combined attention mechanism are used for feature fusion and classification, respectively. RESULTS The accuracies of ENNAVIA-C and ENNAVIA-D datasets under the 10-fold cross-validation are 99.15% and 98.92%, respectively, and other evaluation indexes have also obtained satisfactory results. The experimental results show that our model is superior to other existing models. CONCLUSION The iACVP-MR model can be viewed as a powerful and intelligent tool for the accurate identification of ACVP. The datasets and source codes for iACVP-MR are freely downloaded at https://github.com/yunyunliang88/iACVP-MR.
Collapse
Affiliation(s)
- Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, P.R. China
| | - Xinyan Ma
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, P.R. China
| | - Jin Li
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, P.R. China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, P.R. China
| |
Collapse
|
7
|
Pratyush P, Kc DB. Advances in Prediction of Posttranslational Modification Sites Known to Localize in Protein Supersecondary Structures. Methods Mol Biol 2025; 2870:117-151. [PMID: 39543034 DOI: 10.1007/978-1-0716-4213-9_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2024]
Abstract
Posttranslational modifications (PTMs) play a crucial role in modulating the structure, function, localization, and interactions of proteins, with many PTMs being localized within supersecondary structures, such as helical pairs. These modifications can significantly influence the conformation and stability of these structures. For instance, phosphorylation introduces negative charges that alter electrostatic interactions, while acetylation or methylation of lysine residues affects the stability and interactions of alpha helices or beta strands. Given the pivotal role of supersecondary structures in the overall protein architecture, their modulation by PTMs is essential for protein functionality. This chapter explores the latest advancements in predicting sites for the five PTMs (phosphorylation, acetylation, glycosylation, methylation, and ubiquitination) known to be localized within supersecondary structures. The chapter highlights the recent advances in the prediction of these PTM sites, including the use of global contextualized embeddings from protein language models, integration of structural information, utilization of reliable positive and negative sites, and application of contrastive learning. These methodologies and emerging trends offer a roadmap for novel innovations in addressing PTM prediction challenges, particularly those linked to supersecondary structures.
Collapse
Affiliation(s)
- Pawel Pratyush
- Computer Science Department, Michigan Technological University, Houghton, MI, USA
- Computer Science Department, Rochester Institute of Technology, Henrietta, NY, USA
| | - Dukka B Kc
- Computer Science Department, Michigan Technological University, Houghton, MI, USA.
- Computer Science Department, Rochester Institute of Technology, Henrietta, NY, USA.
| |
Collapse
|
8
|
Chen Y, Sheng G, Wang G. CapsNet-TIS: Predicting translation initiation site based on multi-feature fusion and improved capsule network. Gene 2024; 924:148598. [PMID: 38782224 DOI: 10.1016/j.gene.2024.148598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2024] [Revised: 04/22/2024] [Accepted: 05/20/2024] [Indexed: 05/25/2024]
Abstract
Genes are the basic units of protein synthesis in organisms, and accurately identifying the translation initiation site (TIS) of genes is crucial for understanding the regulation, transcription, and translation processes of genes. However, the existing models cannot adequately extract the feature information in TIS sequences, and they also inadequately capture the complex hierarchical relationships among features. Therefore, a novel predictor named CapsNet-TIS is proposed in this paper. CapsNet-TIS first fully extracts the TIS sequence information using four encoding methods, including One-hot encoding, physical structure property (PSP) encoding, nucleotide chemical property (NCP) encoding, and nucleotide density (ND) encoding. Next, multi-scale convolutional neural networks are used to perform feature fusion of the encoded features to enhance the comprehensiveness of the feature representation. Finally, the fused features are classified using capsule network as the main network of the classification model to capture the complex hierarchical relationships among the features. Moreover, we improve the capsule network by introducing residual block, channel attention, and BiLSTM to enhance the model's feature extraction and sequence data modeling capabilities. In this paper, the performance of CapsNet-TIS is evaluated using TIS datasets from four species: human, mouse, bovine, and fruit fly, and the effectiveness of each part is demonstrated by performing ablation experiments. By comparing the experimental results with models proposed by other researchers, the results demonstrate the superior performance of CapsNet-TIS.
Collapse
Affiliation(s)
- Yu Chen
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China.
| | - Guojun Sheng
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| | - Gang Wang
- College of Computer and Control Engineering, Northeast Forestry University, Harbin 150040, China
| |
Collapse
|
9
|
Yao L, Xie P, Guan J, Chung CR, Huang Y, Pang Y, Wu H, Chiang YC, Lee TY. CapsEnhancer: An Effective Computational Framework for Identifying Enhancers Based on Chaos Game Representation and Capsule Network. J Chem Inf Model 2024; 64:5725-5736. [PMID: 38946113 PMCID: PMC11267569 DOI: 10.1021/acs.jcim.4c00546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2024] [Revised: 06/21/2024] [Accepted: 06/21/2024] [Indexed: 07/02/2024]
Abstract
Enhancers are a class of noncoding DNA, serving as crucial regulatory elements in governing gene expression by binding to transcription factors. The identification of enhancers holds paramount importance in the field of biology. However, traditional experimental methods for enhancer identification demand substantial human and material resources. Consequently, there is a growing interest in employing computational methods for enhancer prediction. In this study, we propose a two-stage framework based on deep learning, termed CapsEnhancer, for the identification of enhancers and their strengths. CapsEnhancer utilizes chaos game representation to encode DNA sequences into unique images and employs a capsule network to extract local and global features from sequence "images". Experimental results demonstrate that CapsEnhancer achieves state-of-the-art performance in both stages. In the first and second stages, the accuracy surpasses the previous best methods by 8 and 3.5%, reaching accuracies of 94.5 and 95%, respectively. Notably, this study represents the pioneering application of computer vision methods to enhancer identification tasks. Our work not only contributes novel insights to enhancer identification but also provides a fresh perspective for other biological sequence analysis tasks.
Collapse
Affiliation(s)
- Lantian Yao
- Kobilka
Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- School
of Science and Engineering, The Chinese
University of Hong Kong, Shenzhen 518172, China
| | - Peilin Xie
- Kobilka
Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Jiahui Guan
- School
of Medicine, The Chinese University of Hong
Kong, Shenzhen 518172, China
| | - Chia-Ru Chung
- Department
of Computer Science and Information Engineering, National Central University, Taoyuan 320317, Taiwan
| | - Yixian Huang
- School
of Medicine, The Chinese University of Hong
Kong, Shenzhen 518172, China
| | - Yuxuan Pang
- Division
of Health Medical Intelligence, Human Genome Center, The Institute
of Medical Science, The University of Tokyo, Tokyo 108-8639, Japan
| | - Huacong Wu
- School
of Medicine, The Chinese University of Hong
Kong, Shenzhen 518172, China
| | - Ying-Chih Chiang
- Kobilka
Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- School
of Medicine, The Chinese University of Hong
Kong, Shenzhen 518172, China
| | - Tzong-Yi Lee
- Institute
of Bioinformatics and Systems Biology, National
Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan
- Center
for Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan
| |
Collapse
|
10
|
Hu F, Gao J, Zheng J, Kwoh C, Jia C. N-GlycoPred: A hybrid deep learning model for accurate identification of N-glycosylation sites. Methods 2024; 227:48-57. [PMID: 38734394 DOI: 10.1016/j.ymeth.2024.05.002] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 04/16/2024] [Accepted: 05/03/2024] [Indexed: 05/13/2024] Open
Abstract
Studies have shown that protein glycosylation in cells reflects the real-time dynamics of biological processes, and the occurrence and development of many diseases are closely related to protein glycosylation. Abnormal protein glycosylation can be used as a potential diagnostic and prognostic marker of a disease, as well as a therapeutic target and a new breakthrough point for exploring pathogenesis. To address the issue of significant differences in the prediction results of previous models for different species, we constructed a hybrid deep learning model N-GlycoPred on the basis of dual-layer convolution, a paired attention mechanism and BiLSTM for accurate identification of N-glycosylation sites. By adopting one-hot encoding or the AAindex, we specifically selected the optimum combination of features and deep learning frameworks for human and mouse to refine the models. Based on six independent test datasets, our N-GlycoPred model achieved an average AUC of 0.9553, which is 0.23% higher than MusiteDeep. The comparison results indicate that our model can serve as a powerful tool for N-glycosylation site prescreening for biological researchers.
Collapse
Affiliation(s)
- Fengzhu Hu
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Jie Gao
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Jia Zheng
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Cheekeong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian 116026, China.
| |
Collapse
|
11
|
Pratyush P, Bahmani S, Pokharel S, Ismail HD, KC DB. LMCrot: an enhanced protein crotonylation site predictor by leveraging an interpretable window-level embedding from a transformer-based protein language model. Bioinformatics 2024; 40:btae290. [PMID: 38662579 PMCID: PMC11088740 DOI: 10.1093/bioinformatics/btae290] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 02/13/2024] [Accepted: 04/24/2024] [Indexed: 05/13/2024] Open
Abstract
MOTIVATION Recent advancements in natural language processing have highlighted the effectiveness of global contextualized representations from protein language models (pLMs) in numerous downstream tasks. Nonetheless, strategies to encode the site-of-interest leveraging pLMs for per-residue prediction tasks, such as crotonylation (Kcr) prediction, remain largely uncharted. RESULTS Herein, we adopt a range of approaches for utilizing pLMs by experimenting with different input sequence types (full-length protein sequence versus window sequence), assessing the implications of utilizing per-residue embedding of the site-of-interest as well as embeddings of window residues centered around it. Building upon these insights, we developed a novel residual ConvBiLSTM network designed to process window-level embeddings of the site-of-interest generated by the ProtT5-XL-UniRef50 pLM using full-length sequences as input. This model, termed T5ResConvBiLSTM, surpasses existing state-of-the-art Kcr predictors in performance across three diverse datasets. To validate our approach of utilizing full sequence-based window-level embeddings, we also delved into the interpretability of ProtT5-derived embedding tensors in two ways: firstly, by scrutinizing the attention weights obtained from the transformer's encoder block; and secondly, by computing SHAP values for these tensors, providing a model-agnostic interpretation of the prediction results. Additionally, we enhance the latent representation of ProtT5 by incorporating two additional local representations, one derived from amino acid properties and the other from supervised embedding layer, through an intermediate fusion stacked generalization approach, using an n-mer window sequence (or, peptide/fragment). The resultant stacked model, dubbed LMCrot, exhibits a more pronounced improvement in predictive performance across the tested datasets. AVAILABILITY AND IMPLEMENTATION LMCrot is publicly available at https://github.com/KCLabMTU/LMCrot.
Collapse
Affiliation(s)
- Pawel Pratyush
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, United States
| | - Soufia Bahmani
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, United States
| | - Suresh Pokharel
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, United States
| | - Hamid D Ismail
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, United States
| | - Dukka B KC
- Department of Computer Science, Michigan Technological University, Houghton, MI 49931, United States
| |
Collapse
|
12
|
Jiang Y, Yan R, Wang X. PlantNh-Kcr: a deep learning model for predicting non-histone crotonylation sites in plants. PLANT METHODS 2024; 20:28. [PMID: 38360730 PMCID: PMC10870457 DOI: 10.1186/s13007-024-01157-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Accepted: 02/07/2024] [Indexed: 02/17/2024]
Abstract
BACKGROUND Lysine crotonylation (Kcr) is a crucial protein post-translational modification found in histone and non-histone proteins. It plays a pivotal role in regulating diverse biological processes in both animals and plants, including gene transcription and replication, cell metabolism and differentiation, as well as photosynthesis. Despite the significance of Kcr, detection of Kcr sites through biological experiments is often time-consuming, expensive, and only a fraction of crotonylated peptides can be identified. This reality highlights the need for efficient and rapid prediction of Kcr sites through computational methods. Currently, several machine learning models exist for predicting Kcr sites in humans, yet models tailored for plants are rare. Furthermore, no downloadable Kcr site predictors or datasets have been developed specifically for plants. To address this gap, it is imperative to integrate existing Kcr sites detected in plant experiments and establish a dedicated computational model for plants. RESULTS Most plant Kcr sites are located on non-histones. In this study, we collected non-histone Kcr sites from five plants, including wheat, tabacum, rice, peanut, and papaya. We then conducted a comprehensive analysis of the amino acid distribution surrounding these sites. To develop a predictive model for plant non-histone Kcr sites, we combined a convolutional neural network (CNN), a bidirectional long short-term memory network (BiLSTM), and attention mechanism to build a deep learning model called PlantNh-Kcr. On both five-fold cross-validation and independent tests, PlantNh-Kcr outperformed multiple conventional machine learning models and other deep learning models. Furthermore, we conducted an analysis of species-specific effect on the PlantNh-Kcr model and found that a general model trained using data from multiple species outperforms species-specific models. CONCLUSION PlantNh-Kcr represents a valuable tool for predicting plant non-histone Kcr sites. We expect that this model will aid in addressing key challenges and tasks in the study of plant crotonylation sites.
Collapse
Affiliation(s)
- Yanming Jiang
- College of Mathematics and Computer Sciences, Shanxi Normal University, Taiyuan, 030031, China
| | - Renxiang Yan
- The Key Laboratory of Marine Enzyme Engineering of Fujian Province, Fuzhou University, Fuzhou, 350002, China
- College of Biological Science and Engineering, Fuzhou University, Fuzhou, 350002, China
| | - Xiaofeng Wang
- College of Mathematics and Computer Sciences, Shanxi Normal University, Taiyuan, 030031, China.
| |
Collapse
|
13
|
Dhakal P, Tayara H, Chong KT. An ensemble of stacking classifiers for improved prediction of miRNA-mRNA interactions. Comput Biol Med 2023; 164:107242. [PMID: 37473564 DOI: 10.1016/j.compbiomed.2023.107242] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2023] [Revised: 06/21/2023] [Accepted: 07/07/2023] [Indexed: 07/22/2023]
Abstract
MicroRNAs (miRNAs) are small non-coding RNA molecules that play a crucial role in regulating gene expression at the post-transcriptional level by binding to potential target sites of messenger RNAs (mRNAs), facilitated by the Argonaute family of proteins. Selecting the conservative candidate target sites (CTS) is a challenging step, considering that most of the existing computational algorithms primarily focus on canonical site types, which is a time-consuming and inefficient utilization of miRNA target site interactions. We developed a stacking classifier algorithm that addresses the CTS selection criteria using feature-encoding techniques that generates feature vectors, including k-mer nucleotide composition, dinucleotide composition, pseudo-nucleotide composition, and sequence order coupling. This innovative stacking classifier algorithm surpassed previous state-of-the-art algorithms in predicting functional miRNA targets. We evaluated the performance of the proposed model on 10 independent test datasets and obtained an average accuracy of 79.77%, which is a significant improvement of 7.26 % over previous models. This improvement shows that the proposed method has great potential for distinguishing highly functional miRNA targets and can serve as a valuable tool in biomedical and drug development research.
Collapse
Affiliation(s)
- Priyash Dhakal
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju-si, 54896, Jeollabuk-do, Republic of Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju-si, 54896, Jeollabuk-do, Republic of Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju-si, 54896, Jeollabuk-do, Republic of Korea; Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju-si, 54896, Jeollabuk-do, Republic of Korea.
| |
Collapse
|
14
|
Zhang Z, Li F, Zhao J, Zheng C. CapsNetYY1: identifying YY1-mediated chromatin loops based on a capsule network architecture. BMC Genomics 2023; 24:448. [PMID: 37559017 PMCID: PMC10410878 DOI: 10.1186/s12864-023-09217-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Accepted: 02/28/2023] [Indexed: 08/11/2023] Open
Abstract
BACKGROUND Previous studies have identified that chromosome structure plays a very important role in gene control. The transcription factor Yin Yang 1 (YY1), a multifunctional DNA binding protein, could form a dimer to mediate chromatin loops and active enhancer-promoter interactions. The deletion of YY1 or point mutations at the YY1 binding sites significantly inhibit the enhancer-promoter interactions and affect gene expression. To date, only a few computational methods are available for identifying YY1-mediated chromatin loops. RESULTS We proposed a novel model named CapsNetYY1, which was based on capsule network architecture to identify whether a pair of YY1 motifs can form a chromatin loop. Firstly, we encode the DNA sequence using one-hot encoding method. Secondly, multi-scale convolution layer is used to extract local features of the sequence, and bidirectional gated recurrent unit is used to learn the features across time steps. Finally, capsule networks (convolution capsule layer and digital capsule layer) used to extract higher level features and recognize YY1-mediated chromatin loops. Compared with DeepYY1, the only prediction for YY1-mediated chromatin loops, our model CapsNetYY1 achieved the better performance on the independent datasets (AUC [Formula: see text]). CONCLUSION The results indicate that CapsNetYY1 is an excellent method for identifying YY1-mediated chromatin loops. We believe that the CapsNetYY1 method will be used for predictive classification of other DNA sequences.
Collapse
Affiliation(s)
- Zhimin Zhang
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China
| | - Fenglin Li
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China
| | - Jianping Zhao
- College of Mathematics and System Sciences, Xinjiang University, Urumqi, China.
| | - Chunhou Zheng
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Information Materials and Intelligent Sensing Laboratory of Anhui Province, and School of Artificial Intelligence, Anhui University, Hefei, China.
| |
Collapse
|
15
|
Wang R, Chung CR, Huang HD, Lee TY. Identification of species-specific RNA N6-methyladinosine modification sites from RNA sequences. Brief Bioinform 2023; 24:7008797. [PMID: 36715277 DOI: 10.1093/bib/bbac573] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2022] [Revised: 11/11/2022] [Accepted: 11/24/2022] [Indexed: 01/31/2023] Open
Abstract
N6-methyladinosine (m6A) modification is the most abundant co-transcriptional modification in eukaryotic RNA and plays important roles in cellular regulation. Traditional high-throughput sequencing experiments used to explore functional mechanisms are time-consuming and labor-intensive, and most of the proposed methods focused on limited species types. To further understand the relevant biological mechanisms among different species with the same RNA modification, it is necessary to develop a computational scheme that can be applied to different species. To achieve this, we proposed an attention-based deep learning method, adaptive-m6A, which consists of convolutional neural network, bi-directional long short-term memory and an attention mechanism, to identify m6A sites in multiple species. In addition, three conventional machine learning (ML) methods, including support vector machine, random forest and logistic regression classifiers, were considered in this work. In addition to the performance of ML methods for multi-species prediction, the optimal performance of adaptive-m6A yielded an accuracy of 0.9832 and the area under the receiver operating characteristic curve of 0.98. Moreover, the motif analysis and cross-validation among different species were conducted to test the robustness of one model towards multiple species, which helped improve our understanding about the sequence characteristics and biological functions of RNA modifications in different species.
Collapse
Affiliation(s)
- Rulan Wang
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
| | - Chia-Ru Chung
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
- School of Life Sciences, University of Science and Technology of China, 230026, Hefei, Anhui, P.R. China
| | - Hsien-Da Huang
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
| | - Tzong-Yi Lee
- Warshel Institute for Computational Biology, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
- School of Medicine, The Chinese University of Hong Kong, Shenzhen, 2001 Longxiang Road, Longgang District, 51872, Shenzhen, P.R. China
| |
Collapse
|
16
|
Hu W, Zhang W, Zhou Y, Luo Y, Sun X, Xu H, Shi S, Li T, Xu Y, Yang Q, Qiu Y, Zhu F, Dai H. MecDDI: Clarified Drug-Drug Interaction Mechanism Facilitating Rational Drug Use and Potential Drug-Drug Interaction Prediction. J Chem Inf Model 2023; 63:1626-1636. [PMID: 36802582 DOI: 10.1021/acs.jcim.2c01656] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023]
Abstract
Drug-drug interactions (DDIs) are a major concern in clinical practice and have been recognized as one of the key threats to public health. To address such a critical threat, many studies have been conducted to clarify the mechanism underlying each DDI, based on which alternative therapeutic strategies are successfully proposed. Moreover, artificial intelligence-based models for predicting DDIs, especially multilabel classification models, are highly dependent on a reliable DDI data set with clear mechanistic information. These successes highlight the imminent necessity to have a platform providing mechanistic clarifications for a large number of existing DDIs. However, no such platform is available yet. In this study, a platform entitled "MecDDI" was therefore introduced to systematically clarify the mechanisms underlying the existing DDIs. This platform is unique in (a) clarifying the mechanisms underlying over 1,78,000 DDIs by explicit descriptions and graphic illustrations and (b) providing a systematic classification for all collected DDIs based on the clarified mechanisms. Due to the long-lasting threats of DDIs to public health, MecDDI could offer medical scientists a clear clarification of DDI mechanisms, support healthcare professionals to identify alternative therapeutics, and prepare data for algorithm scientists to predict new DDIs. MecDDI is now expected as an indispensable complement to the available pharmaceutical platforms and is freely accessible at: https://idrblab.org/mecddi/.
Collapse
Affiliation(s)
- Wei Hu
- Department of Pharmacy, Center of Clinical Pharmacology, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310009, China
| | - Wei Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Ying Zhou
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Xiuna Sun
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Huimin Xu
- Department of Pharmacy, Center of Clinical Pharmacology, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310009, China
| | - Shuiyang Shi
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Teng Li
- Department of Pharmacy, Center of Clinical Pharmacology, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310009, China
| | - Yichao Xu
- Department of Pharmacy, Center of Clinical Pharmacology, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310009, China
| | - Qianqian Yang
- Department of Pharmacy, Affiliated Hangzhou First Peoples Hospital, Zhejiang University School of Medicine, Hangzhou 310006, China.,Clinical Pharmacy Research Center, Zhejiang University School of Medicine, Hangzhou 310009, China
| | - Yunqing Qiu
- State Key Laboratory for Diagnosis and Treatment of Infectious Disease, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, Zhejiang Provincial Key Laboratory for Drug Clinical Research and Evaluation, The First Affiliated Hospital, Zhejiang University, Hangzhou 310000, China
| | - Feng Zhu
- Department of Pharmacy, Center of Clinical Pharmacology, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310009, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou 310058, China.,Innovation Institute for Artificial Intelligence in Medicine of Zhejiang University, Alibaba-Zhejiang University Joint Research Center of Future Digital Healthcare, Hangzhou 330110, China
| | - Haibin Dai
- Department of Pharmacy, Center of Clinical Pharmacology, The Second Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou 310009, China.,Clinical Pharmacy Research Center, Zhejiang University School of Medicine, Hangzhou 310009, China
| |
Collapse
|
17
|
Li W, Wang J, Luo Y, Bezabih TT. Multi-dimensional feature recognition model based on capsule network for ubiquitination site prediction. PeerJ 2022; 10:e14427. [PMID: 36523471 PMCID: PMC9745908 DOI: 10.7717/peerj.14427] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2022] [Accepted: 10/30/2022] [Indexed: 12/12/2022] Open
Abstract
Ubiquitination is an important post-translational modification of proteins that regulates many cellular activities. Traditional experimental methods for identification are costly and time-consuming, so many researchers have proposed computational methods for ubiquitination site prediction in recent years. However, traditional machine learning methods focus on feature engineering and are not suitable for large-scale proteomic data. In addition, deep learning methods are mostly based on convolutional neural networks and fuse multiple coding approaches to achieve classification prediction. This cannot effectively identify potential fine-grained features of the input data and has limitations in the representation of dependencies between low-level features and high-level features. A multi-dimensional feature recognition model based on a capsule network (MDCapsUbi) was proposed to predict protein ubiquitination sites. The proposed module consisting of convolution operations and channel attention was used to recognize coarse-grained features in the sequence dimension and the feature map dimension. The capsule network module consisting of capsule vectors was used to identify fine-grained features and classify ubiquitinated sites. With ten-fold cross-validation, the MDCapsUbi achieved 91.82% accuracy, 91.39% sensitivity, 92.24% specificity, 0.837 MCC, 0.918 F-Score and 0.97 AUC. Experimental results indicated that the proposed method outperformed other ubiquitination site prediction technologies.
Collapse
Affiliation(s)
- Weimin Li
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Jie Wang
- School of Computer Engineering and Science, Shanghai University, Shanghai, China
| | - Yin Luo
- School of Life Sciences, East China Normal University, Shanghai, China
| | | |
Collapse
|
18
|
Mansoor M, Nauman M, Rehman HU, Omar M. Gene Ontology Capsule GAN: an improved architecture for protein function prediction. PeerJ Comput Sci 2022; 8:e1014. [PMID: 36092003 PMCID: PMC9454774 DOI: 10.7717/peerj-cs.1014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 05/31/2022] [Indexed: 06/15/2023]
Abstract
Proteins are the core of all functions pertaining to living things. They consist of an extended amino acid chain folding into a three-dimensional shape that dictates their behavior. Currently, convolutional neural networks (CNNs) have been pivotal in predicting protein functions based on protein sequences. While it is a technology crucial to the niche, the computation cost and translational invariance associated with CNN make it impossible to detect spatial hierarchies between complex and simpler objects. Therefore, this research utilizes capsule networks to capture spatial information as opposed to CNNs. Since capsule networks focus on hierarchical links, they have a lot of potential for solving structural biology challenges. In comparison to the standard CNNs, our results exhibit an improvement in accuracy. Gene Ontology Capsule GAN (GOCAPGAN) achieved an F1 score of 82.6%, a precision score of 90.4% and recall score of 76.1%.
Collapse
Affiliation(s)
- Musadaq Mansoor
- National University of Computer and Emerging Sciences, Islamabad, Peshawar, KPK, Pakistan
| | - Mohammad Nauman
- National University of Computer and Emerging Sciences, Islamabad, Peshawar, KPK, Pakistan
| | - Hafeez Ur Rehman
- National University of Computer and Emerging Sciences, Islamabad, Peshawar, KPK, Pakistan
| | - Maryam Omar
- National University of Computer and Emerging Sciences, Islamabad, Peshawar, KPK, Pakistan
| |
Collapse
|
19
|
Liang Y, Wu Y, Zhang Z, Liu N, Peng J, Tang J. Hyb4mC: a hybrid DNA2vec-based model for DNA N4-methylcytosine sites prediction. BMC Bioinformatics 2022; 23:258. [PMID: 35768759 PMCID: PMC9241225 DOI: 10.1186/s12859-022-04789-6] [Citation(s) in RCA: 13] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2021] [Accepted: 06/10/2022] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND DNA N4-methylcytosine is part of the restrictive modification system, which works by regulating some biological processes, for example, the initiation of DNA replication, mismatch repair and inactivation of transposon. However, using experimental methods to detect 4mC sites is time-consuming and expensive. Besides, considering the huge differences in the number of 4mC samples among different species, it is challenging to achieve a robust multi-species 4mC site prediction performance. Hence, it is of great significance to develop effective computational tools to identify 4mC sites. RESULTS This work proposes a flexible deep learning-based framework to predict 4mC sites, called Hyb4mC. Hyb4mC adopts the DNA2vec method for sequence embedding, which captures more efficient and comprehensive information compared with the sequence-based feature method. Then, two different subnets are used for further analysis: Hyb_Caps and Hyb_Conv. Hyb_Caps is composed of a capsule neural network and can generalize from fewer samples. Hyb_Conv combines the attention mechanism with a text convolutional neural network for further feature learning. CONCLUSIONS Extensive benchmark tests have shown that Hyb4mC can significantly enhance the performance of predicting 4mC sites compared with the recently proposed methods.
Collapse
Affiliation(s)
- Ying Liang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China.
| | - Yanan Wu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Zequn Zhang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Niannian Liu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Jun Peng
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Jianjun Tang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| |
Collapse
|