1
|
Li J, Xiong S, Shi H, Cui F, Zhang Z, Wei L. NeuroPred-AIMP: Multimodal Deep Learning for Neuropeptide Prediction via Protein Language Modeling and Temporal Convolutional Networks. J Chem Inf Model 2025; 65:4740-4750. [PMID: 40258183 DOI: 10.1021/acs.jcim.5c00444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2025]
Abstract
Neuropeptides are key signaling molecules that regulate fundamental physiological processes ranging from metabolism to cognitive function. However, accurate identification is a huge challenge due to sequence heterogeneity, obscured functional motifs and limited experimentally validated data. Accurate identification of neuropeptides is critical for advancing neurological disease therapeutics and peptide-based drug design. Existing neuropeptide identification methods rely on manual features combined with traditional machine learning methods, which are difficult to capture the deep patterns of sequences. To address these limitations, we propose NeuroPred-AIMP (adaptive integrated multimodal predictor), an interpretable model that synergizes global semantic representation of the protein language model (ESM) and the multiscale structural features of the temporal convolutional network (TCN). The model introduced the adaptive features fusion mechanism of residual enhancement to dynamically recalibrate feature contributions, to achieve robust integration of evolutionary and local sequence information. The experimental results demonstrated that the proposed model showed excellent comprehensive performance on the independence test set, with an accuracy of 92.3% and the AUROC of 0.974. Simultaneously, the model showed good balance in the ability to identify positive and negative samples, with a sensitivity of 92.6% and a specificity of 92.1%, with a difference of less than 0.5%. The result fully confirms the effectiveness of the multimodal features strategy in the task of neuropeptide recognition.
Collapse
Affiliation(s)
- Jinjin Li
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
| | - Shuwen Xiong
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
| | - Hua Shi
- School of Optoelectronic and Communication Engineering, Xiamen University of Technology, Xiamen 361024, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Leyi Wei
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
- School of Software, Shandong University, Jinan 250101, China
| |
Collapse
|
2
|
Akbar S, Raza A, Awan HH, Zou Q, Alghamdi W, Saeed A. pNPs-CapsNet: Predicting Neuropeptides Using Protein Language Models and FastText Encoding-Based Weighted Multi-View Feature Integration with Deep Capsule Neural Network. ACS OMEGA 2025; 10:12403-12416. [PMID: 40191328 PMCID: PMC11966582 DOI: 10.1021/acsomega.4c11449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Revised: 02/04/2025] [Accepted: 03/07/2025] [Indexed: 04/09/2025]
Abstract
Neuropeptides (NPs) are critical signaling molecules that are essential in numerous physiological processes and possess significant therapeutic potential. Computational prediction of NPs has emerged as a promising alternative to traditional experimental methods, often labor-intensive, time-consuming, and expensive. Recent advancements in computational peptide models provide a cost-effective approach to identifying NPs, characterized by high selectivity toward target cells and minimal side effects. In this study, we propose a novel deep capsule neural network-based computational model, namely pNPs-CapsNet, to predict NPs and non-NPs accurately. Input samples are numerically encoded using pretrained protein language models, including ESM, ProtBERT-BFD, and ProtT5, to extract attention mechanism-based contextual and semantic features. A differential evolution-based weighted feature integration method is utilized to construct a multiview vector. Additionally, a two-tier feature selection strategy, comprising MRMD and SHAP analysis, is developed to identify and select optimal features. Finally, the novel capsule neural network (CapsNet) is trained using the selected optimal feature set. The proposed pNPs-CapsNet model achieved a remarkable predictive accuracy of 98.10% and an AUC of 0.98. To validate the generalization capability of the pNPs-CapsNet model, independent samples reported an accuracy of 95.21% and an AUC of 0.96. The pNPs-CapsNet model outperforms existing state-of-the-art models, demonstrating 4% and 2.5% improved predictive accuracy for training and independent data sets, respectively. The demonstrated efficacy and consistency of pNPs-CapsNet underline its potential as a valuable and robust tool for advancing drug discovery and academic research.
Collapse
Affiliation(s)
- Shahid Akbar
- Institute
of Fundamental and Frontier Sciences, University
of Electronic Science and Technology of China, Chengdu 610054, China
- Department
of Computer Science, Abdul Wali Khan University
Mardan, Mardan 23200, Khyber Pakhtunkhwa, Pakistan
| | - Ali Raza
- Department
of Computer Science, Bahria University, Islamabad 44220, Pakistan
| | - Hamid Hussain Awan
- Department
of Computer Science, Rawalpindi Women University, Rawalpindi 46300, Punjab, Pakistan
| | - Quan Zou
- Institute
of Fundamental and Frontier Sciences, University
of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze
Delta Region Institute (Quzhou), University
of Electronic Science and Technology of China, Quzhou 324000, PR China
| | - Wajdi Alghamdi
- Department
of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Aamir Saeed
- Department
of Computer Science and IT, University of
Engineering and Technology, Jalozai Campus, Peshawar 25000, Pakistan
| |
Collapse
|
3
|
Rahmani R, Kalankesh LR, Ferdousi R. Computational approaches for identifying neuropeptides: A comprehensive review. MOLECULAR THERAPY. NUCLEIC ACIDS 2025; 36:102409. [PMID: 40171446 PMCID: PMC11960512 DOI: 10.1016/j.omtn.2024.102409] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/03/2025]
Abstract
Neuropeptides (NPs) are key signaling molecules that interact with G protein-coupled receptors, influencing neuronal activities and developmental pathways, as well as the endocrine and immune systems. They are significant in disease contexts, offering potential therapeutic targets for conditions such as anxiety, neurological disorders, cardiovascular health, and diabetes. Understanding and detecting NPs is crucial because of their complex functions in health and disease. Historically, identifying NPs via wet lab techniques has been time-consuming and costly. However, integrating computational methods has shown the potential to improve efficiency, accuracy, and cost-effectiveness. Computational techniques, such as artificial intelligence and machine learning, have been extensively researched in recent years for the identification of NP. This review explores the application of machine learning (ML) techniques in predicting various aspects of NPs, including their sequences, cleavage sites, and precursors. Additionally, it provides insights into databases containing NP metadata and specialized tools used in this domain.
Collapse
Affiliation(s)
- Roya Rahmani
- Student Research Committee, Tabriz University of Medical Science, Tabriz, Iran
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Leila R. Kalankesh
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
- Tabriz University of Medical Sciences, Research Center of Psychiatry and Behavioral Sciences Tabriz, East Azerbaijan, Iran
| | - Reza Ferdousi
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
4
|
Liang Y, Cao M, Zhang S. NeuroPred-ResSE: Predicting neuropeptides by integrating residual block and squeeze-excitation attention mechanism. Anal Biochem 2024; 695:115648. [PMID: 39154878 DOI: 10.1016/j.ab.2024.115648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/31/2024] [Accepted: 08/15/2024] [Indexed: 08/20/2024]
Abstract
Neuropeptides play crucial roles in regulating neurological function acting as signaling molecules, which provide new opportunity for developing drugs for the treatment of neurological diseases. Therefore, it is very necessary to develop a rapid and accurate prediction model for neuropeptides. Although a few prediction tools have been developed, there is room for improvement in prediction accuracy by using deep learning approach. In this paper, we establish the NeuroPred-ResSE model based on residual block and squeeze-excitation attention mechanism. Firstly, we extract multi-features by using one-hot coding based on the NT5CT5 sequence, dipeptide deviation from expected mean and natural vector. Then, we integrate residual block and squeeze-excitation attention mechanism, which can capture and identify the most relevant attribute features. Finally, the accuracies of the training set and test set are 97.16 % and 96.60 % based on the 5-fold cross-validation and independent test, respectively, and other evaluation metrics have also obtained satisfactory results. The experimental results show that the performance of the NeuroPred-ResSE model outperforms those of existing state-of-the-art models, and our model is an effective, intelligent and robust prediction tool. The datasets and source codes are available at https://github.com/yunyunliang88/NeuroPred-ResSE.
Collapse
Affiliation(s)
- Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, PR China.
| | - Mengyi Cao
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, PR China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| |
Collapse
|
5
|
Xia H, Dong C, Chen X, Wei Z, Gu L, Zhu X. SGTCDA: Prediction of circRNA-drug sensitivity associations with interpretable graph transformers and effective assessment. BMC Genomics 2024; 25:1113. [PMID: 39567908 PMCID: PMC11577602 DOI: 10.1186/s12864-024-11022-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2024] [Accepted: 11/08/2024] [Indexed: 11/22/2024] Open
Abstract
CircRNAs are a type of circular non-coding RNA whose associations with drug sensitivities have been demonstrated in recent studies. Due to the high cost of biomedical experiments for detecting the associations between circRNAs and drug sensitivities, several computational methods have been developed. However, these methods were evaluated mainly based on 5- or tenfold cross-validation, which are often over-optimistic. Furthermore, there are technique issues with these models, such as over-smoothing and over-squashing. To address these issues, we propose a strategy to evaluate models based on independent test sets for association prediction-related studies. In the light of this effective assessment, we constructed a model, SGTCDA, by integrating structural deep network embedding (SDNE) and a graph transformer to predict the potential associations of circRNA-drug sensitivity, which can efficiently capture long-range dependencies and local structural information of nodes. Our results on the training sets and the independent test sets indicate that SGTCDA outperforms the other state-of-the-art models, demonstrating its capacity for accurate prediction of circRNA-drug sensitivity. Moreover, we leveraged EdgeSHAPer to explain the performance of the proposed SGTCDA model, which illustrates that the edges between drugs are more important than other edges for the performance of the model. The source code and dataset of SGTCDA are available at: https://github.com/hwxia/SGTCDA .
Collapse
Affiliation(s)
- Hongwei Xia
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, 230036, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, Anhui, 230036, China
- Research Center for Agricultural Information Perception and Intelligent Computing Engineering of Anhui Province, Hefei, Anhui, 230036, China
| | - Caiyue Dong
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, 230036, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, Anhui, 230036, China
- Research Center for Agricultural Information Perception and Intelligent Computing Engineering of Anhui Province, Hefei, Anhui, 230036, China
| | - Xinxing Chen
- School of Life Sciences, Anhui Agricultural University, Hefei, Anhui, 230036, China
| | - Zhuoyu Wei
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, 230036, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, Anhui, 230036, China
- Research Center for Agricultural Information Perception and Intelligent Computing Engineering of Anhui Province, Hefei, Anhui, 230036, China
| | - Lichuan Gu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, 230036, China.
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, Anhui, 230036, China.
- Research Center for Agricultural Information Perception and Intelligent Computing Engineering of Anhui Province, Hefei, Anhui, 230036, China.
| | - Xiaolei Zhu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui, 230036, China.
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Hefei, Anhui, 230036, China.
- Research Center for Agricultural Information Perception and Intelligent Computing Engineering of Anhui Province, Hefei, Anhui, 230036, China.
| |
Collapse
|
6
|
Gao W, Zhao J, Gui J, Wang Z, Chen J, Yue Z. Comprehensive Assessment of BERT-Based Methods for Predicting Antimicrobial Peptides. J Chem Inf Model 2024; 64:7772-7785. [PMID: 39316765 DOI: 10.1021/acs.jcim.4c00507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
In recent years, the prediction of antimicrobial peptides (AMPs) has gained prominence due to their high antibacterial activity and reduced susceptibility to drug resistance, making them potential antibiotic substitutes. To advance the field of AMP recognition, an increasing number of natural language processing methods are being applied. These methods exhibit diversity in terms of pretraining models, pretraining data sets, word vector embeddings, feature encoding methods, and downstream classification models. Here, we provide a comprehensive survey of current BERT-based methods for AMP prediction. An independent benchmark test data set is constructed to evaluate the predictive capabilities of the surveyed tools. Furthermore, we compared the predictive performance of these computational methods based on six different AMP public databases. LM_pred (BFD) outperformed all other surveyed tools due to abundant pretraining data set and the unique vector embedding approach. To avoid the impact of varying training data sets used by different methods on prediction performance, we performed the 5-fold cross-validation experiments using the same data set, involving retraining. Additionally, to explore the applicability and generalization ability of the models, we constructed a short peptide data set and an external data set to test the retrained models. Although these prediction methods based on BERT can achieve good prediction performance, there is still room for improvement in recognition accuracy. With the continuous enhancement of protein language model, we proposed an AMP prediction method based on the ESM-2 pretrained model called iAMP-bert. Experimental results demonstrate that iAMP-bert outperforms other approaches. iAMP-bert is freely accessible to the public at http://iamp.aielab.cc/.
Collapse
Affiliation(s)
- Wanling Gao
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Jun Zhao
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Jianfeng Gui
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Zehan Wang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Jie Chen
- National Engineering Laboratory for Big Data System Computing Technology, Shenzhen University, Shenzhen, Guangdong 518060, China
| | - Zhenyu Yue
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, Anhui 230036, China
| |
Collapse
|
7
|
Wen J, Ding Z, Wei Z, Xia H, Zhang Y, Zhu X. NeuroPpred-SHE: An interpretable neuropeptides prediction model based on selected features from hand-crafted features and embeddings of T5 model. Comput Biol Med 2024; 181:109048. [PMID: 39182368 DOI: 10.1016/j.compbiomed.2024.109048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 08/13/2024] [Accepted: 08/18/2024] [Indexed: 08/27/2024]
Abstract
Neuropeptides are the most ubiquitous neurotransmitters in the immune system, regulating various biological processes. Neuropeptides play a significant role for the discovery of new drugs and targets for nervous system disorders. Traditional experimental methods for identifying neuropeptides are time-consuming and costly. Although several computational methods have been developed to predict the neuropeptides, the accuracy is still not satisfactory due to the representability of the extracted features. In this work, we propose an efficient and interpretable model, NeuroPpred-SHE, for predicting neuropeptides by selecting the optimal feature subset from both hand-crafted features and embeddings of a protein language model. Specially, we first employed a pre-trained T5 protein language model to extract embedding features and twelve other encoding methods to extract hand-crafted features from peptide sequences, respectively. Secondly, we fused both embedding features and hand-crafted features to enhance the feature representability. Thirdly, we utilized random forest (RF), Max-Relevance and Min-Redundancy (mRMR) and eXtreme Gradient Boosting (XGBoost) methods to select the optimal feature subset from the fused features. Finally, we employed five machine learning methods (GBDT, XGBoost, SVM, MLP, and LightGBM) to build the models. Our results show that the model based on GBDT achieves the best performance. Furthermore, our final model was compared with other state-of-the-art methods on an independent test set, the results indicate that our model achieves an AUROC of 97.8 % which is higher than all the other state-of-the-art predictors. Our model is available at: https://github.com/wenjean/NeuroPpred-SHE.
Collapse
Affiliation(s)
- Jian Wen
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
| | - Zhijie Ding
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
| | - Zhuoyu Wei
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
| | - Hongwei Xia
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
| | - Yong Zhang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China.
| | - Xiaolei Zhu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China.
| |
Collapse
|
8
|
Liu S, Shi T, Yu J, Li R, Lin H, Deng K. Research on Bitter Peptides in the Field of Bioinformatics: A Comprehensive Review. Int J Mol Sci 2024; 25:9844. [PMID: 39337334 PMCID: PMC11432553 DOI: 10.3390/ijms25189844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 09/06/2024] [Accepted: 09/09/2024] [Indexed: 09/30/2024] Open
Abstract
Bitter peptides are small molecular peptides produced by the hydrolysis of proteins under acidic, alkaline, or enzymatic conditions. These peptides can enhance food flavor and offer various health benefits, with attributes such as antihypertensive, antidiabetic, antioxidant, antibacterial, and immune-regulating properties. They show significant potential in the development of functional foods and the prevention and treatment of diseases. This review introduces the diverse sources of bitter peptides and discusses the mechanisms of bitterness generation and their physiological functions in the taste system. Additionally, it emphasizes the application of bioinformatics in bitter peptide research, including the establishment and improvement of bitter peptide databases, the use of quantitative structure-activity relationship (QSAR) models to predict bitterness thresholds, and the latest advancements in classification prediction models built using machine learning and deep learning algorithms for bitter peptide identification. Future research directions include enhancing databases, diversifying models, and applying generative models to advance bitter peptide research towards deepening and discovering more practical applications.
Collapse
Affiliation(s)
| | | | | | | | - Hao Lin
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; (S.L.); (T.S.); (J.Y.); (R.L.)
| | - Kejun Deng
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; (S.L.); (T.S.); (J.Y.); (R.L.)
| |
Collapse
|
9
|
Li H, Meng J, Wang Z, Tang Y, Xia S, Wang Y, Qin Z, Luan Y. miPEPPred-FRL: A Novel Method for Predicting Plant MiRNA-Encoded Peptides Using Adaptive Feature Representation Learning. J Chem Inf Model 2024; 64:2889-2900. [PMID: 37733290 DOI: 10.1021/acs.jcim.3c01020] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
MicroRNAs (miRNAs) are an essential type of small molecule RNAs that play significant regulatory roles in organisms. Recent studies have demonstrated that small open reading frames (sORFs) harbored in primary miRNAs (pri-miRNAs) can encode small peptides, known as miPEPs. Plant miPEPs can increase the abundance and activity of cognate miRNAs by promoting the transcription of their corresponding pri-miRNAs, thereby modulating plant traits. Biological experiments are the most effective way to accurately identify miPEPs; however, they are time-consuming and expensive. Hence, an efficient computational method for the identification of miPEPs on a large scale is highly desirable. Up to now, there have been no specialized computational tools for identifying miPEPs. In this work, a novel predictor named miPEPPred-FRL based on an adaptive feature representation learning framework that consists of the feature transformation module and the cascade architecture has been proposed. The feature transformation module integrating a newly designed feature selection method and classifier selection rule is developed to convert sequence-based features into primary class and probabilistic features, which are then fed into the improved cascade architecture to obtain more stable and discriminative augmented features. Finally, the augmented features are utilized to construct the final predictor. Cross-validation experiments illustrate that the novel feature selection method and classifier selection rule contribute to boosting the feature representation ability of the framework. Furthermore, the high accuracy of miPEPPred-FRL on independent testing data suggests that it is a trustworthy and valuable tool for the identification of miPEPs.
Collapse
Affiliation(s)
- Haibin Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Zhaowei Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Youwei Tang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Shihao Xia
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Yu Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Zhaojing Qin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, Liaoning 116024, China
| |
Collapse
|
10
|
Le NQK. Leveraging transformers-based language models in proteome bioinformatics. Proteomics 2023; 23:e2300011. [PMID: 37381841 DOI: 10.1002/pmic.202300011] [Citation(s) in RCA: 21] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 06/13/2023] [Accepted: 06/13/2023] [Indexed: 06/30/2023]
Abstract
In recent years, the rapid growth of biological data has increased interest in using bioinformatics to analyze and interpret this data. Proteomics, which studies the structure, function, and interactions of proteins, is a crucial area of bioinformatics. Using natural language processing (NLP) techniques in proteomics is an emerging field that combines machine learning and text mining to analyze biological data. Recently, transformer-based NLP models have gained significant attention for their ability to process variable-length input sequences in parallel, using self-attention mechanisms to capture long-range dependencies. In this review paper, we discuss the recent advancements in transformer-based NLP models in proteome bioinformatics and examine their advantages, limitations, and potential applications to improve the accuracy and efficiency of various tasks. Additionally, we highlight the challenges and future directions of using these models in proteome bioinformatics research. Overall, this review provides valuable insights into the potential of transformer-based NLP models to revolutionize proteome bioinformatics.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
- AIBioMed Research Group, Taipei Medical University, Taipei, Taiwan
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, Taiwan
| |
Collapse
|