1
|
Zhang H, Liu S, Su W, Xie X, Yu J, Dao F, Yang M, Lyu H, Lin H. NeuroScale: evolutional scale-based protein language models enable prediction of neuropeptides. BMC Biol 2025; 23:142. [PMID: 40437538 PMCID: PMC12121104 DOI: 10.1186/s12915-025-02243-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2025] [Accepted: 05/12/2025] [Indexed: 06/01/2025] Open
Abstract
BACKGROUND Neuropeptides (NPs) are critical signaling molecules involved in various physiological and behavioral processes, including development, metabolism, and memory. They function within both the nervous and endocrine systems and have emerged as promising therapeutic targets for a range of diseases. Despite their significance, the accurate identification of NPs remains a challenge, necessitating the development of more effective computational approaches. RESULTS In this study, we introduce NeuroScale, a multi-channel neural network model leveraging evolutionary scale modeling (ESM) for the precise prediction of NPs. By integrating the GoogLeNet framework, NeuroScale effectively captures multi-scale NP features, enabling robust and accurate classification. Extensive benchmarking demonstrates its superior performance, consistently achieving an area under the receiver operating characteristic curve (AUC) exceeding 0.97. Additionally, we systematically analyzed the impact of protein sequence similarity thresholds and multi-scale sequence lengths on model performance, further validating NeuroScale's robustness and generalizability. CONCLUSIONS NeuroScale represents a significant advancement in neuropeptide prediction, offering both high accuracy and adaptability to diverse sequence characteristics. Its ability to generalize across different sequence similarity thresholds and lengths underscores its potential as a reliable tool for neuropeptide discovery and peptide-based drug development. By providing a scalable and efficient deep learning framework, NeuroScale paves the way for future research in neuropeptide function, disease mechanisms, and therapeutic applications.
Collapse
Affiliation(s)
- Hongqi Zhang
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China
| | - Shanghua Liu
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China
| | - Wei Su
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China
| | - Xueqin Xie
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China
| | - Junwen Yu
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China
| | - Fuying Dao
- School of Biological Sciences, Nanyang Technological University, Singapore, 639798, Singapore
| | - Mi Yang
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China.
| | - Hao Lyu
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China.
| | - Hao Lin
- The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China.
| |
Collapse
|
2
|
Yao Y, Zhang D, Fan H, Wu T, Su Y, Bin Y. Prediction of Chemically Modified Antimicrobial Peptides and Their Sub-functional Activities Using Hybrid Features. Probiotics Antimicrob Proteins 2025:10.1007/s12602-025-10575-6. [PMID: 40397268 DOI: 10.1007/s12602-025-10575-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/29/2025] [Indexed: 05/22/2025]
Abstract
Antimicrobial peptides (AMPs) demonstrate a broad spectrum of activities against various pathogens, thereby offering a promising strategy to mitigate the urgent challenge of antimicrobial resistance. Recent studies indicate that chemically modified AMPs (cmAMPs), which contain chemically modified amino acids, have the potential to alleviate the adverse effects commonly associated with conventional AMPs. Nevertheless, there remains a notable deficiency in computational methods specifically designed for the analysis and prediction of cmAMPs and their sub-function predictions. In this study, we proposed a two-layer model, termed as iCMAMP, aimed for the identification of cmAMPs and their sub-functional activities. The first layer, referred to as iCMAMP-1L, integrates three categories encompassing seven distinct groups of features, in conjunction with an ensemble method designed at enhancing predictive accuracy for cmAMPs. This ensemble approach effectively extracts relevant insights from a heterogeneous array of features sets while addressing potential dimensionality challenges. On the test dataset, iCMAMP-1L achieved an ACC of 0.934 and an MCC of 0.868, representing improvements of 3.4% and 6.8%, respectively, over AntiMPmod, which is the sole existing method for predicting cmAMPs. A comparative analysis between cmAMPs and their corresponding AMPs revealed that chemical modifications can significantly reduce hemolysis and toxicity associated with AMPs, while the functional characteristics of the peptides are primarily determined by their sequences. The second layer of our model, designated as iCMAMP-2L, employed a multi-label classification approach to predict the sub-functional activities of cmAMPs, with a specific focus on the dipeptide composition-based features. On the test dataset, iCMAMP-2L achieved an Accuracy of 0.390 and an Absolute true of 0.621. The data and Python code used in the iCMAMP model are available at https://github.com/swicher123/iCMAMP/tree/master .
Collapse
Affiliation(s)
- Yujie Yao
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Daijun Zhang
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Henghui Fan
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Ting Wu
- Department of Infectious Diseases & Anhui Province Key Laboratory of Infectious Diseases, The First Affiliated Hospital of Anhui Medical University, Hefei, 230022, Anhui, China.
- Institute of Bacterial Resistance & Anhui Center for Surveillance of Bacterial Resistance, Anhui Medical University, Hefei, 230022, Anhui, China.
| | - Yansen Su
- School of Artificial Intelligence, Anhui University, Hefei, 230601, Anhui, China.
| | - Yannan Bin
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.
| |
Collapse
|
3
|
Li J, Xiong S, Shi H, Cui F, Zhang Z, Wei L. NeuroPred-AIMP: Multimodal Deep Learning for Neuropeptide Prediction via Protein Language Modeling and Temporal Convolutional Networks. J Chem Inf Model 2025; 65:4740-4750. [PMID: 40258183 DOI: 10.1021/acs.jcim.5c00444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2025]
Abstract
Neuropeptides are key signaling molecules that regulate fundamental physiological processes ranging from metabolism to cognitive function. However, accurate identification is a huge challenge due to sequence heterogeneity, obscured functional motifs and limited experimentally validated data. Accurate identification of neuropeptides is critical for advancing neurological disease therapeutics and peptide-based drug design. Existing neuropeptide identification methods rely on manual features combined with traditional machine learning methods, which are difficult to capture the deep patterns of sequences. To address these limitations, we propose NeuroPred-AIMP (adaptive integrated multimodal predictor), an interpretable model that synergizes global semantic representation of the protein language model (ESM) and the multiscale structural features of the temporal convolutional network (TCN). The model introduced the adaptive features fusion mechanism of residual enhancement to dynamically recalibrate feature contributions, to achieve robust integration of evolutionary and local sequence information. The experimental results demonstrated that the proposed model showed excellent comprehensive performance on the independence test set, with an accuracy of 92.3% and the AUROC of 0.974. Simultaneously, the model showed good balance in the ability to identify positive and negative samples, with a sensitivity of 92.6% and a specificity of 92.1%, with a difference of less than 0.5%. The result fully confirms the effectiveness of the multimodal features strategy in the task of neuropeptide recognition.
Collapse
Affiliation(s)
- Jinjin Li
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
| | - Shuwen Xiong
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
| | - Hua Shi
- School of Optoelectronic and Communication Engineering, Xiamen University of Technology, Xiamen 361024, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Leyi Wei
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
- School of Software, Shandong University, Jinan 250101, China
| |
Collapse
|
4
|
Fields L, Dang TC, Tran VNH, Ibarra AE, Li L. Decoding Neuropeptide Complexity: Advancing Neurobiological Insights from Invertebrates to Vertebrates through Evolutionary Perspectives. ACS Chem Neurosci 2025; 16:1662-1679. [PMID: 40261092 DOI: 10.1021/acschemneuro.5c00053] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/24/2025] Open
Abstract
Neuropeptides are vital signaling molecules involved in neural communication, hormonal regulation, and stress response across diverse taxa. Despite their critical roles, neuropeptide research remains challenging due to their low abundance, complex post-translational modifications (PTMs), and dynamic expression patterns. Mass spectrometry (MS)-based neuropeptidomics has revolutionized peptide identification and quantification, enabling the high-throughput characterization of neuropeptides and their PTMs. However, the complexity of vertebrate neural networks poses significant challenges for functional studies. Invertebrate models, such as Cancer borealis, Drosophila melanogaster, and Caenorhabditis elegans, offer simplified neural circuits, well-characterized systems, and experimental tools for elucidating the functional roles of neuropeptides. These models have revealed conserved neuropeptide families, including allatostatins, RFamides, and tachykinin-related peptides, whose vertebrate homologues regulate analogous physiological functions. Recent advancements in MS techniques, including ion mobility spectrometry and MALDI MS imaging, have further enhanced the spatial and temporal resolution of neuropeptide analysis, allowing for insights into peptide signaling systems. Invertebrate neuropeptide research not only expands our understanding of conserved neuropeptide functions but also informs translational applications including the development of peptide-based therapeutics. This review highlights the utility of invertebrate models in neuropeptide discovery, emphasizing their contributions to uncovering fundamental biological principles and their relevance to vertebrate systems.
Collapse
Affiliation(s)
- Lauren Fields
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Tina C Dang
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States
| | - Vu Ngoc Huong Tran
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States
| | - Angel E Ibarra
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Lingjun Li
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
- School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States
- Lachman Institute for Pharmaceutical Development, School of Pharmacy, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
- Wisconsin Center for NanoBioSystems, School of Pharmacy, University of Wisconsin-Madison, Madison, Wisconsin 53705, United States
| |
Collapse
|
5
|
Akbar S, Raza A, Awan HH, Zou Q, Alghamdi W, Saeed A. pNPs-CapsNet: Predicting Neuropeptides Using Protein Language Models and FastText Encoding-Based Weighted Multi-View Feature Integration with Deep Capsule Neural Network. ACS OMEGA 2025; 10:12403-12416. [PMID: 40191328 PMCID: PMC11966582 DOI: 10.1021/acsomega.4c11449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/24/2024] [Revised: 02/04/2025] [Accepted: 03/07/2025] [Indexed: 04/09/2025]
Abstract
Neuropeptides (NPs) are critical signaling molecules that are essential in numerous physiological processes and possess significant therapeutic potential. Computational prediction of NPs has emerged as a promising alternative to traditional experimental methods, often labor-intensive, time-consuming, and expensive. Recent advancements in computational peptide models provide a cost-effective approach to identifying NPs, characterized by high selectivity toward target cells and minimal side effects. In this study, we propose a novel deep capsule neural network-based computational model, namely pNPs-CapsNet, to predict NPs and non-NPs accurately. Input samples are numerically encoded using pretrained protein language models, including ESM, ProtBERT-BFD, and ProtT5, to extract attention mechanism-based contextual and semantic features. A differential evolution-based weighted feature integration method is utilized to construct a multiview vector. Additionally, a two-tier feature selection strategy, comprising MRMD and SHAP analysis, is developed to identify and select optimal features. Finally, the novel capsule neural network (CapsNet) is trained using the selected optimal feature set. The proposed pNPs-CapsNet model achieved a remarkable predictive accuracy of 98.10% and an AUC of 0.98. To validate the generalization capability of the pNPs-CapsNet model, independent samples reported an accuracy of 95.21% and an AUC of 0.96. The pNPs-CapsNet model outperforms existing state-of-the-art models, demonstrating 4% and 2.5% improved predictive accuracy for training and independent data sets, respectively. The demonstrated efficacy and consistency of pNPs-CapsNet underline its potential as a valuable and robust tool for advancing drug discovery and academic research.
Collapse
Affiliation(s)
- Shahid Akbar
- Institute
of Fundamental and Frontier Sciences, University
of Electronic Science and Technology of China, Chengdu 610054, China
- Department
of Computer Science, Abdul Wali Khan University
Mardan, Mardan 23200, Khyber Pakhtunkhwa, Pakistan
| | - Ali Raza
- Department
of Computer Science, Bahria University, Islamabad 44220, Pakistan
| | - Hamid Hussain Awan
- Department
of Computer Science, Rawalpindi Women University, Rawalpindi 46300, Punjab, Pakistan
| | - Quan Zou
- Institute
of Fundamental and Frontier Sciences, University
of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze
Delta Region Institute (Quzhou), University
of Electronic Science and Technology of China, Quzhou 324000, PR China
| | - Wajdi Alghamdi
- Department
of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Aamir Saeed
- Department
of Computer Science and IT, University of
Engineering and Technology, Jalozai Campus, Peshawar 25000, Pakistan
| |
Collapse
|
6
|
Asim MN, Asif T, Mehmood F, Dengel A. Peptide classification landscape: An in-depth systematic literature review on peptide types, databases, datasets, predictors architectures and performance. Comput Biol Med 2025; 188:109821. [PMID: 39987697 DOI: 10.1016/j.compbiomed.2025.109821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 02/03/2025] [Accepted: 02/05/2025] [Indexed: 02/25/2025]
Abstract
Peptides are gaining significant attention in diverse fields such as the pharmaceutical market has seen a steady rise in peptide-based therapeutics over the past six decades. Peptides have been utilized in the development of distinct applications including inhibitors of SARS-COV-2 and treatments for conditions like cancer and diabetes. Distinct types of peptides possess unique characteristics, and development of peptide-specific applications require the discrimination of one peptide type from others. To the best of our knowledge, approximately 230 Artificial Intelligence (AI) driven applications have been developed for 22 distinct types of peptides, yet there remains significant room for development of new predictors. A Comprehensive review addresses the critical gap by providing a consolidated platform for the development of AI-driven peptide classification applications. This paper offers several key contributions, including presenting the biological foundations of 22 unique peptide types and categorizes them into four main classes: Regulatory, Therapeutic, Nutritional, and Delivery Peptides. It offers an in-depth overview of 47 databases that have been used to develop peptide classification benchmark datasets. It summarizes details of 288 benchmark datasets that are used in development of diverse types AI-driven peptide classification applications. It provides a detailed summary of 197 sequence representation learning methods and 94 classifiers that have been used to develop 230 distinct AI-driven peptide classification applications. Across 22 distinct types peptide classification tasks related to 288 benchmark datasets, it demonstrates performance values of 230 AI-driven peptide classification applications. It summarizes experimental settings and various evaluation measures that have been employed to assess the performance of AI-driven peptide classification applications. The primary focus of this manuscript is to consolidate scattered information into a single comprehensive platform. This resource will greatly assist researchers who are interested in developing new AI-driven peptide classification applications.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany; Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany.
| | - Tayyaba Asif
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Faiza Mehmood
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; Institute of Data Sciences, University of Engineering and Technology, Lahore, Pakistan
| | - Andreas Dengel
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany; Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
| |
Collapse
|
7
|
Yue Y, Fan H, Zhao J, Xia J. Protein language model-based prediction for plant miRNA encoded peptides. PeerJ Comput Sci 2025; 11:e2733. [PMID: 40134870 PMCID: PMC11935769 DOI: 10.7717/peerj-cs.2733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Accepted: 02/05/2025] [Indexed: 03/27/2025]
Abstract
Plant miRNA encoded peptides (miPEPs), which are short peptides derived from small open reading frames within primary miRNAs, play a crucial role in regulating diverse plant traits. Plant miPEPs identification is challenging due to limitations in the available number of known miPEPs for training. Existing prediction methods rely on manually encoded features, including miPEPPred-FRL, to infer plant miPEPs. Recent advances in deep learning modeling of protein sequences provide an opportunity to improve the representation of key features, leveraging large datasets of protein sequences. In this study, we propose an accurate prediction model, called pLM4PEP, which integrates ESM2 peptide embedding with machine learning methods. Our model not only demonstrates precise identification capabilities for plant miPEPs, but also achieves remarkable results across diverse datasets that include other bioactive peptides. The source codes, datasets of pLM4PEP are available at https://github.com/xialab-ahu/pLM4PEP.
Collapse
Affiliation(s)
- Yishan Yue
- College of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang, China
| | - Henghui Fan
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
| | - Jianping Zhao
- College of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang, China
| | - Junfeng Xia
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
| |
Collapse
|
8
|
Zhenghui L, Wenxing H, Yan W, Jihong Z, Xiaojun X, Lixin G, Mengshan L. Ensemble learning based on bi-directional gated recurrent unit and convolutional neural network with word embedding module for bioactive peptide prediction. Food Chem 2025; 468:142464. [PMID: 39675273 DOI: 10.1016/j.foodchem.2024.142464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Revised: 11/12/2024] [Accepted: 12/11/2024] [Indexed: 12/17/2024]
Abstract
Bioactive peptides, as small protein fragments, are essential mediators of diverse physiological activities, such as antimicrobial, anti-inflammatory, anticancer, antioxidant, and immunomodulatory functions. Despite their substantial potential in pharmaceuticals and the food industry, conventional methods for peptide classification and activity prediction are limited by high costs, time-intensive procedures, and extensive data processing requirements. Here, we present BioPepPred-DLEmb, a novel computational model integrating Convolutional Neural Networks (CNNs) and Bidirectional Gated Recurrent Units (BiGRUs), augmented with natural language processing to encode amino acids into information-dense vectors. Evaluated across nine bioactive peptide datasets, BioPepPred-DLEmb demonstrates superior predictive accuracy (0.909) and sensitivity (0.911) compared to traditional methods. Through UMAP visualization and Kplogo analysis, the model effectively differentiates peptide activity states and identifies key biomarkers. The predicted antimicrobial peptides (Pred-AMPs) exhibit potent efficacy in vitro, achieving low micromolar inhibitory concentrations (2-16 μmol/L) against pathogens such as Escherichia coli and Acinetobacter baumannii. These findings establish a robust foundation for bioactive peptide development, with implications for advancements in precision medicine, personalized therapies, and functional food innovations.
Collapse
Affiliation(s)
- Lai Zhenghui
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Hu Wenxing
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Wu Yan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Zhu Jihong
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Xie Xiaojun
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Guan Lixin
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China
| | - Li Mengshan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou 341000, Jiangxi, China.
| |
Collapse
|
9
|
Rahmani R, Kalankesh LR, Ferdousi R. Computational approaches for identifying neuropeptides: A comprehensive review. MOLECULAR THERAPY. NUCLEIC ACIDS 2025; 36:102409. [PMID: 40171446 PMCID: PMC11960512 DOI: 10.1016/j.omtn.2024.102409] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 04/03/2025]
Abstract
Neuropeptides (NPs) are key signaling molecules that interact with G protein-coupled receptors, influencing neuronal activities and developmental pathways, as well as the endocrine and immune systems. They are significant in disease contexts, offering potential therapeutic targets for conditions such as anxiety, neurological disorders, cardiovascular health, and diabetes. Understanding and detecting NPs is crucial because of their complex functions in health and disease. Historically, identifying NPs via wet lab techniques has been time-consuming and costly. However, integrating computational methods has shown the potential to improve efficiency, accuracy, and cost-effectiveness. Computational techniques, such as artificial intelligence and machine learning, have been extensively researched in recent years for the identification of NP. This review explores the application of machine learning (ML) techniques in predicting various aspects of NPs, including their sequences, cleavage sites, and precursors. Additionally, it provides insights into databases containing NP metadata and specialized tools used in this domain.
Collapse
Affiliation(s)
- Roya Rahmani
- Student Research Committee, Tabriz University of Medical Science, Tabriz, Iran
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| | - Leila R. Kalankesh
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
- Tabriz University of Medical Sciences, Research Center of Psychiatry and Behavioral Sciences Tabriz, East Azerbaijan, Iran
| | - Reza Ferdousi
- Department of Health Information Technology, School of Management and Medical Informatics, Tabriz University of Medical Sciences, Tabriz, Iran
| |
Collapse
|
10
|
Liang Y, Ma X, Li J, Zhang S. iACVP-MR: Accurate Identification of Anti-coronavirus Peptide based on Multiple Features Information and Recurrent Neural Network. Curr Med Chem 2025; 32:2055-2067. [PMID: 38549527 DOI: 10.2174/0109298673277663240101111507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/26/2023] [Accepted: 11/30/2023] [Indexed: 05/14/2024]
Abstract
BACKGROUND Over the years, viruses have caused human illness and threatened human health. Therefore, it is pressing to develop anti-coronavirus infection drugs with clear function, low cost, and high safety. Anti-coronavirus peptide (ACVP) is a key therapeutic agent against coronavirus. Traditional methods for finding ACVP need a great deal of money and man power. Hence, it is a significant task to establish intelligent computational tools to able rapid, efficient and accurate identification of ACVP. METHODS In this paper, we construct an excellent model named iACVP-MR to identify ACVP based on multiple features and recurrent neural networks. Multiple features are extracted by using reduced amino acid component and dipeptide component, compositions of k-spaced amino acid pairs, BLOSUM62 encoder according to the N5C5 sequence, as well as second-order moving average approach based on 16 physicochemical properties. Then, two recurrent neural networks named long-short term memory (LSTM) and bidirectional gated recurrent unit (BiGRU) combined attention mechanism are used for feature fusion and classification, respectively. RESULTS The accuracies of ENNAVIA-C and ENNAVIA-D datasets under the 10-fold cross-validation are 99.15% and 98.92%, respectively, and other evaluation indexes have also obtained satisfactory results. The experimental results show that our model is superior to other existing models. CONCLUSION The iACVP-MR model can be viewed as a powerful and intelligent tool for the accurate identification of ACVP. The datasets and source codes for iACVP-MR are freely downloaded at https://github.com/yunyunliang88/iACVP-MR.
Collapse
Affiliation(s)
- Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, P.R. China
| | - Xinyan Ma
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, P.R. China
| | - Jin Li
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, P.R. China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, P.R. China
| |
Collapse
|
11
|
Saraswat A, Sharma U, Gandotra A, Wasan L, Artham S, Maitra A, Singh B. Pred-AHCP: Robust Feature Selection-Enabled Sequence-Specific Prediction of Anti-Hepatitis C Peptides via Machine Learning. J Chem Inf Model 2024; 64:9111-9124. [PMID: 39505690 DOI: 10.1021/acs.jcim.4c00900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2024]
Abstract
Every year, an estimated 1.5 million people worldwide contract Hepatitis C, a significant contributor to liver problems. Although many studies have explored machine learning's potential to predict antiviral peptides, very few have addressed the problem of predicting peptides against specific viruses such as Hepatitis C. In this study, we demonstrate the application and fine-tuning of machine learning (ML) algorithms to predict peptides that are effective against Hepatitis C virus (HCV). We developed a fine-tuned and explainable ML model that harnesses the amino acid sequence of a peptide to predict its anti-hepatitis C potential. Specifically, features were computed based on sequence and physicochemical properties. The feature selection was performed using a combined strategy of mutual information and variance inflation factor. This facilitated the removal of redundant and multicollinear features, enhancing the model's generalizability in predicting anti-hepatitis C peptides (AHCPs). The model using the random forest algorithm produced the best performance with an accuracy of about 92%. The feature analysis highlights that the distributions of hydrophobicity, polarizability, coil-forming residues, frequency of glycine residues and the existence of dipeptide motifs VL, LV, and CC emerged as the key predictors for identifying AHCPs targeting different components of HCV. The developed model can be accessed through the Pred-AHCP web server, provided at http://tinyurl.com/web-Pred-AHCP. This resource facilitates the prediction and re-engineering of AHCPs for designing peptide-based therapeutics while also proposing an exploration of similar strategies for designing peptide inhibitors effective against other viruses. The developed ML model can also be used for validating peptide sequences generated using generative artificial intelligence methods for further optimization.
Collapse
Affiliation(s)
- Akash Saraswat
- Department of Applied Sciences, School of Engineering and Technology, BML Munjal University, Gurugram, Haryana 122413, India
| | - Utsav Sharma
- Department of Computer Science and Engineering, School of Engineering and Technology, BML Munjal University, Gurugram, Haryana 122413, India
| | - Aryan Gandotra
- Department of Computer Science and Engineering, School of Engineering and Technology, BML Munjal University, Gurugram, Haryana 122413, India
| | - Lakshit Wasan
- Department of Computer Science and Engineering, School of Engineering and Technology, BML Munjal University, Gurugram, Haryana 122413, India
| | - Sainithin Artham
- Department of Computer Science and Engineering, School of Engineering and Technology, BML Munjal University, Gurugram, Haryana 122413, India
| | - Arijit Maitra
- Department of Applied Sciences, School of Engineering and Technology, BML Munjal University, Gurugram, Haryana 122413, India
| | - Bipin Singh
- Centre for Life Sciences, Mahindra University, Hyderabad, Telangana 500043, India
| |
Collapse
|
12
|
Liang Y, Cao M, Zhang S. NeuroPred-ResSE: Predicting neuropeptides by integrating residual block and squeeze-excitation attention mechanism. Anal Biochem 2024; 695:115648. [PMID: 39154878 DOI: 10.1016/j.ab.2024.115648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 07/31/2024] [Accepted: 08/15/2024] [Indexed: 08/20/2024]
Abstract
Neuropeptides play crucial roles in regulating neurological function acting as signaling molecules, which provide new opportunity for developing drugs for the treatment of neurological diseases. Therefore, it is very necessary to develop a rapid and accurate prediction model for neuropeptides. Although a few prediction tools have been developed, there is room for improvement in prediction accuracy by using deep learning approach. In this paper, we establish the NeuroPred-ResSE model based on residual block and squeeze-excitation attention mechanism. Firstly, we extract multi-features by using one-hot coding based on the NT5CT5 sequence, dipeptide deviation from expected mean and natural vector. Then, we integrate residual block and squeeze-excitation attention mechanism, which can capture and identify the most relevant attribute features. Finally, the accuracies of the training set and test set are 97.16 % and 96.60 % based on the 5-fold cross-validation and independent test, respectively, and other evaluation metrics have also obtained satisfactory results. The experimental results show that the performance of the NeuroPred-ResSE model outperforms those of existing state-of-the-art models, and our model is an effective, intelligent and robust prediction tool. The datasets and source codes are available at https://github.com/yunyunliang88/NeuroPred-ResSE.
Collapse
Affiliation(s)
- Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, PR China.
| | - Mengyi Cao
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, PR China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| |
Collapse
|
13
|
Wen J, Ding Z, Wei Z, Xia H, Zhang Y, Zhu X. NeuroPpred-SHE: An interpretable neuropeptides prediction model based on selected features from hand-crafted features and embeddings of T5 model. Comput Biol Med 2024; 181:109048. [PMID: 39182368 DOI: 10.1016/j.compbiomed.2024.109048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Revised: 08/13/2024] [Accepted: 08/18/2024] [Indexed: 08/27/2024]
Abstract
Neuropeptides are the most ubiquitous neurotransmitters in the immune system, regulating various biological processes. Neuropeptides play a significant role for the discovery of new drugs and targets for nervous system disorders. Traditional experimental methods for identifying neuropeptides are time-consuming and costly. Although several computational methods have been developed to predict the neuropeptides, the accuracy is still not satisfactory due to the representability of the extracted features. In this work, we propose an efficient and interpretable model, NeuroPpred-SHE, for predicting neuropeptides by selecting the optimal feature subset from both hand-crafted features and embeddings of a protein language model. Specially, we first employed a pre-trained T5 protein language model to extract embedding features and twelve other encoding methods to extract hand-crafted features from peptide sequences, respectively. Secondly, we fused both embedding features and hand-crafted features to enhance the feature representability. Thirdly, we utilized random forest (RF), Max-Relevance and Min-Redundancy (mRMR) and eXtreme Gradient Boosting (XGBoost) methods to select the optimal feature subset from the fused features. Finally, we employed five machine learning methods (GBDT, XGBoost, SVM, MLP, and LightGBM) to build the models. Our results show that the model based on GBDT achieves the best performance. Furthermore, our final model was compared with other state-of-the-art methods on an independent test set, the results indicate that our model achieves an AUROC of 97.8 % which is higher than all the other state-of-the-art predictors. Our model is available at: https://github.com/wenjean/NeuroPpred-SHE.
Collapse
Affiliation(s)
- Jian Wen
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
| | - Zhijie Ding
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
| | - Zhuoyu Wei
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
| | - Hongwei Xia
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China
| | - Yong Zhang
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China.
| | - Xiaolei Zhu
- School of Information and Artificial Intelligence, Anhui Agricultural University, Hefei, 230036, China.
| |
Collapse
|
14
|
Fernández-Díaz R, Cossio-Pérez R, Agoni C, Lam HT, Lopez V, Shields DC. AutoPeptideML: a study on how to build more trustworthy peptide bioactivity predictors. BIOINFORMATICS (OXFORD, ENGLAND) 2024; 40:btae555. [PMID: 39292535 PMCID: PMC11438549 DOI: 10.1093/bioinformatics/btae555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/24/2024] [Revised: 08/08/2024] [Accepted: 09/17/2024] [Indexed: 09/20/2024]
Abstract
MOTIVATION Automated machine learning (AutoML) solutions can bridge the gap between new computational advances and their real-world applications by enabling experimental scientists to build their own custom models. We examine different steps in the development life-cycle of peptide bioactivity binary predictors and identify key steps where automation cannot only result in a more accessible method, but also more robust and interpretable evaluation leading to more trustworthy models. RESULTS We present a new automated method for drawing negative peptides that achieves better balance between specificity and generalization than current alternatives. We study the effect of homology-based partitioning for generating the training and testing data subsets and demonstrate that model performance is overestimated when no such homology correction is used, which indicates that prior studies may have overestimated their performance when applied to new peptide sequences. We also conduct a systematic analysis of different protein language models as peptide representation methods and find that they can serve as better descriptors than a naive alternative, but that there is no significant difference across models with different sizes or algorithms. Finally, we demonstrate that an ensemble of optimized traditional machine learning algorithms can compete with more complex neural network models, while being more computationally efficient. We integrate these findings into AutoPeptideML, an easy-to-use AutoML tool to allow researchers without a computational background to build new predictive models for peptide bioactivity in a matter of minutes. AVAILABILITY AND IMPLEMENTATION Source code, documentation, and data are available at https://github.com/IBM/AutoPeptideML and a dedicated web-server at http://peptide.ucd.ie/AutoPeptideML. A static version of the software to ensure the reproduction of the results is available at https://zenodo.org/records/13363975.
Collapse
Affiliation(s)
- Raúl Fernández-Díaz
- IBM Research, Dublin, Dublin D15 HN66, Ireland
- School of Medicine, University College Dublin, Dublin D04 C1P1, Ireland
- Conway Institute of Biomolecular and Biomedical Science, University College Dublin, Dublin D04 C1P, Ireland
- The SFI Centre for Research Training in Genomics Data Science, Ireland
| | - Rodrigo Cossio-Pérez
- School of Medicine, University College Dublin, Dublin D04 C1P1, Ireland
- Conway Institute of Biomolecular and Biomedical Science, University College Dublin, Dublin D04 C1P, Ireland
- Department of Science and Technology, National University of Quilmes, Bernal B1876, Provincia de Buenos Aires, Argentina
| | - Clement Agoni
- School of Medicine, University College Dublin, Dublin D04 C1P1, Ireland
- Conway Institute of Biomolecular and Biomedical Science, University College Dublin, Dublin D04 C1P, Ireland
- Discipline of Pharmaceutical Sciences, School of Health Sciences, University of KwaZulu-Natal, Durban 4000, South Africa
| | | | | | - Denis C Shields
- School of Medicine, University College Dublin, Dublin D04 C1P1, Ireland
- Conway Institute of Biomolecular and Biomedical Science, University College Dublin, Dublin D04 C1P, Ireland
| |
Collapse
|
15
|
Li H, Jiang L, Yang K, Shang S, Li M, Lv Z. iNP_ESM: Neuropeptide Identification Based on Evolutionary Scale Modeling and Unified Representation Embedding Features. Int J Mol Sci 2024; 25:7049. [PMID: 39000158 PMCID: PMC11240975 DOI: 10.3390/ijms25137049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Revised: 06/17/2024] [Accepted: 06/25/2024] [Indexed: 07/16/2024] Open
Abstract
Neuropeptides are biomolecules with crucial physiological functions. Accurate identification of neuropeptides is essential for understanding nervous system regulatory mechanisms. However, traditional analysis methods are expensive and laborious, and the development of effective machine learning models continues to be a subject of current research. Hence, in this research, we constructed an SVM-based machine learning neuropeptide predictor, iNP_ESM, by integrating protein language models Evolutionary Scale Modeling (ESM) and Unified Representation (UniRep) for the first time. Our model utilized feature fusion and feature selection strategies to improve prediction accuracy during optimization. In addition, we validated the effectiveness of the optimization strategy with UMAP (Uniform Manifold Approximation and Projection) visualization. iNP_ESM outperforms existing models on a variety of machine learning evaluation metrics, with an accuracy of up to 0.937 in cross-validation and 0.928 in independent testing, demonstrating optimal neuropeptide recognition capabilities. We anticipate improved neuropeptide data in the future, and we believe that the iNP_ESM model will have broader applications in the research and clinical treatment of neurological diseases.
Collapse
Affiliation(s)
- Honghao Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Liangzhen Jiang
- College of Food and Biological Engineering, Chengdu University, Chengdu 610106, China
- Country Key Laboratory of Coarse Cereal Processing, Ministry of Agriculture and Rural Affairs, Chengdu 610106, China
| | - Kaixiang Yang
- College of Software Engineering, Sichuan University, Chengdu 610041, China
| | - Shulin Shang
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| |
Collapse
|
16
|
Singh V, Singh SK, Sharma R. A novel framework based on explainable AI and genetic algorithms for designing neurological medicines. Sci Rep 2024; 14:12807. [PMID: 38834718 DOI: 10.1038/s41598-024-63561-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 05/30/2024] [Indexed: 06/06/2024] Open
Abstract
The advent of the fourth industrial revolution, characterized by artificial intelligence (AI) as its central component, has resulted in the mechanization of numerous previously labor-intensive activities. The use of in silico tools has become prevalent in the design of biopharmaceuticals. Upon conducting a comprehensive analysis of the genomes of many organisms, it has been discovered that their tissues can generate specific peptides that confer protection against certain diseases. This study aims to identify a selected group of neuropeptides (NPs) possessing favorable characteristics that render them ideal for production as neurological biopharmaceuticals. Until now, the construction of NP classifiers has been the primary focus, neglecting to optimize these characteristics. Therefore, in this study, the task of creating ideal NPs has been formulated as a multi-objective optimization problem. The proposed framework, NPpred, comprises two distinct components: NSGA-NeuroPred and BERT-NeuroPred. The former employs the NSGA-II algorithm to explore and change a population of NPs, while the latter is an interpretable deep learning-based model. The utilization of explainable AI and motifs has led to the proposal of two novel operators, namely p-crossover and p-mutation. An online application has been deployed at https://neuropred.anvil.app for designing an ideal collection of synthesizable NPs from protein sequences.
Collapse
Affiliation(s)
- Vishakha Singh
- Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, 221005, Uttar Pradesh, India.
| | - Sanjay Kumar Singh
- Department of Computer Science and Engineering, Indian Institute of Technology (BHU), Varanasi, 221005, Uttar Pradesh, India.
| | - Ritesh Sharma
- Department of ICT, Manipal Institute of Technology, Manipal, 576104, Uttar Pradesh, India
| |
Collapse
|
17
|
Liao YH, Chen SZ, Bin YN, Zhao JP, Feng XL, Zheng CH. UsIL-6: An unbalanced learning strategy for identifying IL-6 inducing peptides by undersampling technique. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 250:108176. [PMID: 38677081 DOI: 10.1016/j.cmpb.2024.108176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 03/26/2024] [Accepted: 04/11/2024] [Indexed: 04/29/2024]
Abstract
BACKGROUND AND OBJECTIVE Interleukin-6 (IL-6) is the critical factor of early warning, monitoring, and prognosis in the inflammatory storm of COVID-19 cases. IL-6 inducing peptides, which can induce cytokine IL-6 production, are very important for the development of diagnosis and immunotherapy. Although the existing methods have some success in predicting IL-6 inducing peptides, there is still room for improvement in the performance of these models in practical application. METHODS In this study, we proposed UsIL-6, a high-performance bioinformatics tool for identifying IL-6 inducing peptides. First, we extracted five groups of physicochemical properties and sequence structural information from IL-6 inducing peptide sequences, and obtained a 636-dimensional feature vector, we also employed NearMiss3 undersampling method and normalization method StandardScaler to process the data. Then, a 40-dimensional optimal feature vector was obtained by Boruta feature selection method. Finally, we combined this feature vector with extreme randomization tree classifier to build the final model UsIL-6. RESULTS The AUC value of UsIL-6 on the independent test dataset was 0.87, and the BACC value was 0.808, which indicated that UsIL-6 had better performance than the existing methods in IL-6 inducing peptide recognition. CONCLUSIONS The performance comparison on independent test dataset confirmed that UsIL-6 could achieve the highest performance, best robustness, and most excellent generalization ability. We hope that UsIL-6 will become a valuable method to identify, annotate and characterize new IL-6 inducing peptides.
Collapse
Affiliation(s)
- Yan-Hong Liao
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China
| | - Shou-Zhi Chen
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China
| | - Yan-Nan Bin
- School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| | - Jian-Ping Zhao
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China.
| | - Xin-Long Feng
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China.
| | - Chun-Hou Zheng
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China; School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| |
Collapse
|
18
|
Li H, Meng J, Wang Z, Tang Y, Xia S, Wang Y, Qin Z, Luan Y. miPEPPred-FRL: A Novel Method for Predicting Plant MiRNA-Encoded Peptides Using Adaptive Feature Representation Learning. J Chem Inf Model 2024; 64:2889-2900. [PMID: 37733290 DOI: 10.1021/acs.jcim.3c01020] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
MicroRNAs (miRNAs) are an essential type of small molecule RNAs that play significant regulatory roles in organisms. Recent studies have demonstrated that small open reading frames (sORFs) harbored in primary miRNAs (pri-miRNAs) can encode small peptides, known as miPEPs. Plant miPEPs can increase the abundance and activity of cognate miRNAs by promoting the transcription of their corresponding pri-miRNAs, thereby modulating plant traits. Biological experiments are the most effective way to accurately identify miPEPs; however, they are time-consuming and expensive. Hence, an efficient computational method for the identification of miPEPs on a large scale is highly desirable. Up to now, there have been no specialized computational tools for identifying miPEPs. In this work, a novel predictor named miPEPPred-FRL based on an adaptive feature representation learning framework that consists of the feature transformation module and the cascade architecture has been proposed. The feature transformation module integrating a newly designed feature selection method and classifier selection rule is developed to convert sequence-based features into primary class and probabilistic features, which are then fed into the improved cascade architecture to obtain more stable and discriminative augmented features. Finally, the augmented features are utilized to construct the final predictor. Cross-validation experiments illustrate that the novel feature selection method and classifier selection rule contribute to boosting the feature representation ability of the framework. Furthermore, the high accuracy of miPEPPred-FRL on independent testing data suggests that it is a trustworthy and valuable tool for the identification of miPEPs.
Collapse
Affiliation(s)
- Haibin Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Zhaowei Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Youwei Tang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Shihao Xia
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Yu Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Zhaojing Qin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, Liaoning 116024, China
| |
Collapse
|
19
|
Wang M, Wang L, Xu W, Chu Z, Wang H, Lu J, Xue Z, Wang Y. NeuroPep 2.0: An Updated Database Dedicated to Neuropeptide and Its Receptor Annotations. J Mol Biol 2024; 436:168416. [PMID: 38143020 DOI: 10.1016/j.jmb.2023.168416] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 12/19/2023] [Indexed: 12/26/2023]
Abstract
Neuropeptides not only work through nervous system but some of them also work peripherally to regulate numerous physiological processes. They are important in regulation of numerous physiological processes including growth, reproduction, social behavior, inflammation, fluid homeostasis, cardiovascular function, and energy homeostasis. The various roles of neuropeptides make them promising candidates for prospective therapeutics of different diseases. Currently, NeuroPep has been updated to version 2.0, it now holds 11,417 unique neuropeptide entries, which is nearly double of the first version of NeuroPep. When available, we collected information about the receptor for each neuropeptide entry and predicted the 3D structures of those neuropeptides without known experimental structure using AlphaFold2 or APPTEST according to the peptide sequence length. In addition, DeepNeuropePred and NeuroPred-PLM, two neuropeptide prediction tools developed by us recently, were also integrated into NeuroPep 2.0 to help to facilitate the identification of new neuropeptides. NeuroPep 2.0 is freely accessible at https://isyslab.info/NeuroPepV2/.
Collapse
Affiliation(s)
- Mingxia Wang
- Institute of Medical Artificial Intelligence, Binzhou Medical University, Yantai, Shandong 264003, China
| | - Lei Wang
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Wei Xu
- Institute of Medical Artificial Intelligence, Binzhou Medical University, Yantai, Shandong 264003, China
| | - Ziqiang Chu
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Hengzhi Wang
- School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Jingxiang Lu
- Institute of Medical Artificial Intelligence, Binzhou Medical University, Yantai, Shandong 264003, China
| | - Zhidong Xue
- Institute of Medical Artificial Intelligence, Binzhou Medical University, Yantai, Shandong 264003, China; School of Software Engineering, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
| | - Yan Wang
- Institute of Medical Artificial Intelligence, Binzhou Medical University, Yantai, Shandong 264003, China; School of Life Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China.
| |
Collapse
|
20
|
Du Z, Ding X, Hsu W, Munir A, Xu Y, Li Y. pLM4ACE: A protein language model based predictor for antihypertensive peptide screening. Food Chem 2024; 431:137162. [PMID: 37604011 DOI: 10.1016/j.foodchem.2023.137162] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Revised: 08/09/2023] [Accepted: 08/13/2023] [Indexed: 08/23/2023]
Abstract
Angiotensin-I converting enzyme (ACE) regulates the renin-angiotensin system and is a drug target in clinical treatment for hypertension. This study aims to develop a protein language model (pLM) with evolutionary scale modeling (ESM-2) embeddings that is trained on experimental data to screen peptides with strong ACE inhibitory activity. Twelve conventional peptide embedding approaches and five machine learning (ML) modeling methods were also tested for performance comparison. Among the 65 classifiers tested, logistic regression with ESM-2 embeddings showed the best performance, with balanced accuracy (BACC), Matthews correlation coefficient (MCC), and area under the curve of 0.883 ± 0.017, 0.77 ± 0.032, and 0.96 ± 0.009, respectively. Multilayer perceptron and support vector machine also exhibited great compatibility with ESM-2 embeddings. The ESM-2 embeddings showed superior performance in enhancing the prediction model compared to the 12 traditional embedding methods. A user-friendly webserver (https://sqzujiduce.us-east-1.awsapprunner.com) with the top three models is now freely available.
Collapse
Affiliation(s)
- Zhenjiao Du
- Department of Grain Science and Industry, Kansas State University, Manhattan, KS 66506, USA
| | - Xingjian Ding
- Department of Computer Science, Kansas State University, Manhattan, KS 66506, USA
| | - William Hsu
- Department of Computer Science, Kansas State University, Manhattan, KS 66506, USA
| | - Arslan Munir
- Department of Computer Science, Kansas State University, Manhattan, KS 66506, USA
| | - Yixiang Xu
- Healthy Processed Foods Research Unit, Western Regional Research Center, USDA-ARS, 800 Buchanan Street, Albany, CA 94710, USA
| | - Yonghui Li
- Department of Grain Science and Industry, Kansas State University, Manhattan, KS 66506, USA.
| |
Collapse
|
21
|
Chen S, Liao Y, Zhao J, Bin Y, Zheng C. PACVP: Prediction of Anti-Coronavirus Peptides Using a Stacking Learning Strategy With Effective Feature Representation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3106-3116. [PMID: 37022025 DOI: 10.1109/tcbb.2023.3238370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Due to the global outbreak of COVID-19 and its variants, antiviral peptides with anti-coronavirus activity (ACVPs) represent a promising new drug candidate for the treatment of coronavirus infection. At present, several computational tools have been developed to identify ACVPs, but the overall prediction performance is still not enough to meet the actual therapeutic application. In this study, we constructed an efficient and reliable prediction model PACVP (Prediction of Anti-CoronaVirus Peptides) for identifying ACVPs based on effective feature representation and a two-layer stacking learning framework. In the first layer, we use nine feature encoding methods with different feature representation angles to characterize the rich sequence information and fuse them into a feature matrix. Secondly, data normalization and unbalanced data processing are carried out. Next, 12 baseline models are constructed by combining three feature selection methods and four machine learning classification algorithms. In the second layer, we input the optimal probability features into the logistic regression algorithm (LR) to train the final model PACVP. The experiments show that PACVP achieves favorable prediction performance on independent test dataset, with ACC of 0.9208 and AUC of 0.9465. We hope that PACVP will become a useful method for identifying, annotating and characterizing novel ACVPs.
Collapse
|
22
|
Liu X, Jin H, Xu G, Lai R, Wang A. Bioactive Peptides from Barnacles and Their Potential for Antifouling Development. Mar Drugs 2023; 21:480. [PMID: 37755093 PMCID: PMC10532818 DOI: 10.3390/md21090480] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Revised: 08/26/2023] [Accepted: 08/29/2023] [Indexed: 09/28/2023] Open
Abstract
Barnacles, a prevalent fouler organism in intertidal zones, has long been a source of annoyance due to significant economic losses and ecological impacts. Numerous antifouling approaches have been explored, including extensive research on antifouling chemicals. However, the excessive utilization of small-molecule chemicals appears to give rise to novel environmental concerns. Therefore, it is imperative to develop new strategies. Barnacles exhibit appropriate responses to environmental challenges with complex physiological processes and unique sensory systems. Given the assumed crucial role of bioactive peptides, an increasing number of peptides with diverse activities are being discovered in barnacles. Fouling-related processes have been identified as potential targets for antifouling strategies. In this paper, we present a comprehensive review of peptides derived from barnacles, aiming to underscore their significant potential in the quest for innovative solutions in biofouling prevention and drug discovery.
Collapse
Affiliation(s)
- Xuan Liu
- Center for Evolution and Conservation Biology, Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou 511458, China; (X.L.); (H.J.); (G.X.); (R.L.)
| | - Hui Jin
- Center for Evolution and Conservation Biology, Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou 511458, China; (X.L.); (H.J.); (G.X.); (R.L.)
| | - Gaochi Xu
- Center for Evolution and Conservation Biology, Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou 511458, China; (X.L.); (H.J.); (G.X.); (R.L.)
| | - Ren Lai
- Center for Evolution and Conservation Biology, Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou 511458, China; (X.L.); (H.J.); (G.X.); (R.L.)
- Key Laboratory of Bioactive Peptides of Yunnan Province, KIZ-CUHK Joint Laboratory of Bioresources and Molecular Research in Common Diseases, National Resource Center for Non-Human Primates, Kunming Primate Research Center, National Research Facility for Phenotypic & Genetic Analysis of Model Animals (Primate Facility), Sino-African Joint Research Center and Engineering Laboratory of Peptides, Kunming Institute of Zoology, Kunming 650107, China
- University of Chinese Academy of Sciences, Beijing 100049, China
| | - Aili Wang
- Center for Evolution and Conservation Biology, Southern Marine Science and Engineering Guangdong Laboratory (Guangzhou), Guangzhou 511458, China; (X.L.); (H.J.); (G.X.); (R.L.)
| |
Collapse
|
23
|
Liu D, Lin Z, Jia C. NeuroCNN_GNB: an ensemble model to predict neuropeptides based on a convolution neural network and Gaussian naive Bayes. Front Genet 2023; 14:1226905. [PMID: 37576553 PMCID: PMC10414792 DOI: 10.3389/fgene.2023.1226905] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 06/30/2023] [Indexed: 08/15/2023] Open
Abstract
Neuropeptides contain more chemical information than other classical neurotransmitters and have multiple receptor recognition sites. These characteristics allow neuropeptides to have a correspondingly higher selectivity for nerve receptors and fewer side effects. Traditional experimental methods, such as mass spectrometry and liquid chromatography technology, still need the support of a complete neuropeptide precursor database and the basic characteristics of neuropeptides. Incomplete neuropeptide precursor and information databases will lead to false-positives or reduce the sensitivity of recognition. In recent years, studies have proven that machine learning methods can rapidly and effectively predict neuropeptides. In this work, we have made a systematic attempt to create an ensemble tool based on four convolution neural network models. These baseline models were separately trained on one-hot encoding, AAIndex, G-gap dipeptide encoding and word2vec and integrated using Gaussian Naive Bayes (NB) to construct our predictor designated NeuroCNN_GNB. Both 5-fold cross-validation tests using benchmark datasets and independent tests showed that NeuroCNN_GNB outperformed other state-of-the-art methods. Furthermore, this novel framework provides essential interpretations that aid the understanding of model success by leveraging the powerful Shapley Additive exPlanation (SHAP) algorithm, thereby highlighting the most important features relevant for predicting neuropeptides.
Collapse
Affiliation(s)
- Di Liu
- Information Science and Technology College, Dalian Maritime University, Dalian, China
| | - Zhengkui Lin
- Information Science and Technology College, Dalian Maritime University, Dalian, China
| | - Cangzhi Jia
- School of Science, Dalian Maritime University, Dalian, China
| |
Collapse
|
24
|
Liu Y, Wang S, Li X, Liu Y, Zhu X. NeuroPpred-SVM: A New Model for Predicting Neuropeptides Based on Embeddings of BERT. J Proteome Res 2023; 22:718-728. [PMID: 36749151 DOI: 10.1021/acs.jproteome.2c00363] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
Abstract
Neuropeptides play pivotal roles in different physiological processes and are related to different kinds of diseases. Identification of neuropeptides is of great benefit for studying the mechanism of these physiological processes and the treatment of neurological disorders. Several state-of-the-art neuropeptide predictors have been developed by using a two-layer stacking ensemble algorithm. Although the two-layer stacking ensemble algorithm can improve the feature representability, these models are complex, which are not as efficient as the models based on one classifier. In this study, we proposed a new model, NeuroPpred-SVM, to predict neuropeptides based on the embeddings of Bidirectional Encoder Representations from Transformers and other sequential features by using a support vector machine (SVM). The experimental results indicate that our model achieved a cross-validation area under the receiver operating characteristic (AUROC) curve of 0.969 on the training data set and an AUROC of 0.966 on the independent test set. By comparing our model with the other four state-of-the-art models including NeuroPIpred, PredNeuroP, NeuroPpred-Fuse, and NeuroPpred-FRL on the independent test set, our model achieved the highest AUROC, Matthews correlation coefficient, accuracy, and specificity, which indicate that our model outperforms the existing models. We believed that NeuroPpred-SVM could be a useful tool for identifying neuropeptides with high accuracy and low cost. The data sets and Python code are available at https://github.com/liuyf-a/NeuroPpred-SVM.
Collapse
Affiliation(s)
- Yufeng Liu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Shuyu Wang
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Xiang Li
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Yinbo Liu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, Anhui 230036, China
| |
Collapse
|
25
|
Vu NQ, Yen HC, Fields L, Cao W, Li L. HyPep: An Open-Source Software for Identification and Discovery of Neuropeptides Using Sequence Homology Search. J Proteome Res 2023; 22:420-431. [PMID: 36696582 PMCID: PMC10160011 DOI: 10.1021/acs.jproteome.2c00597] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Neuropeptides are a class of endogenous peptides that have key regulatory roles in biochemical, physiological, and behavioral processes. Mass spectrometry analyses of neuropeptides often rely on protein informatics tools for database searching and peptide identification. As neuropeptide databases are typically experimentally built and comprised of short sequences with high sequence similarity to each other, we developed a novel database searching tool, HyPep, which utilizes sequence homology searching for peptide identification. HyPep aligns de novo sequenced peptides, generated through PEAKS software, with neuropeptide database sequences and identifies neuropeptides based on the alignment score. HyPep performance was optimized using LC-MS/MS measurements of peptide extracts from various Callinectes sapidus neuronal tissue types and compared with a commercial database searching software, PEAKS DB. HyPep identified more neuropeptides from each tissue type than PEAKS DB at 1% false discovery rate, and the false match rate from both programs was 2%. In addition to identification, this report describes how HyPep can aid in the discovery of novel neuropeptides.
Collapse
Affiliation(s)
- Nhu Q Vu
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Hsu-Ching Yen
- Department of Biochemistry, University of Wisconsin-Madison, 433 Babcock Drive, Madison, Wisconsin 53706, United States
| | - Lauren Fields
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Weifeng Cao
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States
| | - Lingjun Li
- Department of Chemistry, University of Wisconsin-Madison, 1101 University Avenue, Madison, Wisconsin 53706, United States.,School of Pharmacy, University of Wisconsin-Madison, 777 Highland Avenue, Madison, Wisconsin 53705, United States
| |
Collapse
|
26
|
Liu Y, Liu Y, Wang S, Zhu X. LBCE-XGB: A XGBoost Model for Predicting Linear B-Cell Epitopes Based on BERT Embeddings. Interdiscip Sci 2023; 15:293-305. [PMID: 36646842 DOI: 10.1007/s12539-023-00549-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Revised: 12/28/2022] [Accepted: 01/03/2023] [Indexed: 01/18/2023]
Abstract
Accurately detecting linear B-cell epitopes (BCEs) makes great sense in vaccine design, immunodiagnostic test, antibody production, disease prevention and treatment. Wet-lab experiments for determining linear BCEs are both expensive and laborious, which are not able to meet the recognition needs of modern massive protein sequence data. Instead, computational methods can efficiently identify linear BCEs with low cost. Although several computational methods are available, the performance is still not satisfactory. Thus, we propose a new method, LBCE-XGB, to forecast linear BCEs based on XGBoost algorithm. To represent the biological information concealed in peptide sequences, the embeddings of the residues were obtained from a pre-trained domain-specific BERT model. In addition, the other five types of attributes comprising amino acid composition, amino acid antigenicity scale were also extracted. The best feature combination was determined according to the cross-validation results. Against the models developed by other deep learning and machine learning algorithms, LBCE-XGB achieves the top performance with an AUROC of 0.845 for fivefold cross-validation. The results on the independent test set show that our model attains an AUROC of 0.838 which is substantially higher than other state-of-the-art methods. The outcomes indicate that the representations of BERT could be an effective feature in predicting linear BCEs and we believe that LBCE-XGB could be a useful medium for detecting linear B cell epitopes with high accuracy and low cost.
Collapse
Affiliation(s)
- Yufeng Liu
- School of Sciences, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Yinbo Liu
- School of Sciences, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Shuyu Wang
- School of Sciences, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Xiaolei Zhu
- School of Sciences, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| |
Collapse
|
27
|
Chen S, Li Q, Zhao J, Bin Y, Zheng C. NeuroPred-CLQ: incorporating deep temporal convolutional networks and multi-head attention mechanism to predict neuropeptides. Brief Bioinform 2022; 23:6672901. [DOI: 10.1093/bib/bbac319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2022] [Revised: 06/27/2022] [Accepted: 07/14/2022] [Indexed: 11/13/2022] Open
Abstract
Abstract
Neuropeptides (NPs) are a particular class of informative substances in the immune system and physiological regulation. They play a crucial role in regulating physiological functions in various biological growth and developmental stages. In addition, NPs are crucial for developing new drugs for the treatment of neurological diseases. With the development of molecular biology techniques, some data-driven tools have emerged to predict NPs. However, it is necessary to improve the predictive performance of these tools for NPs. In this study, we developed a deep learning model (NeuroPred-CLQ) based on the temporal convolutional network (TCN) and multi-head attention mechanism to identify NPs effectively and translate the internal relationships of peptide sequences into numerical features by the Word2vec algorithm. The experimental results show that NeuroPred-CLQ learns data information effectively, achieving 93.6% accuracy and 98.8% AUC on the independent test set. The model has better performance in identifying NPs than the state-of-the-art predictors. Visualization of features using t-distribution random neighbor embedding shows that the NeuroPred-CLQ can clearly distinguish the positive NPs from the negative ones. We believe the NeuroPred-CLQ can facilitate drug development and clinical trial studies to treat neurological disorders.
Collapse
Affiliation(s)
- Shouzhi Chen
- School of Mathematics and System Science, Xinjiang University , Urumqi, China
| | - Qing Li
- School of Mathematics and System Science, Xinjiang University , Urumqi, China
| | - Jianping Zhao
- School of Mathematics and System Science, Xinjiang University , Urumqi, China
| | - Yannan Bin
- School of Computer Science and Technology, Anhui University , Hefei, China
| | - Chunhou Zheng
- School of Mathematics and System Science, Xinjiang University , Urumqi, China
- School of Computer Science and Technology, Anhui University , Hefei, China
| |
Collapse
|
28
|
Liu S, Cui C, Chen H, Liu T. Ensemble Learning-Based Feature Selection for Phage Protein Prediction. Front Microbiol 2022; 13:932661. [PMID: 35910662 PMCID: PMC9335128 DOI: 10.3389/fmicb.2022.932661] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2022] [Accepted: 06/14/2022] [Indexed: 11/14/2022] Open
Abstract
Phage has high specificity for its host recognition. As a natural enemy of bacteria, it has been used to treat super bacteria many times. Identifying phage proteins from the original sequence is very important for understanding the relationship between phage and host bacteria and developing new antimicrobial agents. However, traditional experimental methods are both expensive and time-consuming. In this study, an ensemble learning-based feature selection method is proposed to find important features for phage protein identification. The method uses four types of protein sequence-derived features, quantifies the importance of each feature by adding perturbations to the features to influence the results, and finally splices the important features among the four types of features. In addition, we analyzed the selected features and their biological significance.
Collapse
Affiliation(s)
- Songbo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Chengmin Cui
- Beijing Institute of Control Engineering, China Academy of Space Technology, Beijing, China
| | - Huipeng Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
- *Correspondence: Huipeng Chen
| | - Tong Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
29
|
Li X, Ma S, Yan W, Wu Y, Kong H, Zhang M, Luo X, Xia J. dbBIP: a comprehensive bipolar disorder database for genetic research. Database (Oxford) 2022; 2022:baac049. [PMID: 35779245 PMCID: PMC9250320 DOI: 10.1093/database/baac049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 05/28/2022] [Accepted: 06/11/2022] [Indexed: 11/17/2022]
Abstract
Bipolar disorder (BIP) is one of the most common hereditary psychiatric disorders worldwide. Elucidating the genetic basis of BIP will play a pivotal role in mechanistic delineation. Genome-wide association studies (GWAS) have successfully reported multiple susceptibility loci conferring BIP risk, thus providing insight into the effects of its underlying pathobiology. However, difficulties remain in the extrication of important and biologically relevant data from genetic discoveries related to psychiatric disorders such as BIP. There is an urgent need for an integrated and comprehensive online database with unified access to genetic and multi-omics data for in-depth data mining. Here, we developed the dbBIP, a database for BIP genetic research based on published data. The dbBIP consists of several modules, i.e.: (i) single nucleotide polymorphism (SNP) module, containing large-scale GWAS genetic summary statistics and functional annotation information relevant to risk variants; (ii) gene module, containing BIP-related candidate risk genes from various sources and (iii) analysis module, providing a simple and user-friendly interface to analyze one's own data. We also conducted extensive analyses, including functional SNP annotation, integration (including summary-data-based Mendelian randomization and transcriptome-wide association studies), co-expression, gene expression, tissue expression, protein-protein interaction and brain expression quantitative trait loci analyses, thus shedding light on the genetic causes of BIP. Finally, we developed a graphical browser with powerful search tools to facilitate data navigation and access. The dbBIP provides a comprehensive resource for BIP genetic research as well as an integrated analysis platform for researchers and can be accessed online at http://dbbip.xialab.info. Database URL: http://dbbip.xialab.info.
Collapse
Affiliation(s)
- Xiaoyan Li
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Shushan District, Hefei, Anhui 230601, China
| | - Shunshuai Ma
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Shushan District, Hefei, Anhui 230601, China
| | - Wenhui Yan
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Shushan District, Hefei, Anhui 230601, China
| | - Yong Wu
- Affiliated Wuhan Mental Health Center, Tongji Medical College, Huazhong University of Science and Technology, 93 Youyi Road, Qiaokou District, Wuhan, Hubei 430030, China
| | - Hui Kong
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Shushan District, Hefei, Anhui 230601, China
| | - Mingshan Zhang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Shushan District, Hefei, Anhui 230601, China
| | - Xiongjian Luo
- Key Laboratory of Animal Models and Human Disease Mechanisms of the Chinese Academy of Sciences & Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, 32 Jiaochang East Road, Wuhua District, Kunming, Yunnan 650223, China
- Kunming College of Life Science, University of Chinese Academy of Sciences, 19 Qingsong Road, Panlong District, Kunming, Yunnan 650204, China
| | - Junfeng Xia
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, 111 Jiulong Road, Shushan District, Hefei, Anhui 230601, China
| |
Collapse
|
30
|
Rahman A, Ahmed S, Al Mehedi Hasan M, Ahmad S, Dehzangi I. Accurately predicting nitrosylated tyrosine sites using probabilistic sequence information. Gene 2022; 826:146445. [PMID: 35358650 DOI: 10.1016/j.gene.2022.146445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 02/16/2022] [Accepted: 03/18/2022] [Indexed: 11/04/2022]
Abstract
Post-translational modification (PTM) is defined as the enzymatic changes of proteins after the translation process in protein biosynthesis. Nitrotyrosine, which is one of the most important modifications of proteins, is interceded by the active nitrogen molecule. It is known to be associated with different diseases including autoimmune diseases characterized by chronic inflammation and cell damage. Currently, nitrotyrosine sites are identified using experimental approaches which are laborious and costly. In this study, we propose a new machine learning method called PredNitro to accurately predict nitrotyrosine sites. To build PredNitro, we use sequence coupling information from the neighboring amino acids of tyrosine residues along with a support vector machine as our classification technique.Our results demonstrates that PredNitro achieves 98.0% accuracy with more than 0.96 MCC and 0.99 AUC in both 5-fold cross-validation and jackknife cross-validation tests which are significantly better than those reported in previous studies. PredNitro is publicly available as an online predictor at: http://103.99.176.239/PredNitro.
Collapse
Affiliation(s)
- Afrida Rahman
- Department of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Sabit Ahmed
- Department of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Md Al Mehedi Hasan
- Department of Computer Science and Engineering, Rajshahi University of Engineering and Technology, Rajshahi, Bangladesh
| | - Shamim Ahmad
- Department of Computer Science and Engineering, University of Rajshahi, Rajshahi, Bangladesh
| | - Iman Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ 08102, USA; Center for Computational and Integrative Biology, Rutgers University, Camden, NJ 08102, USA.
| |
Collapse
|
31
|
Zou H. iAHTP-LH: Integrating Low-Order and High-Order Correlation Information for Identifying Antihypertensive Peptides. Int J Pept Res Ther 2022. [DOI: 10.1007/s10989-022-10414-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
32
|
Zou H, Yang F, Yin Z. iTTCA-MFF: identifying tumor T cell antigens based on multiple feature fusion. Immunogenetics 2022; 74:447-454. [PMID: 35246701 DOI: 10.1007/s00251-022-01258-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2022] [Accepted: 02/26/2022] [Indexed: 11/05/2022]
Abstract
Cancer is a terrible disease, recent studies reported that tumor T cell antigens (TTCAs) may play a promising role in cancer treatment. Since experimental methods are still expensive and time-consuming, it is highly desirable to develop automatic computational methods to identify tumor T cell antigens from the huge amount of natural and synthetic peptides. Hence, in this study, a novel computational model called iTTCA-MFF was proposed to identify TTCAs. In order to describe the sequence effectively, the physicochemical (PC) properties of amino acid and residue pairwise energy content matrix (RECM) were firstly employed to encode peptide sequences. Then, two different approaches including covariance and Pearson's correlation coefficient (PCC) were used to collect discriminative information from PC and RECM matrixes. Next, an effective feature selection approach called the least absolute shrinkage and selection operator (LAASO) was adopted to select the optimal features. These selected optimal features were fed into support vector machine (SVM) for identifying TTCAs. We performed experiments on two different datasets, experimental results indicated that the proposed method is promising and may play a complementary role to the existing methods for identifying TTCAs. The datasets and codes can be available at https://figshare.com/articles/online_resource/iTTCA-MFF/17636120 .
Collapse
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, 330003, China.
| | - Fan Yang
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, 330003, China
| | - Zhijian Yin
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, 330003, China
| |
Collapse
|
33
|
Grønning AGB, Kacprowski T, Schéele C. MultiPep: a hierarchical deep learning approach for multi-label classification of peptide bioactivities. Biol Methods Protoc 2021; 6:bpab021. [PMID: 34909478 PMCID: PMC8665375 DOI: 10.1093/biomethods/bpab021] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2021] [Revised: 10/28/2021] [Accepted: 11/17/2021] [Indexed: 11/14/2022] Open
Abstract
Peptide-based therapeutics are here to stay and will prosper in the future. A key step in identifying novel peptide-drugs is the determination of their bioactivities. Recent advances in peptidomics screening approaches hold promise as a strategy for identifying novel drug targets. However, these screenings typically generate an immense number of peptides and tools for ranking these peptides prior to planning functional studies are warranted. Whereas a couple of tools in the literature predict multiple classes, these are constructed using multiple binary classifiers. We here aimed to use an innovative deep learning approach to generate an improved peptide bioactivity classifier with capacity of distinguishing between multiple classes. We present MultiPep: a deep learning multi-label classifier that assigns peptides to zero or more of 20 bioactivity classes. We train and test MultiPep on data from several publically available databases. The same data are used for a hierarchical clustering, whose dendrogram shapes the architecture of MultiPep. We test a new loss function that combines a customized version of Matthews correlation coefficient with binary cross entropy (BCE), and show that this is better than using class-weighted BCE as loss function. Further, we show that MultiPep surpasses state-of-the-art peptide bioactivity classifiers and that it predicts known and novel bioactivities of FDA-approved therapeutic peptides. In conclusion, we present innovative machine learning techniques used to produce a peptide prediction tool to aid peptide-based therapy development and hypothesis generation.
Collapse
Affiliation(s)
- Alexander G B Grønning
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| | - Tim Kacprowski
- Division Data Science in Biomedicine, Peter L. Reichertz Institute for Medical Informatics, TU Braunschweig and Hannover Medical School, 38106 Braunschweig, Germany.,Braunschweig Integrated Centre for Systems Biology (BRICS), 38106 Braunschweig, Germany
| | - Camilla Schéele
- Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
| |
Collapse
|
34
|
Zou H. Identifying blood‐brain barrier peptides by using amino acids physicochemical properties and features fusion method. Pept Sci (Hoboken) 2021. [DOI: 10.1002/pep2.24247] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics Jiangxi Science and Technology Normal University Nanchang China
| |
Collapse
|
35
|
Zhang W, Xia E, Dai R, Tang W, Bin Y, Xia J. PredAPP: Predicting Anti-Parasitic Peptides with Undersampling and Ensemble Approaches. Interdiscip Sci 2021; 14:258-268. [PMID: 34608613 DOI: 10.1007/s12539-021-00484-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2021] [Revised: 09/15/2021] [Accepted: 09/15/2021] [Indexed: 12/12/2022]
Abstract
Anti-parasitic peptides (APPs) have been regarded as promising therapeutic candidate drugs against parasitic diseases. Due to the fact that the experimental techniques for identifying APPs are expensive and time-consuming, there is an urgent need to develop a computational approach to predict APPs on a large scale. In this study, we provided a computational method, termed PredAPP (Prediction of Anti-Parasitic Peptides) that could effectively identify APPs using an ensemble of well-performed machine learning (ML) classifiers. Firstly, to solve the class imbalance problem, a balanced training dataset was generated by the undersampling method. We found that the balanced dataset based on cluster centroid achieved the best performance. Then, nine groups of features and six ML algorithms were combined to generate 54 classifiers and the output of these classifiers formed 54 feature representations, and in each feature group, we selected the feature representation with best performance for classification. Finally, the selected feature representations were integrated using logistic regression algorithm to construct the prediction model PredAPP. On the independent dataset, PredAPP achieved accuracy and AUC of 0.880 and 0.922, respectively, compared to 0.739 and 0.873 of AMPfun, a state-of-the-art method to predict APPs. The web server of PredAPP is freely accessible at http://predapp.xialab.info and https://github.com/xialab-ahu/PredAPP .
Collapse
Affiliation(s)
- Wei Zhang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.,State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Enhua Xia
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, 230036, Anhui, China
| | - Ruyu Dai
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Wending Tang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Yannan Bin
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China. .,Anhui Key Laboratory of Modern Biomanufacturing, Anhui University, Hefei, 230601, Anhui, China.
| | - Junfeng Xia
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education and Information Materials and Intelligent Sensing Laboratory of Anhui Province, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China. .,State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, 230036, Anhui, China.
| |
Collapse
|
36
|
Identifying Dipeptidyl Peptidase-IV Inhibitory Peptides Based on Correlation Information of Physicochemical Properties. Int J Pept Res Ther 2021. [DOI: 10.1007/s10989-021-10280-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
37
|
Jiang M, Zhao B, Luo S, Wang Q, Chu Y, Chen T, Mao X, Liu Y, Wang Y, Jiang X, Wei DQ, Xiong Y. NeuroPpred-Fuse: an interpretable stacking model for prediction of neuropeptides by fusing sequence information and feature selection methods. Brief Bioinform 2021; 22:6350884. [PMID: 34396388 DOI: 10.1093/bib/bbab310] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 07/01/2021] [Accepted: 07/18/2021] [Indexed: 12/13/2022] Open
Abstract
Neuropeptides acting as signaling molecules in the nervous system of various animals play crucial roles in a wide range of physiological functions and hormone regulation behaviors. Neuropeptides offer many opportunities for the discovery of new drugs and targets for the treatment of neurological diseases. In recent years, there have been several data-driven computational predictors of various types of bioactive peptides, but the relevant work about neuropeptides is little at present. In this work, we developed an interpretable stacking model, named NeuroPpred-Fuse, for the prediction of neuropeptides through fusing a variety of sequence-derived features and feature selection methods. Specifically, we used six types of sequence-derived features to encode the peptide sequences and then combined them. In the first layer, we ensembled three base classifiers and four feature selection algorithms, which select non-redundant important features complementarily. In the second layer, the output of the first layer was merged and fed into logistic regression (LR) classifier to train the model. Moreover, we analyzed the selected features and explained the feasibility of the selected features. Experimental results show that our model achieved 90.6% accuracy and 95.8% AUC on the independent test set, outperforming the state-of-the-art models. In addition, we exhibited the distribution of selected features by these tree models and compared the results on the training set to that on the test set. These results fully showed that our model has a certain generalization ability. Therefore, we expect that our model would provide important advances in the discovery of neuropeptides as new drugs for the treatment of neurological diseases.
Collapse
Affiliation(s)
- Mingming Jiang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Bowen Zhao
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Shenggan Luo
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Qiankun Wang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Tianhang Chen
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xueying Mao
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yatong Liu
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Xue Jiang
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, China
| |
Collapse
|
38
|
Cao R, Wang M, Bin Y, Zheng C. DLFF-ACP: prediction of ACPs based on deep learning and multi-view features fusion. PeerJ 2021; 9:e11906. [PMID: 34414035 PMCID: PMC8344685 DOI: 10.7717/peerj.11906] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2021] [Accepted: 07/14/2021] [Indexed: 01/10/2023] Open
Abstract
An emerging type of therapeutic agent, anticancer peptides (ACPs), has attracted attention because of its lower risk of toxic side effects. However process of identifying ACPs using experimental methods is both time-consuming and laborious. In this study, we developed a new and efficient algorithm that predicts ACPs by fusing multi-view features based on dual-channel deep neural network ensemble model. In the model, one channel used the convolutional neural network CNN to automatically extract the potential spatial features of a sequence. Another channel was used to process and extract more effective features from handcrafted features. Additionally, an effective feature fusion method was explored for the mutual fusion of different features. Finally, we adopted the neural network to predict ACPs based on the fusion features. The performance comparisons across the single and fusion features showed that the fusion of multi-view features could effectively improve the model's predictive ability. Among these, the fusion of the features extracted by the CNN and composition of k-spaced amino acid group pairs achieved the best performance. To further validate the performance of our model, we compared it with other existing methods using two independent test sets. The results showed that our model's area under curve was 0.90, which was higher than that of the other existing methods on the first test set and higher than most of the other existing methods on the second test set. The source code and datasets are available at https://github.com/wame-ng/DLFF-ACP.
Collapse
Affiliation(s)
- Ruifen Cao
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
- Engineering Research Center of Big Data Application in Private Health Medicine, Fujian Province University, Putian, Fujian, China
| | - Meng Wang
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
| | - Yannan Bin
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
| | - Chunhou Zheng
- Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, Anhui, China
- Engineering Research Center of Big Data Application in Private Health Medicine, Fujian Province University, Putian, Fujian, China
| |
Collapse
|
39
|
Hasan MM, Alam MA, Shoombuatong W, Deng HW, Manavalan B, Kurata H. NeuroPred-FRL: an interpretable prediction model for identifying neuropeptide using feature representation learning. Brief Bioinform 2021; 22:6272801. [PMID: 33975333 DOI: 10.1093/bib/bbab167] [Citation(s) in RCA: 65] [Impact Index Per Article: 16.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2021] [Revised: 03/23/2021] [Accepted: 04/09/2021] [Indexed: 12/13/2022] Open
Abstract
Neuropeptides (NPs) are the most versatile neurotransmitters in the immune systems that regulate various central anxious hormones. An efficient and effective bioinformatics tool for rapid and accurate large-scale identification of NPs is critical in immunoinformatics, which is indispensable for basic research and drug development. Although a few NP prediction tools have been developed, it is mandatory to improve their NPs' prediction performances. In this study, we have developed a machine learning-based meta-predictor called NeuroPred-FRL by employing the feature representation learning approach. First, we generated 66 optimal baseline models by employing 11 different encodings, six different classifiers and a two-step feature selection approach. The predicted probability scores of NPs based on the 66 baseline models were combined to be deemed as the input feature vector. Second, in order to enhance the feature representation ability, we applied the two-step feature selection approach to optimize the 66-D probability feature vector and then inputted the optimal one into a random forest classifier for the final meta-model (NeuroPred-FRL) construction. Benchmarking experiments based on both cross-validation and independent tests indicate that the NeuroPred-FRL achieves a superior prediction performance of NPs compared with the other state-of-the-art predictors. We believe that the proposed NeuroPred-FRL can serve as a powerful tool for large-scale identification of NPs, facilitating the characterization of their functional mechanisms and expediting their applications in clinical therapy. Moreover, we interpreted some model mechanisms of NeuroPred-FRL by leveraging the robust SHapley Additive exPlanation algorithm.
Collapse
Affiliation(s)
- Md Mehedi Hasan
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan.,Japan Society for the Promotion of Science, 5-3-1 Kojimachi, Chiyoda-ku, Tokyo 102-0083, Japan
| | - Md Ashad Alam
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112 USA
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Hong-Wen Deng
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA, 70112 USA
| | | | - Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| |
Collapse
|