1
|
Li J, Xiong S, Shi H, Cui F, Zhang Z, Wei L. NeuroPred-AIMP: Multimodal Deep Learning for Neuropeptide Prediction via Protein Language Modeling and Temporal Convolutional Networks. J Chem Inf Model 2025; 65:4740-4750. [PMID: 40258183 DOI: 10.1021/acs.jcim.5c00444] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/23/2025]
Abstract
Neuropeptides are key signaling molecules that regulate fundamental physiological processes ranging from metabolism to cognitive function. However, accurate identification is a huge challenge due to sequence heterogeneity, obscured functional motifs and limited experimentally validated data. Accurate identification of neuropeptides is critical for advancing neurological disease therapeutics and peptide-based drug design. Existing neuropeptide identification methods rely on manual features combined with traditional machine learning methods, which are difficult to capture the deep patterns of sequences. To address these limitations, we propose NeuroPred-AIMP (adaptive integrated multimodal predictor), an interpretable model that synergizes global semantic representation of the protein language model (ESM) and the multiscale structural features of the temporal convolutional network (TCN). The model introduced the adaptive features fusion mechanism of residual enhancement to dynamically recalibrate feature contributions, to achieve robust integration of evolutionary and local sequence information. The experimental results demonstrated that the proposed model showed excellent comprehensive performance on the independence test set, with an accuracy of 92.3% and the AUROC of 0.974. Simultaneously, the model showed good balance in the ability to identify positive and negative samples, with a sensitivity of 92.6% and a specificity of 92.1%, with a difference of less than 0.5%. The result fully confirms the effectiveness of the multimodal features strategy in the task of neuropeptide recognition.
Collapse
Affiliation(s)
- Jinjin Li
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
| | - Shuwen Xiong
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
| | - Hua Shi
- School of Optoelectronic and Communication Engineering, Xiamen University of Technology, Xiamen 361024, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Leyi Wei
- Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macao 999078, China
- School of Software, Shandong University, Jinan 250101, China
| |
Collapse
|
2
|
Hasnat S, Rahman MM, Yeasmin F, Jubair M, Helmy YA, Islam T, Hoque MN. Genomic and Computational Analysis Unveils Bacteriocin Based Therapeutics against Clinical Mastitis Pathogens in Dairy Cows. Probiotics Antimicrob Proteins 2025:10.1007/s12602-025-10563-w. [PMID: 40295467 DOI: 10.1007/s12602-025-10563-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/24/2025] [Indexed: 04/30/2025]
Abstract
Clinical mastitis (CM) remains a critical challenge in dairy production, exacerbated by the global rise of antibiotic-resistant pathogens, which threatens herd health and productivity. This study pioneers a dual genomic-computational strategy to develop bacteriocin-based therapeutics-a promising alternative to conventional antibiotics-by targeting conserved virulence mechanisms in CM-causing pathogens. We aimed to (i) identify essential core proteins in CM-causing pathogens of dairy cows using the genomic approach; and (ii) assess the efficacy of bacteriocin peptides (BPs) as novel therapeutic agents targeting the selected core proteins for sustainable management of mastitis. Through pan-genomic analysis of 16 clinically relevant pathogens, including Staphylococcus aureus, S. warneri, Streptococcus agalactiae, S. uberis, Escherichia coli, Klebsiella pneumoniae, Pseudomonas aeruginosa, P. putida, and P. asiatica, we identified 65 evolutionarily conserved core proteins. Prioritization based on essentiality, virulence, and resistance potential revealed Rho (transcription termination factor) and HupB (nucleoid-associated protein) as high-value therapeutic targets due to their critical roles in bacterial survival and pathogenicity. A computational screen of 70 BPs identified 14 candidates with high binding affinity for both Rho and HupB proteins. Molecular dynamics simulations demonstrated that BP8, a novel dual-action bacteriocin, competitively inhibits Rho-mediated transcription termination and disrupts HupB-DNA interactions, effectively crippling bacterial replication and virulence. BP8 exhibited superior structural stability and binding efficacy compared to other candidates, positioning it as a potent broad-spectrum agent against diverse CM pathogens, including multidrug-resistant strains. Our study underscores the untapped potential of bacteriocins in veterinary medicine, offering a sustainable solution to mitigate antibiotic overuse and resistance. The computational validation of BP8 provides a foundational framework for developing targeted therapies, with implications for reducing dairy industry losses and improving animal welfare. Further in vitro and in vivo studies are warranted to translate these insights into practical therapeutics.
Collapse
Affiliation(s)
- Soharth Hasnat
- Molecular Biology and Bioinformatics Laboratory, Department of Gynecology, Obstetrics and Reproductive Health, Gazipur Agricultural University, Gazipur, 1706, Bangladesh
| | - Md Morshedur Rahman
- Molecular Biology and Bioinformatics Laboratory, Department of Gynecology, Obstetrics and Reproductive Health, Gazipur Agricultural University, Gazipur, 1706, Bangladesh
| | - Farzana Yeasmin
- Institute of Biotechnology and Genetic Engineering, Gazipur Agricultural University, Gazipur, 1706, Bangladesh
| | - Mohammad Jubair
- iccdr'b (International Centre for Diarrhoeal Disease Research, Bangladesh), Dhaka, 1212, Bangladesh
| | - Yosra A Helmy
- Department of Veterinary Science, University of Kentucky, 1400 Nicholasville Rd., Lexington, KY, 40546-0099, USA
| | - Tofazzal Islam
- Institute of Biotechnology and Genetic Engineering, Gazipur Agricultural University, Gazipur, 1706, Bangladesh.
| | - M Nazmul Hoque
- Molecular Biology and Bioinformatics Laboratory, Department of Gynecology, Obstetrics and Reproductive Health, Gazipur Agricultural University, Gazipur, 1706, Bangladesh.
| |
Collapse
|
3
|
Yue Y, Fan H, Zhao J, Xia J. Protein language model-based prediction for plant miRNA encoded peptides. PeerJ Comput Sci 2025; 11:e2733. [PMID: 40134870 PMCID: PMC11935769 DOI: 10.7717/peerj-cs.2733] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2024] [Accepted: 02/05/2025] [Indexed: 03/27/2025]
Abstract
Plant miRNA encoded peptides (miPEPs), which are short peptides derived from small open reading frames within primary miRNAs, play a crucial role in regulating diverse plant traits. Plant miPEPs identification is challenging due to limitations in the available number of known miPEPs for training. Existing prediction methods rely on manually encoded features, including miPEPPred-FRL, to infer plant miPEPs. Recent advances in deep learning modeling of protein sequences provide an opportunity to improve the representation of key features, leveraging large datasets of protein sequences. In this study, we propose an accurate prediction model, called pLM4PEP, which integrates ESM2 peptide embedding with machine learning methods. Our model not only demonstrates precise identification capabilities for plant miPEPs, but also achieves remarkable results across diverse datasets that include other bioactive peptides. The source codes, datasets of pLM4PEP are available at https://github.com/xialab-ahu/pLM4PEP.
Collapse
Affiliation(s)
- Yishan Yue
- College of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang, China
| | - Henghui Fan
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
| | - Jianping Zhao
- College of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang, China
| | - Junfeng Xia
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
| |
Collapse
|
4
|
Iglesias V, Bárcenas O, Pintado‐Grima C, Burdukiewicz M, Ventura S. Structural information in therapeutic peptides: Emerging applications in biomedicine. FEBS Open Bio 2025; 15:254-268. [PMID: 38877295 PMCID: PMC11788753 DOI: 10.1002/2211-5463.13847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Revised: 05/08/2024] [Accepted: 05/27/2024] [Indexed: 06/16/2024] Open
Abstract
Peptides are attracting a growing interest as therapeutic agents. This trend stems from their cost-effectiveness and reduced immunogenicity, compared to antibodies or recombinant proteins, but also from their ability to dock and interfere with large protein-protein interaction surfaces, and their higher specificity and better biocompatibility relative to organic molecules. Many tools have been developed to understand, predict, and engineer peptide function. However, most state-of-the-art approaches treat peptides only as linear entities and disregard their structural arrangement. Yet, structural details are critical for peptide properties such as solubility, stability, or binding affinities. Recent advances in peptide structure prediction have successfully addressed the scarcity of confidently determined peptide structures. This review will explore different therapeutic and biotechnological applications of peptides and their assemblies, emphasizing the importance of integrating structural information to advance these endeavors effectively.
Collapse
Affiliation(s)
- Valentín Iglesias
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquímica i Biologia MolecularUniversitat Autònoma de BarcelonaBarcelonaSpain
- Clinical Research CentreMedical University of BiałystokBiałystokPoland
| | - Oriol Bárcenas
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquímica i Biologia MolecularUniversitat Autònoma de BarcelonaBarcelonaSpain
- Institute of Advanced Chemistry of Catalonia (IQAC), CSICBarcelonaSpain
| | - Carlos Pintado‐Grima
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquímica i Biologia MolecularUniversitat Autònoma de BarcelonaBarcelonaSpain
| | - Michał Burdukiewicz
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquímica i Biologia MolecularUniversitat Autònoma de BarcelonaBarcelonaSpain
- Clinical Research CentreMedical University of BiałystokBiałystokPoland
| | - Salvador Ventura
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquímica i Biologia MolecularUniversitat Autònoma de BarcelonaBarcelonaSpain
| |
Collapse
|
5
|
Vrbnjak K, Sewduth RN. Recent Advances in Peptide Drug Discovery: Novel Strategies and Targeted Protein Degradation. Pharmaceutics 2024; 16:1486. [PMID: 39598608 PMCID: PMC11597556 DOI: 10.3390/pharmaceutics16111486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2024] [Revised: 11/19/2024] [Accepted: 11/20/2024] [Indexed: 11/29/2024] Open
Abstract
Recent technological advancements, including computer-assisted drug discovery, gene-editing techniques, and high-throughput screening approaches, have greatly expanded the palette of methods for the discovery of peptides available to researchers. These emerging strategies, driven by recent advances in bioinformatics and multi-omics, have significantly improved the efficiency of peptide drug discovery when compared with traditional in vitro and in vivo methods, cutting costs and improving their reliability. An added benefit of peptide-based drugs is the ability to precisely target protein-protein interactions, which are normally a particularly challenging aspect of drug discovery. Another recent breakthrough in this field is targeted protein degradation through proteolysis-targeting chimeras. These revolutionary compounds represent a noteworthy advancement over traditional small-molecule inhibitors due to their unique mechanism of action, which allows for the degradation of specific proteins with unprecedented specificity. The inclusion of a peptide as a protein-of-interest-targeting moiety allows for improved versatility and the possibility of targeting otherwise undruggable proteins. In this review, we discuss various novel wet-lab and computational multi-omic methods for peptide drug discovery, provide an overview of therapeutic agents discovered through these cutting-edge techniques, and discuss the potential for the therapeutic delivery of peptide-based drugs.
Collapse
Affiliation(s)
- Katarina Vrbnjak
- VIB-KU Leuven Center for Cancer Biology (VIB), 3000 Leuven, Belgium
| | | |
Collapse
|
6
|
Hu J, Chen KX, Rao B, Ni JY, Thafar MA, Albaradei S, Arif M. Protein-peptide binding residue prediction based on protein language models and cross-attention mechanism. Anal Biochem 2024; 694:115637. [PMID: 39121938 DOI: 10.1016/j.ab.2024.115637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 07/28/2024] [Accepted: 08/06/2024] [Indexed: 08/12/2024]
Abstract
Accurate identifications of protein-peptide binding residues are essential for protein-peptide interactions and advancing drug discovery. To address this problem, extensive research efforts have been made to design more discriminative feature representations. However, extracting these explicit features usually depend on third-party tools, resulting in low computational efficacy and suffering from low predictive performance. In this study, we design an end-to-end deep learning-based method, E2EPep, for protein-peptide binding residue prediction using protein sequence only. E2EPep first employs and fine-tunes two state-of-the-art pre-trained protein language models that can extract two different high-latent feature representations from protein sequences relevant for protein structures and functions. A novel feature fusion module is then designed in E2EPep to fuse and optimize the above two feature representations of binding residues. In addition, we have also design E2EPep+, which integrates E2EPep and PepBCL models, to improve the prediction performance. Experimental results on two independent testing data sets demonstrate that E2EPep and E2EPep + could achieve the average AUC values of 0.846 and 0.842 while achieving an average Matthew's correlation coefficient value that is significantly higher than that of existing most of sequence-based methods and comparable to that of the state-of-the-art structure-based predictors. Detailed data analysis shows that the primary strength of E2EPep lies in the effectiveness of feature representation using cross-attention mechanism to fuse the embeddings generated by two fine-tuned protein language models. The standalone package of E2EPep and E2EPep + can be obtained at https://github.com/ckx259/E2EPep.git for academic use only.
Collapse
Affiliation(s)
- Jun Hu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China; Center for AI and Computational Biology, Suzhou Institution of Systems Medicine, Suzhou, 215123, China.
| | - Kai-Xin Chen
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Bing Rao
- School of Information & Electrical Engineering, Hangzhou City University, Hangzhou, 310015, China
| | - Jing-Yuan Ni
- NUIST Reading Academy, Nanjing University of Information Science & Technology, Nanjing, 210044, China
| | - Maha A Thafar
- Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, 21944, Saudi Arabia
| | - Somayah Albaradei
- Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Muhammad Arif
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, 34110, Qatar.
| |
Collapse
|
7
|
Guo X, Zheng Z, Cheong KH, Zou Q, Tiwari P, Ding Y. Sequence homology score-based deep fuzzy network for identifying therapeutic peptides. Neural Netw 2024; 178:106458. [PMID: 38901093 DOI: 10.1016/j.neunet.2024.106458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 05/29/2024] [Accepted: 06/09/2024] [Indexed: 06/22/2024]
Abstract
The detection of therapeutic peptides is a topic of immense interest in the biomedical field. Conventional biochemical experiment-based detection techniques are tedious and time-consuming. Computational biology has become a useful tool for improving the detection efficiency of therapeutic peptides. Most computational methods do not consider the deviation caused by noise. To improve the generalization performance of therapeutic peptide prediction methods, this work presents a sequence homology score-based deep fuzzy echo-state network with maximizing mixture correntropy (SHS-DFESN-MMC) model. Our method is compared with the existing methods on eight types of therapeutic peptide datasets. The model parameters are determined by 10 fold cross-validation on their training sets and verified by independent test sets. Across the 8 datasets, the average area under the receiver operating characteristic curve (AUC) values of SHS-DFESN-MMC are the highest on both the training (0.926) and independent sets (0.923).
Collapse
Affiliation(s)
- Xiaoyi Guo
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, PR China; Quzhou People's Hospital, Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou, 324000, PR China; Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, S637371, Singapore.
| | - Ziyu Zheng
- Department of Mathematical Sciences, University of Nottingham Ningbo, Ningbo, 315100, PR China.
| | - Kang Hao Cheong
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, S637371, Singapore; College of Computing and Data Science, Nanyang Technological University, S639798, Singapore.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, PR China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, PR China.
| | - Prayag Tiwari
- School of Information Technology, Halmstad University, Sweden.
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, PR China.
| |
Collapse
|
8
|
Han J, Kong T, Liu J. PepNet: an interpretable neural network for anti-inflammatory and antimicrobial peptides prediction using a pre-trained protein language model. Commun Biol 2024; 7:1198. [PMID: 39341947 PMCID: PMC11438969 DOI: 10.1038/s42003-024-06911-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 09/17/2024] [Indexed: 10/01/2024] Open
Abstract
Identifying anti-inflammatory peptides (AIPs) and antimicrobial peptides (AMPs) is crucial for the discovery of innovative and effective peptide-based therapies targeting inflammation and microbial infections. However, accurate identification of AIPs and AMPs remains a computational challenge mainly due to limited utilization of peptide sequence information. Here, we propose PepNet, an interpretable neural network for predicting both AIPs and AMPs by applying a pre-trained protein language model to fully utilize the peptide sequence information. It first captures the information of residue arrangements and physicochemical properties using a residual dilated convolution block, and then seizes the function-related diverse information by introducing a residual Transformer block to characterize the residue representations generated by a pre-trained protein language model. After training and testing, PepNet demonstrates great superiority over other leading AIP and AMP predictors and shows strong interpretability of its learned peptide representations. A user-friendly web server for PepNet is freely available at http://liulab.top/PepNet/server .
Collapse
Affiliation(s)
- Jiyun Han
- School of Mathematics and Statistics, Shandong University, 264209, Weihai, China
| | - Tongxin Kong
- School of Mathematics and Statistics, Shandong University, 264209, Weihai, China
| | - Juntao Liu
- School of Mathematics and Statistics, Shandong University, 264209, Weihai, China.
| |
Collapse
|
9
|
Zhang W, Ding Y, Wei L, Guo X, Ni F. Therapeutic peptides identification via kernel risk sensitive loss-based k-nearest neighbor model and multi-Laplacian regularization. Brief Bioinform 2024; 25:bbae534. [PMID: 39438076 PMCID: PMC11495874 DOI: 10.1093/bib/bbae534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 08/30/2024] [Accepted: 10/08/2024] [Indexed: 10/25/2024] Open
Abstract
Therapeutic peptides are therapeutic agents synthesized from natural amino acids, which can be used as carriers for precisely transporting drugs and can activate the immune system for preventing and treating various diseases. However, screening therapeutic peptides using biochemical assays is expensive, time-consuming, and limited by experimental conditions and biological samples, and there may be ethical considerations in the clinical stage. In contrast, screening therapeutic peptides using machine learning and computational methods is efficient, automated, and can accurately predict potential therapeutic peptides. In this study, a k-nearest neighbor model based on multi-Laplacian and kernel risk sensitive loss was proposed, which introduces a kernel risk loss function derived from the K-local hyperplane distance nearest neighbor model as well as combining the Laplacian regularization method to predict therapeutic peptides. The findings indicated that the suggested approach achieved satisfactory results and could effectively predict therapeutic peptide sequences.
Collapse
Affiliation(s)
- Wenyu Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No. 2006 Xiyuan Avenue, High tech Zone, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No.1 Chengdian Road, Kecheng District, Quzhou 324000, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No.1 Chengdian Road, Kecheng District, Quzhou 324000, China
| | - Leyi Wei
- Macao Polytechnic University, Gomes Street, Macau Peninsula, Macau 999078, China
| | - Xiaoyi Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No.1 Chengdian Road, Kecheng District, Quzhou 324000, China
| | - Fengming Ni
- Department of Gastroenterology, The First Hospital of Jilin University, No. 71 Xinmin Street, Chaoyang District, Changchun 130021, China
| |
Collapse
|
10
|
Isaac KS, Combe M, Potter G, Sokolenko S. Machine learning tools for peptide bioactivity evaluation - Implications for cell culture media optimization and the broader cultivated meat industry. Curr Res Food Sci 2024; 9:100842. [PMID: 39435450 PMCID: PMC11491887 DOI: 10.1016/j.crfs.2024.100842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2024] [Accepted: 09/07/2024] [Indexed: 10/23/2024] Open
Abstract
Although bioactive peptides have traditionally been studied for their health-promoting qualities in the context of nutrition and medicine, the past twenty years have seen a steady increase in their application to cell culture media optimization. Complex natural sources of bioactive peptides, such as hydrolysates, offer a sustainable and cost-effective means of promoting cellular growth, making them an essential component of scaling-up cultivated meat production. However, the sheer diversity of hydrolysates makes product selection difficult, highlighting the need for functional characterization. Traditional wet-lab techniques for isolating and estimating peptide bioactivity cannot keep pace with peptide identification using high-throughput tools such as mass spectrometry, requiring the development and use of machine learning-based classifiers. This review provides a comprehensive list of available software tools to evaluate peptide bioactivity, classified and compared based on the algorithm, training set, functionality, and limitations of the underlying models. We curated independent test sets to compare the predictive performance of different models based on specific bioactivity classification relevant to promoting cell culture growth: antioxidant and anti-inflammatory. A comprehensive screening of all bioactivity classifiers revealed that while there are approximately fifty tools to elucidate antimicrobial activity and sixteen that predict anti-inflammatory activity, fewer tools are available for other functionalities related to cell growth - five that predict antioxidant activity and two for growth factor and/or cell signaling prediction. A thorough evaluation of the available tools revealed significant issues with sensitivity, specificity, and overall accuracy. Despite the overall interest in estimating peptide bioactivity, our work highlights key gaps in the broader adoption of existing software for the specific application of cell culture media optimization in the context of cultivated meat and beyond.
Collapse
Affiliation(s)
- Kathy Sharon Isaac
- Process Engineering and Applied Science, Dalhousie University, 5273 DaCosta Row, PO Box 15000, Halifax, B3H 4R2, NS, Canada
| | - Michelle Combe
- Process Engineering and Applied Science, Dalhousie University, 5273 DaCosta Row, PO Box 15000, Halifax, B3H 4R2, NS, Canada
| | | | - Stanislav Sokolenko
- Process Engineering and Applied Science, Dalhousie University, 5273 DaCosta Row, PO Box 15000, Halifax, B3H 4R2, NS, Canada
| |
Collapse
|
11
|
Xu Y, Zhang S, Zhu F, Liang Y. A deep learning model for anti-inflammatory peptides identification based on deep variational autoencoder and contrastive learning. Sci Rep 2024; 14:18451. [PMID: 39117712 PMCID: PMC11310449 DOI: 10.1038/s41598-024-69419-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2024] [Accepted: 08/05/2024] [Indexed: 08/10/2024] Open
Abstract
As a class of biologically active molecules with significant immunomodulatory and anti-inflammatory effects, anti-inflammatory peptides have important application value in the medical and biotechnology fields due to their unique biological functions. Research on the identification of anti-inflammatory peptides provides important theoretical foundations and practical value for a deeper understanding of the biological mechanisms of inflammation and immune regulation, as well as for the development of new drugs and biotechnological applications. Therefore, it is necessary to develop more advanced computational models for identifying anti-inflammatory peptides. In this study, we propose a deep learning model named DAC-AIPs based on variational autoencoder and contrastive learning for accurate identification of anti-inflammatory peptides. In the sequence encoding part, the incorporation of multi-hot encoding helps capture richer sequence information. The autoencoder, composed of convolutional layers and linear layers, can learn latent features and reconstruct features, with variational inference enhancing the representation capability of latent features. Additionally, the introduction of contrastive learning aims to improve the model's classification ability. Through cross-validation and independent dataset testing experiments, DAC-AIPs achieves superior performance compared to existing state-of-the-art models. In cross-validation, the classification accuracy of DAC-AIPs reached around 88%, which is 7% higher than previous models. Furthermore, various ablation experiments and interpretability experiments validate the effectiveness of DAC-AIPs. Finally, a user-friendly online predictor is designed to enhance the practicality of the model, and the server is freely accessible at http://dac-aips.online .
Collapse
Affiliation(s)
- Yujie Xu
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, People's Republic of China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, People's Republic of China.
| | - Feng Zhu
- Center for Translational Medicine, The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, 710061, People's Republic of China
| | - Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, People's Republic of China
| |
Collapse
|
12
|
Kang Y, Zhang H, Wang X, Yang Y, Jia Q. MMDB: Multimodal dual-branch model for multi-functional bioactive peptide prediction. Anal Biochem 2024; 690:115491. [PMID: 38460901 DOI: 10.1016/j.ab.2024.115491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Revised: 01/21/2024] [Accepted: 02/19/2024] [Indexed: 03/11/2024]
Abstract
Bioactive peptides can hinder oxidative processes and microbial spoilage in foodstuffs and play important roles in treating diverse diseases and disorders. While most of the methods focus on single-functional bioactive peptides and have obtained promising prediction performance, it is still a significant challenge to accurately detect complex and diverse functions simultaneously with the quick increase of multi-functional bioactive peptides. In contrast to previous research on multi-functional bioactive peptide prediction based solely on sequence, we propose a novel multimodal dual-branch (MMDB) lightweight deep learning model that designs two different branches to effectively capture the complementary information of peptide sequence and structural properties. Specifically, a multi-scale dilated convolution with Bi-LSTM branch is presented to effectively model the different scales sequence properties of peptides while a multi-layer convolution branch is proposed to capture structural information. To the best of our knowledge, this is the first effective extraction of peptide sequence features using multi-scale dilated convolution without parameter increase. Multimodal features from both branches are integrated via a fully connected layer for multi-label classification. Compared to state-of-the-art methods, our MMDB model exhibits competitive results across metrics, with a 9.1% Coverage increase and 5.3% and 3.5% improvements in Precision and Accuracy, respectively.
Collapse
Affiliation(s)
- Yan Kang
- National Pilot School of Software, Yunnan University, Kunming, 650091, Yunnan, China; Yunnan Key Laboratory of Software Engineering, China
| | - Huadong Zhang
- National Pilot School of Software, Yunnan University, Kunming, 650091, Yunnan, China
| | - Xinchao Wang
- National Pilot School of Software, Yunnan University, Kunming, 650091, Yunnan, China
| | - Yun Yang
- National Pilot School of Software, Yunnan University, Kunming, 650091, Yunnan, China; Yunnan Key Laboratory of Software Engineering, China.
| | - Qi Jia
- School of Information Science, Yunnan University, Kunming, 650091, Yunnan, China
| |
Collapse
|
13
|
Zhang D, Wang Z, Zhao D, Li J. DRGATAN: Directed relation graph attention aware network for asymmetric drug-drug interaction prediction. iScience 2024; 27:109943. [PMID: 38868194 PMCID: PMC11167430 DOI: 10.1016/j.isci.2024.109943] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 03/21/2024] [Accepted: 05/06/2024] [Indexed: 06/14/2024] Open
Abstract
In scenarios involving the treatment of complex or coexisting diseases with multiple drugs, the potential for severe adverse drug reactions in patients necessitates the identification of potential drug-drug interactions (DDIs). Most existing computational methods have not taken into account the asymmetry and relation types of drug interactions caused by the relation information between drugs, which may lead to missing information in embedded learning. Therefore, this paper proposes a directed relation graph attention aware network (DRGATAN) to predict asymmetric drug interactions. DRGATAN leverages an encoder to learn multi-relational role embeddings of drugs across different types of relations. The experimental results show that DRGATAN's performance is superior to recognized advanced methods. The visualization demonstrates the effect of utilizing asymmetric information, and the case analysis validates the reliability of the proposed method. This study provides guidance for predicting asymmetric drug interactions.
Collapse
Affiliation(s)
- Dehai Zhang
- The Key Laboratory of Software Engineering of Yunnan Province, School of Software, Yunnan University, Kunming 650091, P.R. China
| | - Zhengwu Wang
- The Key Laboratory of Software Engineering of Yunnan Province, School of Software, Yunnan University, Kunming 650091, P.R. China
| | - Di Zhao
- The Key Laboratory of Software Engineering of Yunnan Province, School of Software, Yunnan University, Kunming 650091, P.R. China
| | - Jin Li
- The Key Laboratory of Software Engineering of Yunnan Province, School of Software, Yunnan University, Kunming 650091, P.R. China
| |
Collapse
|
14
|
Ghafoor H, Asim MN, Ibrahim MA, Ahmed S, Dengel A. CAPTURE: Comprehensive anti-cancer peptide predictor with a unique amino acid sequence encoder. Comput Biol Med 2024; 176:108538. [PMID: 38759585 DOI: 10.1016/j.compbiomed.2024.108538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 04/26/2024] [Accepted: 04/28/2024] [Indexed: 05/19/2024]
Abstract
Anticancer peptides (ACPs) key properties including bioactivity, high efficacy, low toxicity, and lack of drug resistance make them ideal candidates for cancer therapies. To deeply explore the potential of ACPs and accelerate development of cancer therapies, although 53 Artificial Intelligence supported computational predictors have been developed for ACPs and non ACPs classification but only one predictor has been developed for ACPs functional types annotations. Moreover, these predictors extract amino acids distribution patterns to transform peptides sequences into statistical vectors that are further fed to classifiers for discriminating peptides sequences and annotating peptides functional classes. Overall, these predictors remain fail in extracting diverse types of amino acids distribution patterns from peptide sequences. The paper in hand presents a unique CARE encoder that transforms peptides sequences into statistical vectors by extracting 4 different types of distribution patterns including correlation, distribution, composition, and transition. Across public benchmark dataset, proposed encoder potential is explored under two different evaluation settings namely; intrinsic and extrinsic. Extrinsic evaluation indicates that 12 different machine learning classifiers achieve superior performance with the proposed encoder as compared to 55 existing encoders. Furthermore, an intrinsic evaluation reveals that, unlike existing encoders, the proposed encoder generates more discriminative clusters for ACPs and non-ACPs classes. Across 8 public benchmark ACPs and non-ACPs classification datasets, proposed encoder and Adaboost classifier based CAPTURE predictor outperforms existing predictors with an average accuracy, recall and MCC score of 1%, 4%, and 2% respectively. In generalizeability evaluation case study, across 7 benchmark anti-microbial peptides classification datasets, CAPTURE surpasses existing predictors by an average AU-ROC of 2%. CAPTURE predictive pipeline along with label powerset method outperforms state-of-the-art ACPs functional types predictor by 5%, 5%, 5%, 6%, and 3% in terms of average accuracy, subset accuracy, precision, recall, and F1 respectively. CAPTURE web application is available at https://sds_genetic_analysis.opendfki.de/CAPTURE.
Collapse
Affiliation(s)
- Hina Ghafoor
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany.
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| |
Collapse
|
15
|
Coelho LP, Santos-Júnior CD, de la Fuente-Nunez C. Challenges in computational discovery of bioactive peptides in 'omics data. Proteomics 2024; 24:e2300105. [PMID: 38458994 PMCID: PMC11537280 DOI: 10.1002/pmic.202300105] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Revised: 02/06/2024] [Accepted: 02/06/2024] [Indexed: 03/10/2024]
Abstract
Peptides have a plethora of activities in biological systems that can potentially be exploited biotechnologically. Several peptides are used clinically, as well as in industry and agriculture. The increase in available 'omics data has recently provided a large opportunity for mining novel enzymes, biosynthetic gene clusters, and molecules. While these data primarily consist of DNA sequences, other types of data provide important complementary information. Due to their size, the approaches proven successful at discovering novel proteins of canonical size cannot be naïvely applied to the discovery of peptides. Peptides can be encoded directly in the genome as short open reading frames (smORFs), or they can be derived from larger proteins by proteolysis. Both of these peptide classes pose challenges as simple methods for their prediction result in large numbers of false positives. Similarly, functional annotation of larger proteins, traditionally based on sequence similarity to infer orthology and then transferring functions between characterized proteins and uncharacterized ones, cannot be applied for short sequences. The use of these techniques is much more limited and alternative approaches based on machine learning are used instead. Here, we review the limitations of traditional methods as well as the alternative methods that have recently been developed for discovering novel bioactive peptides with a focus on prokaryotic genomes and metagenomes.
Collapse
Affiliation(s)
- Luis Pedro Coelho
- Centre for Microbiome Research, School of Biomedical Sciences, Queensland University of Technology, Woolloongabba, Queensland, Australia
- Institute of Science and Technology for Brain-Inspired Intelligence – ISTBI, Fudan University, Shanghai, China
| | - Célio Dias Santos-Júnior
- Institute of Science and Technology for Brain-Inspired Intelligence – ISTBI, Fudan University, Shanghai, China
- Laboratory of Microbial Processes & Biodiversity – LMPB, Hydrobiology Department, Federal University of São Carlos – UFSCar, São Paulo, Brazil
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, Pennsylvania, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, Pennsylvania, USA
| |
Collapse
|
16
|
Lee B, Shin D. Contrastive learning for enhancing feature extraction in anticancer peptides. Brief Bioinform 2024; 25:bbae220. [PMID: 38725157 PMCID: PMC11082072 DOI: 10.1093/bib/bbae220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 03/28/2024] [Accepted: 04/21/2024] [Indexed: 05/13/2024] Open
Abstract
Cancer, recognized as a primary cause of death worldwide, has profound health implications and incurs a substantial social burden. Numerous efforts have been made to develop cancer treatments, among which anticancer peptides (ACPs) are garnering recognition for their potential applications. While ACP screening is time-consuming and costly, in silico prediction tools provide a way to overcome these challenges. Herein, we present a deep learning model designed to screen ACPs using peptide sequences only. A contrastive learning technique was applied to enhance model performance, yielding better results than a model trained solely on binary classification loss. Furthermore, two independent encoders were employed as a replacement for data augmentation, a technique commonly used in contrastive learning. Our model achieved superior performance on five of six benchmark datasets against previous state-of-the-art models. As prediction tools advance, the potential in peptide-based cancer therapeutics increases, promising a brighter future for oncology research and patient care.
Collapse
Affiliation(s)
- Byungjo Lee
- Research Institute, National Cancer Center, 323, Ilsan-ro, Ilsandong-gu, Goyang, 10408, Republic of Korea
| | - Dongkwan Shin
- Research Institute, National Cancer Center, 323, Ilsan-ro, Ilsandong-gu, Goyang, 10408, Republic of Korea
- Department of Cancer Biomedical Science, National Cancer Center Graduate School of Cancer Science and Policy, 323, Ilsan-ro, Ilsandong-gu, Goyang, 10408, Republic of Korea
| |
Collapse
|
17
|
Wang C, Wang Y, Ding P, Li S, Yu X, Yu B. ML-FGAT: Identification of multi-label protein subcellular localization by interpretable graph attention networks and feature-generative adversarial networks. Comput Biol Med 2024; 170:107944. [PMID: 38215617 DOI: 10.1016/j.compbiomed.2024.107944] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 12/08/2023] [Accepted: 01/01/2024] [Indexed: 01/14/2024]
Abstract
The prediction of multi-label protein subcellular localization (SCL) is a pivotal area in bioinformatics research. Recent advancements in protein structure research have facilitated the application of graph neural networks. This paper introduces a novel approach termed ML-FGAT. The approach begins by extracting node information of proteins from sequence data, physical-chemical properties, evolutionary insights, and structural details. Subsequently, various evolutionary techniques are integrated to consolidate multi-view information. A linear discriminant analysis framework, grounded on entropy weight, is then employed to reduce the dimensionality of the merged features. To enhance the robustness of the model, the training dataset is augmented using feature-generative adversarial networks. For the primary prediction step, graph attention networks are employed to determine multi-label protein SCL, leveraging both node and neighboring information. The interpretability is enhanced by analyzing the attention weight parameters. The training is based on the Gram-positive bacteria dataset, while validation employs newly constructed datasets: human, virus, Gram-negative bacteria, plant, and SARS-CoV-2. Following a leave-one-out cross-validation procedure, ML-FGAT demonstrates noteworthy superiority in this domain.
Collapse
Affiliation(s)
- Congjing Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; School of Data Science, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Yifei Wang
- College of Mathematics and Physics, Qingdao University of Science and Technology, Qingdao, 266061, China; School of Data Science, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Pengju Ding
- College of Information Science and Technology, Qingdao University of Science and Technology, Qingdao, 266061, China
| | - Shan Li
- School of Mathematics and Statistics, Central South University, Changsha, 410083, China
| | - Xu Yu
- Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum, Qingdao, 266580, China
| | - Bin Yu
- School of Data Science, Qingdao University of Science and Technology, Qingdao, 266061, China; School of Data Science, University of Science and Technology of China, Hefei, 230027, China.
| |
Collapse
|
18
|
Filgueiras LA, de Andrade FDCP, Iwao Horita S, Shirsat SD, Achal V, Rai M, Henriques-Pons A, Mendes AN. Analysis of SIKVAV's receptor affinity, pharmacokinetics, and pharmacological characteristics: a matrikine with potent biological function. J Biomol Struct Dyn 2024:1-23. [PMID: 38345036 DOI: 10.1080/07391102.2024.2313709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 01/27/2024] [Indexed: 03/08/2025]
Abstract
Matrikines are biologically active peptides generated from fragments fragmentation of extracellular matrix components (ECM) that are functionally distinct from the original full-length molecule. The active matricryptic sites can be unmasked by ECM components enzymatic degradation or multimerization, heterotypic binding, adsorption to other molecules, cell-mediated mechanical forces, exposure to reactive oxygen species, ECM denaturation, and others. Laminin α1-derived peptide (SIKVAV) is a bioactive peptide derived from laminin-111 that participates in tumor development, cell proliferation, angiogenesis in various cell types. SIKVAV has also a potential pharmaceutical activity that may be used for tissue regeneration and bioengineering in Alzheimer's disease and muscular dystrophies. In this work, we made computational analyzes of SIKVAV regarding the ADMET panel, that stands for Administration, Distribution, Metabolism, Excretion, and Toxicity. Docking analyzes using the α3β1 and α6β1 integrin receptors were performed to fill in the gaps in the SIKVAV's signaling pathway and coupling tests showed that SIKVAV can interact with both receptors. Moreover, there is no indication of cytotoxicity, mutagenic or carcinogenic activity, skin or oral sensitivity. Our analysis suggests that SIKVAV has a high probability of interacting with peroxisome proliferator-activated receptor-gamma (NR-PPAR-γ), which has anti-inflammatory activity. The results of bioinformatics can help understand the participation of SIKVAV in homeostasis and influence the understanding of how this peptide can act as a biological asset in the control of dystrophies, neurodegenerative diseases, and tissue engineering.
Collapse
Affiliation(s)
- Livia Alves Filgueiras
- Laboratory of Innovation in Science and Technology - LACITEC, Department of Biophysics and Physiology, Federal University of Piauí, Teresina, Brazil
| | | | - Samuel Iwao Horita
- Laboratory of Innovation in Therapies, Education, and Bioproducts - LITEB, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| | - Shubhangi D Shirsat
- Laboratory of Innovation in Therapies, Education, and Bioproducts - LITEB, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| | - Varenyam Achal
- Environmental Engineering Program, Guangdong Technion - Israel Institute of Technology, Shantou, China
- Technion - Israel Institute of Technology, Haifa, Israel
| | - Mahendra Rai
- Department of Biotechnology, SGB Amravati University, Amravati, India
| | - Andrea Henriques-Pons
- Laboratory of Innovation in Therapies, Education, and Bioproducts - LITEB, Oswaldo Cruz Foundation, Rio de Janeiro, Brazil
| | - Anderson Nogueira Mendes
- Laboratory of Innovation in Science and Technology - LACITEC, Department of Biophysics and Physiology, Federal University of Piauí, Teresina, Brazil
| |
Collapse
|
19
|
Bao W, Liu Y, Chen B. Oral_voting_transfer: classification of oral microorganisms' function proteins with voting transfer model. Front Microbiol 2024; 14:1277121. [PMID: 38384719 PMCID: PMC10879614 DOI: 10.3389/fmicb.2023.1277121] [Citation(s) in RCA: 17] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Accepted: 12/19/2023] [Indexed: 02/23/2024] Open
Abstract
Introduction The oral microbial group typically represents the human body's highly complex microbial group ecosystem. Oral microorganisms take part in human diseases, including Oral cavity inflammation, mucosal disease, periodontal disease, tooth decay, and oral cancer. On the other hand, oral microbes can also cause endocrine disorders, digestive function, and nerve function disorders, such as diabetes, digestive system diseases, and Alzheimer's disease. It was noted that the proteins of oral microbes play significant roles in these serious diseases. Having a good knowledge of oral microbes can be helpful in analyzing the procession of related diseases. Moreover, the high-dimensional features and imbalanced data lead to the complexity of oral microbial issues, which can hardly be solved with traditional experimental methods. Methods To deal with these challenges, we proposed a novel method, which is oral_voting_transfer, to deal with such classification issues in the field of oral microorganisms. Such a method employed three features to classify the five oral microorganisms, including Streptococcus mutans, Staphylococcus aureus, abiotrophy adjacent, bifidobacterial, and Capnocytophaga. Firstly, we utilized the highly effective model, which successfully classifies the organelle's proteins and transfers to deal with the oral microorganisms. And then, some classification methods can be treated as the local classifiers in this work. Finally, the results are voting from the transfer classifiers and the voting ones. Results and discussion The proposed method achieved the well performances in the five oral microorganisms. The oral_voting_transfer is a standalone tool, and all its source codes are publicly available at https://github.com/baowz12345/voting_transfer.
Collapse
Affiliation(s)
- Wenzheng Bao
- School of Information Engineering, Xuzhou University of Technology, Xuzhou, China
| | - Yujun Liu
- School of Information Engineering, Xuzhou University of Technology, Xuzhou, China
| | - Baitong Chen
- The Affiliated Xuzhou Municipal Hospital of Xuzhou Medical University, Xuzhou, China
- Department of Stomatology, Xuzhou First People’s Hospital, Xuzhou, China
| |
Collapse
|
20
|
Chen S, Semenov I, Zhang F, Yang Y, Geng J, Feng X, Meng Q, Lei K. An effective framework for predicting drug-drug interactions based on molecular substructures and knowledge graph neural network. Comput Biol Med 2024; 169:107900. [PMID: 38199213 DOI: 10.1016/j.compbiomed.2023.107900] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 11/27/2023] [Accepted: 12/23/2023] [Indexed: 01/12/2024]
Abstract
Drug-drug interactions (DDIs) play a central role in drug research, as the simultaneous administration of multiple drugs can have harmful or beneficial effects. Harmful interactions lead to adverse reactions, some of which can be life-threatening, while beneficial interactions can promote efficacy. Therefore, it is crucial for physicians, patients, and the research community to identify potential DDIs. Although many AI-based techniques have been proposed for predicting DDIs, most existing computational models primarily focus on integrating multiple data sources or combining popular embedding methods. Researchers often overlook the valuable information within the molecular structure of drugs or only consider the structural information of drugs, neglecting the relationship or topological information between drugs and other biological objects. In this study, we propose MSKG-DDI - a two-component framework that incorporates the Drug Chemical Structure Graph-based component and the Drug Knowledge Graph-based component to capture multimodal characteristics of drugs. Subsequently, a multimodal fusion neural layer is utilized to explore the complementarity between multimodal representations of drugs. Extensive experiments were conducted using two real-world datasets, and the results demonstrate that MSKG-DDI outperforms other state-of-the-art models in binary-class, multi-class, and multi-label prediction tasks under both transductive and inductive settings. Furthermore, the ablation analysis further confirms the practical usefulness of MSKG-DDI.
Collapse
Affiliation(s)
- Siqi Chen
- School of Information Science and Engineering, Chongqing Jiaotong University, Chongqing, 400074, China
| | - Ivan Semenov
- College of Intelligence and Computing, Tianjin University, Tianjin, 300072, China
| | - Fengyun Zhang
- College of Intelligence and Computing, Tianjin University, Tianjin, 300072, China
| | - Yang Yang
- College of Intelligence and Computing, Tianjin University, Tianjin, 300072, China
| | - Jie Geng
- TianJin Chest Hospital, Tianjin University, Tianjin, 300222, China
| | - Xuequan Feng
- Tianjin First Central Hospital, Tianjin, 300192, China.
| | - Qinghua Meng
- Tianjin Key Laboratory of Sports Physiology and Sports Medicine, Tianjin University of Sport, Tianjin, 301617, China
| | - Kaiyou Lei
- College of Computer and Information Science, Southwest University, Chongqing, 400715, China
| |
Collapse
|
21
|
Guan J, Yao L, Chung CR, Xie P, Zhang Y, Deng J, Chiang YC, Lee TY. Predicting Anti-inflammatory Peptides by Ensemble Machine Learning and Deep Learning. J Chem Inf Model 2023; 63:7886-7898. [PMID: 38054927 DOI: 10.1021/acs.jcim.3c01602] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/07/2023]
Abstract
Inflammation is a biological response to harmful stimuli, aiding in the maintenance of tissue homeostasis. However, excessive or persistent inflammation can precipitate a myriad of pathological conditions. Although current treatments such as NSAIDs, corticosteroids, and immunosuppressants are effective, they can have side effects and resistance issues. In this backdrop, anti-inflammatory peptides (AIPs) have emerged as a promising therapeutic approach against inflammation. Leveraging machine learning methods, we have the opportunity to accelerate the discovery and investigation of these AIPs more effectively. In this study, we proposed an advanced framework by ensemble machine learning and deep learning for AIP prediction. Initially, we constructed three individual models with extremely randomized trees (ET), gated recurrent unit (GRU), and convolutional neural networks (CNNs) with attention mechanism and then used stacking architecture to build the final predictor. By utilizing various sequence encodings and combining the strengths of different algorithms, our predictor demonstrated exemplary performance. On our independent test set, our model achieved an accuracy, MCC, and F1-score of 0.757, 0.500, and 0.707, respectively, clearly outperforming other contemporary AIP prediction methods. Additionally, our model offers profound insights into the feature interpretation of AIPs, establishing a valuable knowledge foundation for the design and development of future anti-inflammatory strategies.
Collapse
Affiliation(s)
- Jiahui Guan
- School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Lantian Yao
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, Taoyuan 320317, Taiwan
| | - Peilin Xie
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Yilun Zhang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Junyang Deng
- School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Ying-Chih Chiang
- School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
- Kobilka Institute of Innovative Drug Discovery, School of Medicine, The Chinese University of Hong Kong, Shenzhen 518172, China
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan
- Center for Intelligent Drug Systems and Smart Bio-devices (IDS2B), National Yang Ming Chiao Tung University, Hsinchu 300093, Taiwan
| |
Collapse
|
22
|
Chung CR, Liou JT, Wu LC, Horng JT, Lee TY. Multi-label classification and features investigation of antimicrobial peptides with various functional classes. iScience 2023; 26:108250. [PMID: 38025779 PMCID: PMC10679894 DOI: 10.1016/j.isci.2023.108250] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2023] [Revised: 07/15/2023] [Accepted: 10/16/2023] [Indexed: 12/01/2023] Open
Abstract
The challenge of drug-resistant bacteria to global public health has led to increased attention on antimicrobial peptides (AMPs) as a targeted therapeutic alternative with a lower risk of resistance. However, high production costs and limitations in functional class prediction have hindered progress in this field. In this study, we used multi-label classifiers with binary relevance and algorithm adaptation techniques to predict different functions of AMPs across a wide range of pathogen categories, including bacteria, mammalian cells, fungi, viruses, and cancer cells. Our classifiers attained promising AUC scores varying from 0.8492 to 0.9126 on independent testing data. Forward feature selection identified sequence order and charge as critical, with specific amino acids (C and E) as discriminative. These findings provide valuable insights for the design of antimicrobial peptides (AMPs) with multiple functionalities, thus contributing to the broader effort to combat drug-resistant pathogens.
Collapse
Affiliation(s)
- Chia-Ru Chung
- Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
| | - Jhen-Ting Liou
- Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
| | - Li-Ching Wu
- Department of Biomedical Sciences and Engineering, National Central University, Taoyuan, Taiwan
| | - Jorng-Tzong Horng
- Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
- Department of Bioinformatics and Medical Engineering, Asia University, Taoyuan City, Taiwan
| | - Tzong-Yi Lee
- Institute of Bioinformatics and Systems Biology, National Yang Ming Chiao Tung University, Hsinchu City, Taiwan
- Center for Intelligent Drug Systems and Smart Biodevices (IDS2B), National Yang Ming Chiao Tung University, Hsinchu City, Taiwan
| |
Collapse
|
23
|
Lv H, Yan K, Liu B. TPpred-LE: therapeutic peptide function prediction based on label embedding. BMC Biol 2023; 21:238. [PMID: 37904157 PMCID: PMC10617231 DOI: 10.1186/s12915-023-01740-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2023] [Accepted: 10/17/2023] [Indexed: 11/01/2023] Open
Abstract
BACKGROUND Therapeutic peptides play an essential role in human physiology, treatment paradigms and bio-pharmacy. Several computational methods have been developed to identify the functions of therapeutic peptides based on binary classification and multi-label classification. However, these methods fail to explicitly exploit the relationship information among different functions, preventing the further improvement of the prediction performance. Besides, with the development of peptide detection technology, peptide functions will be more comprehensively discovered. Therefore, it is necessary to explore computational methods for detecting therapeutic peptide functions with limited labeled data. RESULTS In this study, a novel method called TPpred-LE based on Transformer framework was proposed for predicting therapeutic peptide multiple functions, which can explicitly extract the function correlation information by using label embedding methodology and exploit the specificity information based on function-specific classifiers. Besides, we incorporated the multi-label classifier retraining approach (MCRT) into TPpred-LE to detect the new therapeutic functions with limited labeled data. Experimental results demonstrate that TPpred-LE outperforms the other state-of-the-art methods, and TPpred-LE with MCRT is robust for the limited labeled data. CONCLUSIONS In summary, TPpred-LE is a function-specific classifier for accurate therapeutic peptide function prediction, demonstrating the importance of the relationship information for therapeutic peptide function prediction. MCRT is a simple but effective strategy to detect functions with limited labeled data.
Collapse
Affiliation(s)
- Hongwu Lv
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, No. 5, South Zhongguancun Street, Haidian District, Beijing, 100081, China.
| |
Collapse
|
24
|
Yang Y, Wu H, Gao Y, Tong W, Li K. MFPPDB: a comprehensive multi-functional plant peptide database. FRONTIERS IN PLANT SCIENCE 2023; 14:1224394. [PMID: 37908832 PMCID: PMC10613858 DOI: 10.3389/fpls.2023.1224394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 09/29/2023] [Indexed: 11/02/2023]
Abstract
Plants produce a wide range of bioactive peptides as part of their innate defense mechanisms. With the explosive growth of plant-derived peptides, verifying the therapeutic function using traditional experimental methods are resources and time consuming. Therefore, it is necessary to predict the therapeutic function of plant-derived peptides more effectively and accurately with reduced waste of resources and thus expedite the development of plant peptides. We herein developed a repository of plant peptides predicted to have multiple therapeutic functions, named as MFPPDB (multi-functional plant peptide database). MFPPDB including 1,482,409 single or multiple functional plant origin therapeutic peptides derived from 121 fundamental plant species. The functional categories of these therapeutic peptides include 41 different features such as anti-bacterial, anti-fungal, anti-HIV, anti-viral, and anti-cancer. The detailed physicochemical information of these peptides was presented in functional search and physicochemical property search module, which can help users easily access the peptide information by the plant peptide species, ID, and functions, or by their peptide ID, isoelectric point, peptide sequence, and molecular weight through web-friendly interface. We further matched the predicted peptides to nine state-of-the-art curated functional peptide databases and found that at least 293,408 of the peptides possess functional potentials. Overall, MFPPDB integrated a massive number of plant peptides have single or multiple therapeutic functions, which will facilitate the comprehensive research in plant peptidomics. MFPPDB can be freely accessed through http://124.223.195.214:9188/mfppdb/index.
Collapse
Affiliation(s)
- Yaozu Yang
- School of Information and Computer, Anhui Agricultural University, Hefei, China
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei, Anhui, China
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, Anhui, China
| | - Hongwei Wu
- School of Information and Computer, Anhui Agricultural University, Hefei, China
| | - Yu Gao
- School of Information and Computer, Anhui Agricultural University, Hefei, China
| | - Wei Tong
- State Key Laboratory of Tea Plant Biology and Utilization, Anhui Agricultural University, Hefei, Anhui, China
| | - Ke Li
- School of Information and Computer, Anhui Agricultural University, Hefei, China
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Anhui University, Hefei, Anhui, China
- Anhui Provincial Engineering Laboratory for Beidou Precision Agriculture Information, Anhui Agricultural University, Hefei, Anhui, China
| |
Collapse
|
25
|
Cui Z, Wang SG, He Y, Chen ZH, Zhang QH. DeepTPpred: A Deep Learning Approach With Matrix Factorization for Predicting Therapeutic Peptides by Integrating Length Information. IEEE J Biomed Health Inform 2023; 27:4611-4622. [PMID: 37368803 DOI: 10.1109/jbhi.2023.3290014] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2023]
Abstract
The abuse of traditional antibiotics has led to increased resistance of bacteria and viruses. Efficient therapeutic peptide prediction is critical for peptide drug discovery. However, most of the existing methods only make effective predictions for one class of therapeutic peptides. It is worth noting that currently no predictive method considers sequence length information as a distinct feature of therapeutic peptides. In this article, a novel deep learning approach with matrix factorization for predicting therapeutic peptides (DeepTPpred) by integrating length information are proposed. The matrix factorization layer can learn the potential features of the encoded sequence through the mechanism of first compression and then restoration. And the length features of the sequence of therapeutic peptides are embedded with encoded amino acid sequences. To automatically learn therapeutic peptide predictions, these latent features are input into the neural networks with self-attention mechanism. On eight therapeutic peptide datasets, DeepTPpred achieved excellent prediction results. Based on these datasets, we first integrated eight datasets to obtain a full therapeutic peptide integration dataset. Then, we obtained two functional integration datasets based on the functional similarity of the peptides. Finally, we also conduct experiments on the latest versions of the ACP and CPP datasets. Overall, the experimental results show that our work is effective for the identification of therapeutic peptides.
Collapse
|
26
|
He S, Ye X, Sakurai T, Zou Q. MRMD3.0: A Python Tool and Webserver for Dimensionality Reduction and Data Visualization via an Ensemble Strategy. J Mol Biol 2023; 435:168116. [PMID: 37356901 DOI: 10.1016/j.jmb.2023.168116] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 04/06/2023] [Accepted: 04/14/2023] [Indexed: 06/27/2023]
Abstract
Dimensionality reduction is a hot topic in machine learning that can help researchers find key features from complex medical or biological data, which is crucial for biological sequence research, drug development, etc. However, when applied to specific datasets, different dimensionality reduction methods generate different results, which produces instability and makes tuning the parameters a time-consuming task. Exploring high quality features, genes, or attributes from complex data is an important task and challenge. To ensure the efficiency, robustness, and accuracy of experiments, in this work, we developed a dimensionality reduction tool MRMD3.0 based on the ensemble strategy of link analysis. It is mainly divided into two steps: first, the ensemble method is used to integrate different feature ranking algorithms to calculate feature importance, and then the forward feature search strategy combined with cross-validation is used to explore the proper feature combination. Compared with the previously developed version, MRMD3.0 has added more link-based ensemble algorithms, including PageRank, HITS, LeaderRank, and TrustRank. At the same time, more feature ranking algorithms have been added, and their effect and calculation speed have been greatly improved. In addition, the newest version provides an interface used by each feature ranking method and five kinds of charts to help users analyze features. Finally, we also provide an online webserver to help researchers analyze the data. Availability and implementation Webserver: http://lab.malab.cn/soft/MRMDv3/home.html. GitHub: https://github.com/heshida01/MRMD3.0.
Collapse
Affiliation(s)
- Shida He
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China; Department of Computer Science, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan
| | - Xiucai Ye
- Department of Computer Science, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan
| | - Tetsuya Sakurai
- Department of Computer Science, University of Tsukuba, Tsukuba, Ibaraki 305-8577, Japan
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China.
| |
Collapse
|
27
|
Charoenkwan P, Chumnanpuen P, Schaduangrat N, Oh C, Manavalan B, Shoombuatong W. PSRQSP: An effective approach for the interpretable prediction of quorum sensing peptide using propensity score representation learning. Comput Biol Med 2023; 158:106784. [PMID: 36989748 DOI: 10.1016/j.compbiomed.2023.106784] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Revised: 02/07/2023] [Accepted: 03/10/2023] [Indexed: 03/14/2023]
Abstract
Quorum sensing peptides (QSPs) are microbial signaling molecules involved in several cellular processes, such as cellular communication, virulence expression, bioluminescence, and swarming, in various bacterial species. Understanding QSPs is essential for identifying novel drug targets for controlling bacterial populations and pathogenicity. In this study, we present a novel computational approach (PSRQSP) for improving the prediction and analysis of QSPs. In PSRQSP, we develop a novel propensity score representation learning (PSR) scheme. Specifically, we utilized the PSR approach to extract and learn a comprehensive set of estimated propensities of 20 amino acids, 400 dipeptides, and 400 g-gap dipeptides from a pool of scoring card method-based models. Finally, to maximize the utility of the propensity scores, we explored a set of optimal propensity scores and combined them to construct a final meta-predictor. Our experimental results showed that combining multiview propensity scores was more beneficial for identifying QSPs than the conventional feature descriptors. Moreover, extensive benchmarking experiments based on the independent test were sufficient to demonstrate the predictive capability and effectiveness of PSRQSP by outperforming the conventional ML-based and existing methods, with an accuracy of 94.44% and AUC of 0.967. PSR-derived propensity scores were employed to determine the crucial physicochemical properties for a better understanding of the functional mechanisms of QSPs. Finally, we constructed an easy-to-use web server for the PSRQSP (http://pmlabstack.pythonanywhere.com/PSRQSP). PSRQSP is anticipated to be an efficient computational tool for accelerating the data-driven discovery of potential QSPs for drug discovery and development.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand; Omics Center for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok, 10900, Thailand
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Changmin Oh
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea.
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
28
|
Wang C, Zou Q, Ju Y, Shi H. Enhancer-FRL: Improved and Robust Identification of Enhancers and Their Activities Using Feature Representation Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:967-975. [PMID: 36063523 DOI: 10.1109/tcbb.2022.3204365] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Enhancers are crucial for precise regulation of gene expression, while enhancer identification and strength prediction are challenging because of their free distribution and tremendous number of similar fractions in the genome. Although several bioinformatics tools have been developed, shortfalls in these models remain, and their performances need further improvement. In the present study, a two-layer predictor called Enhancer-FRL was proposed for identifying enhancers (enhancers or nonenhancers) and their activities (strong and weak). More specifically, to build an efficient model, the feature representation learning scheme was applied to generate a 50D probabilistic vector based on 10 feature encodings and five machine learning algorithms. Subsequently, the multiview probabilistic features were integrated to construct the final prediction model. Compared with the single feature-based model, Enhancer-FRL showed significant performance improvement and model robustness. Performance assessment on the independent test dataset indicated that the proposed model outperformed state-of-the-art available toolkits. The webserver Enhancer-FRL is freely accessible at http://lab.malab.cn/∼wangchao/softwares/Enhancer-FRL/, The code and datasets can be downloaded at the webserver page or at the Github https://github.com/wangchao-malab/Enhancer-FRL/.
Collapse
|
29
|
Yan K, Lv H, Wen J, Guo Y, Xu Y, Liu B. PreTP-Stack: Prediction of Therapeutic Peptides Based on the Stacked Ensemble Learing. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1337-1344. [PMID: 35700248 DOI: 10.1109/tcbb.2022.3183018] [Citation(s) in RCA: 19] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Therapeutic peptide prediction is critical for drug development and therapeutic therapy. Researchers have developed several computational methods to identify different therapeutic peptide types. However, most computational methods focus on identifying the specific type of therapeutic peptides and fail to accurately predict all types of therapeutic peptides. Moreover, it is still challenging to utilize different properties features to predict the therapeutic peptides. In this study, a novel stacking framework PreTP-Stack is proposed for predicting different types of therapeutic peptide. PreTP-Stack is constructed based on ten different features and four predictors (Random Forest, Linear Discriminant Analysis, XGBoost and Support Vector Machine). Then the proposed method constructs an auto-weighted multi-view learning model as a final meta-classifier to enhance the performance of the basic models. Experimental results showed that the proposed method achieved better or highly comparable performance with the state-of-the-art methods for predicting eight types of therapeutic peptides A user-friendly web-server predictor is available at http://bliulab.net/PreTP-Stack.
Collapse
|
30
|
Zhang Y, Li Z. RF_phage virion: Classification of phage virion proteins with a random forest model. Front Genet 2023; 13:1103783. [PMID: 36846294 PMCID: PMC9945117 DOI: 10.3389/fgene.2022.1103783] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2022] [Accepted: 12/30/2022] [Indexed: 02/10/2023] Open
Abstract
Introduction: Phages play essential roles in biological procession, and the virion proteins encoded by the phage genome constitute critical elements of the assembled phage particle. Methods: This study uses machine learning methods to classify phage virion proteins. We proposed a novel approach, RF_phage virion, for the effective classification of the virion and non-virion proteins. The model uses four protein sequence coding methods as features, and the random forest algorithm was employed to solve the classification problem. Results: The performance of the RF_phage virion model was analyzed by comparing the performance of this algorithm with that of classical machine learning methods. The proposed method achieved a specificity (Sp) of 93.37%%, sensitivity (Sn) of 90.30%, accuracy (Acc) of 91.84%, Matthews correlation coefficient (MCC) of .8371, and an F1 score of .9196.
Collapse
Affiliation(s)
- Yanqing Zhang
- School of Finance, Xuzhou University of Technology, Xuzhou, China
| | - Zhiyuan Li
- School of Artificial Intelligence and Software College, Jiangsu Normal University Kewen College, Xuzhou, China,*Correspondence: Zhiyuan Li,
| |
Collapse
|
31
|
Guo X, Tiwari P, Zou Q, Ding Y. Subspace projection-based weighted echo state networks for predicting therapeutic peptides. Knowl Based Syst 2023. [DOI: 10.1016/j.knosys.2023.110307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
|
32
|
Accurate Prediction of Anti-hypertensive Peptides Based on Convolutional Neural Network and Gated Recurrent unit. Interdiscip Sci 2022; 14:879-894. [PMID: 35474167 DOI: 10.1007/s12539-022-00521-3] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2021] [Revised: 03/30/2022] [Accepted: 04/06/2022] [Indexed: 12/30/2022]
Abstract
Hypertension (HT) is a general disease, and also one of the most ordinary and major causes of cardiovascular disease. Some diseases are caused by high blood pressure, including impairment of heart and kidney function, cerebral hemorrhage and myocardial infarction. Due to the limitations of laboratory methods, bioactive peptides for the treatment of HT need a long time to be identified. Therefore, it is of great immediate significance for the identification of anti-hypertensive peptides (AHTPs). With the prevalence of machine learning, it is suggested to use it as a supplementary method for AHTPs classification. Therefore, we develop a new model to identify AHTPs based on multiple features and deep learning. And the deep model is constructed by combining a convolutional neural network (CNN) and a gated recurrent unit (GRU). The unique convolution structure is used to reduce the feature dimension and running time. The data processed by CNN is input into the recurrent structure GRU, and important information is filtered out through the reset gate and update gate. Finally, the output layer adopts Sigmoid activation function. Firstly, we use Kmer, the deviation between the dipeptide frequency and the expected mean (DDE), encoding based on grouped weight (EBGW), enhanced grouped amino acid composition (EGAAC) and dipeptide binary profile and frequency (DBPF) to extract features. For Kmer, DDE, EBGW and EGAAC, it is widely used in the field of protein research. DBPF is a new feature representation method designed by us. It corresponds dipeptides to binary numbers, and finally obtains a binary coding file and a frequency file. Then these features are spliced together and input into our proposed model for prediction and analysis. After a tenfold cross-validation test, this model has a better competitive advantage than the previous methods, and the accuracy is 96.23% and 99.10%, respectively. From the results, compared with the previous methods, it has been greatly improved. It shows that the combination of convolution calculation and recurrent structure has a positive impact on the classification of AHTPs. The results show that this method is a feasible, efficient and competitive sequence analysis tool for AHTPs. Meanwhile, we design a friendly online prediction tool and it is freely accessible at http://ahtps.zhanglab.site/ .
Collapse
|
33
|
Wang H, Li H, Gao W, Xie J. PrUb-EL: A hybrid framework based on deep learning for identifying ubiquitination sites in Arabidopsis thaliana using ensemble learning strategy. Anal Biochem 2022; 658:114935. [PMID: 36206844 DOI: 10.1016/j.ab.2022.114935] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2022] [Revised: 09/25/2022] [Accepted: 09/26/2022] [Indexed: 12/30/2022]
Abstract
Identification of ubiquitination sites is central to many biological experiments. Ubiquitination is a kind of post-translational protein modification (PTM). It is a key mechanism for increasing protein diversity and plays a vital role in regulating cell function. In recent years, many models have been developed to predict ubiquitination sites in humans, mice and yeast. However, few studies have predicted ubiquitination sites in Arabidopsis thaliana. In view of this, a deep network model named PrUb-EL is proposed to predict ubiquitination sites in Arabidopsis thaliana. Firstly, six features based on the protein sequence are extracted with amino acid index database (AAindex), dipeptide deviates from the expected mean (DDE), dipeptide composition (DPC), blocks substitution matrix (BLOSUM62), enhanced amino acid composition (EAAC) and binary encoding. Secondly, the synthetic minority over-sampling technique (SMOTE) is utilized to process the imbalanced data set. Then a new classifier named DG is presented, which includes Dense block, Residual block and Gated recurrent unit (GRU) block. Finally, each of six feature extraction methods is integrated into the DG model, and the ensemble learning strategy is used to gain the final prediction result. Experimental results show that PrUb-EL has good predictive ability with the accuracy (ACC) and area under the ROC curve (auROC) values of 91.00% and 97.70% using 5-fold cross-validation, respectively. Note that the values of ACC and auROC are 88.58% and 96.09% in the independent test, respectively. Compared with previous studies, our model has significantly improved performance thus it is an excellent method for identifying ubiquitination sites in Arabidopsis thaliana. The datasets and code used for the article are available at https://github.com/Tom-Wangy/PreUb-EL.git.
Collapse
Affiliation(s)
- Houqiang Wang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| | - Hong Li
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China.
| | - Weifeng Gao
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| | - Jin Xie
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, PR China
| |
Collapse
|
34
|
The dynamic landscape of peptide activity prediction. Comput Struct Biotechnol J 2022; 20:6526-6533. [DOI: 10.1016/j.csbj.2022.11.043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 11/21/2022] [Accepted: 11/21/2022] [Indexed: 11/27/2022] Open
|
35
|
Improved prediction and characterization of blood-brain barrier penetrating peptides using estimated propensity scores of dipeptides. J Comput Aided Mol Des 2022; 36:781-796. [DOI: 10.1007/s10822-022-00476-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 09/15/2022] [Indexed: 11/27/2022]
|
36
|
ACP-ADA: A Boosting Method with Data Augmentation for Improved Prediction of Anticancer Peptides. Int J Mol Sci 2022; 23:ijms232012194. [PMID: 36293050 PMCID: PMC9603247 DOI: 10.3390/ijms232012194] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 10/08/2022] [Accepted: 10/11/2022] [Indexed: 11/30/2022] Open
Abstract
Cancer is the second-leading cause of death worldwide, and therapeutic peptides that target and destroy cancer cells have received a great deal of interest in recent years. Traditional wet experiments are expensive and inefficient for identifying novel anticancer peptides; therefore, the development of an effective computational approach is essential to recognize ACP candidates before experimental methods are used. In this study, we proposed an Ada-boosting algorithm with the base learner random forest called ACP-ADA, which integrates binary profile feature, amino acid index, and amino acid composition with a 210-dimensional feature space vector to represent the peptides. Training samples in the feature space were augmented to increase the sample size and further improve the performance of the model in the case of insufficient samples. Furthermore, we used five-fold cross-validation to find model parameters, and the cross-validation results showed that ACP-ADA outperforms existing methods for this feature combination with data augmentation in terms of performance metrics. Specifically, ACP-ADA recorded an average accuracy of 86.4% and a Mathew’s correlation coefficient of 74.01% for dataset ACP740 and 90.83% and 81.65% for dataset ACP240; consequently, it can be a very useful tool in drug development and biomedical research.
Collapse
|
37
|
Rodrigues CHM, Garg A, Keizer D, Pires DEV, Ascher DB. CSM-peptides: A computational approach to rapid identification of therapeutic peptides. Protein Sci 2022; 31:e4442. [PMID: 36173168 PMCID: PMC9518225 DOI: 10.1002/pro.4442] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2022] [Revised: 08/29/2022] [Accepted: 08/30/2022] [Indexed: 11/25/2022]
Abstract
Peptides are attractive alternatives for the development of new therapeutic strategies due to their versatility and low complexity of synthesis. Increasing interest in these molecules has led to the creation of large collections of experimentally characterized therapeutic peptides, which greatly contributes to development of data-driven computational approaches. Here we propose CSM-peptides, a novel machine learning method for rapid identification of eight different types of therapeutic peptides: anti-angiogenic, anti-bacterial, anti-cancer, anti-inflammatory, anti-viral, cell-penetrating, quorum sensing, and surface binding. Our method has shown to outperform existing approaches, achieving an AUC of up to 0.92 on independent blind tests, and consistent performance on cross-validation. We anticipate CSM-peptides to be of great value in helping screening large libraries to identify novel peptides with therapeutic potential and have made it freely available as a user-friendly web server and Application Programming Interface at https://biosig.lab.uq.edu.au/csm_peptides.
Collapse
Affiliation(s)
- Carlos H. M. Rodrigues
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- School of Chemistry and Molecular BiosciencesUniversity of QueenslandSt LuciaQueenslandAustralia
| | - Anjali Garg
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
| | - David Keizer
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
| | - Douglas E. V. Pires
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- School of Computing and Information SystemsUniversity of MelbourneMelbourneVictoriaAustralia
| | - David B. Ascher
- Structural Biology and Bioinformatics, Department of BiochemistryUniversity of MelbourneMelbourneVictoriaAustralia
- Systems and Computational Biology, Bio21 Institute, University of MelbourneMelbourneVictoriaAustralia
- Computational Biology and Clinical InformaticsBaker Heart and Diabetes InstituteMelbourneVictoriaAustralia
- School of Chemistry and Molecular BiosciencesUniversity of QueenslandSt LuciaQueenslandAustralia
| |
Collapse
|
38
|
Yan W, Tang W, Wang L, Bin Y, Xia J. PrMFTP: Multi-functional therapeutic peptides prediction based on multi-head self-attention mechanism and class weight optimization. PLoS Comput Biol 2022; 18:e1010511. [PMID: 36094961 PMCID: PMC9499272 DOI: 10.1371/journal.pcbi.1010511] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Revised: 09/22/2022] [Accepted: 08/24/2022] [Indexed: 11/18/2022] Open
Abstract
Prediction of therapeutic peptide is a significant step for the discovery of promising therapeutic drugs. Most of the existing studies have focused on the mono-functional therapeutic peptide prediction. However, the number of multi-functional therapeutic peptides (MFTP) is growing rapidly, which requires new computational schemes to be proposed to facilitate MFTP discovery. In this study, based on multi-head self-attention mechanism and class weight optimization algorithm, we propose a novel model called PrMFTP for MFTP prediction. PrMFTP exploits multi-scale convolutional neural network, bi-directional long short-term memory, and multi-head self-attention mechanisms to fully extract and learn informative features of peptide sequence to predict MFTP. In addition, we design a class weight optimization scheme to address the problem of label imbalanced data. Comprehensive evaluation demonstrate that PrMFTP is superior to other state-of-the-art computational methods for predicting MFTP. We provide a user-friendly web server of PrMFTP, which is available at http://bioinfo.ahu.edu.cn/PrMFTP.
Collapse
Affiliation(s)
- Wenhui Yan
- Information Materials and Intelligent Sensing Laboratory of Anhui Province and Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
| | - Wending Tang
- Information Materials and Intelligent Sensing Laboratory of Anhui Province and Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
| | - Lihua Wang
- Information Materials and Intelligent Sensing Laboratory of Anhui Province and Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
| | - Yannan Bin
- Information Materials and Intelligent Sensing Laboratory of Anhui Province and Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
- * E-mail: (YB); (JX)
| | - Junfeng Xia
- Information Materials and Intelligent Sensing Laboratory of Anhui Province and Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, Anhui, China
- * E-mail: (YB); (JX)
| |
Collapse
|
39
|
Niu M, Zou Q. SgRNA-RF: Identification of SgRNA On-Target Activity With Imbalanced Datasets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2442-2453. [PMID: 33979289 DOI: 10.1109/tcbb.2021.3079116] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Single-guide RNA is a guide RNA (gRNA), which guides the insertion or deletion of uridine residues into kinetoplastid during RNA editing. It is a small non-coding RNA that can be combined with pre -mRNA pairing. SgRNA is a critical component of the CRISPR/Cas9 gene knockout system and play an important role in gene editing and gene regulation. It is important to accurately and quickly identify highly on-target activity sgRNAs. Due to its importance, several computational predictors have been proposed to predict sgRNAs on-target activity. All these methods have clearly contributed to the development of this very important field. However, they also have certain limitations. In the paper, we developed a new classifier SgRNA-RF, which extracts the features of nucleic acid composition and structure of on-target activity sgRNA sequence and identified by random forest algorithm. In addition to solving an imbalanced dataset, this paper proposed a new method called CS-Smote. We compared sgRNA-RF with state-of-the-art predictors on the five datasets, and found SgRNA-RF significantly improved the identification accuracy, with accuracies of 0.8636,0.9161,0.894,0.938,0.965,0.77,0.979,0.973, respectively. The user-friendly web server that implements sgRNA-RF is freely available at http://server.malab.cn/sgRNA-RF/.
Collapse
|
40
|
Kurata H, Tsukiyama S, Manavalan B. iACVP: markedly enhanced identification of anti-coronavirus peptides using a dataset-specific word2vec model. Brief Bioinform 2022; 23:6623727. [PMID: 35772910 DOI: 10.1093/bib/bbac265] [Citation(s) in RCA: 23] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 05/23/2022] [Accepted: 06/06/2022] [Indexed: 01/22/2023] Open
Abstract
The COVID-19 pandemic caused several million deaths worldwide. Development of anti-coronavirus drugs is thus urgent. Unlike conventional non-peptide drugs, antiviral peptide drugs are highly specific, easy to synthesize and modify, and not highly susceptible to drug resistance. To reduce the time and expense involved in screening thousands of peptides and assaying their antiviral activity, computational predictors for identifying anti-coronavirus peptides (ACVPs) are needed. However, few experimentally verified ACVP samples are available, even though a relatively large number of antiviral peptides (AVPs) have been discovered. In this study, we attempted to predict ACVPs using an AVP dataset and a small collection of ACVPs. Using conventional features, a binary profile and a word-embedding word2vec (W2V), we systematically explored five different machine learning methods: Transformer, Convolutional Neural Network, bidirectional Long Short-Term Memory, Random Forest (RF) and Support Vector Machine. Via exhaustive searches, we found that the RF classifier with W2V consistently achieved better performance on different datasets. The two main controlling factors were: (i) the dataset-specific W2V dictionary was generated from the training and independent test datasets instead of the widely used general UniProt proteome and (ii) a systematic search was conducted and determined the optimal k-mer value in W2V, which provides greater discrimination between positive and negative samples. Therefore, our proposed method, named iACVP, consistently provides better prediction performance compared with existing state-of-the-art methods. To assist experimentalists in identifying putative ACVPs, we implemented our model as a web server accessible via the following link: http://kurata35.bio.kyutech.ac.jp/iACVP.
Collapse
Affiliation(s)
- Hiroyuki Kurata
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Sho Tsukiyama
- Department of Bioscience and Bioinformatics, Kyushu Institute of Technology, 680-4 Kawazu, Iizuka, Fukuoka 820-8502, Japan
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| |
Collapse
|
41
|
Li Y, Li X, Liu Y, Yao Y, Huang G. MPMABP: A CNN and Bi-LSTM-Based Method for Predicting Multi-Activities of Bioactive Peptides. Pharmaceuticals (Basel) 2022; 15:707. [PMID: 35745625 PMCID: PMC9231127 DOI: 10.3390/ph15060707] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2022] [Revised: 05/23/2022] [Accepted: 05/30/2022] [Indexed: 12/30/2022] Open
Abstract
Bioactive peptides are typically small functional peptides with 2-20 amino acid residues and play versatile roles in metabolic and biological processes. Bioactive peptides are multi-functional, so it is vastly challenging to accurately detect all their functions simultaneously. We proposed a convolution neural network (CNN) and bi-directional long short-term memory (Bi-LSTM)-based deep learning method (called MPMABP) for recognizing multi-activities of bioactive peptides. The MPMABP stacked five CNNs at different scales, and used the residual network to preserve the information from loss. The empirical results showed that the MPMABP is superior to the state-of-the-art methods. Analysis on the distribution of amino acids indicated that the lysine preferred to appear in the anti-cancer peptide, the leucine in the anti-diabetic peptide, and the proline in the anti-hypertensive peptide. The method and analysis are beneficial to recognize multi-activities of bioactive peptides.
Collapse
Affiliation(s)
- You Li
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (Y.L.); (X.L.)
| | - Xueyong Li
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (Y.L.); (X.L.)
| | - Yuewu Liu
- College of Information and Intelligence, Hunan Agricultural University, Changsha 410128, China;
| | - Yuhua Yao
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China;
| | - Guohua Huang
- School of Electrical Engineering, Shaoyang University, Shaoyang 422000, China; (Y.L.); (X.L.)
| |
Collapse
|
42
|
Charoenkwan P, Schaduangrat N, Hasan MM, Moni MA, Lió P, Shoombuatong W. Empirical comparison and analysis of machine learning-based predictors for predicting and analyzing of thermophilic proteins. EXCLI JOURNAL 2022; 21:554-570. [PMID: 35651661 PMCID: PMC9150013 DOI: 10.17179/excli2022-4723] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 02/21/2022] [Indexed: 12/15/2022]
Abstract
Thermophilic proteins (TPPs) are critical for basic research and in the food industry due to their ability to maintain a thermodynamically stable fold at extremely high temperatures. Thus, the expeditious identification of novel TPPs through computational models from protein sequences is very desirable. Over the last few decades, a number of computational methods, especially machine learning (ML)-based methods, for in silico prediction of TPPs have been developed. Therefore, it is desirable to revisit these methods and summarize their advantages and disadvantages in order to further develop new computational approaches to achieve more accurate and improved prediction of TPPs. With this goal in mind, we comprehensively investigate a large collection of fourteen state-of-the-art TPP predictors in terms of their dataset size, feature encoding schemes, feature selection strategies, ML algorithms, evaluation strategies and web server/software usability. To the best of our knowledge, this article represents the first comprehensive review on the development of ML-based methods for in silico prediction of TPPs. Among these TPP predictors, they can be classified into two groups according to the interpretability of ML algorithms employed (i.e., computational black-box methods and computational white-box methods). In order to perform the comparative analysis, we conducted a comparative study on several currently available TPP predictors based on two benchmark datasets. Finally, we provide future perspectives for the design and development of new computational models for TPP prediction. We hope that this comprehensive review will facilitate researchers in selecting an appropriate TPP predictor that is the most suitable one to deal with their purposes and provide useful perspectives for the development of more effective and accurate TPP predictors.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand, 50200
| | - Nalini Schaduangrat
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| | - Md Mehedi Hasan
- Tulane Center for Biomedical Informatics and Genomics, Division of Biomedical Informatics and Genomics, John W. Deming Department of Medicine, School of Medicine, Tulane University, New Orleans, LA 70112, USA
| | - Mohammad Ali Moni
- School of Health and Rehabilitation Sciences, Faculty of Health and Behavioural Sciences, the University of Queensland, St Lucia, QLD 4072, Australia
| | - Pietro Lió
- Department of Computer Science and Technology, University of Cambridge, Cambridge, CB3 0FD, UK
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
| |
Collapse
|
43
|
Prediction of Cell-Penetrating Peptides Using a Novel HSIC-Based Multiview TSK Fuzzy System. APPLIED SCIENCES-BASEL 2022. [DOI: 10.3390/app12115383] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Cell-penetrating peptides (CPPs) are short peptides that can carry cargo into cells. CPPs are widely utilized due to their powerful loading capacity and transduction efficiency. Identifying CPPs is the basis for studying their functions and mechanisms; however, experimental methods to identify CPPs are expensive and time-consuming. Recently, CPP predictors based on machine learning methods have become a research hotspot. Although considerable progress has been made, some challenges remain unresolved. First, most predictors employ a variety of feature descriptors to transform an original sequence into multiview data; however, extant methods ignore the relationships between different views, limiting further performance improvement. Second, most machine learning models are actually black boxes and cannot offer insightful advice. In this paper, a novel Hilbert–Schmidt independence criterion (HSIC)-based multiview TSK fuzzy system is proposed. Compared with other machine learning methods, TSK fuzzy systems have better interpretability, and the introduction of multiview mechanisms provides comprehensive insight into the intrinsic laws of the data. HSIC is utilized here to measure the independence and enhance the complementarity between different views. Notably, the proposed method attained prediction accuracy results of 92.2% and 96.2% for the training and independent test sets, respectively. The empirical results show that our promising approach features greater recognition performance than the state-of-the-art method.
Collapse
|
44
|
Yan K, Lv H, Guo Y, Chen Y, Wu H, Liu B. TPpred-ATMV: therapeutic peptide prediction by adaptive multi-view tensor learning model. Bioinformatics 2022; 38:2712-2718. [PMID: 35561206 DOI: 10.1093/bioinformatics/btac200] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2022] [Revised: 03/17/2022] [Accepted: 04/06/2022] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Therapeutic peptide prediction is important for the discovery of efficient therapeutic peptides and drug development. Researchers have developed several computational methods to identify different therapeutic peptide types. However, these computational methods focus on identifying some specific types of therapeutic peptides, failing to predict the comprehensive types of therapeutic peptides. Moreover, it is still challenging to utilize different properties to predict the therapeutic peptides. RESULTS In this study, an adaptive multi-view based on the tensor learning framework TPpred-ATMV is proposed for predicting different types of therapeutic peptides. TPpred-ATMV constructs the class and probability information based on various sequence features. We constructed the latent subspace among the multi-view features and constructed an auto-weighted multi-view tensor learning model to utilize the high correlation based on the multi-view features. Experimental results showed that the TPpred-ATMV is better than or highly comparable with the other state-of-the-art methods for predicting eight types of therapeutic peptides. AVAILABILITY AND IMPLEMENTATION The code of TPpred-ATMV is accessed at: https://github.com/cokeyk/TPpred-ATMV. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Hongwu Lv
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yichen Guo
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Yongyong Chen
- Bio-Computing Research Center, Harbin Institute of Technology, Shenzhen 518055, China
| | - Hao Wu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
45
|
Chen Q, Yang C, Xie Y, Wang Y, Li X, Wang K, Huang J, Yan W. GM-Pep: A High Efficiency Strategy to De Novo Design Functional Peptide Sequences. J Chem Inf Model 2022; 62:2617-2629. [PMID: 35533298 DOI: 10.1021/acs.jcim.2c00089] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
Although peptides are regarded as ideal therapeutic agents, only a small proportion of the marketed drugs are peptides. In the past decade, pharmacists have paid great attention to the development of peptide therapeutics. Except a few approved chemically/rationally designed peptides, most attempts failed due to unsatisfactory efficacy or safety. Luckily, computation methods, such as artificial intelligence, have been utilized to accelerate the discovery of therapeutic peptides by predicting the activity, toxicity, and absorption, distribution, metabolism, and excretion of polypeptides. Usually, a specific biological activity of a peptide could be accurately determined by an interest-oriented binary classification constructed of a positive set and another un-experimentally validated negative set regardless of other characteristics, which suggests that it could be challenging to realize the comprehensive evaluation of the research object in the early stage of drug research and development. Herein, we proposed an integrated method (GM-Pep) that contained a conditional variational autoencoder model (CVAE) and a positive sample training multiclassifier (Deep-Multiclassifier) to effectively generate a single bioactive peptide sequence without toxicity and referential side effects. The results showed that our Deep-Multiclassifier model gave a sequence accuracy of up to 96.41% [toxicity (94.48%), antifungal (96.58%), antihypertensive (97.18%), and antibacterial (96.91%), respectively]. The properties of Deep-Multiclassifier and CVAE were validated through 12 first synthesized antibacterial peptides or compared to random peptides. The source code and data sets are available at https://github.com/TimothyChen225/GM-Pep.
Collapse
Affiliation(s)
- Qushuo Chen
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| | - Changyan Yang
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| | - Yihao Xie
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| | - Yuqiang Wang
- School of Stomatology, Lanzhou University,Lanzhou, Gansu 730000, China
| | - Xiaoxu Li
- School of Computer and Communication, Lanzhou University of Technology, Lanzhou, Gansu 730050, China
| | - Kairong Wang
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| | - Jinqi Huang
- Department of Hematology, Affiliated Hospital of Guangdong Medical University, Zhanjiang, Guangdong 524000, China
| | - Wenjin Yan
- The Institute of Pharmacology, Key Laboratory of Preclinical Study for New Drugs of Gansu Province, School of Basic Medical Sciences, Lanzhou University, Lanzhou, Gansu 730000, China
| |
Collapse
|
46
|
Zhanga S, Yao Y, Wang J, Liang Y. Identification of DNA N4-methylcytosine sites based on multi-source features and gradient boosting decision tree. Anal Biochem 2022; 652:114746. [DOI: 10.1016/j.ab.2022.114746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Revised: 05/13/2022] [Accepted: 05/18/2022] [Indexed: 11/16/2022]
|
47
|
Guo X, Jiang Y, Zou Q. Structured Sparse Regularized TSK Fuzzy System for predicting therapeutic peptides. Brief Bioinform 2022; 23:6570018. [PMID: 35438149 DOI: 10.1093/bib/bbac135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2022] [Revised: 03/19/2022] [Accepted: 03/22/2022] [Indexed: 11/13/2022] Open
Abstract
Therapeutic peptides act on the skeletal system, digestive system and blood system, have antibacterial properties and help relieve inflammation. In order to reduce the resource consumption of wet experiments for the identification of therapeutic peptides, many computational-based methods have been developed to solve the identification of therapeutic peptides. Due to the insufficiency of traditional machine learning methods in dealing with feature noise. We propose a novel therapeutic peptide identification method called Structured Sparse Regularized Takagi-Sugeno-Kang Fuzzy System on Within-Class Scatter (SSR-TSK-FS-WCS). Our method achieves good performance on multiple therapeutic peptides and UCI datasets.
Collapse
Affiliation(s)
- Xiaoyi Guo
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, P.R.China
| | - Yizhang Jiang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, P.R.China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, P.R.China
| |
Collapse
|
48
|
Manavalan B, Basith S, Lee G. Comparative analysis of machine learning-based approaches for identifying therapeutic peptides targeting SARS-CoV-2. Brief Bioinform 2022; 23:bbab412. [PMID: 34595489 PMCID: PMC8500067 DOI: 10.1093/bib/bbab412] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2021] [Revised: 08/27/2021] [Accepted: 09/07/2021] [Indexed: 01/08/2023] Open
Abstract
Coronavirus disease 2019 (COVID-19) has impacted public health as well as societal and economic well-being. In the last two decades, various prediction algorithms and tools have been developed for predicting antiviral peptides (AVPs). The current COVID-19 pandemic has underscored the need to develop more efficient and accurate machine learning (ML)-based prediction algorithms for the rapid identification of therapeutic peptides against severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2). Several peptide-based ML approaches, including anti-coronavirus peptides (ACVPs), IL-6 inducing epitopes and other epitopes targeting SARS-CoV-2, have been implemented in COVID-19 therapeutics. Owing to the growing interest in the COVID-19 field, it is crucial to systematically compare the existing ML algorithms based on their performances. Accordingly, we comprehensively evaluated the state-of-the-art IL-6 and AVP predictors against coronaviruses in terms of core algorithms, feature encoding schemes, performance evaluation metrics and software usability. A comprehensive performance assessment was then conducted to evaluate the robustness and scalability of the existing predictors using well-constructed independent validation datasets. Additionally, we discussed the advantages and disadvantages of the existing methods, providing useful insights into the development of novel computational tools for characterizing and identifying epitopes or ACVPs. The insights gained from this review are anticipated to provide critical guidance to the scientific community in the rapid design and development of accurate and efficient next-generation in silico tools against SARS-CoV-2.
Collapse
Affiliation(s)
| | - Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Korea
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon 16499, Korea
| |
Collapse
|
49
|
Yu H, Dong W, Shi J. RANEDDI: Relation-aware network embedding for drug-drug interaction prediction. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.09.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
50
|
Abstract
Background:
Therapeutic peptide prediction is critical for drug development and therapy. Researchers have been studying this essential task, developing several computational methods to identify different therapeutic peptide types.
Objective:
Most predictors are the specific methods for certain peptides. Currently, developing methods to predict the presence of multiple peptides remains a challenging problem. Moreover, it is still challenging to combine different features to make the therapeutic prediction.
Method:
In this paper, we proposed a new ensemble method TP-MV for general therapeutic peptide recognition. TP-MV is developed using the stacking framework in conjunction with the KNN, SVM, ET, RF, and XGB. Then TP-MV constructs a multi-view learning model as meta-classifiers to extract the discriminative feature for different peptides.
Results:
In the experiment, the proposed method outperforms the other existing methods on the benchmark datasets, indicating that the proposed method has the ability to predict multiple therapeutic peptides simultaneously.
Conclusion:
The TP-MV is a useful tool for predicting therapeutic peptides.
Collapse
Affiliation(s)
- Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Hongwu Lv
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Yichen Guo
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Jie Wen
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| |
Collapse
|