1
|
Su L, Ma Z, Ji H, Kong J, Yan W, Zhang Q, Li J, Zuo M. From prediction to design: Revealing the mechanisms of umami peptides using interpretable deep learning, quantum chemical simulations, and module substitution. Food Chem 2025; 483:144301. [PMID: 40233511 DOI: 10.1016/j.foodchem.2025.144301] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2025] [Revised: 03/24/2025] [Accepted: 04/08/2025] [Indexed: 04/17/2025]
Abstract
This study screened and designed umami peptides using deep learning model and module substitution strategies. The predictive model, which integrates pre-training, enhanced feature, and contrastive learning module, achieved an accuracy of 0.94, outperforming other models by 2-9 %. Umami peptides were identified through virtual hydrolysis, model predictions, and sensory evaluation. Peptides EN, ETR, GK4, RK5, ER6, EF7, IL8, VR9, DL10, and PK14 demonstrated umami taste and exhibited umami-enhancing effects with MSG. Module substitution strategy, where highly contributive module from umami peptides replace corresponding module in bitter peptides, facilitates peptide design and modification. The mechanism underlying module substitution and taste presentation were elucidated via molecular docking and active site analysis, revealing that substituted peptides form more hydrogen bonds and hydrophobic interactions with T1R1/T1R3. Amino acids D, E, Q, K, and R were critical for umami taste. This study provides an efficient tool for rapid umami peptide screening and expands the repository.
Collapse
Affiliation(s)
- Lijun Su
- National Engineering Research Center for Agri-Product Quality Traceability, Beijing Technology and Business University, Beijing 100048, China; School of Food and Health, Beijing Technology and Business University, Beijing 100048, China
| | - Zhenren Ma
- National Engineering Research Center for Agri-Product Quality Traceability, Beijing Technology and Business University, Beijing 100048, China
| | - Huizhuo Ji
- National Engineering Research Center for Agri-Product Quality Traceability, Beijing Technology and Business University, Beijing 100048, China; School of Food and Health, Beijing Technology and Business University, Beijing 100048, China
| | - Jianlei Kong
- National Engineering Research Center for Agri-Product Quality Traceability, Beijing Technology and Business University, Beijing 100048, China.
| | - Wenjing Yan
- National Engineering Research Center for Agri-Product Quality Traceability, Beijing Technology and Business University, Beijing 100048, China
| | - Qingchuan Zhang
- National Engineering Research Center for Agri-Product Quality Traceability, Beijing Technology and Business University, Beijing 100048, China
| | - Jian Li
- School of Food and Health, Beijing Technology and Business University, Beijing 100048, China
| | - Min Zuo
- School of Information, Beijing Wuzi University, Beijing 101126, China.
| |
Collapse
|
2
|
Li Y, Xiao M, Li Y, Lv L, Zhang S, Liu Y, Zhang J. Machine Learning for the Prediction of Acute Kidney Injury in Critically Ill Patients with Coronary Heart Disease: Algorithm Development and Validation. JMIR Med Inform 2025. [PMID: 40383933 DOI: 10.2196/72349] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/20/2025] Open
Abstract
BACKGROUND Acute kidney injury (AKI) frequently occurs in critically ill patients with coronary heart disease (CHD), and its development markedly elevates mortality rates and prolongs hospitalization duration. Early AKI prediction is crucial for timely intervention and amelioration of patient outcomes. OBJECTIVE This study aims to develop and verify a clinical prediction model for the occurrence of AKI upon admission in the critically ill CHD population through machine learning (ML). METHODS Data from the MIMIC-IV (version 2.2) database were gathered and included information on critically ill CHD individuals in the intensive care unit (ICU). The dataset was randomized into a training set (70%) and a test set (30%). LASSO regression was employed for feature variable selection. ML models, including logistic regression (LR), decision tree (DT), naive bayes (NB), random forest (RF), extreme gradient boosting (XGBoost), and support vector machine (SVM), were constructed using the training set. The six models were compared in the test set to identify the best-performing model. Subsequently, the model was assessed by calibration curve and decision curve analysis(DCA). External validation was conducted using data from the Second Affiliated Hospital of Zhengzhou University. Ultimately, the predictive model was interpreted via SHapley Additive Explanations (SHAP) values. RESULTS 2,711 ICU-admitted CHD patients were selected, with 1,809 (66.7%) having AKI. Thirteen variables were selected to construct the six ML models. XGBoost exhibited the best performance regarding discrimination (AUC =0.765, 95% CI 0.731-0.800), accuracy (0.725), and sensitivity (0.759). External validation using a cohort of 226 patients confirmed the strong generalizability of the XGBoost model (AUC = 0.835, 95% CI 0.782-0.887). Feature importance analyses derived from SHAP values, DT, RF, and XGBoost consistently identified five key predictors associated with the development of AKI: mechanical ventilation, use of antiplatelet agents, age, N-terminal pro B-type natriuretic peptide (NT-proBNP) levels, and acute physiology score III (APSIII). CONCLUSIONS ML models can serve as reliable tools for forecasting AKI in the critically ill CHD cohort. The XGBoost model is highly accurate and may aid doctors in identifying high-risk individuals for early intervention to lower mortality. CLINICALTRIAL
Collapse
Affiliation(s)
- Yike Li
- The Second Clinical Medical School, Zhengzhou University, No. 2 Jingba Road, Jinshui District, Zhengzhou, CN
| | - Mingyang Xiao
- The Second Clinical Medical School, Zhengzhou University, No. 2 Jingba Road, Jinshui District, Zhengzhou, CN
| | - Yaqian Li
- The Second Clinical Medical School, Zhengzhou University, No. 2 Jingba Road, Jinshui District, Zhengzhou, CN
| | - Lulu Lv
- The Second Clinical Medical School, Zhengzhou University, No. 2 Jingba Road, Jinshui District, Zhengzhou, CN
| | - Shanshan Zhang
- The Second Clinical Medical School, Zhengzhou University, No. 2 Jingba Road, Jinshui District, Zhengzhou, CN
| | - Yuhui Liu
- The Second Clinical Medical School, Zhengzhou University, No. 2 Jingba Road, Jinshui District, Zhengzhou, CN
| | - Juan Zhang
- The Second Clinical Medical School, Zhengzhou University, No. 2 Jingba Road, Jinshui District, Zhengzhou, CN
| |
Collapse
|
3
|
Tan JZE, Wee J, Gong X, Xia K. Topology-Enhanced Machine Learning Model (Top-ML) for Anticancer Peptide Prediction. J Chem Inf Model 2025; 65:4232-4242. [PMID: 40229641 DOI: 10.1021/acs.jcim.5c00476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
Recently, therapeutic peptides have demonstrated great promise for cancer treatment. To explore powerful anticancer peptides, artificial intelligence (AI)-based approaches have been developed to systematically screen potential candidates. However, the lack of efficient featurization of peptides has become a bottleneck for these machine-learning models. In this paper, we propose a topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction. Our Top-ML employs peptide topological features derived from its sequence "connection" information characterized by spectral descriptors. Our Top-ML model, employing an Extra-Trees classifier, has been validated on the AntiCP 2.0 and mACPpred 2.0 benchmark data sets, achieving state-of-the-art performance or results comparable to existing deep learning models, while providing greater interpretability. Our results highlight the potential of leveraging novel topology-based featurization to accelerate the identification of anticancer peptides.
Collapse
Affiliation(s)
- Joshua Zhi En Tan
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - JunJie Wee
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Xue Gong
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| |
Collapse
|
4
|
Abbas Z, Kim S, Lee N, Kazmi SAW, Lee SW. A robust ensemble framework for anticancer peptide classification using multi-model voting approach. Comput Biol Med 2025; 188:109750. [PMID: 40032410 DOI: 10.1016/j.compbiomed.2025.109750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 01/14/2025] [Accepted: 01/22/2025] [Indexed: 03/05/2025]
Abstract
Anticancer peptides (ACPs) hold great potential for cancer therapeutics, yet accurately identifying them remains a challenging task due to the complexity of peptide sequences and their interactions with biological systems. In this study, we propose a novel machine learning-based framework for ACP classification, integrating multiple feature sets, including sequence composition, physicochemical properties, and embedding features derived from pre-trained language models. We evaluate the performance of various classifiers on benchmark datasets and compare our model against state-of-the-art methods. The results demonstrate that our model outperforms existing methods such as UniDL4BioPep, ACPred-Fuse, and iACP with an accuracy of 75.58%, an AUC of 0.8272, and an MCC of 0.5119. Our approach provides a more balanced sensitivity of 0.7384 and specificity of 0.773, ensuring robust identification of both ACPs and non-ACPs. These findings suggest that incorporating diverse feature sets can significantly enhance ACP classification, potentially facilitating the discovery of novel anticancer peptides for therapeutic applications.
Collapse
Affiliation(s)
- Zeeshan Abbas
- Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea; Department of Artificial Intelligence, Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - Sunyeup Kim
- Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea
| | - Nangkyeong Lee
- Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea
| | | | - Seung Won Lee
- Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea; Department of Artificial Intelligence, Sungkyunkwan University, Suwon 16419, Republic of Korea; Department of Metabiohealth, Sungkyunkwan University, Suwon 16419, Republic of Korea; Personalized Cancer Immunotherapy Research Center, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea.
| |
Collapse
|
5
|
Asim MN, Asif T, Mehmood F, Dengel A. Peptide classification landscape: An in-depth systematic literature review on peptide types, databases, datasets, predictors architectures and performance. Comput Biol Med 2025; 188:109821. [PMID: 39987697 DOI: 10.1016/j.compbiomed.2025.109821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 02/03/2025] [Accepted: 02/05/2025] [Indexed: 02/25/2025]
Abstract
Peptides are gaining significant attention in diverse fields such as the pharmaceutical market has seen a steady rise in peptide-based therapeutics over the past six decades. Peptides have been utilized in the development of distinct applications including inhibitors of SARS-COV-2 and treatments for conditions like cancer and diabetes. Distinct types of peptides possess unique characteristics, and development of peptide-specific applications require the discrimination of one peptide type from others. To the best of our knowledge, approximately 230 Artificial Intelligence (AI) driven applications have been developed for 22 distinct types of peptides, yet there remains significant room for development of new predictors. A Comprehensive review addresses the critical gap by providing a consolidated platform for the development of AI-driven peptide classification applications. This paper offers several key contributions, including presenting the biological foundations of 22 unique peptide types and categorizes them into four main classes: Regulatory, Therapeutic, Nutritional, and Delivery Peptides. It offers an in-depth overview of 47 databases that have been used to develop peptide classification benchmark datasets. It summarizes details of 288 benchmark datasets that are used in development of diverse types AI-driven peptide classification applications. It provides a detailed summary of 197 sequence representation learning methods and 94 classifiers that have been used to develop 230 distinct AI-driven peptide classification applications. Across 22 distinct types peptide classification tasks related to 288 benchmark datasets, it demonstrates performance values of 230 AI-driven peptide classification applications. It summarizes experimental settings and various evaluation measures that have been employed to assess the performance of AI-driven peptide classification applications. The primary focus of this manuscript is to consolidate scattered information into a single comprehensive platform. This resource will greatly assist researchers who are interested in developing new AI-driven peptide classification applications.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany; Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany.
| | - Tayyaba Asif
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Faiza Mehmood
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; Institute of Data Sciences, University of Engineering and Technology, Lahore, Pakistan
| | - Andreas Dengel
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany; Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
| |
Collapse
|
6
|
Geng A, Luo Z, Li A, Zhang Z, Zou Q, Wei L, Cui F. ACP-CLB: An Anticancer Peptide Prediction Model Based on Multichannel Discriminative Processing and Integration of Large Pretrained Protein Language Models. J Chem Inf Model 2025; 65:2336-2349. [PMID: 39969847 DOI: 10.1021/acs.jcim.4c02072] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/20/2025]
Abstract
MOTIVATION Cancer affects millions globally, and as research advances, our understanding and treatment of cancer evolve. Compared to conventional treatments with significant side effects, anticancer peptides (ACPs) have gained considerable attention. Validating ACPs through wet-lab experiments is time-consuming and costly. However, numerous artificial intelligence methods are now used for ACP identification and classification. These methods typically apply a uniform strategy to all feature types, overlooking the potential benefits of more specialized processing for different feature types. INNOVATION In this paper, we propose a framework based on multichannel discriminative processing, where different neural networks are applied to process various feature types, optimizing their respective feature vectors. Additionally, we leverage Large Pretrained Protein Language Models to capture deeper sequence features, further enhancing the model's performance. Contributions: To better validate the overall performance and generalization ability of the model, we compared it with state-of-the-art models using four different data sets (AntiCp2Main, AntiCp2 Alternate, ACP740, cACP-DeepGram). The results show significant improvements across most metrics. Additionally, our proposed framework better assists researchers in distinguishing and identifying ACPs and further validates the need for distinct processing methods for different feature types.
Collapse
Affiliation(s)
- Aoyun Geng
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Zhenjie Luo
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Aohan Li
- Graduate School of Informatics and Engineering, The University of Electro-Communications, Tokyo 182-8585, Japan
| | - Zilong Zhang
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Leyi Wei
- Centre for Artificial Intelligence driven Drug Discovery, Faculty of Applied Science, Macao Polytechnic University, Macao SAR 999078, China
- School of Informatics, Xiamen University, Xiamen 361000, China
| | - Feifei Cui
- School of Computer Science and Technology, Hainan University, Haikou 570228, China
| |
Collapse
|
7
|
Wang S, Ma B. Anti-Cancer Peptides Identification and Activity Type Classification With Protein Sequence Pre-Training. IEEE J Biomed Health Inform 2025; 29:1692-1701. [PMID: 40048353 DOI: 10.1109/jbhi.2024.3358632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2025]
Abstract
Cancer remains a significant global health challenge, responsible for millions of deaths annually. Addressing this issue necessitates the discovery of novel anti-cancer drugs. Anti-cancer peptides (ACPs), with their unique ability to selectively target cancer cells, offer new hope in discovering low side-effect anti-cancer drugs. However, the process of discovering novel ACPs is both time-consuming and costly. Therefore, there is an urgent need for a computational method that can predict whether a given peptide is an ACP and classify its specific functional types. In this paper, we introduce DUO-ACP, a model serving dual roles in ACP prediction: identification and functional type classification. DUO-ACP employs two embedding modules to acquire knowledge about global protein features and local ACP characteristics, complemented by a prediction module. When assessed on two publicly available datasets for each task, DUO-ACP surpasses all existing methods, achieving outstanding results: an ACP identification accuracy of 89.5% and a Macro-averaged AUC of 88.6% in ACP functional type classification. We further interpret the contribution of each part of our model, including the two types of embeddings as well as ensemble learning. On a new curated dataset, the prediction results of DUO-ACP closely match existing literature, highlighting DUO-ACP's generalization capabilities on previously unseen data and displaying the potential capability of discovering novel ACP.
Collapse
|
8
|
Cao J, Zhou W, Yu Q, Ji J, Zhang J, He S, Zhu Z. MDTL-ACP: Anticancer Peptides Prediction Based on Multi-Domain Transfer Learning. IEEE J Biomed Health Inform 2025; 29:1714-1725. [PMID: 38147420 DOI: 10.1109/jbhi.2023.3347138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023]
Abstract
Anticancer peptides (ACPs) have emerged as one of the most promising therapeutic agents for cancer treatment. They are bioactive peptides featuring broad-spectrum activity and low drug-resistance. The discovery of ACPs via traditional biochemical methods is laborious and costly. Accordingly, various computational methods have been developed to facilitate the discovery of ACPs. However, the data resources and knowledge of ACPs are still very scarce, and only a few of them are clinically verified, which limits the competence of computational methods. To address this issue, in this article, we propose an ACP prediction model based on multi-domain transfer learning, namely MDTL-ACP, to discriminate novel ACPs from plentiful inactive peptides. In particular, we collect abundant antimicrobial peptides (AMPs) from four well-studied peptide domains and extract their inherent features as the input of MDTL-ACP. The features learned from multiple source domains of AMPs are then transferred into the target prediction task of ACPs via artificial neural network-based shared-extractor and task-specific classifiers in MDTL-ACP. The knowledge captured in the transferred features enhances the prediction of ACPs in the target domain. Experimental results demonstrate that MDTL-ACP can outperform the traditional and state-of-the-art ACP prediction methods.
Collapse
|
9
|
Ge F, Zhou J, Zhang M, Yu DJ. MFP-MFL: Leveraging Graph Attention and Multi-Feature Integration for Superior Multifunctional Bioactive Peptide Prediction. Int J Mol Sci 2025; 26:1317. [PMID: 39941085 PMCID: PMC11818429 DOI: 10.3390/ijms26031317] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2024] [Revised: 02/01/2025] [Accepted: 02/02/2025] [Indexed: 02/16/2025] Open
Abstract
Bioactive peptides, composed of amino acid chains, are fundamental to a wide range of biological functions. Their inherent multifunctionality, however, complicates accurate classification and prediction. To address these challenges, we present MFP-MFL, an advanced multi-feature, multi-label learning framework that integrates Graph Attention Networks (GAT) with leading protein language models, including ESM-2, ProtT5, and RoBERTa. By employing an ensemble learning strategy, MFP-MFL effectively utilizes deep sequence features and complex functional dependencies, ensuring highly accurate and robust predictions of multifunctional peptides. Comparative experiments demonstrate that MFP-MFL achieves precision, coverage, and accuracy scores of 0.799, 0.821, and 0.786, respectively. Additionally, it attains an Absolute true of 0.737 while maintaining a low Absolute false of 0.086. A comprehensive case study involving 86,970 mutations further highlights the model's ability to predict functional changes resulting from sequence variations. These results establish MFP-MFL as a powerful tool for the discovery and application of multifunctional peptides, offering significant potential to advance research and biomedical applications.
Collapse
Affiliation(s)
- Fang Ge
- State Key Laboratory of Flexible Electronics (LoFE), Institute of Advanced Materials (IAM), Nanjing University of Posts and Telecommunications, 9 Wenyuan Road, Nanjing 210023, China;
| | - Jianren Zhou
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang 212100, China; (J.Z.); (M.Z.)
| | - Ming Zhang
- School of Computer, Jiangsu University of Science and Technology, 666 Changhui Road, Zhenjiang 212100, China; (J.Z.); (M.Z.)
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, 200 Xiaolingwei, Nanjing 210094, China
| |
Collapse
|
10
|
Yue J, Li T, Xu J, Chen Z, Li Y, Liang S, Liu Z, Wang Y. Discovery of anticancer peptides from natural and generated sequences using deep learning. Int J Biol Macromol 2025; 290:138880. [PMID: 39706427 DOI: 10.1016/j.ijbiomac.2024.138880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 12/10/2024] [Accepted: 12/16/2024] [Indexed: 12/23/2024]
Abstract
Anticancer peptides (ACPs) demonstrate significant potential in clinical cancer treatment due to their ability to selectively target and kill cancer cells. In recent years, numerous artificial intelligence (AI) algorithms have been developed. However, many predictive methods lack sufficient wet lab validation, thereby constraining the progress of models and impeding the discovery of novel ACPs. This study proposes a comprehensive research strategy by introducing CNBT-ACPred, an ACP prediction model based on a three-channel deep learning architecture, supported by extensive in vitro and in vivo experiments. CNBT-ACPred achieved an accuracy of 0.9554 and a Matthews Correlation Coefficient (MCC) of 0.8602. Compared to existing excellent models, CNBT-ACPred increased accuracy by at least 5 % and improved MCC by 15 %. Predictions were conducted on over 3.8 million sequences from Uniprot, along with 100,000 sequences generated by a deep generative model, ultimately identifying 37 out of 41 candidate peptides from >30 species that exhibited effective in vitro tumor inhibitory activity. Among these, tPep14 demonstrated significant anticancer effects in two mouse xenograft models without detectable toxicity. Finally, the study revealed correlations between the amino acid composition, structure, and function of the identified ACP candidates.
Collapse
Affiliation(s)
- Jianda Yue
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Tingting Li
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Jiawei Xu
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Zihui Chen
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China
| | - Yaqi Li
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Songping Liang
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Zhonghua Liu
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Ying Wang
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| |
Collapse
|
11
|
Ji S, Wu J, An F, Lou M, Zhang T, Guo J, Wu P, Zhu Y, Wu R. Umami-gcForest: Construction of a predictive model for umami peptides based on deep forest. Food Chem 2025; 464:141826. [PMID: 39522377 DOI: 10.1016/j.foodchem.2024.141826] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2024] [Revised: 10/07/2024] [Accepted: 10/27/2024] [Indexed: 11/16/2024]
Abstract
Umami peptides have recently gained attention for their ability to enhance umami flavor, reduce salt content, and provide nutritional benefits. However, traditional wet laboratory methods to identify them are time-consuming, laborious, and costly. Therefore, we developed the Umami-gcForest model using the deep forest algorithm. It constructs amino acid feature matrices using ProtBERT, amino acid composition, composition-transition-distribution, and pseudo amino acid composition, applying mutual information for feature selection to optimize dimensions. Compared to other machine learning baseline, umami peptide prediction, and composite models, the validation results of Umami-gcForest on different test sets demonstrated outstanding predictive accuracy. Using SHapley Additive exPlanations to calculate feature contributions, we found that the key features of Umami-gcForest were hydrophobicity, charge, and polarity. Based on this, an online platform was developed to facilitate its user application. In conclusion, Umami-gcForest serves as a powerful tool, providing a solid foundation for the efficient and accurate screening of umami peptides.
Collapse
Affiliation(s)
- Shuaiqi Ji
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Junrui Wu
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Feiyu An
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Liaoning Engineering Research Center of Food Fermentation Technology, Shenyang 110866, PR China
| | - Mengxue Lou
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Taowei Zhang
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Jiawei Guo
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Penggong Wu
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Liaoning Engineering Research Center of Food Fermentation Technology, Shenyang 110866, PR China
| | - Yi Zhu
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Shenyang Key Laboratory of Microbial Fermentation Technology Innovation, Shenyang 110866, PR China
| | - Rina Wu
- College of Food Science, Shenyang Agricultural University, Shenyang 110866, PR China; Liaoning Engineering Research Center of Food Fermentation Technology, Shenyang 110866, PR China.
| |
Collapse
|
12
|
Li W, Liu X, Liu Y, Zheng Z. High-Accuracy Identification and Structure-Activity Analysis of Antioxidant Peptides via Deep Learning and Quantum Chemistry. J Chem Inf Model 2025; 65:603-612. [PMID: 39772654 DOI: 10.1021/acs.jcim.4c01713] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2025]
Abstract
Antioxidant peptides (AOPs) hold great promise for mitigating oxidative-stress-related diseases, but their discovery is hindered by inefficient and time-consuming traditional methods. To address this, we developed an innovative framework combining machine learning and quantum chemistry to accelerate AOP identification and analyze structure-activity relationships. A Bi-LSTM-based model, AOPP, achieved superior performance with accuracies of 0.9043 and 0.9267, precisions of 0.9767 and 0.9848, and Matthews correlation coefficients (MCCs) of 0.818 and 0.859 on two data sets, outperforming existing methods. Compared with XGBoost and LightGBM, AOPP demonstrated a 4.67% improvement in accuracy. Feature fusion significantly enhanced classification, as validated by UMAP visualization. Experimental validation of ten peptides confirmed the antioxidant activity, with LLA exhibiting the highest DPPH and ABTS scavenging rates (0.108 and 0.437 mmol/g, respectively). Quantum chemical calculations identified LLA's lowest HOMO-LUMO gap (ΔE = 0.26 eV) and C3-H26 as the key active site contributing to its superior antioxidant potential. This study highlights the synergy of machine learning and quantum chemistry, offering an efficient framework for AOP discovery with broad applications in therapeutics and functional foods.
Collapse
Affiliation(s)
- Wanxing Li
- School of Food Science and Technology, Jiangnan University, Wuxi214122, China
| | - Xuejing Liu
- School of Food Science and Technology, Jiangnan University, Wuxi214122, China
| | - Yuanfa Liu
- School of Food Science and Technology, Jiangnan University, Wuxi214122, China
| | - Zhaojun Zheng
- School of Food Science and Technology, Jiangnan University, Wuxi214122, China
| |
Collapse
|
13
|
Luo J, Zhao K, Chen J, Yang C, Qu F, Liu Y, Jin X, Yan K, Zhang Y, Liu B. iMFP-LG: Identify Novel Multi-functional Peptides Using Protein Language Models and Graph-based Deep Learning. GENOMICS, PROTEOMICS & BIOINFORMATICS 2025; 22:qzae084. [PMID: 39585308 PMCID: PMC12011362 DOI: 10.1093/gpbjnl/qzae084] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/10/2023] [Revised: 10/25/2024] [Accepted: 11/21/2024] [Indexed: 11/26/2024]
Abstract
Functional peptides are short amino acid fragments that have a wide range of beneficial functions for living organisms. The majority of previous studies have focused on mono-functional peptides, but an increasing number of multi-functional peptides have been discovered. Although there have been enormous experimental efforts to assay multi-functional peptides, only a small portion of millions of known peptides has been explored. The development of effective and accurate techniques for identifying multi-functional peptides can facilitate their discovery and mechanistic understanding. In this study, we presented iMFP-LG, a method for multi-functional peptide identification based on protein language models (pLMs) and graph attention networks (GATs). Our comparative analyses demonstrated that iMFP-LG outperformed the state-of-the-art methods in identifying both multi-functional bioactive peptides and multi-functional therapeutic peptides. The interpretability of iMFP-LG was also illustrated by visualizing attention patterns in pLMs and GATs. Regarding the outstanding performance of iMFP-LG on the identification of multi-functional peptides, we employed iMFP-LG to screen novel peptides with both anti-microbial and anti-cancer functions from millions of known peptides in the UniRef90 database. As a result, eight candidate peptides were identified, among which one candidate was validated to process both anti-bacterial and anti-cancer properties through molecular structure alignment and biological experiments. We anticipate that iMFP-LG can assist in the discovery of multi-functional peptides and contribute to the advancement of peptide drug design.
Collapse
Affiliation(s)
- Jiawei Luo
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
| | - Kejuan Zhao
- School of Science, Harbin Institute of Technology, Shenzhen 518055, China
| | - Junjie Chen
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
| | - Caihua Yang
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
| | - Fuchuan Qu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen 518055, China
| | - Yumeng Liu
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen 518055, China
| | - Xiaopeng Jin
- College of Big Data and Internet, Shenzhen Technology University, Shenzhen 518055, China
| | - Ke Yan
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 10081, China
| | - Yang Zhang
- School of Science, Harbin Institute of Technology, Shenzhen 518055, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 10081, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 10081, China
| |
Collapse
|
14
|
Guan C, Fernandes FC, Franco OL, de la Fuente-Nunez C. Leveraging large language models for peptide antibiotic design. CELL REPORTS. PHYSICAL SCIENCE 2025; 6:102359. [PMID: 39949833 PMCID: PMC11823563 DOI: 10.1016/j.xcrp.2024.102359] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/16/2025]
Abstract
Large language models (LLMs) have significantly impacted various domains of our society, including recent applications in complex fields such as biology and chemistry. These models, built on sophisticated neural network architectures and trained on extensive datasets, are powerful tools for designing, optimizing, and generating molecules. This review explores the role of LLMs in discovering and designing antibiotics, focusing on peptide molecules. We highlight advancements in drug design and outline the challenges of applying LLMs in these areas.
Collapse
Affiliation(s)
- Changge Guan
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
- These authors contributed equally
| | - Fabiano C. Fernandes
- Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, Brazil
- Departamento de Ciência da Computação, Instituto Federal de Brasília, Campus Taguatinga, Brasília, Brazil
- These authors contributed equally
| | - Octavio L. Franco
- Centro de Análises Proteômicas e Bioquímicas, Pós-Graduação em Ciências Genômicas e Biotecnologia, Universidade Católica de Brasília, Brasília, Brazil
- S-Inova Biotech, Programa de Pós-Graduação em Biotecnologia, Universidade Católica Dom Bosco, Campo Grande, Brazil
| | - Cesar de la Fuente-Nunez
- Machine Biology Group, Departments of Psychiatry and Microbiology, Institute for Biomedical Informatics, Institute for Translational Medicine and Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA
- Departments of Bioengineering and Chemical and Biomolecular Engineering, School of Engineering and Applied Science, University of Pennsylvania, Philadelphia, PA, USA
- Department of Chemistry, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA, USA
- Penn Institute for Computational Science, University of Pennsylvania, Philadelphia, PA, USA
| |
Collapse
|
15
|
Huang G, Cao Y, Dai Q, Chen W. ACP-DPE: A Dual-Channel Deep Learning Model for Anticancer Peptide Prediction. IET Syst Biol 2025; 19:e70010. [PMID: 40119615 PMCID: PMC11928748 DOI: 10.1049/syb2.70010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2024] [Revised: 02/13/2025] [Accepted: 02/20/2025] [Indexed: 03/24/2025] Open
Abstract
Cancer is a serious and complex disease caused by uncontrolled cell growth and is becoming one of the leading causes of death worldwide. Anticancer peptides (ACPs), as a bioactive peptide with lower toxicity, emerge as a promising means of effectively treating cancer. Identifying ACPs is challenging due to the limitation of experimental conditions. To address this, we proposed a dual-channel-based deep learning method, termed ACP-DPE, for ACP prediction. The ACP-DPE consisted of two parallel channels: one was an embedding layer followed by the bi-directional gated recurrent unit (Bi-GRU) module, and the other was an adaptive embedding layer followed by the dilated convolution module. The Bi-GRU module captured the peptide sequence dependencies, whereas the dilated convolution module characterised the local relationship of amino acids. Experimental results show that ACP-DPE achieves an accuracy of 82.81% and a sensitivity of 86.63%, surpassing the state-of-the-art method by 3.86% and 5.1%, respectively. These findings demonstrate the effectiveness of ACP-DPE for ACP prediction and highlight its potential as a valuable tool in cancer treatment research.
Collapse
Affiliation(s)
- Guohua Huang
- College of Information Science and EngineeringShaoyang UniversityShaoyangChina
- Hunan Provincial Key Laboratory of Finance & Economics Big Data Science and TechnologyHunan University of Finance and EconomicsChangshaChina
| | - Yujie Cao
- College of Information Science and EngineeringShaoyang UniversityShaoyangChina
| | - Qi Dai
- College of Life Science and MedicineZhejiang Sci‐Tech UniversityHangzhouChina
| | - Weihong Chen
- Hunan Provincial Key Laboratory of Finance & Economics Big Data Science and TechnologyHunan University of Finance and EconomicsChangshaChina
| |
Collapse
|
16
|
Wang G, Zhang H, Shao M, Feng Y, Cao C, Hu X. DeepTGIN: a novel hybrid multimodal approach using transformers and graph isomorphism networks for protein-ligand binding affinity prediction. J Cheminform 2024; 16:147. [PMID: 39734235 DOI: 10.1186/s13321-024-00938-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2024] [Accepted: 11/25/2024] [Indexed: 12/31/2024] Open
Abstract
Predicting protein-ligand binding affinity is essential for understanding protein-ligand interactions and advancing drug discovery. Recent research has demonstrated the advantages of sequence-based models and graph-based models. In this study, we present a novel hybrid multimodal approach, DeepTGIN, which integrates transformers and graph isomorphism networks to predict protein-ligand binding affinity. DeepTGIN is designed to learn sequence and graph features efficiently. The DeepTGIN model comprises three modules: the data representation module, the encoder module, and the prediction module. The transformer encoder learns sequential features from proteins and protein pockets separately, while the graph isomorphism network extracts graph features from the ligands. To evaluate the performance of DeepTGIN, we compared it with state-of-the-art models using the PDBbind 2016 core set and PDBbind 2013 core set. DeepTGIN outperforms these models in terms of R, RMSE, MAE, SD, and CI metrics. Ablation studies further demonstrate the effectiveness of the ligand features and the encoder module. The code is available at: https://github.com/zhc-moushang/DeepTGIN . SCIENTIFIC CONTRIBUTION: DeepTGIN is a novel hybrid multimodal deep learning model for predict protein-ligand binding affinity. The model combines the Transformer encoder to extract sequence features from protein and protein pocket, while integrating graph isomorphism networks to capture features from the ligand. This model addresses the limitations of existing methods in exploring protein pocket and ligand features.
Collapse
Affiliation(s)
- Guishen Wang
- College of Computer Science and Engineering, Changchun University of Technology, North Yunda Street No. 3000, Changchun, 130012, Jilin, China
- School of Life Sciences, Jilin University, Qianjin Street No. 2055, Changchun, 130000, Jilin, China
| | - Hangchen Zhang
- College of Computer Science and Engineering, Changchun University of Technology, North Yunda Street No. 3000, Changchun, 130012, Jilin, China
| | - Mengting Shao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Longmian Avenue No. 101, Nanjing, 211166, Jiangsu, China
| | - Yuncong Feng
- College of Computer Science and Engineering, Changchun University of Technology, North Yunda Street No. 3000, Changchun, 130012, Jilin, China
| | - Chen Cao
- Key Laboratory for Bio-Electromagnetic Environment and Advanced Medical Theranostics, School of Biomedical Engineering and Informatics, Nanjing Medical University, Longmian Avenue No. 101, Nanjing, 211166, Jiangsu, China.
| | - Xiaowen Hu
- School of Biomedical Engineering and Informatics, Nanjing Medical University, Longmian Avenue No. 101, Nanjing, 211166, Jiangsu, China.
| |
Collapse
|
17
|
Wang X, Zhang Z, Liu C. iACP-DFSRA: Identification of Anticancer Peptides Based on a Dual-channel Fusion Strategy of ResCNN and Attention. J Mol Biol 2024; 436:168810. [PMID: 39362624 DOI: 10.1016/j.jmb.2024.168810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 09/10/2024] [Accepted: 09/27/2024] [Indexed: 10/05/2024]
Abstract
Anticancer peptides (ACPs) have been widely applied in the treatment of cancer owing to good safety, rational side effects, and high selectivity. However, the number of ACPs that have been experimentally validated is limited as identification of ACPs is extremely expensive. Hence, accurate and cost-effective identification methods for ACPs are urgently needed. In this work, we proposed a deep learning-based model, named iACP-DFSRA, for ACPs identification. Specifically, we adopted two kinds of sequence embedding technologies, ProtBert_BFD pre-training language model and handcrafted features to encode protein sequences. Then, the LightGBM was used for feature selection, and the selected features were input into ResCNN and Attention mechanism, respectively, to extract local and global features. Finally, the concatenate features were deeply fused by using the Attention mechanism to allow key features to be paid more attention to by the model and make predictions by fully connected layer. The results of 10-fold cross-validation demonstrated that the iACP-DFSRA model delivered improved results in most metrics with Sp of 94.15%, Sn of 95.32%, Acc of 94.74% and MCC of 89.48% compared to the latest AACFlow model. Indeed, the iACP-DFSRA model is the only model with Acc > 90% and MCC > 80% on this independent test dataset. Furthermore, we have further demonstrated the superiority of our model on additional datasets. In addition, t-SNE and SHAP interpretation analysis demonstrated that it is crucial to use two channels for feature extraction and use the Attention mechanism for deep fusion, which helps the iACP-DFSRA to predict ACPs more effectively.
Collapse
Affiliation(s)
- Xin Wang
- School of Science, Dalian Maritime University, Dalian 116026, China.
| | - Zimeng Zhang
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Chang Liu
- School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
18
|
Qin D, Liang X, Jiao L, Wang R, Zhao Y, Xue W, Wang J, Liang G. Sequence-Activity Relationship of Angiotensin-Converting Enzyme Inhibitory Peptides Derived from Food Proteins, Based on a New Deep Learning Model. Foods 2024; 13:3550. [PMID: 39593966 PMCID: PMC11592644 DOI: 10.3390/foods13223550] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2024] [Revised: 10/29/2024] [Accepted: 11/05/2024] [Indexed: 11/28/2024] Open
Abstract
Food-derived peptides are usually safe natural drug candidates that can potentially inhibit the angiotensin-converting enzyme (ACE). The wet experiments used to identify ACE inhibitory peptides (ACEiPs) are time-consuming and costly, making it important and urgent to reduce the scope of experimental validation through bioinformatics methods. Here, we construct an ACE inhibitory peptide predictor (ACEiPP) using optimized amino acid descriptors (AADs) and long- and short-term memory neural networks. Our results show that combined-AAD models exhibit more efficient feature transformation ability than single-AAD models, especially the training model with the optimal descriptors as the feature inputs, which exhibits the highest predictive ability in the independent test (Acc = 0.9479 and AUC = 0.9876), with a significant performance improvement compared to the existing three predictors. The model can effectively characterize the structure-activity relationship of ACEiPs. By combining the model with database mining, we used ACEiPP to screen four ACEiPs with multiple reported functions. We also used ACEiPP to predict peptides from 21,249 food-derived proteins in the Database of Food-derived Bioactive Peptides (DFBP) and construct a library of potential ACEiPs to facilitate the discovery of new anti-ACE peptides.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Guizhao Liang
- Key Laboratory of Biorheological Science and Technology, Ministry of Education, College of Bioengineering, Chongqing University, Chongqing 400044, China; (D.Q.); (X.L.); (L.J.); (R.W.); (Y.Z.); (W.X.); (J.W.)
| |
Collapse
|
19
|
Sui J, Chen J, Chen Y, Iwamori N, Sun J. GASIDN: identification of sub-Golgi proteins with multi-scale feature fusion. BMC Genomics 2024; 25:1019. [PMID: 39478465 PMCID: PMC11526662 DOI: 10.1186/s12864-024-10954-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Accepted: 10/24/2024] [Indexed: 11/02/2024] Open
Abstract
The Golgi apparatus is a crucial component of the inner membrane system in eukaryotic cells, playing a central role in protein biosynthesis. Dysfunction of the Golgi apparatus has been linked to neurodegenerative diseases. Accurate identification of sub-Golgi protein types is therefore essential for developing effective treatments for such diseases. Due to the expensive and time-consuming nature of experimental methods for identifying sub-Golgi protein types, various computational methods have been developed as identification tools. However, the majority of these methods rely solely on neighboring features in the protein sequence and neglect the crucial spatial structure information of the protein.To discover alternative methods for accurately identifying sub-Golgi proteins, we have developed a model called GASIDN. The GASIDN model extracts multi-dimension features by utilizing a 1D convolution module on protein sequences and a graph learning module on contact maps constructed from AlphaFold2.The model utilizes the deep representation learning model SeqVec to initialize protein sequences. GASIDN achieved accuracy values of 98.4% and 96.4% in independent testing and ten-fold cross-validation, respectively, outperforming the majority of previous predictors. To the best of our knowledge, this is the first method that utilizes multi-scale feature fusion to identify and locate sub-Golgi proteins. In order to assess the generalizability and scalability of our model, we conducted experiments to apply it in the identification of proteins from other organelles, including plant vacuoles and peroxisomes. The results obtained from these experiments demonstrated promising outcomes, indicating the effectiveness and versatility of our model. The source code and datasets can be accessed at https://github.com/SJNNNN/GASIDN .
Collapse
Affiliation(s)
- Jianan Sui
- School of Information Science and Engineering, University of Jinan, Jinan, China
| | - Jiazi Chen
- Laboratory of Zoology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka-shi, Fukuoka, Japan
| | - Yuehui Chen
- School of Artificial Intelligence Institute and Information Science and Engineering, University of Jinan, Jinan, China.
| | - Naoki Iwamori
- Laboratory of Zoology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka-shi, Fukuoka, Japan
| | - Jin Sun
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
| |
Collapse
|
20
|
Kilimci ZH, Yalcin M. ACP-ESM: A novel framework for classification of anticancer peptides using protein-oriented transformer approach. Artif Intell Med 2024; 156:102951. [PMID: 39173421 DOI: 10.1016/j.artmed.2024.102951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 07/19/2024] [Accepted: 08/13/2024] [Indexed: 08/24/2024]
Abstract
Anticancer peptides (ACPs) are a class of molecules that have gained significant attention in the field of cancer research and therapy. ACPs are short chains of amino acids, the building blocks of proteins, and they possess the ability to selectively target and kill cancer cells. One of the key advantages of ACPs is their ability to selectively target cancer cells while sparing healthy cells to a greater extent. This selectivity is often attributed to differences in the surface properties of cancer cells compared to normal cells. That is why ACPs are being investigated as potential candidates for cancer therapy. ACPs may be used alone or in combination with other treatment modalities like chemotherapy and radiation therapy. While ACPs hold promise as a novel approach to cancer treatment, there are challenges to overcome, including optimizing their stability, improving selectivity, and enhancing their delivery to cancer cells, continuous increasing in number of peptide sequences, developing a reliable and precise prediction model. In this work, we propose an efficient transformer-based framework to identify ACPs for by performing accurate a reliable and precise prediction model. For this purpose, four different transformer models, namely ESM, ProtBERT, BioBERT, and SciBERT are employed to detect ACPs from amino acid sequences. To demonstrate the contribution of the proposed framework, extensive experiments are carried on widely-used datasets in the literature, two versions of AntiCp2, cACP-DeepGram, ACP-740. Experiment results show the usage of proposed model enhances classification accuracy when compared to the literature studies. The proposed framework, ESM, exhibits 96.45% of accuracy for AntiCp2 dataset, 97.66% of accuracy for cACP-DeepGram dataset, and 88.51% of accuracy for ACP-740 dataset, thence determining new state-of-the-art. The code of proposed framework is publicly available at github (https://github.com/mstf-yalcin/acp-esm).
Collapse
Affiliation(s)
- Zeynep Hilal Kilimci
- Department of Information Systems Engineering, Kocaeli University, 41001, Kocaeli, Turkey.
| | - Mustafa Yalcin
- Department of Information Systems Engineering, Kocaeli University, 41001, Kocaeli, Turkey.
| |
Collapse
|
21
|
Wang X, Wang S. ACP-PDAFF: Pretrained model and dual-channel attentional feature fusion for anticancer peptides prediction. Comput Biol Chem 2024; 112:108141. [PMID: 38996756 DOI: 10.1016/j.compbiolchem.2024.108141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 05/26/2024] [Accepted: 06/28/2024] [Indexed: 07/14/2024]
Abstract
Anticancer peptides(ACPs) have attracted significant interest as a novel method of treating cancer due to their ability to selectively kill cancer cells without damaging normal cells. Many artificial intelligence-based methods have demonstrated impressive performance in predicting ACPs. Nevertheless, the limitations of existing methods in feature engineering include handcrafted features driven by prior knowledge, insufficient feature extraction, and inefficient feature fusion. In this study, we propose a model based on a pretrained model, and dual-channel attentional feature fusion(DAFF), called ACP-PDAFF. Firstly, to reduce the heavy dependence on expert knowledge-based handcrafted features, binary profile features (BPF) and physicochemical properties features(PCPF) are used as inputs to the transformer model. Secondly, aimed at learning more diverse feature informations of ACPs, a pretrained model ProtBert is utilized. Thirdly, for better fusion of different feature channels, DAFF is employed. Finally, to evaluate the performance of the model, we compare it with other methods on five benchmark datasets, including ACP-Mixed-80 dataset, Main and Alternate datasets of AntiCP 2.0, LEE and Independet dataset, and ACPred-Fuse dataset. And the accuracies obtained by ACP-PDAFF are 0.86, 0.80, 0.94, 0.97 and 0.95 on five datasets, respectively, higher than existing methods by 1% to 12%. Therefore, by learning rich feature informations and effectively fusing different feature channels, ACD-PDAFF achieves outstanding performance. Our code and the datasets are available at https://github.com/wongsing/ACP-PDAFF.
Collapse
Affiliation(s)
- Xinyi Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China.
| |
Collapse
|
22
|
Zhang W, Ding Y, Wei L, Guo X, Ni F. Therapeutic peptides identification via kernel risk sensitive loss-based k-nearest neighbor model and multi-Laplacian regularization. Brief Bioinform 2024; 25:bbae534. [PMID: 39438076 PMCID: PMC11495874 DOI: 10.1093/bib/bbae534] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2024] [Revised: 08/30/2024] [Accepted: 10/08/2024] [Indexed: 10/25/2024] Open
Abstract
Therapeutic peptides are therapeutic agents synthesized from natural amino acids, which can be used as carriers for precisely transporting drugs and can activate the immune system for preventing and treating various diseases. However, screening therapeutic peptides using biochemical assays is expensive, time-consuming, and limited by experimental conditions and biological samples, and there may be ethical considerations in the clinical stage. In contrast, screening therapeutic peptides using machine learning and computational methods is efficient, automated, and can accurately predict potential therapeutic peptides. In this study, a k-nearest neighbor model based on multi-Laplacian and kernel risk sensitive loss was proposed, which introduces a kernel risk loss function derived from the K-local hyperplane distance nearest neighbor model as well as combining the Laplacian regularization method to predict therapeutic peptides. The findings indicated that the suggested approach achieved satisfactory results and could effectively predict therapeutic peptide sequences.
Collapse
Affiliation(s)
- Wenyu Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, No. 2006 Xiyuan Avenue, High tech Zone, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No.1 Chengdian Road, Kecheng District, Quzhou 324000, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No.1 Chengdian Road, Kecheng District, Quzhou 324000, China
| | - Leyi Wei
- Macao Polytechnic University, Gomes Street, Macau Peninsula, Macau 999078, China
| | - Xiaoyi Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, No.1 Chengdian Road, Kecheng District, Quzhou 324000, China
| | - Fengming Ni
- Department of Gastroenterology, The First Hospital of Jilin University, No. 71 Xinmin Street, Chaoyang District, Changchun 130021, China
| |
Collapse
|
23
|
Li J, Ren J, Dai W, Stubenrauch C, Finn RD, Wang J. Fungtion: A Server for Predicting and Visualizing Fungal Effector Proteins. J Mol Biol 2024; 436:168613. [PMID: 39237206 DOI: 10.1016/j.jmb.2024.168613] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2024] [Revised: 05/11/2024] [Accepted: 05/13/2024] [Indexed: 09/07/2024]
Abstract
Fungal pathogens pose significant threats to plant health by secreting effectors that manipulate plant-host defences. However, identifying effector proteins remains challenging, in part because they lack common sequence motifs. Here, we introduce Fungtion (Fungal effector prediction), a toolkit leveraging a hybrid framework to accurately predict and visualize fungal effectors. By combining global patterns learned from pretrained protein language models with refined information from known effectors, Fungtion achieves state-of-the-art prediction performance. Additionally, the interactive visualizations we have developed enable researchers to explore both sequence- and high-level relationships between the predicted and known effectors, facilitating effector function discovery, annotation, and hypothesis formulation regarding plant-pathogen interactions. We anticipate Fungtion to be a valuable resource for biologists seeking deeper insights into fungal effector functions and for computational biologists aiming to develop future methodologies for fungal effector prediction: https://step3.erc.monash.edu/Fungtion/.
Collapse
Affiliation(s)
- Jiahui Li
- Biomedicine Discovery Institute, Monash University, VIC 3800, Australia; Centre to Impact AMR, Monash University, VIC 3800, Australia
| | - Jinzheng Ren
- Biomedicine Discovery Institute, Monash University, VIC 3800, Australia; Centre to Impact AMR, Monash University, VIC 3800, Australia; College of Engineering, Computing and Cybernetics, Australian National University, Canberra, ACT 2600, Australia
| | - Wei Dai
- Biomedicine Discovery Institute, Monash University, VIC 3800, Australia; Centre to Impact AMR, Monash University, VIC 3800, Australia
| | - Christopher Stubenrauch
- Biomedicine Discovery Institute, Monash University, VIC 3800, Australia; Centre to Impact AMR, Monash University, VIC 3800, Australia
| | - Robert D Finn
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| | - Jiawei Wang
- Biomedicine Discovery Institute, Monash University, VIC 3800, Australia; Centre to Impact AMR, Monash University, VIC 3800, Australia; European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK.
| |
Collapse
|
24
|
Sangaraju VK, Pham NT, Wei L, Yu X, Manavalan B. mACPpred 2.0: Stacked Deep Learning for Anticancer Peptide Prediction with Integrated Spatial and Probabilistic Feature Representations. J Mol Biol 2024; 436:168687. [PMID: 39237191 DOI: 10.1016/j.jmb.2024.168687] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 05/28/2024] [Accepted: 06/20/2024] [Indexed: 09/07/2024]
Abstract
Anticancer peptides (ACPs), naturally occurring molecules with remarkable potential to target and kill cancer cells. However, identifying ACPs based solely from their primary amino acid sequences remains a major hurdle in immunoinformatics. In the past, several web-based machine learning (ML) tools have been proposed to assist researchers in identifying potential ACPs for further testing. Notably, our meta-approach method, mACPpred, introduced in 2019, has significantly advanced the field of ACP research. Given the exponential growth in the number of characterized ACPs, there is now a pressing need to create an updated version of mACPpred. To develop mACPpred 2.0, we constructed an up-to-date benchmarking dataset by integrating all publicly available ACP datasets. We employed a large-scale of feature descriptors, encompassing both conventional feature descriptors and advanced pre-trained natural language processing (NLP)-based embeddings. We evaluated their ability to discriminate between ACPs and non-ACPs using eleven different classifiers. Subsequently, we employed a stacked deep learning (SDL) approach, incorporating 1D convolutional neural network (1D CNN) blocks and hybrid features. These features included the top seven performing NLP-based features and 90 probabilistic features, allowing us to identify hidden patterns within these diverse features and improve the accuracy of our ACP prediction model. This is the first study to integrate spatial and probabilistic feature representations for predicting ACPs. Rigorous cross-validation and independent tests conclusively demonstrated that mACPpred 2.0 not only surpassed its predecessor (mACPpred) but also outperformed the existing state-of-the-art predictors, highlighting the importance of advanced feature representation capabilities attained through SDL. To facilitate widespread use and accessibility, we have developed a user-friendly for mACPpred 2.0, available at https://balalab-skku.org/mACPpred2/.
Collapse
Affiliation(s)
- Vinoth Kumar Sangaraju
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea
| | - Nhat Truong Pham
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea
| | - Leyi Wei
- Faculty of Applied Sciences, Macao Polytechnic University, Macau
| | - Xue Yu
- Beidahuang Industry Group General Hospital, 150001 Harbin, China.
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea.
| |
Collapse
|
25
|
Rukh G, Akbar S, Rehman G, Alarfaj FK, Zou Q. StackedEnC-AOP: prediction of antioxidant proteins using transform evolutionary and sequential features based multi-scale vector with stacked ensemble learning. BMC Bioinformatics 2024; 25:256. [PMID: 39098908 PMCID: PMC11298090 DOI: 10.1186/s12859-024-05884-6] [Citation(s) in RCA: 25] [Impact Index Per Article: 25.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2024] [Accepted: 07/29/2024] [Indexed: 08/06/2024] Open
Abstract
BACKGROUND Antioxidant proteins are involved in several biological processes and can protect DNA and cells from the damage of free radicals. These proteins regulate the body's oxidative stress and perform a significant role in many antioxidant-based drugs. The current invitro-based medications are costly, time-consuming, and unable to efficiently screen and identify the targeted motif of antioxidant proteins. METHODS In this model, we proposed an accurate prediction method to discriminate antioxidant proteins namely StackedEnC-AOP. The training sequences are formulation encoded via incorporating a discrete wavelet transform (DWT) into the evolutionary matrix to decompose the PSSM-based images via two levels of DWT to form a Pseudo position-specific scoring matrix (PsePSSM-DWT) based embedded vector. Additionally, the Evolutionary difference formula and composite physiochemical properties methods are also employed to collect the structural and sequential descriptors. Then the combined vector of sequential features, evolutionary descriptors, and physiochemical properties is produced to cover the flaws of individual encoding schemes. To reduce the computational cost of the combined features vector, the optimal features are chosen using Minimum redundancy and maximum relevance (mRMR). The optimal feature vector is trained using a stacking-based ensemble meta-model. RESULTS Our developed StackedEnC-AOP method reported a prediction accuracy of 98.40% and an AUC of 0.99 via training sequences. To evaluate model validation, the StackedEnC-AOP training model using an independent set achieved an accuracy of 96.92% and an AUC of 0.98. CONCLUSION Our proposed StackedEnC-AOP strategy performed significantly better than current computational models with a ~ 5% and ~ 3% improved accuracy via training and independent sets, respectively. The efficacy and consistency of our proposed StackedEnC-AOP make it a valuable tool for data scientists and can execute a key role in research academia and drug design.
Collapse
Affiliation(s)
- Gul Rukh
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| | - Shahid Akbar
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| | - Gauhar Rehman
- Department of Zoology, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| | - Fawaz Khaled Alarfaj
- Department of Management Information Systems (MIS), School of Business, King Faisal University (KFU), 31982, Al-Ahsa, Saudi Arabia
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China.
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, People's Republic of China.
| |
Collapse
|
26
|
Garai S, Thomas J, Dey P, Das D. LGBM-ACp: an ensemble model for anticancer peptide prediction and in silico screening with potential drug targets. Mol Divers 2024; 28:1965-1981. [PMID: 36637711 DOI: 10.1007/s11030-023-10602-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 01/06/2023] [Indexed: 01/14/2023]
Abstract
Conventional cancer therapies are highly expensive and have serious complications. An alternative approach now emphasizes on the development of small, biologically active peptides without acute toxicity. Experimental screening to find curative anticancer peptides (ACP) often gives rise to multiple obstacles and is time dependent. Consequently, developing an effective computational technique to identify promising ACP candidates prior to preclinical research is in high demand. This study proposed a machine-learning framework that used the light gradient-boosting machine as a classifier and two compositional and two binary profile features as input. The ensemble model displayed an accuracy, MCC, and AUROC of 97.52%, 0.91, and 0.98, respectively, which outclassed most of the existing sequence-based computational tools. A distinct dataset of non-mutagenic, non-toxic, and non-inhibitory Cytochrome P-450 peptides was used to validate the hybrid model. The most relevant ACP in the alternative dataset was compared with two standard ACPs, beta defensin 2, and cecropin-A. Molecular docking of the predicted peptide revealed that it has a strong binding affinity with twenty-five anticancer drug targets, most notably phosphoenolpyruvate carboxykinase (- 7.2 kcal/mol). Additionally, molecular dynamics simulation and principal component analysis supported the stability of the peptide-receptor complex. Overall, the present findings will take a step forward in rational drug design through rapid identification and screening of therapeutic peptides.
Collapse
Affiliation(s)
- Swarnava Garai
- Department of Bioengineering, NIT Agartala, Tripura, 799046, India
| | - Juanit Thomas
- Department of Bioengineering, NIT Agartala, Tripura, 799046, India
| | - Palash Dey
- Civil Engineering Department, The ICFAI University, Tripura, 799210, India
| | - Deeplina Das
- Department of Bioengineering, NIT Agartala, Tripura, 799046, India.
| |
Collapse
|
27
|
Arif M, Musleh S, Fida H, Alam T. PLMACPred prediction of anticancer peptides based on protein language model and wavelet denoising transformation. Sci Rep 2024; 14:16992. [PMID: 39043738 PMCID: PMC11266708 DOI: 10.1038/s41598-024-67433-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Accepted: 07/11/2024] [Indexed: 07/25/2024] Open
Abstract
Anticancer peptides (ACPs) perform a promising role in discovering anti-cancer drugs. The growing research on ACPs as therapeutic agent is increasing due to its minimal side effects. However, identifying novel ACPs using wet-lab experiments are generally time-consuming, labor-intensive, and expensive. Leveraging computational methods for fast and accurate prediction of ACPs would harness the drug discovery process. Herein, a machine learning-based predictor, called PLMACPred, is developed for identifying ACPs from peptide sequence only. PLMACPred adopted a set of encoding schemes representing evolutionary-property, composition-property, and protein language model (PLM), i.e., evolutionary scale modeling (ESM-2)- and ProtT5-based embedding to encode peptides. Then, two-dimensional (2D) wavelet denoising (WD) was employed to remove the noise from extracted features. Finally, ensemble-based cascade deep forest (CDF) model was developed to identify ACP. PLMACPred model attained superior performance on all three benchmark datasets, namely, ACPmain, ACPAlter, and ACP740 over tenfold cross validation and independent dataset. PLMACPred outperformed the existing models and improved the prediction accuracy by 18.53%, 2.4%, 7.59% on ACPmain, ACPalter, ACP740 dataset, respectively. We showed that embedding from ProtT5 and ESM-2 was capable of capturing better contextual information from the entire sequence than the other encoding schemes for ACP prediction. For the explainability of proposed model, SHAP (SHapley Additive exPlanations) method was used to analyze the feature effect on the ACP prediction. A list of novel sequence motifs was proposed from the ACP sequence using MEME suites. We believe, PLMACPred will support in accelerating the discovery of novel ACPs as well as other activities of microbial peptides.
Collapse
Affiliation(s)
- Muhammad Arif
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Saleh Musleh
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Huma Fida
- Department of Microbiology, Abdul Wali Khan University, Mardan, KPK, Pakistan
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar.
| |
Collapse
|
28
|
Salam A, Ullah F, Amin F, Ahmad Khan I, Garcia Villena E, Kuc Castilla A, de la Torre I. Efficient prediction of anticancer peptides through deep learning. PeerJ Comput Sci 2024; 10:e2171. [PMID: 39145253 PMCID: PMC11323142 DOI: 10.7717/peerj-cs.2171] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 06/11/2024] [Indexed: 08/16/2024]
Abstract
Background Cancer remains one of the leading causes of mortality globally, with conventional chemotherapy often resulting in severe side effects and limited effectiveness. Recent advancements in bioinformatics and machine learning, particularly deep learning, offer promising new avenues for cancer treatment through the prediction and identification of anticancer peptides. Objective This study aimed to develop and evaluate a deep learning model utilizing a two-dimensional convolutional neural network (2D CNN) to enhance the prediction accuracy of anticancer peptides, addressing the complexities and limitations of current prediction methods. Methods A diverse dataset of peptide sequences with annotated anticancer activity labels was compiled from various public databases and experimental studies. The sequences were preprocessed and encoded using one-hot encoding and additional physicochemical properties. The 2D CNN model was trained and optimized using this dataset, with performance evaluated through metrics such as accuracy, precision, recall, F1-score, and area under the receiver operating characteristic curve (AUC-ROC). Results The proposed 2D CNN model achieved superior performance compared to existing methods, with an accuracy of 0.87, precision of 0.85, recall of 0.89, F1-score of 0.87, and an AUC-ROC value of 0.91. These results indicate the model's effectiveness in accurately predicting anticancer peptides and capturing intricate spatial patterns within peptide sequences. Conclusion The findings demonstrate the potential of deep learning, specifically 2D CNNs, in advancing the prediction of anticancer peptides. The proposed model significantly improves prediction accuracy, offering a valuable tool for identifying effective peptide candidates for cancer treatment. Future Work Further research should focus on expanding the dataset, exploring alternative deep learning architectures, and validating the model's predictions through experimental studies. Efforts should also aim at optimizing computational efficiency and translating these predictions into clinical applications.
Collapse
Affiliation(s)
- Abdu Salam
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Faizan Ullah
- Department of Computer Science, Bacha Khan University, Charsadda, Pakistan
| | - Farhan Amin
- School of Computer Science and Engineering, Yeungnam University, Gyeongsan, Republic of Korea
| | - Izaz Ahmad Khan
- Department of Computer Science, Bacha Khan University, Charsadda, Pakistan
| | | | | | | |
Collapse
|
29
|
Zhang L, Hu X, Xiao K, Kong L. Effective identification and differential analysis of anticancer peptides. Biosystems 2024; 241:105246. [PMID: 38848816 DOI: 10.1016/j.biosystems.2024.105246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 05/27/2024] [Accepted: 06/04/2024] [Indexed: 06/09/2024]
Abstract
Anticancer peptides (ACPs) have recently emerged as promising cancer therapeutics due to their selectivity and lower toxicity. However, the number of experimentally validated ACPs is limited, and identifying ACPs from large-scale sequence data is time-consuming and expensive. Therefore, it is critical to develop and improve upon existing computational models for identifying ACPs. In this study, a computational method named ACP_DA was proposed based on peptide residue composition and physiochemical properties information. To curtail overfitting and reduce computational costs, a sequential forward selection method was utilized to construct the optimal feature groups. Subsequently, the feature vectors were fed into light gradient boosting machine classifier for model construction. It was observed by an independent set test that ACP_DA achieved the highest Matthew's correlation coefficient of 0.63 and accuracy of 0.8129, displaying at least a 2% enhancement compared to state-of-the-art methods. The satisfactory results demonstrate the effectiveness of ACP_DA as a powerful tool for identifying ACPs, with the potential to significantly contribute to the development and optimization of promising therapies. The data and resource codes are available at https://github.com/Zlclab/ACP_DA.
Collapse
Affiliation(s)
- Lichao Zhang
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China; Hebei Innovation Center for Smart Perception and Applied Technology of Agricultural Data, Qinhuangdao, PR China
| | - Xueli Hu
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China
| | - Kang Xiao
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China
| | - Liang Kong
- Hebei Innovation Center for Smart Perception and Applied Technology of Agricultural Data, Qinhuangdao, PR China; School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao, PR China.
| |
Collapse
|
30
|
Li H, Jiang L, Yang K, Shang S, Li M, Lv Z. iNP_ESM: Neuropeptide Identification Based on Evolutionary Scale Modeling and Unified Representation Embedding Features. Int J Mol Sci 2024; 25:7049. [PMID: 39000158 PMCID: PMC11240975 DOI: 10.3390/ijms25137049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2024] [Revised: 06/17/2024] [Accepted: 06/25/2024] [Indexed: 07/16/2024] Open
Abstract
Neuropeptides are biomolecules with crucial physiological functions. Accurate identification of neuropeptides is essential for understanding nervous system regulatory mechanisms. However, traditional analysis methods are expensive and laborious, and the development of effective machine learning models continues to be a subject of current research. Hence, in this research, we constructed an SVM-based machine learning neuropeptide predictor, iNP_ESM, by integrating protein language models Evolutionary Scale Modeling (ESM) and Unified Representation (UniRep) for the first time. Our model utilized feature fusion and feature selection strategies to improve prediction accuracy during optimization. In addition, we validated the effectiveness of the optimization strategy with UMAP (Uniform Manifold Approximation and Projection) visualization. iNP_ESM outperforms existing models on a variety of machine learning evaluation metrics, with an accuracy of up to 0.937 in cross-validation and 0.928 in independent testing, demonstrating optimal neuropeptide recognition capabilities. We anticipate improved neuropeptide data in the future, and we believe that the iNP_ESM model will have broader applications in the research and clinical treatment of neurological diseases.
Collapse
Affiliation(s)
- Honghao Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Liangzhen Jiang
- College of Food and Biological Engineering, Chengdu University, Chengdu 610106, China
- Country Key Laboratory of Coarse Cereal Processing, Ministry of Agriculture and Rural Affairs, Chengdu 610106, China
| | - Kaixiang Yang
- College of Software Engineering, Sichuan University, Chengdu 610041, China
| | - Shulin Shang
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610041, China
| |
Collapse
|
31
|
Ghafoor H, Asim MN, Ibrahim MA, Ahmed S, Dengel A. CAPTURE: Comprehensive anti-cancer peptide predictor with a unique amino acid sequence encoder. Comput Biol Med 2024; 176:108538. [PMID: 38759585 DOI: 10.1016/j.compbiomed.2024.108538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 04/26/2024] [Accepted: 04/28/2024] [Indexed: 05/19/2024]
Abstract
Anticancer peptides (ACPs) key properties including bioactivity, high efficacy, low toxicity, and lack of drug resistance make them ideal candidates for cancer therapies. To deeply explore the potential of ACPs and accelerate development of cancer therapies, although 53 Artificial Intelligence supported computational predictors have been developed for ACPs and non ACPs classification but only one predictor has been developed for ACPs functional types annotations. Moreover, these predictors extract amino acids distribution patterns to transform peptides sequences into statistical vectors that are further fed to classifiers for discriminating peptides sequences and annotating peptides functional classes. Overall, these predictors remain fail in extracting diverse types of amino acids distribution patterns from peptide sequences. The paper in hand presents a unique CARE encoder that transforms peptides sequences into statistical vectors by extracting 4 different types of distribution patterns including correlation, distribution, composition, and transition. Across public benchmark dataset, proposed encoder potential is explored under two different evaluation settings namely; intrinsic and extrinsic. Extrinsic evaluation indicates that 12 different machine learning classifiers achieve superior performance with the proposed encoder as compared to 55 existing encoders. Furthermore, an intrinsic evaluation reveals that, unlike existing encoders, the proposed encoder generates more discriminative clusters for ACPs and non-ACPs classes. Across 8 public benchmark ACPs and non-ACPs classification datasets, proposed encoder and Adaboost classifier based CAPTURE predictor outperforms existing predictors with an average accuracy, recall and MCC score of 1%, 4%, and 2% respectively. In generalizeability evaluation case study, across 7 benchmark anti-microbial peptides classification datasets, CAPTURE surpasses existing predictors by an average AU-ROC of 2%. CAPTURE predictive pipeline along with label powerset method outperforms state-of-the-art ACPs functional types predictor by 5%, 5%, 5%, 6%, and 3% in terms of average accuracy, subset accuracy, precision, recall, and F1 respectively. CAPTURE web application is available at https://sds_genetic_analysis.opendfki.de/CAPTURE.
Collapse
Affiliation(s)
- Hina Ghafoor
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany.
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| |
Collapse
|
32
|
Song H, Lin X, Zhang H, Yin H. ACP-ESM2: The prediction of anticancer peptides based on pre-trained classifier. Comput Biol Chem 2024; 110:108091. [PMID: 38735271 DOI: 10.1016/j.compbiolchem.2024.108091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 04/07/2024] [Accepted: 04/29/2024] [Indexed: 05/14/2024]
Abstract
Anticancer peptides (ACPs) are a type of protein molecule that has anti-cancer activity and can inhibit cancer cell growth and survival. Traditional classification approaches for ACPs are expensive and time-consuming. This paper proposes a pre-trained classifier model, ESM2-GRU, for ACP prediction to make it easier to predict ACPs, gain a better understanding of the structural and functional differences of anti-cancer peptides, and optimize the design for the development of more effective anti-cancer treatment strategies. The model is made up of the ESM2 pre-trained model, a bidirectional GRU recurrent neural network, and a fully connected layer. ACP sequences are first fed into the ESM2 model, which then expands the dimensions before feeding the findings back into the bidirectional GRU recurrent neural network. Finally, the fully connected layer generates the ultimate output. Experimental validation demonstrates that the ESM2-GRU model greatly improves classification performance on the benchmark dataset ACP606, with AUC, ACC, and MCC values of 0.975, 0.852, and 0.738, respectively. This exceptional prediction potential helps to identify specific types of anti-cancer peptides, improving their targeting and selectivity and, therefore, furthering the development of tailored medicine and treatments.
Collapse
Affiliation(s)
- Huijia Song
- School of Information Engineering, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Xiaozhu Lin
- School of Information Engineering, Beijing Institute of Petrochemical Technology, Beijing 102617, China.
| | - Huainian Zhang
- School of Information Engineering, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Huijuan Yin
- School of Information Engineering, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| |
Collapse
|
33
|
Chen Z, Wang R, Guo J, Wang X. The role and future prospects of artificial intelligence algorithms in peptide drug development. Biomed Pharmacother 2024; 175:116709. [PMID: 38713945 DOI: 10.1016/j.biopha.2024.116709] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 05/01/2024] [Accepted: 05/02/2024] [Indexed: 05/09/2024] Open
Abstract
Peptide medications have been more well-known in recent years due to their many benefits, including low side effects, high biological activity, specificity, effectiveness, and so on. Over 100 peptide medications have been introduced to the market to treat a variety of illnesses. Most of these peptide medications are developed on the basis of endogenous peptides or natural peptides, which frequently required expensive, time-consuming, and extensive tests to confirm. As artificial intelligence advances quickly, it is now possible to build machine learning or deep learning models that screen a large number of candidate sequences for therapeutic peptides. Therapeutic peptides, such as those with antibacterial or anticancer properties, have been developed by the application of artificial intelligence algorithms.The process of finding and developing peptide drugs is outlined in this review, along with a few related cases that were helped by AI and conventional methods. These resources will open up new avenues for peptide drug development and discovery, helping to meet the pressing needs of clinical patients for disease treatment. Although peptide drugs are a new class of biopharmaceuticals that distinguish them from chemical and small molecule drugs, their clinical purpose and value cannot be ignored. However, the traditional peptide drug research and development has a long development cycle and high investment, and the creation of peptide medications will be substantially hastened by the AI-assisted (AI+) mode, offering a new boost for combating diseases.
Collapse
Affiliation(s)
- Zhiheng Chen
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100083, China.
| | - Ruoxi Wang
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100083, China.
| | - Junqi Guo
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100083, China.
| | - Xiaogang Wang
- Guangdong Provincial Key Laboratory of Bone and Joint Degenerative Diseases, The Third Affiliated Hospital of Southern Medical University, Guangzhou, Guangdong 510630, China.
| |
Collapse
|
34
|
Noda R, Ichikawa D, Shibagaki Y. Machine learning-based diagnostic prediction of IgA nephropathy: model development and validation study. Sci Rep 2024; 14:12426. [PMID: 38816457 PMCID: PMC11139869 DOI: 10.1038/s41598-024-63339-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2024] [Accepted: 05/28/2024] [Indexed: 06/01/2024] Open
Abstract
IgA nephropathy progresses to kidney failure, making early detection important. However, definitive diagnosis depends on invasive kidney biopsy. This study aimed to develop non-invasive prediction models for IgA nephropathy using machine learning. We collected retrospective data on demographic characteristics, blood tests, and urine tests of the patients who underwent kidney biopsy. The dataset was divided into derivation and validation cohorts, with temporal validation. We employed five machine learning models-eXtreme Gradient Boosting (XGBoost), LightGBM, Random Forest, Artificial Neural Networks, and 1 Dimentional-Convolutional Neural Network (1D-CNN)-and logistic regression, evaluating performance via the area under the receiver operating characteristic curve (AUROC) and explored variable importance through SHapley Additive exPlanations method. The study included 1268 participants, with 353 (28%) diagnosed with IgA nephropathy. In the derivation cohort, LightGBM achieved the highest AUROC of 0.913 (95% CI 0.906-0.919), significantly higher than logistic regression, Artificial Neural Network, and 1D-CNN, not significantly different from XGBoost and Random Forest. In the validation cohort, XGBoost demonstrated the highest AUROC of 0.894 (95% CI 0.850-0.935), maintaining its robust performance. Key predictors identified were age, serum albumin, IgA/C3, and urine red blood cells, aligning with existing clinical insights. Machine learning can be a valuable non-invasive tool for IgA nephropathy.
Collapse
Affiliation(s)
- Ryunosuke Noda
- Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan.
| | - Daisuke Ichikawa
- Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan
| | - Yugo Shibagaki
- Division of Nephrology and Hypertension, Department of Internal Medicine, St. Marianna University School of Medicine, 2-16-1 Sugao, Miyamae-Ku, Kawasaki, Kanagawa, 216-8511, Japan
| |
Collapse
|
35
|
Lin L, Li C, Zhang T, Xia C, Bai Q, Jin L, Shen Y. An in silico scheme for optimizing the enzymatic acquisition of natural biologically active peptides based on machine learning and virtual digestion. Anal Chim Acta 2024; 1298:342419. [PMID: 38462343 DOI: 10.1016/j.aca.2024.342419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 12/23/2023] [Accepted: 02/26/2024] [Indexed: 03/12/2024]
Abstract
BACKGROUND As a potential natural active substance, natural biologically active peptides (NBAPs) are recently attracting increasing attention. The traditional proteolysis methods of obtaining effective NBAPs are considerably vexing, especially since multiple proteases can be used, which blocks the exploration of available NBAPs. Although the development of virtual digesting brings some degree of convenience, the activity of the obtained peptides remains unclear, which would still not allow efficient access to the NBAPs. It is necessary to develop an efficient and accurate strategy for acquiring NBAPs. RESULTS A new in silico scheme named SSA-LSTM-VD, which combines a sparrow search algorithm-long short-term memory (SSA-LSTM) deep learning and virtually digested, was presented to optimize the proteolysis acquisition of NBAPs. Therein, SSA-LSTM reached the highest Efficiency value reached 98.00 % compared to traditional machine learning algorithms, and basic LSTM algorithm. SSA-LSTM was trained to predict the activity of peptides in the proteins virtually digested results, obtain the percentage of target active peptide, and select the appropriate protease for the actual experiment. As an application, SSA-LSTM was employed to predict the percentage of neuroprotective peptides in the virtual digested result of walnut protein, and trypsin was ultimately found to possess the highest value (85.29 %). The walnut protein was digested by trypsin (WPTrH) and the peptide sequence obtained was analyzed closely matches the theoretical neuroprotective peptide. More importantly, the neuroprotective effects of WPTrH had been demonstrated in nerve damage mouse models. SIGNIFICANCE The proposed SSA-LSTM-VD in this paper makes the acquisition of NBAPs efficient and accurate. The approach combines deep learning and virtually digested skillfully. Utilizing the SSA-LSTM-VD based strategy holds promise for discovering and developing peptides with neuroprotective properties or other desired biological activities.
Collapse
Affiliation(s)
- Like Lin
- Key Laboratory of Synthetic and Natural Functional Molecule of Ministry of Education, College of Chemistry and Materials Science, National Demonstration Center for Experimental Chemistry Education, Northwest University, Xi'an, Shaanxi, 710127, People's Republic of China
| | - Cong Li
- Key Laboratory of Synthetic and Natural Functional Molecule of Ministry of Education, College of Chemistry and Materials Science, National Demonstration Center for Experimental Chemistry Education, Northwest University, Xi'an, Shaanxi, 710127, People's Republic of China.
| | - Tianlong Zhang
- Key Laboratory of Synthetic and Natural Functional Molecule of Ministry of Education, College of Chemistry and Materials Science, National Demonstration Center for Experimental Chemistry Education, Northwest University, Xi'an, Shaanxi, 710127, People's Republic of China
| | - Chaoshuang Xia
- Center for Biomedical Mass Spectrometry, Boston University Chobanian and Avedisian School of Medicine, Boston, MA, 02118, United States
| | - Qiuhong Bai
- Key Laboratory of Synthetic and Natural Functional Molecule of Ministry of Education, College of Chemistry and Materials Science, National Demonstration Center for Experimental Chemistry Education, Northwest University, Xi'an, Shaanxi, 710127, People's Republic of China
| | - Lihua Jin
- Key Laboratory of Synthetic and Natural Functional Molecule of Ministry of Education, College of Chemistry and Materials Science, National Demonstration Center for Experimental Chemistry Education, Northwest University, Xi'an, Shaanxi, 710127, People's Republic of China
| | - Yehua Shen
- Key Laboratory of Synthetic and Natural Functional Molecule of Ministry of Education, College of Chemistry and Materials Science, National Demonstration Center for Experimental Chemistry Education, Northwest University, Xi'an, Shaanxi, 710127, People's Republic of China.
| |
Collapse
|
36
|
Xu X, Li C, Yuan X, Zhang Q, Liu Y, Zhu Y, Chen T. ACP-DRL: an anticancer peptides recognition method based on deep representation learning. Front Genet 2024; 15:1376486. [PMID: 38655048 PMCID: PMC11035771 DOI: 10.3389/fgene.2024.1376486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 03/25/2024] [Indexed: 04/26/2024] Open
Abstract
Cancer, a significant global public health issue, resulted in about 10 million deaths in 2022. Anticancer peptides (ACPs), as a category of bioactive peptides, have emerged as a focal point in clinical cancer research due to their potential to inhibit tumor cell proliferation with minimal side effects. However, the recognition of ACPs through wet-lab experiments still faces challenges of low efficiency and high cost. Our work proposes a recognition method for ACPs named ACP-DRL based on deep representation learning, to address the challenges associated with the recognition of ACPs in wet-lab experiments. ACP-DRL marks initial exploration of integrating protein language models into ACPs recognition, employing in-domain further pre-training to enhance the development of deep representation learning. Simultaneously, it employs bidirectional long short-term memory networks to extract amino acid features from sequences. Consequently, ACP-DRL eliminates constraints on sequence length and the dependence on manual features, showcasing remarkable competitiveness in comparison with existing methods.
Collapse
Affiliation(s)
- Xiaofang Xu
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences(Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Chaoran Li
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences(Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Xinpu Yuan
- Department of General Surgery, First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Qiangjian Zhang
- Institute of Dataspace, Hefei Comprehensive National Science Center, Hefei, China
| | - Yi Liu
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences(Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Yunping Zhu
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences(Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Tao Chen
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences(Beijing), Beijing Institute of Lifeomics, Beijing, China
| |
Collapse
|
37
|
Lee B, Shin D. Contrastive learning for enhancing feature extraction in anticancer peptides. Brief Bioinform 2024; 25:bbae220. [PMID: 38725157 PMCID: PMC11082072 DOI: 10.1093/bib/bbae220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 03/28/2024] [Accepted: 04/21/2024] [Indexed: 05/13/2024] Open
Abstract
Cancer, recognized as a primary cause of death worldwide, has profound health implications and incurs a substantial social burden. Numerous efforts have been made to develop cancer treatments, among which anticancer peptides (ACPs) are garnering recognition for their potential applications. While ACP screening is time-consuming and costly, in silico prediction tools provide a way to overcome these challenges. Herein, we present a deep learning model designed to screen ACPs using peptide sequences only. A contrastive learning technique was applied to enhance model performance, yielding better results than a model trained solely on binary classification loss. Furthermore, two independent encoders were employed as a replacement for data augmentation, a technique commonly used in contrastive learning. Our model achieved superior performance on five of six benchmark datasets against previous state-of-the-art models. As prediction tools advance, the potential in peptide-based cancer therapeutics increases, promising a brighter future for oncology research and patient care.
Collapse
Affiliation(s)
- Byungjo Lee
- Research Institute, National Cancer Center, 323, Ilsan-ro, Ilsandong-gu, Goyang, 10408, Republic of Korea
| | - Dongkwan Shin
- Research Institute, National Cancer Center, 323, Ilsan-ro, Ilsandong-gu, Goyang, 10408, Republic of Korea
- Department of Cancer Biomedical Science, National Cancer Center Graduate School of Cancer Science and Policy, 323, Ilsan-ro, Ilsandong-gu, Goyang, 10408, Republic of Korea
| |
Collapse
|
38
|
Wu X, Lin H, Bai R, Duan H. Deep learning for advancing peptide drug development: Tools and methods in structure prediction and design. Eur J Med Chem 2024; 268:116262. [PMID: 38387334 DOI: 10.1016/j.ejmech.2024.116262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Revised: 02/06/2024] [Accepted: 02/17/2024] [Indexed: 02/24/2024]
Abstract
Peptides can bind challenging disease targets with high affinity and specificity, offering enormous opportunities for addressing unmet medical needs. However, peptides' unique features, including smaller size, increased structural flexibility, and limited data availability, pose additional challenges to the design process compared to proteins. This review explores the dynamic field of peptide therapeutics, leveraging deep learning to enhance structure prediction and design. Our exploration encompasses various facets of peptide research, ranging from dataset curation handling to model development. As deep learning technologies become more refined, we channel our efforts into peptide structure prediction and design, aligning with the fundamental principles of structure-activity relationships in drug development. To guide researchers in harnessing the potential of deep learning to advance peptide drug development, our insights comprehensively explore current challenges and future directions of peptide therapeutics.
Collapse
Affiliation(s)
- Xinyi Wu
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, PR China
| | - Huitian Lin
- College of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, PR China
| | - Renren Bai
- School of Pharmacy, Hangzhou Normal University, Hangzhou, 311121, PR China.
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, PR China.
| |
Collapse
|
39
|
Azad H, Akbar MY, Sarfraz J, Haider W, Riaz MN, Ali GM, Ghazanfar S. G-ACP: a machine learning approach to the prediction of therapeutic peptides for gastric cancer. J Biomol Struct Dyn 2024:1-14. [PMID: 38450672 DOI: 10.1080/07391102.2024.2323141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 02/15/2024] [Indexed: 03/08/2024]
Abstract
Conventional Gastrointestinal (GI) cancer treatments are quite expensive and have major hazards. Nowadays, a different strategy places more emphasis on creating tiny biologically active peptides that do not cause severe poisoning. Anticancer peptides (ACPs) are found through experimental screening, which is time-dependent and frequently fraught with difficulties. Gastric ACPs are emerging as a promising GI cancer treatment in the current day. It is crucial to identify novel gastric ACPs to have an improved knowledge of their functioning processes and treatment of gastric cancer. As a result of the post-genomic era's massive production of peptide sequences, rapid and effective ACPs using a computational method are essential. Several adaptive statistical techniques for distinguishing ACPs and non-ACPs have recently been developed. A variety of adapted statistically significant methods have been developed to differentiate between ACPs and non-ACPs. Despite significant progress, there is no specific model for the prediction of gastric ACPs because the specific model will predict a particular type of peptide more accurately and quickly. To overcome this, an initiative is taken for the creation of a reliable framework for the accurate identification of gastric ACPs. The current technique in particular contains four possible features along with one hybrid feature encoding mechanisms which are the target-class motif previously indicated by Amino Acid Composition, Dipeptide Composition, Tripeptide Composition (TPC), Pseudo Amino Acid Composition (PAAC), and their Hybrid. Machine Learning algorithms make high-performance and accurate prediction tools. Moreover, highly variable and ideal deep feature selection is done using an ANOVA-based F score for feature pruning. Experiments on a range of algorithms are carried out to identify the optimal operating strategy due to the diverse nature of learning. Following analysis of the empirical results, Naïve Bayes with TPC and Hybrid feature space outperforms other methods with 0.99 accuracy score on the testing dataset. To find the model generalization an external validation is carried out. In external datasets, the Extra Trees with PAAC features outperforms with the accuracy of 0.94. The comparison study shows that our suggested model will predict gastric ACPs more accurately and will be useful in drug development and gastric cancer. The predictive model can be freely accessed at https://github.com/humeraazad10/G-ACP.git.
Collapse
Affiliation(s)
- Humera Azad
- Department of Biosciences (Bioinformatics) Islamabad, Comsats University Islamabad, Pakistan
| | - Muhammad Yasir Akbar
- National Institute for Genomics and Advanced Biotechnology (NIGAB), National Agricultural Research Center (NARC), Pakistan
| | | | - Waseem Haider
- Department of Biosciences (Bioinformatics) Islamabad, Comsats University Islamabad, Pakistan
| | - Muhammad Naeem Riaz
- National Institute for Genomics and Advanced Biotechnology (NIGAB), National Agricultural Research Center (NARC), Pakistan
| | - Ghulam Muhammad Ali
- Department of Biosciences (Bioinformatics) Islamabad, Comsats University Islamabad, Pakistan
| | - Shakira Ghazanfar
- National Institute for Genomics and Advanced Biotechnology (NIGAB), National Agricultural Research Center (NARC), Pakistan
| |
Collapse
|
40
|
Zhang S, Zhao Y, Liang Y. AACFlow: an end-to-end model based on attention augmented convolutional neural network and flow-attention mechanism for identification of anticancer peptides. Bioinformatics 2024; 40:btae142. [PMID: 38452348 PMCID: PMC10973939 DOI: 10.1093/bioinformatics/btae142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 03/01/2024] [Accepted: 03/06/2024] [Indexed: 03/09/2024] Open
Abstract
MOTIVATION Anticancer peptides (ACPs) have natural cationic properties and can act on the anionic cell membrane of cancer cells to kill cancer cells. Therefore, ACPs have become a potential anticancer drug with good research value and prospect. RESULTS In this article, we propose AACFlow, an end-to-end model for identification of ACPs based on deep learning. End-to-end models have more room to automatically adjust according to the data, making the overall fit better and reducing error propagation. The combination of attention augmented convolutional neural network (AAConv) and multi-layer convolutional neural network (CNN) forms a deep representation learning module, which is used to obtain global and local information on the sequence. Based on the concept of flow network, multi-head flow-attention mechanism is introduced to mine the deep features of the sequence to improve the efficiency of the model. On the independent test dataset, the ACC, Sn, Sp, and AUC values of AACFlow are 83.9%, 83.0%, 84.8%, and 0.892, respectively, which are 4.9%, 1.5%, 8.0%, and 0.016 higher than those of the baseline model. The MCC value is 67.85%. In addition, we visualize the features extracted by each module to enhance the interpretability of the model. Various experiments show that our model is more competitive in predicting ACPs.
Collapse
Affiliation(s)
- Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Ya Zhao
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Yunyun Liang
- School of Science, Xi’an Polytechnic University, Xi'an 710048, China
| |
Collapse
|
41
|
Liu M, Wu T, Li X, Zhu Y, Chen S, Huang J, Zhou F, Liu H. ACPPfel: Explainable deep ensemble learning for anticancer peptides prediction based on feature optimization. Front Genet 2024; 15:1352504. [PMID: 38487252 PMCID: PMC10937565 DOI: 10.3389/fgene.2024.1352504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 02/19/2024] [Indexed: 03/17/2024] Open
Abstract
Background: Cancer is a significant global health problem that continues to cause a high number of deaths worldwide. Traditional cancer treatments often come with risks that can compromise the functionality of vital organs. As a potential alternative to these conventional therapies, Anticancer peptides (ACPs) have garnered attention for their small size, high specificity, and reduced toxicity, making them as a promising option for cancer treatments. Methods: However, the process of identifying effective ACPs through wet-lab screening experiments is time-consuming and requires a lot of labor. To overcome this challenge, a deep ensemble learning method is constructed to predict anticancer peptides (ACPs) in this study. To evaluate the reliability of the framework, four different datasets are used in this study for training and testing. During the training process of the model, integration of feature selection methods, feature dimensionality reduction measures, and optimization of the deep ensemble model are carried out. Finally, we explored the interpretability of features that affected the final prediction results and built a web server platform to facilitate anticancer peptides prediction, which can be used by all researchers for further studies. This web server can be accessed at http://lmylab.online:5001/. Results: The result of this study achieves an accuracy rate of 98.53% and an AUC (Area under Curve) value of 0.9972 on the ACPfel dataset, it has improvements on other datasets as well.
Collapse
Affiliation(s)
- Mingyou Liu
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
| | - Tao Wu
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
| | - Xue Li
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
| | - Yingxue Zhu
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
| | - Sen Chen
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
| | - Jian Huang
- School of Life Science and Technology, University of Electronic Science and Technology, Chengdu, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Fengfeng Zhou
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Hongmei Liu
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
42
|
Karim T, Shaon MSH, Sultan MF, Hasan MZ, Kafy AA. ANNprob-ACPs: A novel anticancer peptide identifier based on probabilistic feature fusion approach. Comput Biol Med 2024; 169:107915. [PMID: 38171261 DOI: 10.1016/j.compbiomed.2023.107915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 12/28/2023] [Accepted: 12/29/2023] [Indexed: 01/05/2024]
Abstract
Anticancer Peptides (ACPs) offer significant potential as cancer treatment drugs in this modern era. Quickly identifying active compounds from protein sequences is crucial for healthcare and cancer treatment. In this paper ANNprob-ACPs, a novel and effective model for detecting ACPs has been implemented based on nine feature encoding techniques, including AAC, CC, W2V, DPC, PAAC, QSO, CTDC, CTDT, and CKSAAGP. After analyzing the performance of several machine learning models, the six best models were selected based on their overall performances in every evaluation metric. The probability scores of each model were subsequently aggregated and used as input of our meta- model, called ANNprob-ACPs. Our model outperformed all others and its potential to lead to phenomenal identification of ACPs. The results of this study showed notable improvement in 10-fold cross-validation and independent test, with accuracy of 93.72% and 90.62%, respectively. Our proposed model, ANNprob-ACPs outperformed existing approaches in terms of accuracy and effectiveness in discovering ACPs. By using SHAP, this study obtained the physicochemical properties of QSO, and compositional properties of DPC, AAC, and PAAC are more impactful for our model's performances, which have a major impact on a drug's interactions and future discoveries. Consequently, this model is crucial for the future and has a high probability of detecting ACPs more frequently. We developed a web server of ANNprob-ACPs, which is accessible at ANNprob-ACPs webserver.
Collapse
Affiliation(s)
- Tasmin Karim
- Department of Computer Science & Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh; Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh.
| | - Md Shazzad Hossain Shaon
- Department of Computer Science & Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh; Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh.
| | - Md Fahim Sultan
- Department of Computer Science & Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh; Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh.
| | - Md Zahid Hasan
- Department of Computer Science & Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh; Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh.
| | - Abdulla-Al Kafy
- Department of Urban & Regional Planning, Rajshahi University of Engineering & Technology (RUET), Rajshahi, 6204, Bangladesh.
| |
Collapse
|
43
|
Wang J, Chen L, Qin S, Xie M, Luo SZ, Li W. Advances in biosynthesis of peptide drugs: Technology and industrialization. Biotechnol J 2024; 19:e2300256. [PMID: 37884278 DOI: 10.1002/biot.202300256] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 07/24/2023] [Accepted: 10/09/2023] [Indexed: 10/28/2023]
Abstract
Peptide drugs are developed from endogenous or synthetic peptides with specific biological activities. They have advantages of strong target specificity, high efficacy and low toxicity, thus showing great promise in the treatment of many diseases such as cancer, infections, and diabetes. Although an increasing number of peptide drugs have entered market in recent years, the preparation of peptide drug substances is yet a bottleneck problem for their industrial production. Comparing to the chemical synthesis method, peptide biosynthesis has advantages of simple synthesis, low cost, and low contamination. Therefore, the biosynthesis technology of peptide drugs has been widely used for manufacturing. Herein, we reviewed the development of peptide drugs and recent advances in peptide biosynthesis technology, in order to shed a light to the prospect of industrial production of peptide drugs based on biosynthesis technology.
Collapse
Affiliation(s)
- Jing Wang
- Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai, Shandong, China
- College of Pharmacy, Binzhou Medical University, Yantai, Shandong, China
| | - Long Chen
- Beijing Key Laboratory of Bioprocess, College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Song Qin
- Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai, Shandong, China
| | - Mingyuan Xie
- State Key Laboratory of Optoelectronic Materials and Technologies, School of Physics, Sun Yat-Sen University, Guangzhou, China
| | - Shi-Zhong Luo
- Beijing Key Laboratory of Bioprocess, College of Life Science and Technology, Beijing University of Chemical Technology, Beijing, China
| | - Wenjun Li
- Yantai Institute of Coastal Zone Research, Chinese Academy of Sciences, Yantai, Shandong, China
| |
Collapse
|
44
|
Yu H, Deng H, He J, Keasling JD, Luo X. UniKP: a unified framework for the prediction of enzyme kinetic parameters. Nat Commun 2023; 14:8211. [PMID: 38081905 PMCID: PMC10713628 DOI: 10.1038/s41467-023-44113-1] [Citation(s) in RCA: 47] [Impact Index Per Article: 23.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2023] [Accepted: 11/30/2023] [Indexed: 12/18/2023] Open
Abstract
Prediction of enzyme kinetic parameters is essential for designing and optimizing enzymes for various biotechnological and industrial applications, but the limited performance of current prediction tools on diverse tasks hinders their practical applications. Here, we introduce UniKP, a unified framework based on pretrained language models for the prediction of enzyme kinetic parameters, including enzyme turnover number (kcat), Michaelis constant (Km), and catalytic efficiency (kcat / Km), from protein sequences and substrate structures. A two-layer framework derived from UniKP (EF-UniKP) has also been proposed to allow robust kcat prediction in considering environmental factors, including pH and temperature. In addition, four representative re-weighting methods are systematically explored to successfully reduce the prediction error in high-value prediction tasks. We have demonstrated the application of UniKP and EF-UniKP in several enzyme discovery and directed evolution tasks, leading to the identification of new enzymes and enzyme mutants with higher activity. UniKP is a valuable tool for deciphering the mechanisms of enzyme kinetics and enables novel insights into enzyme engineering and their industrial applications.
Collapse
Affiliation(s)
- Han Yu
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Huaxiang Deng
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Jiahui He
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
| | - Jay D Keasling
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China
- Joint BioEnergy Institute, Emeryville, CA, 94608, USA
- Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
- Department of Chemical and Biomolecular Engineering & Department of Bioengineering, University of California, Berkeley, CA, 94720, USA
- Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, 2800, Kgs, Lyngby, Denmark
| | - Xiaozhou Luo
- Shenzhen Key Laboratory for the Intelligent Microbial Manufacturing of Medicines, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
- University of Chinese Academy of Sciences, Beijing, 100049, China.
- CAS Key Laboratory of Quantitative Engineering Biology, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
- Center for Synthetic Biochemistry, Shenzhen Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, 518055, China.
| |
Collapse
|
45
|
Yu S, Liao B, Zhu W, Peng D, Wu F. Accurate prediction and key protein sequence feature identification of cyclins. Brief Funct Genomics 2023; 22:411-419. [PMID: 37118891 DOI: 10.1093/bfgp/elad014] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2023] [Revised: 03/03/2023] [Accepted: 03/17/2023] [Indexed: 04/30/2023] Open
Abstract
Cyclin proteins are a group of proteins that activate the cell cycle by forming complexes with cyclin-dependent kinases. Identifying cyclins correctly can provide key clues to understanding the function of cyclins. However, due to the low similarity between cyclin protein sequences, the advancement of a machine learning-based approach to identify cycles is urgently needed. In this study, cyclin protein sequence features were extracted using the profile-based auto-cross covariance method. Then the features were ranked and selected with maximum relevance-maximum distance (MRMD) 1.0 and MRMD2.0. Finally, the prediction model was assessed through 10-fold cross-validation. The computational experiments showed that the best protein sequence features generated by MRMD1.0 could correctly predict 98.2% of cyclins using the random forest (RF) classifier, whereas seven-dimensional key protein sequence features identified with MRMD2.0 could correctly predict 96.1% of cyclins, which was superior to previous studies on the same dataset both in terms of dimensionality and performance comparisons. Therefore, our work provided a valuable tool for identifying cyclins. The model data can be downloaded from https://github.com/YUshunL/cyclin.
Collapse
Affiliation(s)
- Shaoyou Yu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Bo Liao
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Wen Zhu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Dejun Peng
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Fangxiang Wu
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China
- Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| |
Collapse
|
46
|
Li J, Ma S, Pei H, Jiang J, Zou Q, Lv Z. Review of T cell proliferation regulatory factors in treatment and prognostic prediction for solid tumors. Heliyon 2023; 9:e21329. [PMID: 37954355 PMCID: PMC10637962 DOI: 10.1016/j.heliyon.2023.e21329] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2023] [Revised: 10/15/2023] [Accepted: 10/19/2023] [Indexed: 11/14/2023] Open
Abstract
T cell proliferation regulators (Tcprs), which are positive regulators that promote T cell function, have made great contributions to the development of therapies to improve T cell function. CAR (chimeric antigen receptor) -T cell therapy, a type of adoptive cell transfer therapy that targets tumor cells and enhances immune lethality, has led to significant progress in the treatment of hematologic tumors. However, the applications of CAR-T in solid tumor treatment remain limited. Therefore, in this review, we focus on the development of Tcprs for solid tumor therapy and prognostic prediction. We summarize potential strategies for targeting different Tcprs to enhance T cell proliferation and activation and inhibition of cancer progression, thereby improving the antitumor activity and persistence of CAR-T. In summary, we propose means of enhancing CAR-T cells by expressing different Tcprs, which may lead to the development of a new generation of cell therapies.
Collapse
Affiliation(s)
- Jiayu Li
- Student Innovation Competition Team, College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
- College of Life Science, Sichuan University, Chengdu 610065, China
| | - Shuhan Ma
- Student Innovation Competition Team, College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Hongdi Pei
- Student Innovation Competition Team, College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Jici Jiang
- Student Innovation Competition Team, College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zhibin Lv
- Student Innovation Competition Team, College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
47
|
Wang Z, Meng J, Li H, Xia S, Wang Y, Luan Y. PAMPred: A hierarchical evolutionary ensemble framework for identifying plant antimicrobial peptides. Comput Biol Med 2023; 166:107545. [PMID: 37806057 DOI: 10.1016/j.compbiomed.2023.107545] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/04/2023] [Accepted: 09/28/2023] [Indexed: 10/10/2023]
Abstract
Antimicrobial peptides (AMPs) play a crucial role in plant immune regulation, growth and development stages, which have attracted significant attentions in recent years. As the wet-lab experiments are laborious and cost-prohibitive, it is indispensable to develop computational methods to discover novel plant AMPs accurately. In this study, we presented a hierarchical evolutionary ensemble framework, named PAMPred, which consisted of a multi-level heterogeneous architecture to identify plant AMPs. Specifically, to address the existing class imbalance problem, a cluster-based resampling method was adopted to build multiple balanced subsets. Then, several peptide features including sequence information-based and physicochemical properties-based features were fed into the different types of basic learners to increase the ensemble diversity. For boosting the predictive capability of PAMPred, the improved particle swarm optimization (PSO) algorithm and dynamic ensemble pruning strategy were used to optimize the weights at different levels adaptively. Furthermore, extensive ten-fold cross-validation and independent testing experimental results demonstrated that PAMPred achieved excellent prediction performance and generalization ability, and outperformed the state-of-the-art methods. It also indicated that the proposed method could serve as an effective auxiliary tool to identify plant AMPs, which would be conducive to explore the immune regulatory mechanism of plants.
Collapse
Affiliation(s)
- Zhaowei Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China.
| | - Haibin Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Shihao Xia
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Yu Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, Liaoning 116024, China
| |
Collapse
|
48
|
Sui J, Chen J, Chen Y, Iwamori N, Sun J. Identification of plant vacuole proteins by using graph neural network and contact maps. BMC Bioinformatics 2023; 24:357. [PMID: 37740195 PMCID: PMC10517492 DOI: 10.1186/s12859-023-05475-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2023] [Accepted: 09/12/2023] [Indexed: 09/24/2023] Open
Abstract
Plant vacuoles are essential organelles in the growth and development of plants, and accurate identification of their proteins is crucial for understanding their biological properties. In this study, we developed a novel model called GraphIdn for the identification of plant vacuole proteins. The model uses SeqVec, a deep representation learning model, to initialize the amino acid sequence. We utilized the AlphaFold2 algorithm to obtain the structural information of corresponding plant vacuole proteins, and then fed the calculated contact maps into a graph convolutional neural network. GraphIdn achieved accuracy values of 88.51% and 89.93% in independent testing and fivefold cross-validation, respectively, outperforming previous state-of-the-art predictors. As far as we know, this is the first model to use predicted protein topology structure graphs to identify plant vacuole proteins. Furthermore, we assessed the effectiveness and generalization capability of our GraphIdn model by applying it to identify and locate peroxisomal proteins, which yielded promising outcomes. The source code and datasets can be accessed at https://github.com/SJNNNN/GraphIdn .
Collapse
Affiliation(s)
- Jianan Sui
- School of Information Science and Engineering, University of Jinan, Jinan, China
| | - Jiazi Chen
- Laboratory of Zoology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka-Shi, Fukuoka, Japan
| | - Yuehui Chen
- School of Artificial Intelligence Institute and Information Science and Engineering, University of Jinan, Jinan, China.
| | - Naoki Iwamori
- Laboratory of Zoology, Graduate School of Bioresource and Bioenvironmental Sciences, Kyushu University, Fukuoka-Shi, Fukuoka, Japan
| | - Jin Sun
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, China
| |
Collapse
|
49
|
Zhang X, Guo H, Zhang F, Wang X, Wu K, Qiu S, Liu B, Wang Y, Hu Y, Li J. HNetGO: protein function prediction via heterogeneous network transformer. Brief Bioinform 2023; 24:bbab556. [PMID: 37861172 PMCID: PMC10588005 DOI: 10.1093/bib/bbab556] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 11/18/2021] [Accepted: 12/04/2021] [Indexed: 10/21/2023] Open
Abstract
Protein function annotation is one of the most important research topics for revealing the essence of life at molecular level in the post-genome era. Current research shows that integrating multisource data can effectively improve the performance of protein function prediction models. However, the heavy reliance on complex feature engineering and model integration methods limits the development of existing methods. Besides, models based on deep learning only use labeled data in a certain dataset to extract sequence features, thus ignoring a large amount of existing unlabeled sequence data. Here, we propose an end-to-end protein function annotation model named HNetGO, which innovatively uses heterogeneous network to integrate protein sequence similarity and protein-protein interaction network information and combines the pretraining model to extract the semantic features of the protein sequence. In addition, we design an attention-based graph neural network model, which can effectively extract node-level features from heterogeneous networks and predict protein function by measuring the similarity between protein nodes and gene ontology term nodes. Comparative experiments on the human dataset show that HNetGO achieves state-of-the-art performance on cellular component and molecular function branches.
Collapse
Affiliation(s)
- Xiaoshuai Zhang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Huannan Guo
- General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin 150086, China
| | - Fan Zhang
- Center NHC Key Laboratory of Cell Transplantation, The First Affiliated Hospital of Harbin Medical University, Harbin 150086, China
| | - Xuan Wang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Kaitao Wu
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Shizheng Qiu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Bo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Yang Hu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Junyi Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| |
Collapse
|
50
|
Tao H, Shan S, Fu H, Zhu C, Liu B. An Augmented Sample Selection Framework for Prediction of Anticancer Peptides. Molecules 2023; 28:6680. [PMID: 37764455 PMCID: PMC10535447 DOI: 10.3390/molecules28186680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 09/14/2023] [Accepted: 09/15/2023] [Indexed: 09/29/2023] Open
Abstract
Anticancer peptides (ACPs) have promising prospects for cancer treatment. Traditional ACP identification experiments have the limitations of low efficiency and high cost. In recent years, data-driven deep learning techniques have shown significant potential for ACP prediction. However, data-driven prediction models rely heavily on extensive training data. Furthermore, the current publicly accessible ACP dataset is limited in size, leading to inadequate model generalization. While data augmentation effectively expands dataset size, existing techniques for augmenting ACP data often generate noisy samples, adversely affecting prediction performance. Therefore, this paper proposes a novel augmented sample selection framework for the prediction of anticancer peptides (ACPs-ASSF). First, the prediction model is trained using raw data. Then, the augmented samples generated using the data augmentation technique are fed into the trained model to compute pseudo-labels and estimate the uncertainty of the model prediction. Finally, samples with low uncertainty, high confidence, and pseudo-labels consistent with the original labels are selected and incorporated into the training set to retrain the model. The evaluation results for the ACP240 and ACP740 datasets show that ACPs-ASSF achieved accuracy improvements of up to 5.41% and 5.68%, respectively, compared to the traditional data augmentation method.
Collapse
Affiliation(s)
- Huawei Tao
- Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China; (H.T.); (S.S.); (H.F.); (C.Z.)
- Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China
| | - Shuai Shan
- Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China; (H.T.); (S.S.); (H.F.); (C.Z.)
- Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China
| | - Hongliang Fu
- Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China; (H.T.); (S.S.); (H.F.); (C.Z.)
- Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China
| | - Chunhua Zhu
- Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China; (H.T.); (S.S.); (H.F.); (C.Z.)
- Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China
| | - Boye Liu
- College of Food Science and Engineering, Henan University of Technology, Zhengzhou 450001, China
| |
Collapse
|