1
|
Feng X, Xin R, Wu J, Zheng J, Wang C, Yu C. AutoFE-Pointer: Auto-weighted feature extractor based on pointer network for DNA methylation prediction. Int J Biol Macromol 2025; 311:143668. [PMID: 40339839 DOI: 10.1016/j.ijbiomac.2025.143668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2024] [Revised: 04/20/2025] [Accepted: 04/28/2025] [Indexed: 05/10/2025]
Abstract
DNA methylation is a critical epigenetic modification that plays a central role in gene regulation, cellular differentiation, and the development of various diseases, including cancers. Aberrant methylation patterns have emerged as both biomarkers and mechanistic drivers in pathogenesis, underscoring the urgent need for precise and efficient predictive tools. Although some deep learning techniques have advanced methylation prediction, most existing models are trained independently on single-species datasets. This species-specific approach limits efficiency because each model can only handle one dataset at a time, often at the expense of predictive performance. Additionally, the state-of-the-art deep learning models tend to have enormous parameter counts and computational overhead, making them impractical for integration into local offline software applications. To overcome these challenges, we propose AutoFE-Pointer, a lightweight and novel framework that harnesses an improved softened pointer network to dynamically extract and weight features from diverse DNA sequences. AutoFE-Pointer is designed to simultaneously process 17 different benchmark datasets spanning multiple species, achieving superior performance compared to models that are trained individually on single-species data. In doing so, it not only offers state-of-the-art predictive accuracy and robust cross-species generalization but also significantly reduces computational demands, facilitating its deployment in local offline environments. This breakthrough represents a significant advancement in the field of epigenetic modeling and computational biology.
Collapse
Affiliation(s)
- Xin Feng
- School of Science, Jilin Institute of Chemical Technology, Jilin 130000, PR China; State Key Laboratory of Inorganic Synthesis and Preparative Chemistry, College of Chemistry, Jilin University, Changchun 130012, PR China
| | - Ruihao Xin
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin 130000, PR China; College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, PR China
| | - Jiezhang Wu
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin 130000, PR China
| | - Jiaxin Zheng
- College of Computer Science and Technology, Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, PR China; Hunan Bojii Life Technology Co., Ltd., Changsha, Hunan Province 410000, PR China
| | - Chenyan Wang
- Civil Aviation General Hospital, Beijing 100123, PR China
| | - Cuinan Yu
- School of Engineering, Westlake University, Hangzhou, Zhejiang 310030, PR China; Hunan Bojii Life Technology Co., Ltd., Changsha, Hunan Province 410000, PR China.
| |
Collapse
|
2
|
Tan JZE, Wee J, Gong X, Xia K. Topology-Enhanced Machine Learning Model (Top-ML) for Anticancer Peptide Prediction. J Chem Inf Model 2025; 65:4232-4242. [PMID: 40229641 DOI: 10.1021/acs.jcim.5c00476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
Recently, therapeutic peptides have demonstrated great promise for cancer treatment. To explore powerful anticancer peptides, artificial intelligence (AI)-based approaches have been developed to systematically screen potential candidates. However, the lack of efficient featurization of peptides has become a bottleneck for these machine-learning models. In this paper, we propose a topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction. Our Top-ML employs peptide topological features derived from its sequence "connection" information characterized by spectral descriptors. Our Top-ML model, employing an Extra-Trees classifier, has been validated on the AntiCP 2.0 and mACPpred 2.0 benchmark data sets, achieving state-of-the-art performance or results comparable to existing deep learning models, while providing greater interpretability. Our results highlight the potential of leveraging novel topology-based featurization to accelerate the identification of anticancer peptides.
Collapse
Affiliation(s)
- Joshua Zhi En Tan
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - JunJie Wee
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Xue Gong
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| |
Collapse
|
3
|
Abbas Z, Kim S, Lee N, Kazmi SAW, Lee SW. A robust ensemble framework for anticancer peptide classification using multi-model voting approach. Comput Biol Med 2025; 188:109750. [PMID: 40032410 DOI: 10.1016/j.compbiomed.2025.109750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 01/14/2025] [Accepted: 01/22/2025] [Indexed: 03/05/2025]
Abstract
Anticancer peptides (ACPs) hold great potential for cancer therapeutics, yet accurately identifying them remains a challenging task due to the complexity of peptide sequences and their interactions with biological systems. In this study, we propose a novel machine learning-based framework for ACP classification, integrating multiple feature sets, including sequence composition, physicochemical properties, and embedding features derived from pre-trained language models. We evaluate the performance of various classifiers on benchmark datasets and compare our model against state-of-the-art methods. The results demonstrate that our model outperforms existing methods such as UniDL4BioPep, ACPred-Fuse, and iACP with an accuracy of 75.58%, an AUC of 0.8272, and an MCC of 0.5119. Our approach provides a more balanced sensitivity of 0.7384 and specificity of 0.773, ensuring robust identification of both ACPs and non-ACPs. These findings suggest that incorporating diverse feature sets can significantly enhance ACP classification, potentially facilitating the discovery of novel anticancer peptides for therapeutic applications.
Collapse
Affiliation(s)
- Zeeshan Abbas
- Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea; Department of Artificial Intelligence, Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - Sunyeup Kim
- Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea
| | - Nangkyeong Lee
- Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea
| | | | - Seung Won Lee
- Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea; Department of Artificial Intelligence, Sungkyunkwan University, Suwon 16419, Republic of Korea; Department of Metabiohealth, Sungkyunkwan University, Suwon 16419, Republic of Korea; Personalized Cancer Immunotherapy Research Center, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea.
| |
Collapse
|
4
|
Asim MN, Asif T, Mehmood F, Dengel A. Peptide classification landscape: An in-depth systematic literature review on peptide types, databases, datasets, predictors architectures and performance. Comput Biol Med 2025; 188:109821. [PMID: 39987697 DOI: 10.1016/j.compbiomed.2025.109821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 02/03/2025] [Accepted: 02/05/2025] [Indexed: 02/25/2025]
Abstract
Peptides are gaining significant attention in diverse fields such as the pharmaceutical market has seen a steady rise in peptide-based therapeutics over the past six decades. Peptides have been utilized in the development of distinct applications including inhibitors of SARS-COV-2 and treatments for conditions like cancer and diabetes. Distinct types of peptides possess unique characteristics, and development of peptide-specific applications require the discrimination of one peptide type from others. To the best of our knowledge, approximately 230 Artificial Intelligence (AI) driven applications have been developed for 22 distinct types of peptides, yet there remains significant room for development of new predictors. A Comprehensive review addresses the critical gap by providing a consolidated platform for the development of AI-driven peptide classification applications. This paper offers several key contributions, including presenting the biological foundations of 22 unique peptide types and categorizes them into four main classes: Regulatory, Therapeutic, Nutritional, and Delivery Peptides. It offers an in-depth overview of 47 databases that have been used to develop peptide classification benchmark datasets. It summarizes details of 288 benchmark datasets that are used in development of diverse types AI-driven peptide classification applications. It provides a detailed summary of 197 sequence representation learning methods and 94 classifiers that have been used to develop 230 distinct AI-driven peptide classification applications. Across 22 distinct types peptide classification tasks related to 288 benchmark datasets, it demonstrates performance values of 230 AI-driven peptide classification applications. It summarizes experimental settings and various evaluation measures that have been employed to assess the performance of AI-driven peptide classification applications. The primary focus of this manuscript is to consolidate scattered information into a single comprehensive platform. This resource will greatly assist researchers who are interested in developing new AI-driven peptide classification applications.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany; Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany.
| | - Tayyaba Asif
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Faiza Mehmood
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; Institute of Data Sciences, University of Engineering and Technology, Lahore, Pakistan
| | - Andreas Dengel
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany; Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
| |
Collapse
|
5
|
Wang S, Ma B. Anti-Cancer Peptides Identification and Activity Type Classification With Protein Sequence Pre-Training. IEEE J Biomed Health Inform 2025; 29:1692-1701. [PMID: 40048353 DOI: 10.1109/jbhi.2024.3358632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2025]
Abstract
Cancer remains a significant global health challenge, responsible for millions of deaths annually. Addressing this issue necessitates the discovery of novel anti-cancer drugs. Anti-cancer peptides (ACPs), with their unique ability to selectively target cancer cells, offer new hope in discovering low side-effect anti-cancer drugs. However, the process of discovering novel ACPs is both time-consuming and costly. Therefore, there is an urgent need for a computational method that can predict whether a given peptide is an ACP and classify its specific functional types. In this paper, we introduce DUO-ACP, a model serving dual roles in ACP prediction: identification and functional type classification. DUO-ACP employs two embedding modules to acquire knowledge about global protein features and local ACP characteristics, complemented by a prediction module. When assessed on two publicly available datasets for each task, DUO-ACP surpasses all existing methods, achieving outstanding results: an ACP identification accuracy of 89.5% and a Macro-averaged AUC of 88.6% in ACP functional type classification. We further interpret the contribution of each part of our model, including the two types of embeddings as well as ensemble learning. On a new curated dataset, the prediction results of DUO-ACP closely match existing literature, highlighting DUO-ACP's generalization capabilities on previously unseen data and displaying the potential capability of discovering novel ACP.
Collapse
|
6
|
Hou D, Zhou H, Tang Y, Liu Z, Su L, Guo J, Pathak JL, Wu L. Dynamic Visualization of Computer-Aided Peptide Design for Cancer Therapeutics. Drug Des Devel Ther 2025; 19:1043-1065. [PMID: 39974609 PMCID: PMC11837852 DOI: 10.2147/dddt.s497126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2024] [Accepted: 01/20/2025] [Indexed: 02/21/2025] Open
Abstract
Purpose Cancer stands as a significant global public health concern, with traditional therapies potentially yielding severe side effects. Peptide-based cancer therapy is increasingly employed for diseases like cancer due to its advantages of excellent targeting, biocompatibility, and convenient synthesis. With advancements in computer technology and bioinformatics, rational design strategies based on computer technology have been employed to develop more cost-effective and potent anticancer peptides (ACPs). This study aims to explore the current status, hotspots, and future trends in the field of computer-aided design of peptides for cancer treatment through a bibliometric analysis. Methods A total of 1547 relevant publications published from 2006 to 2024 were collected from the Web of Science Core Collection. Bibliometric analysis was conducted using tools like CiteSpace, VOSviewer, Bibliometrix, Origin, and an online bibliometric platform. Results The research in this field has shown a steady growth trend, with the United States and China making the most significant contributions. Currently, ACP research mainly focuses on cell-penetrating peptides related to drug delivery, which are expected to become future research hotspots. Beyond that, peptide vaccines associated with immunotherapy are also worthy of attention. In addition, molecular dynamics simulation and molecular docking are currently popular research methods. At the same time, deep learning is the emerging keyword, indicating its potential for a more significant impact on future peptide design. Conclusion Deep learning technology represents emerging research hotspots with immense potential and promising prospects. As cutting-edge research directions, cellularly penetrating peptides and polypeptide immunotherapy are expected to achieve breakthroughs in cancer treatment. This study provides valuable insights into the computer-aided design of peptides in cancer therapy, contributing significantly to advancing the in-depth research and applications in this area.
Collapse
Affiliation(s)
- Dan Hou
- Department of Basic Oral Medicine, School and Hospital of Stomatology, Guangdong Engineering Research Center of Oral Restoration and Reconstruction, Guangzhou Medical University, Guangzhou, Guangdong, 510182, People’s Republic of China
- Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, Guangdong, 510182, People’s Republic of China
- Department of Oral and Maxillofacial Surgery/Oral Pathology, Amsterdam UMC/VUmc and Academic Centre for Dentistry Amsterdam (ACTA), Vrije Universiteit Amsterdam, Amsterdam Movement Science, Amsterdam, 1081 hZ, the Netherlands
| | - Haobin Zhou
- Department of Basic Oral Medicine, School and Hospital of Stomatology, Guangdong Engineering Research Center of Oral Restoration and Reconstruction, Guangzhou Medical University, Guangzhou, Guangdong, 510182, People’s Republic of China
- Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, Guangdong, 510182, People’s Republic of China
| | - Yuting Tang
- Department of Basic Oral Medicine, School and Hospital of Stomatology, Guangdong Engineering Research Center of Oral Restoration and Reconstruction, Guangzhou Medical University, Guangzhou, Guangdong, 510182, People’s Republic of China
- Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, Guangdong, 510182, People’s Republic of China
| | - Ziyuan Liu
- Department of Basic Oral Medicine, School and Hospital of Stomatology, Guangdong Engineering Research Center of Oral Restoration and Reconstruction, Guangzhou Medical University, Guangzhou, Guangdong, 510182, People’s Republic of China
- Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, Guangdong, 510182, People’s Republic of China
| | - Lin Su
- Department of Basic Oral Medicine, School and Hospital of Stomatology, Guangdong Engineering Research Center of Oral Restoration and Reconstruction, Guangzhou Medical University, Guangzhou, Guangdong, 510182, People’s Republic of China
- Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, Guangdong, 510182, People’s Republic of China
| | - Junkai Guo
- Department of Basic Oral Medicine, School and Hospital of Stomatology, Guangdong Engineering Research Center of Oral Restoration and Reconstruction, Guangzhou Medical University, Guangzhou, Guangdong, 510182, People’s Republic of China
- Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, Guangdong, 510182, People’s Republic of China
| | - Janak Lal Pathak
- Department of Basic Oral Medicine, School and Hospital of Stomatology, Guangdong Engineering Research Center of Oral Restoration and Reconstruction, Guangzhou Medical University, Guangzhou, Guangdong, 510182, People’s Republic of China
- Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, Guangdong, 510182, People’s Republic of China
| | - Lihong Wu
- Department of Basic Oral Medicine, School and Hospital of Stomatology, Guangdong Engineering Research Center of Oral Restoration and Reconstruction, Guangzhou Medical University, Guangzhou, Guangdong, 510182, People’s Republic of China
- Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, Guangdong, 510182, People’s Republic of China
| |
Collapse
|
7
|
Shoombuatong W, Schaduangrat N, Homdee N, Ahmed S, Chumnanpuen P. Advancing the accuracy of tyrosinase inhibitory peptides prediction via a multiview feature fusion strategy. Sci Rep 2025; 15:4762. [PMID: 39922825 PMCID: PMC11807091 DOI: 10.1038/s41598-024-81807-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 11/29/2024] [Indexed: 02/10/2025] Open
Abstract
Tyrosinase plays a crucial role as an enzyme in the production of melanin, which is the pigment accountable for determining the color of the hair, eyes, and skin. Tyrosinase inhibitory peptides (TIPs), mainly designed to regulate the activity of the enzyme tyrosinase, are of interest in various domains, including cosmetics, dermatology, and pharmaceuticals, due to their potential applications in controlling skin pigmentation. To date, a few machine learning-based models have been proposed for predicting TIPs, but their predictive performance remains unsatisfactory. In this study, we propose an innovative computational approach, named TIPred-MVFF, to accurately predict TIPs using only sequence information. Firstly, we established an up-to-date and high-quality dataset by collecting samples from various sources. Secondly, we applied a multi-view feature fusion (MVFF) strategy to extract and explore probability and category information embedded in TIPs, employing several machine learning (ML) algorithms coupled with different commonly used sequence-based feature encodings. Then, we employed resampling approaches to address the class imbalance issue. Finally, to maximize the utility of each feature, we fused probability-based and sequence-based features, generating more informative feature that were used to develop the final prediction model. Based on the independent test, experimental results showed that TIPred-MVFF outperformed several conventional ML classifiers and existing methods in terms of prediction accuracy and robustness, achieving an accuracy of 0.937 and a Matthew's correlation coefficient of 0.847. This new computational approach is anticipated to aid community-wide efforts in rapidly and cost-effectively discovering novel peptides with strong tyrosinase inhibitory activities.
Collapse
Affiliation(s)
- Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nutta Homdee
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Saeed Ahmed
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
- Department of Computer Science, University of Swabi, Swabi, 23561, Pakistan
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand.
- Kasetsart University International College (KUIC), Kasetsart University, Bangkok, 10900, Thailand.
| |
Collapse
|
8
|
Charoenkwan P, Chumnanpuen P, Schaduangrat N, Shoombuatong W. Deepstack-ACE: A deep stacking-based ensemble learning framework for the accelerated discovery of ACE inhibitory peptides. Methods 2025; 234:131-140. [PMID: 39709069 DOI: 10.1016/j.ymeth.2024.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 11/27/2024] [Accepted: 12/07/2024] [Indexed: 12/23/2024] Open
Abstract
Identifying angiotensin-I-converting enzyme (ACE) inhibitory peptides accurately is crucial for understanding the primary factor that regulates the renin-angiotensin system and for providing guidance in developing new potential drugs. Given the inherent experimental complexities, using computational methods for in silico peptide identification could be indispensable for facilitating the high-throughput characterization of ACE inhibitory peptides. In this paper, we propose a novel deep stacking-based ensemble learning framework, termed Deepstack-ACE, to precisely identify ACE inhibitory peptides. In Deepstack-ACE, the input peptide sequences are fed into the word2vec embedding technique to generate sequence representations. Then, these representations were employed to train five powerful deep learning methods, including long short-term memory, convolutional neural network, multi-layer perceptron, gated recurrent unit network, and recurrent neural network, for the construction of base-classifiers. Finally, the optimized stacked model was constructed based on the best combination of selected base-classifiers. Benchmarking experiments showed that Deepstack-ACE attained a more accurate and robust identification of ACE inhibitory peptides compared to its base-classifiers and several conventional machine learning classifiers. Remarkably, in the independent test, our proposed model significantly outperformed the current state-of-the-art methods, with a balanced accuracy of 0.916, sensitivity of 0.911, and Matthews correlation coefficient scores of 0.826. Moreover, we developed a user-friendly web server for Deepstack-ACE, which is freely available at https://pmlabqsar.pythonanywhere.com/Deepstack-ACE. We anticipate that our proposed Deepstack-ACE model can provide a faster and reasonably accurate identification of ACE inhibitory peptides.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; Kasetsart University International College (KUIC), Kasetsart University, Bangkok 10900, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
9
|
Yue J, Li T, Xu J, Chen Z, Li Y, Liang S, Liu Z, Wang Y. Discovery of anticancer peptides from natural and generated sequences using deep learning. Int J Biol Macromol 2025; 290:138880. [PMID: 39706427 DOI: 10.1016/j.ijbiomac.2024.138880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 12/10/2024] [Accepted: 12/16/2024] [Indexed: 12/23/2024]
Abstract
Anticancer peptides (ACPs) demonstrate significant potential in clinical cancer treatment due to their ability to selectively target and kill cancer cells. In recent years, numerous artificial intelligence (AI) algorithms have been developed. However, many predictive methods lack sufficient wet lab validation, thereby constraining the progress of models and impeding the discovery of novel ACPs. This study proposes a comprehensive research strategy by introducing CNBT-ACPred, an ACP prediction model based on a three-channel deep learning architecture, supported by extensive in vitro and in vivo experiments. CNBT-ACPred achieved an accuracy of 0.9554 and a Matthews Correlation Coefficient (MCC) of 0.8602. Compared to existing excellent models, CNBT-ACPred increased accuracy by at least 5 % and improved MCC by 15 %. Predictions were conducted on over 3.8 million sequences from Uniprot, along with 100,000 sequences generated by a deep generative model, ultimately identifying 37 out of 41 candidate peptides from >30 species that exhibited effective in vitro tumor inhibitory activity. Among these, tPep14 demonstrated significant anticancer effects in two mouse xenograft models without detectable toxicity. Finally, the study revealed correlations between the amino acid composition, structure, and function of the identified ACP candidates.
Collapse
Affiliation(s)
- Jianda Yue
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Tingting Li
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Jiawei Xu
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Zihui Chen
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China
| | - Yaqi Li
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Songping Liang
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Zhonghua Liu
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Ying Wang
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| |
Collapse
|
10
|
Huang G, Cao Y, Dai Q, Chen W. ACP-DPE: A Dual-Channel Deep Learning Model for Anticancer Peptide Prediction. IET Syst Biol 2025; 19:e70010. [PMID: 40119615 PMCID: PMC11928748 DOI: 10.1049/syb2.70010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2024] [Revised: 02/13/2025] [Accepted: 02/20/2025] [Indexed: 03/24/2025] Open
Abstract
Cancer is a serious and complex disease caused by uncontrolled cell growth and is becoming one of the leading causes of death worldwide. Anticancer peptides (ACPs), as a bioactive peptide with lower toxicity, emerge as a promising means of effectively treating cancer. Identifying ACPs is challenging due to the limitation of experimental conditions. To address this, we proposed a dual-channel-based deep learning method, termed ACP-DPE, for ACP prediction. The ACP-DPE consisted of two parallel channels: one was an embedding layer followed by the bi-directional gated recurrent unit (Bi-GRU) module, and the other was an adaptive embedding layer followed by the dilated convolution module. The Bi-GRU module captured the peptide sequence dependencies, whereas the dilated convolution module characterised the local relationship of amino acids. Experimental results show that ACP-DPE achieves an accuracy of 82.81% and a sensitivity of 86.63%, surpassing the state-of-the-art method by 3.86% and 5.1%, respectively. These findings demonstrate the effectiveness of ACP-DPE for ACP prediction and highlight its potential as a valuable tool in cancer treatment research.
Collapse
Affiliation(s)
- Guohua Huang
- College of Information Science and EngineeringShaoyang UniversityShaoyangChina
- Hunan Provincial Key Laboratory of Finance & Economics Big Data Science and TechnologyHunan University of Finance and EconomicsChangshaChina
| | - Yujie Cao
- College of Information Science and EngineeringShaoyang UniversityShaoyangChina
| | - Qi Dai
- College of Life Science and MedicineZhejiang Sci‐Tech UniversityHangzhouChina
| | - Weihong Chen
- Hunan Provincial Key Laboratory of Finance & Economics Big Data Science and TechnologyHunan University of Finance and EconomicsChangshaChina
| |
Collapse
|
11
|
Lin Z, Assaraf YG, Kwok HF. Peptides for microbe-induced cancers: latest therapeutic strategies and their advanced technologies. Cancer Metastasis Rev 2024; 43:1315-1336. [PMID: 39008152 DOI: 10.1007/s10555-024-10197-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Accepted: 06/14/2024] [Indexed: 07/16/2024]
Abstract
Cancer is a significant global health concern associated with multiple distinct factors, including microbial and viral infections. Numerous studies have elucidated the role of microorganisms, such as Helicobacter pylori (H. pylori), as well as viruses for example human papillomavirus (HPV), hepatitis B virus (HBV), and hepatitis C virus (HCV), in the development of human malignancies. Substantial attention has been focused on the treatment of these microorganism- and virus-associated cancers, with promising outcomes observed in studies employing peptide-based therapies. The current paper provides an overview of microbe- and virus-induced cancers and their underlying molecular mechanisms. We discuss an assortment of peptide-based therapies which are currently being developed, including tumor-targeting peptides and microbial/viral peptide-based vaccines. We describe the major technological advancements that have been made in the design, screening, and delivery of peptides as anticancer agents. The primary focus of the current review is to provide insight into the latest research and development in this field and to provide a realistic glimpse into the future of peptide-based therapies for microbe- and virus-induced neoplasms.
Collapse
Affiliation(s)
- Ziqi Lin
- Cancer Centre, Faculty of Health Sciences, University of Macau, Avenida da Universidade, Taipa, Macau SAR
- Department of Biomedical Sciences, Faculty of Health Sciences, University of Macau, Avenida da Universidade, Taipa, Macau SAR
| | - Yehuda G Assaraf
- The Fred Wyszkowski Cancer Research Lab, Faculty of Biology, Technion-Israel Instituteof Technology, Haifa, 3200003, Israel
| | - Hang Fai Kwok
- Cancer Centre, Faculty of Health Sciences, University of Macau, Avenida da Universidade, Taipa, Macau SAR.
- Department of Biomedical Sciences, Faculty of Health Sciences, University of Macau, Avenida da Universidade, Taipa, Macau SAR.
- MoE Frontiers Science Center for Precision Oncology, University of Macau, Avenida de Universidade, Taipa, Macau SAR.
| |
Collapse
|
12
|
Wang X, Zhang Z, Liu C. iACP-DFSRA: Identification of Anticancer Peptides Based on a Dual-channel Fusion Strategy of ResCNN and Attention. J Mol Biol 2024; 436:168810. [PMID: 39362624 DOI: 10.1016/j.jmb.2024.168810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 09/10/2024] [Accepted: 09/27/2024] [Indexed: 10/05/2024]
Abstract
Anticancer peptides (ACPs) have been widely applied in the treatment of cancer owing to good safety, rational side effects, and high selectivity. However, the number of ACPs that have been experimentally validated is limited as identification of ACPs is extremely expensive. Hence, accurate and cost-effective identification methods for ACPs are urgently needed. In this work, we proposed a deep learning-based model, named iACP-DFSRA, for ACPs identification. Specifically, we adopted two kinds of sequence embedding technologies, ProtBert_BFD pre-training language model and handcrafted features to encode protein sequences. Then, the LightGBM was used for feature selection, and the selected features were input into ResCNN and Attention mechanism, respectively, to extract local and global features. Finally, the concatenate features were deeply fused by using the Attention mechanism to allow key features to be paid more attention to by the model and make predictions by fully connected layer. The results of 10-fold cross-validation demonstrated that the iACP-DFSRA model delivered improved results in most metrics with Sp of 94.15%, Sn of 95.32%, Acc of 94.74% and MCC of 89.48% compared to the latest AACFlow model. Indeed, the iACP-DFSRA model is the only model with Acc > 90% and MCC > 80% on this independent test dataset. Furthermore, we have further demonstrated the superiority of our model on additional datasets. In addition, t-SNE and SHAP interpretation analysis demonstrated that it is crucial to use two channels for feature extraction and use the Attention mechanism for deep fusion, which helps the iACP-DFSRA to predict ACPs more effectively.
Collapse
Affiliation(s)
- Xin Wang
- School of Science, Dalian Maritime University, Dalian 116026, China.
| | - Zimeng Zhang
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Chang Liu
- School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
13
|
Kilimci ZH, Yalcin M. ACP-ESM: A novel framework for classification of anticancer peptides using protein-oriented transformer approach. Artif Intell Med 2024; 156:102951. [PMID: 39173421 DOI: 10.1016/j.artmed.2024.102951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 07/19/2024] [Accepted: 08/13/2024] [Indexed: 08/24/2024]
Abstract
Anticancer peptides (ACPs) are a class of molecules that have gained significant attention in the field of cancer research and therapy. ACPs are short chains of amino acids, the building blocks of proteins, and they possess the ability to selectively target and kill cancer cells. One of the key advantages of ACPs is their ability to selectively target cancer cells while sparing healthy cells to a greater extent. This selectivity is often attributed to differences in the surface properties of cancer cells compared to normal cells. That is why ACPs are being investigated as potential candidates for cancer therapy. ACPs may be used alone or in combination with other treatment modalities like chemotherapy and radiation therapy. While ACPs hold promise as a novel approach to cancer treatment, there are challenges to overcome, including optimizing their stability, improving selectivity, and enhancing their delivery to cancer cells, continuous increasing in number of peptide sequences, developing a reliable and precise prediction model. In this work, we propose an efficient transformer-based framework to identify ACPs for by performing accurate a reliable and precise prediction model. For this purpose, four different transformer models, namely ESM, ProtBERT, BioBERT, and SciBERT are employed to detect ACPs from amino acid sequences. To demonstrate the contribution of the proposed framework, extensive experiments are carried on widely-used datasets in the literature, two versions of AntiCp2, cACP-DeepGram, ACP-740. Experiment results show the usage of proposed model enhances classification accuracy when compared to the literature studies. The proposed framework, ESM, exhibits 96.45% of accuracy for AntiCp2 dataset, 97.66% of accuracy for cACP-DeepGram dataset, and 88.51% of accuracy for ACP-740 dataset, thence determining new state-of-the-art. The code of proposed framework is publicly available at github (https://github.com/mstf-yalcin/acp-esm).
Collapse
Affiliation(s)
- Zeynep Hilal Kilimci
- Department of Information Systems Engineering, Kocaeli University, 41001, Kocaeli, Turkey.
| | - Mustafa Yalcin
- Department of Information Systems Engineering, Kocaeli University, 41001, Kocaeli, Turkey.
| |
Collapse
|
14
|
Wang X, Wang S. ACP-PDAFF: Pretrained model and dual-channel attentional feature fusion for anticancer peptides prediction. Comput Biol Chem 2024; 112:108141. [PMID: 38996756 DOI: 10.1016/j.compbiolchem.2024.108141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 05/26/2024] [Accepted: 06/28/2024] [Indexed: 07/14/2024]
Abstract
Anticancer peptides(ACPs) have attracted significant interest as a novel method of treating cancer due to their ability to selectively kill cancer cells without damaging normal cells. Many artificial intelligence-based methods have demonstrated impressive performance in predicting ACPs. Nevertheless, the limitations of existing methods in feature engineering include handcrafted features driven by prior knowledge, insufficient feature extraction, and inefficient feature fusion. In this study, we propose a model based on a pretrained model, and dual-channel attentional feature fusion(DAFF), called ACP-PDAFF. Firstly, to reduce the heavy dependence on expert knowledge-based handcrafted features, binary profile features (BPF) and physicochemical properties features(PCPF) are used as inputs to the transformer model. Secondly, aimed at learning more diverse feature informations of ACPs, a pretrained model ProtBert is utilized. Thirdly, for better fusion of different feature channels, DAFF is employed. Finally, to evaluate the performance of the model, we compare it with other methods on five benchmark datasets, including ACP-Mixed-80 dataset, Main and Alternate datasets of AntiCP 2.0, LEE and Independet dataset, and ACPred-Fuse dataset. And the accuracies obtained by ACP-PDAFF are 0.86, 0.80, 0.94, 0.97 and 0.95 on five datasets, respectively, higher than existing methods by 1% to 12%. Therefore, by learning rich feature informations and effectively fusing different feature channels, ACD-PDAFF achieves outstanding performance. Our code and the datasets are available at https://github.com/wongsing/ACP-PDAFF.
Collapse
Affiliation(s)
- Xinyi Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China.
| |
Collapse
|
15
|
Sangaraju VK, Pham NT, Wei L, Yu X, Manavalan B. mACPpred 2.0: Stacked Deep Learning for Anticancer Peptide Prediction with Integrated Spatial and Probabilistic Feature Representations. J Mol Biol 2024; 436:168687. [PMID: 39237191 DOI: 10.1016/j.jmb.2024.168687] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 05/28/2024] [Accepted: 06/20/2024] [Indexed: 09/07/2024]
Abstract
Anticancer peptides (ACPs), naturally occurring molecules with remarkable potential to target and kill cancer cells. However, identifying ACPs based solely from their primary amino acid sequences remains a major hurdle in immunoinformatics. In the past, several web-based machine learning (ML) tools have been proposed to assist researchers in identifying potential ACPs for further testing. Notably, our meta-approach method, mACPpred, introduced in 2019, has significantly advanced the field of ACP research. Given the exponential growth in the number of characterized ACPs, there is now a pressing need to create an updated version of mACPpred. To develop mACPpred 2.0, we constructed an up-to-date benchmarking dataset by integrating all publicly available ACP datasets. We employed a large-scale of feature descriptors, encompassing both conventional feature descriptors and advanced pre-trained natural language processing (NLP)-based embeddings. We evaluated their ability to discriminate between ACPs and non-ACPs using eleven different classifiers. Subsequently, we employed a stacked deep learning (SDL) approach, incorporating 1D convolutional neural network (1D CNN) blocks and hybrid features. These features included the top seven performing NLP-based features and 90 probabilistic features, allowing us to identify hidden patterns within these diverse features and improve the accuracy of our ACP prediction model. This is the first study to integrate spatial and probabilistic feature representations for predicting ACPs. Rigorous cross-validation and independent tests conclusively demonstrated that mACPpred 2.0 not only surpassed its predecessor (mACPpred) but also outperformed the existing state-of-the-art predictors, highlighting the importance of advanced feature representation capabilities attained through SDL. To facilitate widespread use and accessibility, we have developed a user-friendly for mACPpred 2.0, available at https://balalab-skku.org/mACPpred2/.
Collapse
Affiliation(s)
- Vinoth Kumar Sangaraju
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea
| | - Nhat Truong Pham
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea
| | - Leyi Wei
- Faculty of Applied Sciences, Macao Polytechnic University, Macau
| | - Xue Yu
- Beidahuang Industry Group General Hospital, 150001 Harbin, China.
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea.
| |
Collapse
|
16
|
Yue J, Xu J, Li T, Li Y, Chen Z, Liang S, Liu Z, Wang Y. Discovery of potential antidiabetic peptides using deep learning. Comput Biol Med 2024; 180:109013. [PMID: 39137670 DOI: 10.1016/j.compbiomed.2024.109013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2024] [Revised: 07/01/2024] [Accepted: 08/08/2024] [Indexed: 08/15/2024]
Abstract
Antidiabetic peptides (ADPs), peptides with potential antidiabetic activity, hold significant importance in the treatment and control of diabetes. Despite their therapeutic potential, the discovery and prediction of ADPs remain challenging due to limited data, the complex nature of peptide functions, and the expensive and time-consuming nature of traditional wet lab experiments. This study aims to address these challenges by exploring methods for the discovery and prediction of ADPs using advanced deep learning techniques. Specifically, we developed two models: a single-channel CNN and a three-channel neural network (CNN + RNN + Bi-LSTM). ADPs were primarily gathered from the BioDADPep database, alongside thousands of non-ADPs sourced from anticancer, antibacterial, and antiviral peptide datasets. Subsequently, data preprocessing was performed with the evolutionary scale model (ESM-2), followed by model training and evaluation through 10-fold cross-validation. Furthermore, this work collected a series of newly published ADPs as an independent test set through literature review, and found that the CNN model achieved the highest accuracy (90.48 %) in predicting the independent test set, surpassing existing ADP prediction tools. Finally, the application of the model was considered. SeqGAN was used to generate new candidate ADPs, followed by screening with the constructed CNN model. Selected peptides were then evaluated using physicochemical property prediction and structural forecasts for pharmaceutical potential. In summary, this study not only established robust ADP prediction models but also employed these models to screen a batch of potential ADPs, addressing a critical need in the field of peptide-based antidiabetic research.
Collapse
Affiliation(s)
- Jianda Yue
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, 410081, China; Peptide and Small Molecule Drug R&D Plateform, Furong Laboratory, Hunan Normal University, Changsha, 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, 410081, China
| | - Jiawei Xu
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, 410081, China; Peptide and Small Molecule Drug R&D Plateform, Furong Laboratory, Hunan Normal University, Changsha, 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, 410081, China
| | - Tingting Li
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, 410081, China; Peptide and Small Molecule Drug R&D Plateform, Furong Laboratory, Hunan Normal University, Changsha, 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, 410081, China
| | - Yaqi Li
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, 410081, China; Peptide and Small Molecule Drug R&D Plateform, Furong Laboratory, Hunan Normal University, Changsha, 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, 410081, China
| | - Zihui Chen
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, 410081, China; Peptide and Small Molecule Drug R&D Plateform, Furong Laboratory, Hunan Normal University, Changsha, 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, 410081, China
| | - Songping Liang
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, 410081, China; Peptide and Small Molecule Drug R&D Plateform, Furong Laboratory, Hunan Normal University, Changsha, 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, 410081, China
| | - Zhonghua Liu
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, 410081, China; Peptide and Small Molecule Drug R&D Plateform, Furong Laboratory, Hunan Normal University, Changsha, 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, 410081, China.
| | - Ying Wang
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha, 410081, China; Peptide and Small Molecule Drug R&D Plateform, Furong Laboratory, Hunan Normal University, Changsha, 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha, 410081, China.
| |
Collapse
|
17
|
Cheong HH, Zuo W, Chen J, Un CW, Si YW, Wong KH, Kwok HF, Siu SWI. Identification of Anticancer Peptides from the Genome of Candida albicans: in Silico Screening, in Vitro and in Vivo Validations. J Chem Inf Model 2024; 64:6174-6189. [PMID: 39008832 DOI: 10.1021/acs.jcim.4c00501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
Anticancer peptides (ACPs) are promising future therapeutics, but their experimental discovery remains time-consuming and costly. To accelerate the discovery process, we propose a computational screening workflow to identify, filter, and prioritize peptide sequences based on predicted class probability, antitumor activity, and toxicity. The workflow was applied to identify novel ACPs with potent activity against colorectal cancer from the genome sequences of Candida albicans. As a result, four candidates were identified and validated in the HCT116 colon cancer cell line. Among them, PCa1 and PCa2 emerged as the most potent, displaying IC50 values of 3.75 and 56.06 μM, respectively, and demonstrating a 4-fold selectivity for cancer cells over normal cells. In the colon xenograft nude mice model, the administration of both peptides resulted in substantial inhibition of tumor growth without causing significant adverse effects. In conclusion, this work not only contributes a proven computational workflow for ACP discovery but also introduces two peptides, PCa1 and PCa2, as promising candidates poised for further development as targeted therapies for colon cancer. The method as a web service is available at https://app.cbbio.online/acpep/home and the source code at https://github.com/cartercheong/AcPEP_classification.git.
Collapse
Affiliation(s)
- Hong-Hin Cheong
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Weimin Zuo
- Department of Biomedical Sciences, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
- Cancer Centre, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Jiarui Chen
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Chon-Wai Un
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Yain-Whar Si
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Koon Ho Wong
- Department of Biomedical Sciences, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
- MoE Frontiers Science Center for Precision Oncology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
- Cancer Centre, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Hang Fai Kwok
- Department of Biomedical Sciences, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
- MoE Frontiers Science Center for Precision Oncology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
- Cancer Centre, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Shirley W I Siu
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macau SAR 999078, China
- Institute of Science and Environment, University of Saint Joseph, Estrada Marginal da Ilha Verde 14-17, Macau SAR 999078, China
| |
Collapse
|
18
|
Arif M, Musleh S, Fida H, Alam T. PLMACPred prediction of anticancer peptides based on protein language model and wavelet denoising transformation. Sci Rep 2024; 14:16992. [PMID: 39043738 PMCID: PMC11266708 DOI: 10.1038/s41598-024-67433-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Accepted: 07/11/2024] [Indexed: 07/25/2024] Open
Abstract
Anticancer peptides (ACPs) perform a promising role in discovering anti-cancer drugs. The growing research on ACPs as therapeutic agent is increasing due to its minimal side effects. However, identifying novel ACPs using wet-lab experiments are generally time-consuming, labor-intensive, and expensive. Leveraging computational methods for fast and accurate prediction of ACPs would harness the drug discovery process. Herein, a machine learning-based predictor, called PLMACPred, is developed for identifying ACPs from peptide sequence only. PLMACPred adopted a set of encoding schemes representing evolutionary-property, composition-property, and protein language model (PLM), i.e., evolutionary scale modeling (ESM-2)- and ProtT5-based embedding to encode peptides. Then, two-dimensional (2D) wavelet denoising (WD) was employed to remove the noise from extracted features. Finally, ensemble-based cascade deep forest (CDF) model was developed to identify ACP. PLMACPred model attained superior performance on all three benchmark datasets, namely, ACPmain, ACPAlter, and ACP740 over tenfold cross validation and independent dataset. PLMACPred outperformed the existing models and improved the prediction accuracy by 18.53%, 2.4%, 7.59% on ACPmain, ACPalter, ACP740 dataset, respectively. We showed that embedding from ProtT5 and ESM-2 was capable of capturing better contextual information from the entire sequence than the other encoding schemes for ACP prediction. For the explainability of proposed model, SHAP (SHapley Additive exPlanations) method was used to analyze the feature effect on the ACP prediction. A list of novel sequence motifs was proposed from the ACP sequence using MEME suites. We believe, PLMACPred will support in accelerating the discovery of novel ACPs as well as other activities of microbial peptides.
Collapse
Affiliation(s)
- Muhammad Arif
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Saleh Musleh
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Huma Fida
- Department of Microbiology, Abdul Wali Khan University, Mardan, KPK, Pakistan
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar.
| |
Collapse
|
19
|
Bhattarai S, Tayara H, Chong KT. Advancing Peptide-Based Cancer Therapy with AI: In-Depth Analysis of State-of-the-Art AI Models. J Chem Inf Model 2024; 64:4941-4957. [PMID: 38874445 DOI: 10.1021/acs.jcim.4c00295] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/15/2024]
Abstract
Anticancer peptides (ACPs) play a vital role in selectively targeting and eliminating cancer cells. Evaluating and comparing predictions from various machine learning (ML) and deep learning (DL) techniques is challenging but crucial for anticancer drug research. We conducted a comprehensive analysis of 15 ML and 10 DL models, including the models released after 2022, and found that support vector machines (SVMs) with feature combination and selection significantly enhance overall performance. DL models, especially convolutional neural networks (CNNs) with light gradient boosting machine (LGBM) based feature selection approaches, demonstrate improved characterization. Assessment using a new test data set (ACP10) identifies ACPred, MLACP 2.0, AI4ACP, mACPred, and AntiCP2.0_AAC as successive optimal predictors, showcasing robust performance. Our review underscores current prediction tool limitations and advocates for an omnidirectional ACP prediction framework to propel ongoing research.
Collapse
Affiliation(s)
- Sadik Bhattarai
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju-si, 54896 Jeollabuk-do, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju-si, 54896 Jeollabuk-do, South Korea
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju-si, 54896 Jeollabuk-do, South Korea
- Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju-si, 54896 Jeollabuk-do, South Korea
| |
Collapse
|
20
|
Kang Y, Zhang H, Wang X, Yang Y, Jia Q. MMDB: Multimodal dual-branch model for multi-functional bioactive peptide prediction. Anal Biochem 2024; 690:115491. [PMID: 38460901 DOI: 10.1016/j.ab.2024.115491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2023] [Revised: 01/21/2024] [Accepted: 02/19/2024] [Indexed: 03/11/2024]
Abstract
Bioactive peptides can hinder oxidative processes and microbial spoilage in foodstuffs and play important roles in treating diverse diseases and disorders. While most of the methods focus on single-functional bioactive peptides and have obtained promising prediction performance, it is still a significant challenge to accurately detect complex and diverse functions simultaneously with the quick increase of multi-functional bioactive peptides. In contrast to previous research on multi-functional bioactive peptide prediction based solely on sequence, we propose a novel multimodal dual-branch (MMDB) lightweight deep learning model that designs two different branches to effectively capture the complementary information of peptide sequence and structural properties. Specifically, a multi-scale dilated convolution with Bi-LSTM branch is presented to effectively model the different scales sequence properties of peptides while a multi-layer convolution branch is proposed to capture structural information. To the best of our knowledge, this is the first effective extraction of peptide sequence features using multi-scale dilated convolution without parameter increase. Multimodal features from both branches are integrated via a fully connected layer for multi-label classification. Compared to state-of-the-art methods, our MMDB model exhibits competitive results across metrics, with a 9.1% Coverage increase and 5.3% and 3.5% improvements in Precision and Accuracy, respectively.
Collapse
Affiliation(s)
- Yan Kang
- National Pilot School of Software, Yunnan University, Kunming, 650091, Yunnan, China; Yunnan Key Laboratory of Software Engineering, China
| | - Huadong Zhang
- National Pilot School of Software, Yunnan University, Kunming, 650091, Yunnan, China
| | - Xinchao Wang
- National Pilot School of Software, Yunnan University, Kunming, 650091, Yunnan, China
| | - Yun Yang
- National Pilot School of Software, Yunnan University, Kunming, 650091, Yunnan, China; Yunnan Key Laboratory of Software Engineering, China.
| | - Qi Jia
- School of Information Science, Yunnan University, Kunming, 650091, Yunnan, China
| |
Collapse
|
21
|
Zhang L, Hu X, Xiao K, Kong L. Effective identification and differential analysis of anticancer peptides. Biosystems 2024; 241:105246. [PMID: 38848816 DOI: 10.1016/j.biosystems.2024.105246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 05/27/2024] [Accepted: 06/04/2024] [Indexed: 06/09/2024]
Abstract
Anticancer peptides (ACPs) have recently emerged as promising cancer therapeutics due to their selectivity and lower toxicity. However, the number of experimentally validated ACPs is limited, and identifying ACPs from large-scale sequence data is time-consuming and expensive. Therefore, it is critical to develop and improve upon existing computational models for identifying ACPs. In this study, a computational method named ACP_DA was proposed based on peptide residue composition and physiochemical properties information. To curtail overfitting and reduce computational costs, a sequential forward selection method was utilized to construct the optimal feature groups. Subsequently, the feature vectors were fed into light gradient boosting machine classifier for model construction. It was observed by an independent set test that ACP_DA achieved the highest Matthew's correlation coefficient of 0.63 and accuracy of 0.8129, displaying at least a 2% enhancement compared to state-of-the-art methods. The satisfactory results demonstrate the effectiveness of ACP_DA as a powerful tool for identifying ACPs, with the potential to significantly contribute to the development and optimization of promising therapies. The data and resource codes are available at https://github.com/Zlclab/ACP_DA.
Collapse
Affiliation(s)
- Lichao Zhang
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China; Hebei Innovation Center for Smart Perception and Applied Technology of Agricultural Data, Qinhuangdao, PR China
| | - Xueli Hu
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China
| | - Kang Xiao
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China
| | - Liang Kong
- Hebei Innovation Center for Smart Perception and Applied Technology of Agricultural Data, Qinhuangdao, PR China; School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao, PR China.
| |
Collapse
|
22
|
Ghafoor H, Asim MN, Ibrahim MA, Ahmed S, Dengel A. CAPTURE: Comprehensive anti-cancer peptide predictor with a unique amino acid sequence encoder. Comput Biol Med 2024; 176:108538. [PMID: 38759585 DOI: 10.1016/j.compbiomed.2024.108538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 04/26/2024] [Accepted: 04/28/2024] [Indexed: 05/19/2024]
Abstract
Anticancer peptides (ACPs) key properties including bioactivity, high efficacy, low toxicity, and lack of drug resistance make them ideal candidates for cancer therapies. To deeply explore the potential of ACPs and accelerate development of cancer therapies, although 53 Artificial Intelligence supported computational predictors have been developed for ACPs and non ACPs classification but only one predictor has been developed for ACPs functional types annotations. Moreover, these predictors extract amino acids distribution patterns to transform peptides sequences into statistical vectors that are further fed to classifiers for discriminating peptides sequences and annotating peptides functional classes. Overall, these predictors remain fail in extracting diverse types of amino acids distribution patterns from peptide sequences. The paper in hand presents a unique CARE encoder that transforms peptides sequences into statistical vectors by extracting 4 different types of distribution patterns including correlation, distribution, composition, and transition. Across public benchmark dataset, proposed encoder potential is explored under two different evaluation settings namely; intrinsic and extrinsic. Extrinsic evaluation indicates that 12 different machine learning classifiers achieve superior performance with the proposed encoder as compared to 55 existing encoders. Furthermore, an intrinsic evaluation reveals that, unlike existing encoders, the proposed encoder generates more discriminative clusters for ACPs and non-ACPs classes. Across 8 public benchmark ACPs and non-ACPs classification datasets, proposed encoder and Adaboost classifier based CAPTURE predictor outperforms existing predictors with an average accuracy, recall and MCC score of 1%, 4%, and 2% respectively. In generalizeability evaluation case study, across 7 benchmark anti-microbial peptides classification datasets, CAPTURE surpasses existing predictors by an average AU-ROC of 2%. CAPTURE predictive pipeline along with label powerset method outperforms state-of-the-art ACPs functional types predictor by 5%, 5%, 5%, 6%, and 3% in terms of average accuracy, subset accuracy, precision, recall, and F1 respectively. CAPTURE web application is available at https://sds_genetic_analysis.opendfki.de/CAPTURE.
Collapse
Affiliation(s)
- Hina Ghafoor
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany.
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| |
Collapse
|
23
|
Song H, Lin X, Zhang H, Yin H. ACP-ESM2: The prediction of anticancer peptides based on pre-trained classifier. Comput Biol Chem 2024; 110:108091. [PMID: 38735271 DOI: 10.1016/j.compbiolchem.2024.108091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 04/07/2024] [Accepted: 04/29/2024] [Indexed: 05/14/2024]
Abstract
Anticancer peptides (ACPs) are a type of protein molecule that has anti-cancer activity and can inhibit cancer cell growth and survival. Traditional classification approaches for ACPs are expensive and time-consuming. This paper proposes a pre-trained classifier model, ESM2-GRU, for ACP prediction to make it easier to predict ACPs, gain a better understanding of the structural and functional differences of anti-cancer peptides, and optimize the design for the development of more effective anti-cancer treatment strategies. The model is made up of the ESM2 pre-trained model, a bidirectional GRU recurrent neural network, and a fully connected layer. ACP sequences are first fed into the ESM2 model, which then expands the dimensions before feeding the findings back into the bidirectional GRU recurrent neural network. Finally, the fully connected layer generates the ultimate output. Experimental validation demonstrates that the ESM2-GRU model greatly improves classification performance on the benchmark dataset ACP606, with AUC, ACC, and MCC values of 0.975, 0.852, and 0.738, respectively. This exceptional prediction potential helps to identify specific types of anti-cancer peptides, improving their targeting and selectivity and, therefore, furthering the development of tailored medicine and treatments.
Collapse
Affiliation(s)
- Huijia Song
- School of Information Engineering, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Xiaozhu Lin
- School of Information Engineering, Beijing Institute of Petrochemical Technology, Beijing 102617, China.
| | - Huainian Zhang
- School of Information Engineering, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Huijuan Yin
- School of Information Engineering, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| |
Collapse
|
24
|
Liao YH, Chen SZ, Bin YN, Zhao JP, Feng XL, Zheng CH. UsIL-6: An unbalanced learning strategy for identifying IL-6 inducing peptides by undersampling technique. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 250:108176. [PMID: 38677081 DOI: 10.1016/j.cmpb.2024.108176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 03/26/2024] [Accepted: 04/11/2024] [Indexed: 04/29/2024]
Abstract
BACKGROUND AND OBJECTIVE Interleukin-6 (IL-6) is the critical factor of early warning, monitoring, and prognosis in the inflammatory storm of COVID-19 cases. IL-6 inducing peptides, which can induce cytokine IL-6 production, are very important for the development of diagnosis and immunotherapy. Although the existing methods have some success in predicting IL-6 inducing peptides, there is still room for improvement in the performance of these models in practical application. METHODS In this study, we proposed UsIL-6, a high-performance bioinformatics tool for identifying IL-6 inducing peptides. First, we extracted five groups of physicochemical properties and sequence structural information from IL-6 inducing peptide sequences, and obtained a 636-dimensional feature vector, we also employed NearMiss3 undersampling method and normalization method StandardScaler to process the data. Then, a 40-dimensional optimal feature vector was obtained by Boruta feature selection method. Finally, we combined this feature vector with extreme randomization tree classifier to build the final model UsIL-6. RESULTS The AUC value of UsIL-6 on the independent test dataset was 0.87, and the BACC value was 0.808, which indicated that UsIL-6 had better performance than the existing methods in IL-6 inducing peptide recognition. CONCLUSIONS The performance comparison on independent test dataset confirmed that UsIL-6 could achieve the highest performance, best robustness, and most excellent generalization ability. We hope that UsIL-6 will become a valuable method to identify, annotate and characterize new IL-6 inducing peptides.
Collapse
Affiliation(s)
- Yan-Hong Liao
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China
| | - Shou-Zhi Chen
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China
| | - Yan-Nan Bin
- School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| | - Jian-Ping Zhao
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China.
| | - Xin-Long Feng
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China.
| | - Chun-Hou Zheng
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China; School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| |
Collapse
|
25
|
Chen Z, Wang R, Guo J, Wang X. The role and future prospects of artificial intelligence algorithms in peptide drug development. Biomed Pharmacother 2024; 175:116709. [PMID: 38713945 DOI: 10.1016/j.biopha.2024.116709] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2024] [Revised: 05/01/2024] [Accepted: 05/02/2024] [Indexed: 05/09/2024] Open
Abstract
Peptide medications have been more well-known in recent years due to their many benefits, including low side effects, high biological activity, specificity, effectiveness, and so on. Over 100 peptide medications have been introduced to the market to treat a variety of illnesses. Most of these peptide medications are developed on the basis of endogenous peptides or natural peptides, which frequently required expensive, time-consuming, and extensive tests to confirm. As artificial intelligence advances quickly, it is now possible to build machine learning or deep learning models that screen a large number of candidate sequences for therapeutic peptides. Therapeutic peptides, such as those with antibacterial or anticancer properties, have been developed by the application of artificial intelligence algorithms.The process of finding and developing peptide drugs is outlined in this review, along with a few related cases that were helped by AI and conventional methods. These resources will open up new avenues for peptide drug development and discovery, helping to meet the pressing needs of clinical patients for disease treatment. Although peptide drugs are a new class of biopharmaceuticals that distinguish them from chemical and small molecule drugs, their clinical purpose and value cannot be ignored. However, the traditional peptide drug research and development has a long development cycle and high investment, and the creation of peptide medications will be substantially hastened by the AI-assisted (AI+) mode, offering a new boost for combating diseases.
Collapse
Affiliation(s)
- Zhiheng Chen
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100083, China.
| | - Ruoxi Wang
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100083, China.
| | - Junqi Guo
- School of Biological Science and Medical Engineering, Beihang University, Beijing 100083, China.
| | - Xiaogang Wang
- Guangdong Provincial Key Laboratory of Bone and Joint Degenerative Diseases, The Third Affiliated Hospital of Southern Medical University, Guangzhou, Guangdong 510630, China.
| |
Collapse
|
26
|
Xu X, Li C, Yuan X, Zhang Q, Liu Y, Zhu Y, Chen T. ACP-DRL: an anticancer peptides recognition method based on deep representation learning. Front Genet 2024; 15:1376486. [PMID: 38655048 PMCID: PMC11035771 DOI: 10.3389/fgene.2024.1376486] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Accepted: 03/25/2024] [Indexed: 04/26/2024] Open
Abstract
Cancer, a significant global public health issue, resulted in about 10 million deaths in 2022. Anticancer peptides (ACPs), as a category of bioactive peptides, have emerged as a focal point in clinical cancer research due to their potential to inhibit tumor cell proliferation with minimal side effects. However, the recognition of ACPs through wet-lab experiments still faces challenges of low efficiency and high cost. Our work proposes a recognition method for ACPs named ACP-DRL based on deep representation learning, to address the challenges associated with the recognition of ACPs in wet-lab experiments. ACP-DRL marks initial exploration of integrating protein language models into ACPs recognition, employing in-domain further pre-training to enhance the development of deep representation learning. Simultaneously, it employs bidirectional long short-term memory networks to extract amino acid features from sequences. Consequently, ACP-DRL eliminates constraints on sequence length and the dependence on manual features, showcasing remarkable competitiveness in comparison with existing methods.
Collapse
Affiliation(s)
- Xiaofang Xu
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences(Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Chaoran Li
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences(Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Xinpu Yuan
- Department of General Surgery, First Medical Center, Chinese PLA General Hospital, Beijing, China
| | - Qiangjian Zhang
- Institute of Dataspace, Hefei Comprehensive National Science Center, Hefei, China
| | - Yi Liu
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences(Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Yunping Zhu
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences(Beijing), Beijing Institute of Lifeomics, Beijing, China
| | - Tao Chen
- State Key Laboratory of Medical Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences(Beijing), Beijing Institute of Lifeomics, Beijing, China
| |
Collapse
|
27
|
Yang X, Jin J, Wang R, Li Z, Wang Y, Wei L. CACPP: A Contrastive Learning-Based Siamese Network to Identify Anticancer Peptides Based on Sequence Only. J Chem Inf Model 2024; 64:2807-2816. [PMID: 37252890 DOI: 10.1021/acs.jcim.3c00297] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Anticancer peptides (ACPs) recently have been receiving increasing attention in cancer therapy due to their low consumption, few adverse side effects, and easy accessibility. However, it remains a great challenge to identify anticancer peptides via experimental approaches, requiring expensive and time-consuming experimental studies. In addition, traditional machine-learning-based methods are proposed for ACP prediction mainly depending on hand-crafted feature engineering, which normally achieves low prediction performance. In this study, we propose CACPP (Contrastive ACP Predictor), a deep learning framework based on the convolutional neural network (CNN) and contrastive learning for accurately predicting anticancer peptides. In particular, we introduce the TextCNN model to extract the high-latent features based on the peptide sequences only and exploit the contrastive learning module to learn more distinguishable feature representations to make better predictions. Comparative results on the benchmark data sets indicate that CACPP outperforms all the state-of-the-art methods in the prediction of anticancer peptides. Moreover, to intuitively show that our model has good classification ability, we visualize the dimension reduction of the features from our model and explore the relationship between ACP sequences and anticancer functions. Furthermore, we also discuss the influence of data set construction on model prediction and explore our model performance on the data sets with verified negative samples.
Collapse
Affiliation(s)
- Xuetong Yang
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Junru Jin
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Ruheng Wang
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Zhongshen Li
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Yu Wang
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| |
Collapse
|
28
|
Xin R, Zhang F, Zheng J, Zhang Y, Yu C, Feng X. SDBA: Score Domain-Based Attention for DNA N4-Methylcytosine Site Prediction from Multiperspectives. J Chem Inf Model 2024; 64:2839-2853. [PMID: 37646411 DOI: 10.1021/acs.jcim.3c00688] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/01/2023]
Abstract
In tasks related to DNA sequence classification, choosing the appropriate encoding methods is challenging. Some of the methods encode sequences based on prior knowledge that limits the ability of the model to obtain multiperspective information from the sequences. We introduced a new trainable ensemble method based on the attention mechanism SDBA, which stands for Score Domain-Based Attention. Unlike other methods, we fed the task-independent encoding results into the models and dynamically ensembled features from different perspectives using the SDBA mechanism. This approach allows the model to acquire and weight sequence features voluntarily. SDBA is conceptually general and empirically powerful. It has achieved new state-of-the-art results on the benchmark data sets associated with DNA N4-methylcytosine site prediction.
Collapse
Affiliation(s)
- Ruihao Xin
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin 130000, P.R. China
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, P.R. China
| | - Fan Zhang
- College of Information and Control Engineering, Jilin Institute of Chemical Technology, Jilin 130000, P.R. China
| | - Jiaxin Zheng
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, P.R. China
| | - Yangyi Zhang
- University of Melbourne Centre for Cancer Research, Victorian Comprehensive Cancer Centre, University of Melbourne, Parkville, Victoria 3050, Australia
| | - Cuinan Yu
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, P.R. China
| | - Xin Feng
- School of Science, Jilin Institute of Chemical Technology, Jilin 130000, P.R. China
- State Key Laboratory of Inorganic Synthesis and Preparative Chemistry, College of Chemistry, Jilin University, Changchun 130012, P.R. China
| |
Collapse
|
29
|
Liang X, Zhao H, Wang J. MA-PEP: A novel anticancer peptide prediction framework with multimodal feature fusion based on attention mechanism. Protein Sci 2024; 33:e4966. [PMID: 38532681 PMCID: PMC10966354 DOI: 10.1002/pro.4966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 01/30/2024] [Accepted: 03/06/2024] [Indexed: 03/28/2024]
Abstract
AntiCancer Peptides (ACPs) have emerged as promising therapeutic agents for cancer treatment. The time-consuming and costly nature of wet-lab discriminatory methods has spurred the development of various machine learning and deep learning-based ACP classification methods. Nonetheless, current methods encountered challenges in efficiently integrating features from various peptide modalities, thereby limiting a more comprehensive understanding of ACPs and further restricting the improvement of prediction model performance. In this study, we introduce a novel ACP prediction method, MA-PEP, which leverages multiple attention mechanisms for feature enhancement and fusion to improve ACP prediction. By integrating the enhanced molecular-level chemical features and sequence information of peptides, MA-PEP demonstrates superior prediction performance across several benchmark datasets, highlighting its efficacy in ACP prediction. Moreover, the visual analysis and case studies further demonstrate MA-PEP's reliable feature extraction capability and its promise in the realm of ACP exploration. The code and datasets for MA-PEP are available at https://github.com/liangxiaodata/MA-PEP.
Collapse
Affiliation(s)
- Xiao Liang
- School of Computer Science and Engineering, Central South UniversityChangshaChina
- Hunan Provincial Key Lab on Bioinformatics, Central South UniversityChangshaChina
| | - Haochen Zhao
- School of Computer Science and Engineering, Central South UniversityChangshaChina
- Hunan Provincial Key Lab on Bioinformatics, Central South UniversityChangshaChina
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South UniversityChangshaChina
- Hunan Provincial Key Lab on Bioinformatics, Central South UniversityChangshaChina
| |
Collapse
|
30
|
Xu J, Ruan X, Yang J, Hu B, Li S, Hu J. SME-MFP: A novel spatiotemporal neural network with multiangle initialization embedding toward multifunctional peptides prediction. Comput Biol Chem 2024; 109:108033. [PMID: 38412804 DOI: 10.1016/j.compbiolchem.2024.108033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2023] [Revised: 01/09/2024] [Accepted: 02/17/2024] [Indexed: 02/29/2024]
Abstract
As a promising alternative to conventional antibiotic drugs in the biomedical field, functional peptide has been widely used in disease treatment owing to its low toxicity, high absorption rate, and biological activity. Recently, several machine learning methods have been developed for functional peptide prediction. However, the main research heavily relies on statistical features and few consider multifunctional peptide identification. So, we propose SME-MFP, a novel predictor in the imbalanced multi-label functional peptide datasets. First, we employ physicochemical and evolutionary information to represent the peptide sequence's initialization features from multiple perspectives. Second, the features are fused and then put into spatial feature extractors, where the residual connection and multiscale convolutional neural network extract more discriminative features of different lengths' peptide sequences. Besides, we also design AFT-based temporal feature extractors to fully capture the global interactions of the sequences. Finally, devising a new loss to replace the traditional cross entropy loss to settle the class imbalance problems. The results show that our framework not only enhances the model's ability to capture sequence features effectively, but also accuracy improves by 3.89% over existing methods on public peptide datasets.
Collapse
Affiliation(s)
- Jing Xu
- State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
| | - Xiaoli Ruan
- State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China.
| | - Jing Yang
- State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
| | - Bingqi Hu
- State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
| | - Shaobo Li
- State Key Laboratory of Public Big Data, Guizhou University, Guiyang 550025, China
| | - Jianjun Hu
- Department of Computer Science and Engineering, University of South Carolina, Columbia 29208, USA
| |
Collapse
|
31
|
Lee B, Shin D. Contrastive learning for enhancing feature extraction in anticancer peptides. Brief Bioinform 2024; 25:bbae220. [PMID: 38725157 PMCID: PMC11082072 DOI: 10.1093/bib/bbae220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 03/28/2024] [Accepted: 04/21/2024] [Indexed: 05/13/2024] Open
Abstract
Cancer, recognized as a primary cause of death worldwide, has profound health implications and incurs a substantial social burden. Numerous efforts have been made to develop cancer treatments, among which anticancer peptides (ACPs) are garnering recognition for their potential applications. While ACP screening is time-consuming and costly, in silico prediction tools provide a way to overcome these challenges. Herein, we present a deep learning model designed to screen ACPs using peptide sequences only. A contrastive learning technique was applied to enhance model performance, yielding better results than a model trained solely on binary classification loss. Furthermore, two independent encoders were employed as a replacement for data augmentation, a technique commonly used in contrastive learning. Our model achieved superior performance on five of six benchmark datasets against previous state-of-the-art models. As prediction tools advance, the potential in peptide-based cancer therapeutics increases, promising a brighter future for oncology research and patient care.
Collapse
Affiliation(s)
- Byungjo Lee
- Research Institute, National Cancer Center, 323, Ilsan-ro, Ilsandong-gu, Goyang, 10408, Republic of Korea
| | - Dongkwan Shin
- Research Institute, National Cancer Center, 323, Ilsan-ro, Ilsandong-gu, Goyang, 10408, Republic of Korea
- Department of Cancer Biomedical Science, National Cancer Center Graduate School of Cancer Science and Policy, 323, Ilsan-ro, Ilsandong-gu, Goyang, 10408, Republic of Korea
| |
Collapse
|
32
|
Zhang S, Zhao Y, Liang Y. AACFlow: an end-to-end model based on attention augmented convolutional neural network and flow-attention mechanism for identification of anticancer peptides. Bioinformatics 2024; 40:btae142. [PMID: 38452348 PMCID: PMC10973939 DOI: 10.1093/bioinformatics/btae142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 03/01/2024] [Accepted: 03/06/2024] [Indexed: 03/09/2024] Open
Abstract
MOTIVATION Anticancer peptides (ACPs) have natural cationic properties and can act on the anionic cell membrane of cancer cells to kill cancer cells. Therefore, ACPs have become a potential anticancer drug with good research value and prospect. RESULTS In this article, we propose AACFlow, an end-to-end model for identification of ACPs based on deep learning. End-to-end models have more room to automatically adjust according to the data, making the overall fit better and reducing error propagation. The combination of attention augmented convolutional neural network (AAConv) and multi-layer convolutional neural network (CNN) forms a deep representation learning module, which is used to obtain global and local information on the sequence. Based on the concept of flow network, multi-head flow-attention mechanism is introduced to mine the deep features of the sequence to improve the efficiency of the model. On the independent test dataset, the ACC, Sn, Sp, and AUC values of AACFlow are 83.9%, 83.0%, 84.8%, and 0.892, respectively, which are 4.9%, 1.5%, 8.0%, and 0.016 higher than those of the baseline model. The MCC value is 67.85%. In addition, we visualize the features extracted by each module to enhance the interpretability of the model. Various experiments show that our model is more competitive in predicting ACPs.
Collapse
Affiliation(s)
- Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Ya Zhao
- School of Mathematics and Statistics, Xidian University, Xi'an 710071, China
| | - Yunyun Liang
- School of Science, Xi’an Polytechnic University, Xi'an 710048, China
| |
Collapse
|
33
|
Liu M, Wu T, Li X, Zhu Y, Chen S, Huang J, Zhou F, Liu H. ACPPfel: Explainable deep ensemble learning for anticancer peptides prediction based on feature optimization. Front Genet 2024; 15:1352504. [PMID: 38487252 PMCID: PMC10937565 DOI: 10.3389/fgene.2024.1352504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 02/19/2024] [Indexed: 03/17/2024] Open
Abstract
Background: Cancer is a significant global health problem that continues to cause a high number of deaths worldwide. Traditional cancer treatments often come with risks that can compromise the functionality of vital organs. As a potential alternative to these conventional therapies, Anticancer peptides (ACPs) have garnered attention for their small size, high specificity, and reduced toxicity, making them as a promising option for cancer treatments. Methods: However, the process of identifying effective ACPs through wet-lab screening experiments is time-consuming and requires a lot of labor. To overcome this challenge, a deep ensemble learning method is constructed to predict anticancer peptides (ACPs) in this study. To evaluate the reliability of the framework, four different datasets are used in this study for training and testing. During the training process of the model, integration of feature selection methods, feature dimensionality reduction measures, and optimization of the deep ensemble model are carried out. Finally, we explored the interpretability of features that affected the final prediction results and built a web server platform to facilitate anticancer peptides prediction, which can be used by all researchers for further studies. This web server can be accessed at http://lmylab.online:5001/. Results: The result of this study achieves an accuracy rate of 98.53% and an AUC (Area under Curve) value of 0.9972 on the ACPfel dataset, it has improvements on other datasets as well.
Collapse
Affiliation(s)
- Mingyou Liu
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
| | - Tao Wu
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
| | - Xue Li
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
| | - Yingxue Zhu
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
| | - Sen Chen
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
| | - Jian Huang
- School of Life Science and Technology, University of Electronic Science and Technology, Chengdu, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Fengfeng Zhou
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Hongmei Liu
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|
34
|
Shoombuatong W, Homdee N, Schaduangrat N, Chumnanpuen P. Leveraging a meta-learning approach to advance the accuracy of Na v blocking peptides prediction. Sci Rep 2024; 14:4463. [PMID: 38396246 PMCID: PMC10891130 DOI: 10.1038/s41598-024-55160-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 02/21/2024] [Indexed: 02/25/2024] Open
Abstract
The voltage-gated sodium (Nav) channel is a crucial molecular component responsible for initiating and propagating action potentials. While the α subunit, forming the channel pore, plays a central role in this function, the complete physiological function of Nav channels relies on crucial interactions between the α subunit and auxiliary proteins, known as protein-protein interactions (PPI). Nav blocking peptides (NaBPs) have been recognized as a promising and alternative therapeutic agent for pain and itch. Although traditional experimental methods can precisely determine the effect and activity of NaBPs, they remain time-consuming and costly. Hence, machine learning (ML)-based methods that are capable of accurately contributing in silico prediction of NaBPs are highly desirable. In this study, we develop an innovative meta-learning-based NaBP prediction method (MetaNaBP). MetaNaBP generates new feature representations by employing a wide range of sequence-based feature descriptors that cover multiple perspectives, in combination with powerful ML algorithms. Then, these feature representations were optimized to identify informative features using a two-step feature selection method. Finally, the selected informative features were applied to develop the final meta-predictor. To the best of our knowledge, MetaNaBP is the first meta-predictor for NaBP prediction. Experimental results demonstrated that MetaNaBP achieved an accuracy of 0.948 and a Matthews correlation coefficient of 0.898 over the independent test dataset, which were 5.79% and 11.76% higher than the existing method. In addition, the discriminative power of our feature representations surpassed that of conventional feature descriptors over both the training and independent test datasets. We anticipate that MetaNaBP will be exploited for the large-scale prediction and analysis of NaBPs to narrow down the potential NaBPs.
Collapse
Affiliation(s)
- Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| | - Nutta Homdee
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand
- Omics Center for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok, 10900, Thailand
| |
Collapse
|
35
|
Charoenkwan P, Chumnanpuen P, Schaduangrat N, Shoombuatong W. Accelerating the identification of the allergenic potential of plant proteins using a stacked ensemble-learning framework. J Biomol Struct Dyn 2024:1-13. [PMID: 38385478 DOI: 10.1080/07391102.2024.2318482] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 02/08/2024] [Indexed: 02/23/2024]
Abstract
Plant-allergenic proteins (PAPs) have the potential to induce allergic reactions in certain individuals. While these proteins are generally innocuous for the majority of people, they can elicit an immune response in those with particular sensitivities. Thus, screening and prioritizing the allergenic potential of plant proteins is indispensable for the development of diagnostic tools, therapeutic interventions or medications to treat allergic reactions. However, investigating the allergenic potential of plant proteins based on experimental methods is costly and labour-intensive. Therefore, we develop StackPAP, a three-layer stacking ensemble framework for accurate large-scale identification of PAPs. In StackPAP, at the first layer, we conducted a comprehensive analysis of an extensive set of feature descriptors. Subsequently, we selected and fused five potential sequence-based feature descriptors, including amphiphilic pseudo-amino acid composition, dipeptide deviation from expected mean, amino acid composition, pseudo amino acid composition and dipeptide composition. Additionally, we applied an efficient genetic algorithm (GA-SAR) to determine informative feature sets. In the second layer, 12 powerful machine learning (ML) methods, in combination with all the informative feature sets, were employed to construct a pool of base classifiers. Finally, 13 potential base classifiers were selected using the GA-SAR method and combined to develop the final meta-classifier. Our experimental results revealed the promising prediction performance of StackPAP, with an accuracy, Matthew's correlation coefficient and AUC of 0.984, 0.969 and 0.993, respectively, as judged by the independent test dataset. In conclusion, both cross-validation and independent test results indicated the superior performance of StackPAP compared with several ML-based classifiers. To accelerate the identification of the allergenicity of plant proteins, we developed a user-friendly web server for StackPAP (https://pmlabqsar.pythonanywhere.com/StackPAP). We anticipate that StackPAP will be an efficient and useful tool for rapidly screening PAPs from a vast number of plant proteins.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, Thailand
- Omics Center for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand
| |
Collapse
|
36
|
Zhong G, Deng L. ACPScanner: Prediction of Anticancer Peptides by Integrated Machine Learning Methodologies. J Chem Inf Model 2024; 64:1092-1104. [PMID: 38277774 DOI: 10.1021/acs.jcim.3c01860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2024]
Abstract
Novel therapeutic alternatives for cancer treatment are increasingly attracting global research attention. Although chemotherapy remains a primary clinical solution, it often results in significant side effects for patients. In recent years, anticancer peptides (ACPs) have emerged as promising candidates for highly specific anticancer drugs, and a number of computational approaches have been developed to identify ACPs. However, existing methods do not recognize specific types of anticancer function. In this article, we propose ACPScanner, an integrated approach to predict ACPs and non-ACPs at first and then predict several specific activity types for potential ACPs. We incorporate sequential, physicochemical properties, secondary structural information, and deep representation learning embeddings which are generated from artificial intelligence methods to build feature space. Customized deep learning and statistical learning methods are combined to form an integral architecture for the comprehensive two-level prediction task. To the best of our knowledge, ACPScanner is the first approach for specific ACP activity prediction. The comparative evaluation illustrates that ACPScanner achieves competitive prediction performance in both prediction phases in independent testings. We establish a web server at http://acpscanner.denglab.org to provide convenient usage of ACPScanner and make the predictive framework, source code, and data sets publicly available.
Collapse
Affiliation(s)
- Guolun Zhong
- School of Computer Science and Engineering, Central South University, Changsha 410000, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410000, China
| |
Collapse
|
37
|
Karim T, Shaon MSH, Sultan MF, Hasan MZ, Kafy AA. ANNprob-ACPs: A novel anticancer peptide identifier based on probabilistic feature fusion approach. Comput Biol Med 2024; 169:107915. [PMID: 38171261 DOI: 10.1016/j.compbiomed.2023.107915] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2023] [Revised: 12/28/2023] [Accepted: 12/29/2023] [Indexed: 01/05/2024]
Abstract
Anticancer Peptides (ACPs) offer significant potential as cancer treatment drugs in this modern era. Quickly identifying active compounds from protein sequences is crucial for healthcare and cancer treatment. In this paper ANNprob-ACPs, a novel and effective model for detecting ACPs has been implemented based on nine feature encoding techniques, including AAC, CC, W2V, DPC, PAAC, QSO, CTDC, CTDT, and CKSAAGP. After analyzing the performance of several machine learning models, the six best models were selected based on their overall performances in every evaluation metric. The probability scores of each model were subsequently aggregated and used as input of our meta- model, called ANNprob-ACPs. Our model outperformed all others and its potential to lead to phenomenal identification of ACPs. The results of this study showed notable improvement in 10-fold cross-validation and independent test, with accuracy of 93.72% and 90.62%, respectively. Our proposed model, ANNprob-ACPs outperformed existing approaches in terms of accuracy and effectiveness in discovering ACPs. By using SHAP, this study obtained the physicochemical properties of QSO, and compositional properties of DPC, AAC, and PAAC are more impactful for our model's performances, which have a major impact on a drug's interactions and future discoveries. Consequently, this model is crucial for the future and has a high probability of detecting ACPs more frequently. We developed a web server of ANNprob-ACPs, which is accessible at ANNprob-ACPs webserver.
Collapse
Affiliation(s)
- Tasmin Karim
- Department of Computer Science & Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh; Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh.
| | - Md Shazzad Hossain Shaon
- Department of Computer Science & Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh; Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh.
| | - Md Fahim Sultan
- Department of Computer Science & Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh; Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh.
| | - Md Zahid Hasan
- Department of Computer Science & Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh; Health Informatics Research Lab, Department of Computer Science and Engineering, Daffodil International University, Daffodil Smart City, Birulia, Dhaka, 1216, Bangladesh.
| | - Abdulla-Al Kafy
- Department of Urban & Regional Planning, Rajshahi University of Engineering & Technology (RUET), Rajshahi, 6204, Bangladesh.
| |
Collapse
|
38
|
Li C, Jin K. Chemical Strategies towards the Development of Effective Anticancer Peptides. Curr Med Chem 2024; 31:1839-1873. [PMID: 37170992 DOI: 10.2174/0929867330666230426111157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2022] [Revised: 01/28/2023] [Accepted: 02/24/2023] [Indexed: 05/13/2023]
Abstract
Cancer is increasingly recognized as one of the primary causes of death and has become a multifaceted global health issue. Modern medical science has made significant advancements in the diagnosis and therapy of cancer over the past decade. The detrimental side effects, lack of efficacy, and multidrug resistance of conventional cancer therapies have created an urgent need for novel anticancer therapeutics or treatments with low cytotoxicity and drug resistance. The pharmaceutical groups have recognized the crucial role that peptide therapeutic agents can play in addressing unsatisfied healthcare demands and how these become great supplements or even preferable alternatives to biological therapies and small molecules. Anticancer peptides, as a vibrant therapeutic strategy against various cancer cells, have demonstrated incredible anticancer potential due to high specificity and selectivity, low toxicity, and the ability to target the surface of traditional "undruggable" proteins. This review will provide the research progression of anticancer peptides, mainly focusing on the discovery and modifications along with the optimization and application of these peptides in clinical practice.
Collapse
Affiliation(s)
- Cuicui Li
- Key Laboratory of Chemical Biology (Ministry of Education), Department of Medicinal Chemistry, School of Pharmacy, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China
| | - Kang Jin
- Key Laboratory of Chemical Biology (Ministry of Education), Department of Medicinal Chemistry, School of Pharmacy, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, 250012, China
| |
Collapse
|
39
|
La Paglia L, Vazzana M, Mauro M, Urso A, Arizza V, Vizzini A. Bioactive Molecules from the Innate Immunity of Ascidians and Innovative Methods of Drug Discovery: A Computational Approach Based on Artificial Intelligence. Mar Drugs 2023; 22:6. [PMID: 38276644 PMCID: PMC10817596 DOI: 10.3390/md22010006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2023] [Revised: 12/12/2023] [Accepted: 12/17/2023] [Indexed: 01/27/2024] Open
Abstract
The study of bioactive molecules of marine origin has created an important bridge between biological knowledge and its applications in biotechnology and biomedicine. Current studies in different research fields, such as biomedicine, aim to discover marine molecules characterized by biological activities that can be used to produce potential drugs for human use. In recent decades, increasing attention has been paid to a particular group of marine invertebrates, the Ascidians, as they are a source of bioactive products. We describe omics data and computational methods relevant to identifying the mechanisms and processes of innate immunity underlying the biosynthesis of bioactive molecules, focusing on innovative computational approaches based on Artificial Intelligence. Since there is increasing attention on finding new solutions for a sustainable supply of bioactive compounds, we propose that a possible improvement in the biodiscovery pipeline might also come from the study and utilization of marine invertebrates' innate immunity.
Collapse
Affiliation(s)
- Laura La Paglia
- Istituto di Calcolo e Reti ad Alte Prestazioni–Consiglio Nazionale delle Ricerche, Via Ugo La Malfa 153, 90146 Palermo, Italy; (L.L.P.); (A.U.)
| | - Mirella Vazzana
- Dipartimento di Scienze e Tecnologie Biologiche, Chimiche e Farmaceutiche–Università di Palermo, Via Archirafi 18, 90100 Palermo, Italy; (M.V.); (M.M.); (V.A.)
| | - Manuela Mauro
- Dipartimento di Scienze e Tecnologie Biologiche, Chimiche e Farmaceutiche–Università di Palermo, Via Archirafi 18, 90100 Palermo, Italy; (M.V.); (M.M.); (V.A.)
| | - Alfonso Urso
- Istituto di Calcolo e Reti ad Alte Prestazioni–Consiglio Nazionale delle Ricerche, Via Ugo La Malfa 153, 90146 Palermo, Italy; (L.L.P.); (A.U.)
| | - Vincenzo Arizza
- Dipartimento di Scienze e Tecnologie Biologiche, Chimiche e Farmaceutiche–Università di Palermo, Via Archirafi 18, 90100 Palermo, Italy; (M.V.); (M.M.); (V.A.)
| | - Aiti Vizzini
- Dipartimento di Scienze e Tecnologie Biologiche, Chimiche e Farmaceutiche–Università di Palermo, Via Archirafi 18, 90100 Palermo, Italy; (M.V.); (M.M.); (V.A.)
| |
Collapse
|
40
|
Menotti L, Vannini A. Oncolytic Viruses in the Era of Omics, Computational Technologies, and Modeling: Thesis, Antithesis, and Synthesis. Int J Mol Sci 2023; 24:17378. [PMID: 38139207 PMCID: PMC10743452 DOI: 10.3390/ijms242417378] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2023] [Revised: 12/05/2023] [Accepted: 12/08/2023] [Indexed: 12/24/2023] Open
Abstract
Oncolytic viruses (OVs) are the frontier therapy for refractory cancers, especially in integration with immunomodulation strategies. In cancer immunovirotherapy, the many available "omics" and systems biology technologies generate at a fast pace a challenging huge amount of data, where apparently clashing information mirrors the complexity of individual clinical situations and OV used. In this review, we present and discuss how currently big data analysis, on one hand and, on the other, simulation, modeling, and computational technologies, provide invaluable support to interpret and integrate "omic" information and drive novel synthetic biology and personalized OV engineering approaches for effective immunovirotherapy. Altogether, these tools, possibly aided in the future by artificial intelligence as well, will allow for the blending of the information into OV recombinants able to achieve tumor clearance in a patient-tailored way. Various endeavors to the envisioned "synthesis" of turning OVs into personalized theranostic agents are presented.
Collapse
Affiliation(s)
- Laura Menotti
- Department of Pharmacy and Biotechnology, University of Bologna, 40126 Bologna, Italy;
| | | |
Collapse
|
41
|
Wang Z, Meng J, Li H, Xia S, Wang Y, Luan Y. PAMPred: A hierarchical evolutionary ensemble framework for identifying plant antimicrobial peptides. Comput Biol Med 2023; 166:107545. [PMID: 37806057 DOI: 10.1016/j.compbiomed.2023.107545] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/04/2023] [Accepted: 09/28/2023] [Indexed: 10/10/2023]
Abstract
Antimicrobial peptides (AMPs) play a crucial role in plant immune regulation, growth and development stages, which have attracted significant attentions in recent years. As the wet-lab experiments are laborious and cost-prohibitive, it is indispensable to develop computational methods to discover novel plant AMPs accurately. In this study, we presented a hierarchical evolutionary ensemble framework, named PAMPred, which consisted of a multi-level heterogeneous architecture to identify plant AMPs. Specifically, to address the existing class imbalance problem, a cluster-based resampling method was adopted to build multiple balanced subsets. Then, several peptide features including sequence information-based and physicochemical properties-based features were fed into the different types of basic learners to increase the ensemble diversity. For boosting the predictive capability of PAMPred, the improved particle swarm optimization (PSO) algorithm and dynamic ensemble pruning strategy were used to optimize the weights at different levels adaptively. Furthermore, extensive ten-fold cross-validation and independent testing experimental results demonstrated that PAMPred achieved excellent prediction performance and generalization ability, and outperformed the state-of-the-art methods. It also indicated that the proposed method could serve as an effective auxiliary tool to identify plant AMPs, which would be conducive to explore the immune regulatory mechanism of plants.
Collapse
Affiliation(s)
- Zhaowei Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China.
| | - Haibin Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Shihao Xia
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Yu Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, Liaoning 116024, China
| |
Collapse
|
42
|
Sun M, Hu H, Pang W, Zhou Y. ACP-BC: A Model for Accurate Identification of Anticancer Peptides Based on Fusion Features of Bidirectional Long Short-Term Memory and Chemically Derived Information. Int J Mol Sci 2023; 24:15447. [PMID: 37895128 PMCID: PMC10607064 DOI: 10.3390/ijms242015447] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Revised: 09/10/2023] [Accepted: 10/20/2023] [Indexed: 10/29/2023] Open
Abstract
Anticancer peptides (ACPs) have been proven to possess potent anticancer activities. Although computational methods have emerged for rapid ACPs identification, their accuracy still needs improvement. In this study, we propose a model called ACP-BC, a three-channel end-to-end model that utilizes various combinations of data augmentation techniques. In the first channel, features are extracted from the raw sequence using a bidirectional long short-term memory network. In the second channel, the entire sequence is converted into a chemical molecular formula, which is further simplified using Simplified Molecular Input Line Entry System notation to obtain deep abstract features through a bidirectional encoder representation transformer (BERT). In the third channel, we manually selected four effective features according to dipeptide composition, binary profile feature, k-mer sparse matrix, and pseudo amino acid composition. Notably, the application of chemical BERT in predicting ACPs is novel and successfully integrated into our model. To validate the performance of our model, we selected two benchmark datasets, ACPs740 and ACPs240. ACP-BC achieved prediction accuracy with 87% and 90% on these two datasets, respectively, representing improvements of 1.3% and 7% compared to existing state-of-the-art methods on these datasets. Therefore, systematic comparative experiments have shown that the ACP-BC can effectively identify anticancer peptides.
Collapse
Affiliation(s)
- Mingwei Sun
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (M.S.); (H.H.)
| | - Haoyuan Hu
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (M.S.); (H.H.)
| | - Wei Pang
- School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh EH14 4AS, UK;
| | - You Zhou
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China; (M.S.); (H.H.)
- College of Software, Jilin University, Changchun 130012, China
| |
Collapse
|
43
|
Ma C, Wolfinger R. A prediction model for blood-brain barrier penetrating peptides based on masked peptide transformers with dynamic routing. Brief Bioinform 2023; 24:bbad399. [PMID: 37985456 DOI: 10.1093/bib/bbad399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 09/26/2023] [Accepted: 10/17/2023] [Indexed: 11/22/2023] Open
Abstract
Blood-brain barrier penetrating peptides (BBBPs) are short peptide sequences that possess the ability to traverse the selective blood-brain interface, making them valuable drug candidates or carriers for various payloads. However, the in vivo or in vitro validation of BBBPs is resource-intensive and time-consuming, driving the need for accurate in silico prediction methods. Unfortunately, the scarcity of experimentally validated BBBPs hinders the efficacy of current machine-learning approaches in generating reliable predictions. In this paper, we present DeepB3P3, a novel framework for BBBPs prediction. Our contribution encompasses four key aspects. Firstly, we propose a novel deep learning model consisting of a transformer encoder layer, a convolutional network backbone, and a capsule network classification head. This integrated architecture effectively learns representative features from peptide sequences. Secondly, we introduce masked peptides as a powerful data augmentation technique to compensate for small training set sizes in BBBP prediction. Thirdly, we develop a novel threshold-tuning method to handle imbalanced data by approximating the optimal decision threshold using the training set. Lastly, DeepB3P3 provides an accurate estimation of the uncertainty level associated with each prediction. Through extensive experiments, we demonstrate that DeepB3P3 achieves state-of-the-art accuracy of up to 98.31% on a benchmarking dataset, solidifying its potential as a promising computational tool for the prediction and discovery of BBBPs.
Collapse
Affiliation(s)
- Chunwei Ma
- JMP Statistical Discovery, LLC, Cary, 27513, NC, USA
- Department of Computer Science and Engineering, University at Buffalo, Buffalo, 14260, NY, USA
| | | |
Collapse
|
44
|
Charoenkwan P, Kongsompong S, Schaduangrat N, Chumnanpuen P, Shoombuatong W. TIPred: a novel stacked ensemble approach for the accelerated discovery of tyrosinase inhibitory peptides. BMC Bioinformatics 2023; 24:356. [PMID: 37735626 PMCID: PMC10512532 DOI: 10.1186/s12859-023-05463-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Accepted: 09/01/2023] [Indexed: 09/23/2023] Open
Abstract
BACKGROUND Tyrosinase is an enzyme involved in melanin production in the skin. Several hyperpigmentation disorders involve the overproduction of melanin and instability of tyrosinase activity resulting in darker, discolored patches on the skin. Therefore, discovering tyrosinase inhibitory peptides (TIPs) is of great significance for basic research and clinical treatments. However, the identification of TIPs using experimental methods is generally cost-ineffective and time-consuming. RESULTS Herein, a stacked ensemble learning approach, called TIPred, is proposed for the accurate and quick identification of TIPs by using sequence information. TIPred explored a comprehensive set of various baseline models derived from well-known machine learning (ML) algorithms and heterogeneous feature encoding schemes from multiple perspectives, such as chemical structure properties, physicochemical properties, and composition information. Subsequently, 130 baseline models were trained and optimized to create new probabilistic features. Finally, the feature selection approach was utilized to determine the optimal feature vector for developing TIPred. Both tenfold cross-validation and independent test methods were employed to assess the predictive capability of TIPred by using the stacking strategy. Experimental results showed that TIPred significantly outperformed the state-of-the-art method in terms of the independent test, with an accuracy of 0.923, MCC of 0.757 and an AUC of 0.977. CONCLUSIONS The proposed TIPred approach could be a valuable tool for rapidly discovering novel TIPs and effectively identifying potential TIP candidates for follow-up experimental validation. Moreover, an online webserver of TIPred is publicly available at http://pmlabstack.pythonanywhere.com/TIPred .
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Sasikarn Kongsompong
- Interdisciplinary Graduate Program in Bioscience, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand.
- Omics Center for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok, 10900, Thailand.
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
45
|
Tao H, Shan S, Fu H, Zhu C, Liu B. An Augmented Sample Selection Framework for Prediction of Anticancer Peptides. Molecules 2023; 28:6680. [PMID: 37764455 PMCID: PMC10535447 DOI: 10.3390/molecules28186680] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2023] [Revised: 09/14/2023] [Accepted: 09/15/2023] [Indexed: 09/29/2023] Open
Abstract
Anticancer peptides (ACPs) have promising prospects for cancer treatment. Traditional ACP identification experiments have the limitations of low efficiency and high cost. In recent years, data-driven deep learning techniques have shown significant potential for ACP prediction. However, data-driven prediction models rely heavily on extensive training data. Furthermore, the current publicly accessible ACP dataset is limited in size, leading to inadequate model generalization. While data augmentation effectively expands dataset size, existing techniques for augmenting ACP data often generate noisy samples, adversely affecting prediction performance. Therefore, this paper proposes a novel augmented sample selection framework for the prediction of anticancer peptides (ACPs-ASSF). First, the prediction model is trained using raw data. Then, the augmented samples generated using the data augmentation technique are fed into the trained model to compute pseudo-labels and estimate the uncertainty of the model prediction. Finally, samples with low uncertainty, high confidence, and pseudo-labels consistent with the original labels are selected and incorporated into the training set to retrain the model. The evaluation results for the ACP240 and ACP740 datasets show that ACPs-ASSF achieved accuracy improvements of up to 5.41% and 5.68%, respectively, compared to the traditional data augmentation method.
Collapse
Affiliation(s)
- Huawei Tao
- Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China; (H.T.); (S.S.); (H.F.); (C.Z.)
- Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China
| | - Shuai Shan
- Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China; (H.T.); (S.S.); (H.F.); (C.Z.)
- Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China
| | - Hongliang Fu
- Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China; (H.T.); (S.S.); (H.F.); (C.Z.)
- Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China
| | - Chunhua Zhu
- Key Laboratory of Food Information Processing and Control, Ministry of Education, Henan University of Technology, Zhengzhou 450001, China; (H.T.); (S.S.); (H.F.); (C.Z.)
- Henan Engineering Laboratory of Grain IOT Technology, Henan University of Technology, Zhengzhou 450001, China
| | - Boye Liu
- College of Food Science and Engineering, Henan University of Technology, Zhengzhou 450001, China
| |
Collapse
|
46
|
Cui Z, Wang SG, He Y, Chen ZH, Zhang QH. DeepTPpred: A Deep Learning Approach With Matrix Factorization for Predicting Therapeutic Peptides by Integrating Length Information. IEEE J Biomed Health Inform 2023; 27:4611-4622. [PMID: 37368803 DOI: 10.1109/jbhi.2023.3290014] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/29/2023]
Abstract
The abuse of traditional antibiotics has led to increased resistance of bacteria and viruses. Efficient therapeutic peptide prediction is critical for peptide drug discovery. However, most of the existing methods only make effective predictions for one class of therapeutic peptides. It is worth noting that currently no predictive method considers sequence length information as a distinct feature of therapeutic peptides. In this article, a novel deep learning approach with matrix factorization for predicting therapeutic peptides (DeepTPpred) by integrating length information are proposed. The matrix factorization layer can learn the potential features of the encoded sequence through the mechanism of first compression and then restoration. And the length features of the sequence of therapeutic peptides are embedded with encoded amino acid sequences. To automatically learn therapeutic peptide predictions, these latent features are input into the neural networks with self-attention mechanism. On eight therapeutic peptide datasets, DeepTPpred achieved excellent prediction results. Based on these datasets, we first integrated eight datasets to obtain a full therapeutic peptide integration dataset. Then, we obtained two functional integration datasets based on the functional similarity of the peptides. Finally, we also conduct experiments on the latest versions of the ACP and CPP datasets. Overall, the experimental results show that our work is effective for the identification of therapeutic peptides.
Collapse
|
47
|
Chen S, Liao Y, Zhao J, Bin Y, Zheng C. PACVP: Prediction of Anti-Coronavirus Peptides Using a Stacking Learning Strategy With Effective Feature Representation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:3106-3116. [PMID: 37022025 DOI: 10.1109/tcbb.2023.3238370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
Due to the global outbreak of COVID-19 and its variants, antiviral peptides with anti-coronavirus activity (ACVPs) represent a promising new drug candidate for the treatment of coronavirus infection. At present, several computational tools have been developed to identify ACVPs, but the overall prediction performance is still not enough to meet the actual therapeutic application. In this study, we constructed an efficient and reliable prediction model PACVP (Prediction of Anti-CoronaVirus Peptides) for identifying ACVPs based on effective feature representation and a two-layer stacking learning framework. In the first layer, we use nine feature encoding methods with different feature representation angles to characterize the rich sequence information and fuse them into a feature matrix. Secondly, data normalization and unbalanced data processing are carried out. Next, 12 baseline models are constructed by combining three feature selection methods and four machine learning classification algorithms. In the second layer, we input the optimal probability features into the logistic regression algorithm (LR) to train the final model PACVP. The experiments show that PACVP achieves favorable prediction performance on independent test dataset, with ACC of 0.9208 and AUC of 0.9465. We hope that PACVP will become a useful method for identifying, annotating and characterizing novel ACVPs.
Collapse
|
48
|
Hu W, Guan L, Li M. Prediction of DNA Methylation based on Multi-dimensional feature encoding and double convolutional fully connected convolutional neural network. PLoS Comput Biol 2023; 19:e1011370. [PMID: 37639434 PMCID: PMC10461834 DOI: 10.1371/journal.pcbi.1011370] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Accepted: 07/18/2023] [Indexed: 08/31/2023] Open
Abstract
DNA methylation takes on critical significance to the regulation of gene expression by affecting the stability of DNA and changing the structure of chromosomes. DNA methylation modification sites should be identified, which lays a solid basis for gaining more insights into their biological functions. Existing machine learning-based methods of predicting DNA methylation have not fully exploited the hidden multidimensional information in DNA gene sequences, such that the prediction accuracy of models is significantly limited. Besides, most models have been built in terms of a single methylation type. To address the above-mentioned issues, a deep learning-based method was proposed in this study for DNA methylation site prediction, termed the MEDCNN model. The MEDCNN model is capable of extracting feature information from gene sequences in three dimensions (i.e., positional information, biological information, and chemical information). Moreover, the proposed method employs a convolutional neural network model with double convolutional layers and double fully connected layers while iteratively updating the gradient descent algorithm using the cross-entropy loss function to increase the prediction accuracy of the model. Besides, the MEDCNN model can predict different types of DNA methylation sites. As indicated by the experimental results,the deep learning method based on coding from multiple dimensions outperformed single coding methods, and the MEDCNN model was highly applicable and outperformed existing models in predicting DNA methylation between different species. As revealed by the above-described findings, the MEDCNN model can be effective in predicting DNA methylation sites.
Collapse
Affiliation(s)
- Wenxing Hu
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, China
| | - Lixin Guan
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, China
| | - Mengshan Li
- College of Physics and Electronic Information, Gannan Normal University, Ganzhou, Jiangxi, China
| |
Collapse
|
49
|
Charoenkwan P, Schaduangrat N, Shoombuatong W. StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens. BMC Bioinformatics 2023; 24:301. [PMID: 37507654 PMCID: PMC10386778 DOI: 10.1186/s12859-023-05421-x] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023] Open
Abstract
BACKGROUND The identification of tumor T cell antigens (TTCAs) is crucial for providing insights into their functional mechanisms and utilizing their potential in anticancer vaccines development. In this context, TTCAs are highly promising. Meanwhile, experimental technologies for discovering and characterizing new TTCAs are expensive and time-consuming. Although many machine learning (ML)-based models have been proposed for identifying new TTCAs, there is still a need to develop a robust model that can achieve higher rates of accuracy and precision. RESULTS In this study, we propose a new stacking ensemble learning-based framework, termed StackTTCA, for accurate and large-scale identification of TTCAs. Firstly, we constructed 156 different baseline models by using 12 different feature encoding schemes and 13 popular ML algorithms. Secondly, these baseline models were trained and employed to create a new probabilistic feature vector. Finally, the optimal probabilistic feature vector was determined based the feature selection strategy and then used for the construction of our stacked model. Comparative benchmarking experiments indicated that StackTTCA clearly outperformed several ML classifiers and the existing methods in terms of the independent test, with an accuracy of 0.932 and Matthew's correlation coefficient of 0.866. CONCLUSIONS In summary, the proposed stacking ensemble learning-based framework of StackTTCA could help to precisely and rapidly identify true TTCAs for follow-up experimental verification. In addition, we developed an online web server ( http://2pmlab.camt.cmu.ac.th/StackTTCA ) to maximize user convenience for high-throughput screening of novel TTCAs.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
50
|
Xu J, Li F, Li C, Guo X, Landersdorfer C, Shen HH, Peleg AY, Li J, Imoto S, Yao J, Akutsu T, Song J. iAMPCN: a deep-learning approach for identifying antimicrobial peptides and their functional activities. Brief Bioinform 2023; 24:bbad240. [PMID: 37369638 PMCID: PMC10359087 DOI: 10.1093/bib/bbad240] [Citation(s) in RCA: 37] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 05/30/2023] [Accepted: 06/08/2023] [Indexed: 06/29/2023] Open
Abstract
Antimicrobial peptides (AMPs) are short peptides that play crucial roles in diverse biological processes and have various functional activities against target organisms. Due to the abuse of chemical antibiotics and microbial pathogens' increasing resistance to antibiotics, AMPs have the potential to be alternatives to antibiotics. As such, the identification of AMPs has become a widely discussed topic. A variety of computational approaches have been developed to identify AMPs based on machine learning algorithms. However, most of them are not capable of predicting the functional activities of AMPs, and those predictors that can specify activities only focus on a few of them. In this study, we first surveyed 10 predictors that can identify AMPs and their functional activities in terms of the features they employed and the algorithms they utilized. Then, we constructed comprehensive AMP datasets and proposed a new deep learning-based framework, iAMPCN (identification of AMPs based on CNNs), to identify AMPs and their related 22 functional activities. Our experiments demonstrate that iAMPCN significantly improved the prediction performance of AMPs and their corresponding functional activities based on four types of sequence features. Benchmarking experiments on the independent test datasets showed that iAMPCN outperformed a number of state-of-the-art approaches for predicting AMPs and their functional activities. Furthermore, we analyzed the amino acid preferences of different AMP activities and evaluated the model on datasets of varying sequence redundancy thresholds. To facilitate the community-wide identification of AMPs and their corresponding functional types, we have made the source codes of iAMPCN publicly available at https://github.com/joy50706/iAMPCN/tree/master. We anticipate that iAMPCN can be explored as a valuable tool for identifying potential AMPs with specific functional activities for further experimental validation.
Collapse
Affiliation(s)
- Jing Xu
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Fuyi Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
- The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, Melbourne, VIC 3800, Australia
| | - Chen Li
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
| | - Xudong Guo
- College of Information Engineering, Northwest A&F University, Shaanxi 712100, China
| | - Cornelia Landersdorfer
- Monash Institute of Pharmaceutical Sciences, Monash University, Melbourne, VIC 3800, Australia
| | - Hsin-Hui Shen
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Department of Materials Science and Engineering, Faculty of Engineering, Monash University, Clayton, VIC, 3800, Australia
| | - Anton Y Peleg
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Department of Infectious Diseases, Alfred Hospital, Alfred Health, Melbourne, Victoria, Australia
| | - Jian Li
- Monash Biomedicine Discovery Institute and Department of Microbiology, Monash University, Melbourne, VIC 3800, Australia
| | - Seiya Imoto
- Division of Health Medical Intelligence, Human Genome Center, Institute of Medical Science, The University of Tokyo, Minato-ku, Tokyo, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Bunkyo-ku, Tokyo, Japan
| | | | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan
| | - Jiangning Song
- Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC 3800, Australia
- Monash Data Futures Institute, Monash University, Melbourne, VIC 3800, Australia
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Uji 611-0011, Japan
| |
Collapse
|