1
|
Tan JZE, Wee J, Gong X, Xia K. Topology-Enhanced Machine Learning Model (Top-ML) for Anticancer Peptide Prediction. J Chem Inf Model 2025; 65:4232-4242. [PMID: 40229641 DOI: 10.1021/acs.jcim.5c00476] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/16/2025]
Abstract
Recently, therapeutic peptides have demonstrated great promise for cancer treatment. To explore powerful anticancer peptides, artificial intelligence (AI)-based approaches have been developed to systematically screen potential candidates. However, the lack of efficient featurization of peptides has become a bottleneck for these machine-learning models. In this paper, we propose a topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction. Our Top-ML employs peptide topological features derived from its sequence "connection" information characterized by spectral descriptors. Our Top-ML model, employing an Extra-Trees classifier, has been validated on the AntiCP 2.0 and mACPpred 2.0 benchmark data sets, achieving state-of-the-art performance or results comparable to existing deep learning models, while providing greater interpretability. Our results highlight the potential of leveraging novel topology-based featurization to accelerate the identification of anticancer peptides.
Collapse
Affiliation(s)
- Joshua Zhi En Tan
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - JunJie Wee
- Department of Mathematics, Michigan State University, East Lansing, Michigan 48824, United States
| | - Xue Gong
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| | - Kelin Xia
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore 637371, Singapore
| |
Collapse
|
2
|
Asim MN, Asif T, Mehmood F, Dengel A. Peptide classification landscape: An in-depth systematic literature review on peptide types, databases, datasets, predictors architectures and performance. Comput Biol Med 2025; 188:109821. [PMID: 39987697 DOI: 10.1016/j.compbiomed.2025.109821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 02/03/2025] [Accepted: 02/05/2025] [Indexed: 02/25/2025]
Abstract
Peptides are gaining significant attention in diverse fields such as the pharmaceutical market has seen a steady rise in peptide-based therapeutics over the past six decades. Peptides have been utilized in the development of distinct applications including inhibitors of SARS-COV-2 and treatments for conditions like cancer and diabetes. Distinct types of peptides possess unique characteristics, and development of peptide-specific applications require the discrimination of one peptide type from others. To the best of our knowledge, approximately 230 Artificial Intelligence (AI) driven applications have been developed for 22 distinct types of peptides, yet there remains significant room for development of new predictors. A Comprehensive review addresses the critical gap by providing a consolidated platform for the development of AI-driven peptide classification applications. This paper offers several key contributions, including presenting the biological foundations of 22 unique peptide types and categorizes them into four main classes: Regulatory, Therapeutic, Nutritional, and Delivery Peptides. It offers an in-depth overview of 47 databases that have been used to develop peptide classification benchmark datasets. It summarizes details of 288 benchmark datasets that are used in development of diverse types AI-driven peptide classification applications. It provides a detailed summary of 197 sequence representation learning methods and 94 classifiers that have been used to develop 230 distinct AI-driven peptide classification applications. Across 22 distinct types peptide classification tasks related to 288 benchmark datasets, it demonstrates performance values of 230 AI-driven peptide classification applications. It summarizes experimental settings and various evaluation measures that have been employed to assess the performance of AI-driven peptide classification applications. The primary focus of this manuscript is to consolidate scattered information into a single comprehensive platform. This resource will greatly assist researchers who are interested in developing new AI-driven peptide classification applications.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany; Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany.
| | - Tayyaba Asif
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Faiza Mehmood
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; Institute of Data Sciences, University of Engineering and Technology, Lahore, Pakistan
| | - Andreas Dengel
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany; Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
| |
Collapse
|
3
|
Cao J, Zhou W, Yu Q, Ji J, Zhang J, He S, Zhu Z. MDTL-ACP: Anticancer Peptides Prediction Based on Multi-Domain Transfer Learning. IEEE J Biomed Health Inform 2025; 29:1714-1725. [PMID: 38147420 DOI: 10.1109/jbhi.2023.3347138] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2023]
Abstract
Anticancer peptides (ACPs) have emerged as one of the most promising therapeutic agents for cancer treatment. They are bioactive peptides featuring broad-spectrum activity and low drug-resistance. The discovery of ACPs via traditional biochemical methods is laborious and costly. Accordingly, various computational methods have been developed to facilitate the discovery of ACPs. However, the data resources and knowledge of ACPs are still very scarce, and only a few of them are clinically verified, which limits the competence of computational methods. To address this issue, in this article, we propose an ACP prediction model based on multi-domain transfer learning, namely MDTL-ACP, to discriminate novel ACPs from plentiful inactive peptides. In particular, we collect abundant antimicrobial peptides (AMPs) from four well-studied peptide domains and extract their inherent features as the input of MDTL-ACP. The features learned from multiple source domains of AMPs are then transferred into the target prediction task of ACPs via artificial neural network-based shared-extractor and task-specific classifiers in MDTL-ACP. The knowledge captured in the transferred features enhances the prediction of ACPs in the target domain. Experimental results demonstrate that MDTL-ACP can outperform the traditional and state-of-the-art ACP prediction methods.
Collapse
|
4
|
Yue J, Li T, Xu J, Chen Z, Li Y, Liang S, Liu Z, Wang Y. Discovery of anticancer peptides from natural and generated sequences using deep learning. Int J Biol Macromol 2025; 290:138880. [PMID: 39706427 DOI: 10.1016/j.ijbiomac.2024.138880] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 12/10/2024] [Accepted: 12/16/2024] [Indexed: 12/23/2024]
Abstract
Anticancer peptides (ACPs) demonstrate significant potential in clinical cancer treatment due to their ability to selectively target and kill cancer cells. In recent years, numerous artificial intelligence (AI) algorithms have been developed. However, many predictive methods lack sufficient wet lab validation, thereby constraining the progress of models and impeding the discovery of novel ACPs. This study proposes a comprehensive research strategy by introducing CNBT-ACPred, an ACP prediction model based on a three-channel deep learning architecture, supported by extensive in vitro and in vivo experiments. CNBT-ACPred achieved an accuracy of 0.9554 and a Matthews Correlation Coefficient (MCC) of 0.8602. Compared to existing excellent models, CNBT-ACPred increased accuracy by at least 5 % and improved MCC by 15 %. Predictions were conducted on over 3.8 million sequences from Uniprot, along with 100,000 sequences generated by a deep generative model, ultimately identifying 37 out of 41 candidate peptides from >30 species that exhibited effective in vitro tumor inhibitory activity. Among these, tPep14 demonstrated significant anticancer effects in two mouse xenograft models without detectable toxicity. Finally, the study revealed correlations between the amino acid composition, structure, and function of the identified ACP candidates.
Collapse
Affiliation(s)
- Jianda Yue
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Tingting Li
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Jiawei Xu
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Zihui Chen
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China
| | - Yaqi Li
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Songping Liang
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Zhonghua Liu
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| | - Ying Wang
- The National and Local Joint Engineering Laboratory of Animal Peptide Drug Development, College of Life Sciences, Hunan Normal University, Changsha 410081, Hunan, China; Peptide and small molecule drug R&D plateform, Furong Laboratory, Hunan Normal University, Changsha 410081, Hunan, China; Institute of Interdisciplinary Studies, Hunan Normal University, Changsha 410081, Hunan, China.
| |
Collapse
|
5
|
Shahid, Hayat M, Alghamdi W, Akbar S, Raza A, Kadir RA, Sarker MR. pACP-HybDeep: predicting anticancer peptides using binary tree growth based transformer and structural feature encoding with deep-hybrid learning. Sci Rep 2025; 15:565. [PMID: 39747941 PMCID: PMC11695694 DOI: 10.1038/s41598-024-84146-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Accepted: 12/20/2024] [Indexed: 01/04/2025] Open
Abstract
Worldwide, Cancer remains a significant health concern due to its high mortality rates. Despite numerous traditional therapies and wet-laboratory methods for treating cancer-affected cells, these approaches often face limitations, including high costs and substantial side effects. Recently the high selectivity of peptides has garnered significant attention from scientists due to their reliable targeted actions and minimal adverse effects. Furthermore, keeping the significant outcomes of the existing computational models, we propose a highly reliable and effective model namely, pACP-HybDeep for the accurate prediction of anticancer peptides. In this model, training peptides are numerically encoded using an attention-based ProtBERT-BFD encoder to extract semantic features along with CTDT-based structural information. Furthermore, a k-nearest neighbor-based binary tree growth (BTG) algorithm is employed to select an optimal feature set from the multi-perspective vector. The selected feature vector is subsequently trained using a CNN + RNN-based deep learning model. Our proposed pACP-HybDeep model demonstrated a high training accuracy of 95.33%, and an AUC of 0.97. To validate the generalization capabilities of the model, our pACP-HybDeep model achieved accuracies of 94.92%, 92.26%, and 91.16% on independent datasets Ind-S1, Ind-S2, and Ind-S3, respectively. The demonstrated efficacy, and reliability of the pACP-HybDeep model using test datasets establish it as a valuable tool for researchers in academia and pharmaceutical drug design.
Collapse
Affiliation(s)
- Shahid
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan.
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan.
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Ali Raza
- Department of Computer Science, MY University, Islamabad, 45750, Pakistan
| | - Rabiah Abdul Kadir
- Institute of Visual Informatics, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia.
| | - Mahidur R Sarker
- Institute of Visual Informatics, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
- Universidad de Dise˜no, Innovaci´on y Tecnología, UDIT, Av. Alfonso XIII, 97, 28016, Madrid, Spain
| |
Collapse
|
6
|
Huang G, Cao Y, Dai Q, Chen W. ACP-DPE: A Dual-Channel Deep Learning Model for Anticancer Peptide Prediction. IET Syst Biol 2025; 19:e70010. [PMID: 40119615 PMCID: PMC11928748 DOI: 10.1049/syb2.70010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2024] [Revised: 02/13/2025] [Accepted: 02/20/2025] [Indexed: 03/24/2025] Open
Abstract
Cancer is a serious and complex disease caused by uncontrolled cell growth and is becoming one of the leading causes of death worldwide. Anticancer peptides (ACPs), as a bioactive peptide with lower toxicity, emerge as a promising means of effectively treating cancer. Identifying ACPs is challenging due to the limitation of experimental conditions. To address this, we proposed a dual-channel-based deep learning method, termed ACP-DPE, for ACP prediction. The ACP-DPE consisted of two parallel channels: one was an embedding layer followed by the bi-directional gated recurrent unit (Bi-GRU) module, and the other was an adaptive embedding layer followed by the dilated convolution module. The Bi-GRU module captured the peptide sequence dependencies, whereas the dilated convolution module characterised the local relationship of amino acids. Experimental results show that ACP-DPE achieves an accuracy of 82.81% and a sensitivity of 86.63%, surpassing the state-of-the-art method by 3.86% and 5.1%, respectively. These findings demonstrate the effectiveness of ACP-DPE for ACP prediction and highlight its potential as a valuable tool in cancer treatment research.
Collapse
Affiliation(s)
- Guohua Huang
- College of Information Science and EngineeringShaoyang UniversityShaoyangChina
- Hunan Provincial Key Laboratory of Finance & Economics Big Data Science and TechnologyHunan University of Finance and EconomicsChangshaChina
| | - Yujie Cao
- College of Information Science and EngineeringShaoyang UniversityShaoyangChina
| | - Qi Dai
- College of Life Science and MedicineZhejiang Sci‐Tech UniversityHangzhouChina
| | - Weihong Chen
- Hunan Provincial Key Laboratory of Finance & Economics Big Data Science and TechnologyHunan University of Finance and EconomicsChangshaChina
| |
Collapse
|
7
|
Rathore AS, Choudhury S, Arora A, Tijare P, Raghava GPS. ToxinPred 3.0: An improved method for predicting the toxicity of peptides. Comput Biol Med 2024; 179:108926. [PMID: 39038391 DOI: 10.1016/j.compbiomed.2024.108926] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 05/17/2024] [Accepted: 07/17/2024] [Indexed: 07/24/2024]
Abstract
Toxicity emerges as a prominent challenge in the design of therapeutic peptides, causing the failure of numerous peptides during clinical trials. In 2013, our group developed ToxinPred, a computational method that has been extensively adopted by the scientific community for predicting peptide toxicity. In this paper, we propose a refined variant of ToxinPred that showcases improved reliability and accuracy in predicting peptide toxicity. Initially, we utilized a similarity/alignment-based approach employing BLAST to predict toxic peptides, which yielded satisfactory accuracy; however, the method suffered from inadequate coverage. Subsequently, we employed a motif-based approach using MERCI software to uncover specific patterns or motifs that are exclusively observed in toxic peptides. The search for these motifs in peptides allowed us to predict toxic peptides with a high level of specificity with poor sensitivity. To overcome the coverage limitations, we developed alignment-free methods using machine/deep learning techniques to balance sensitivity and specificity of prediction. Deep learning model (ANN - LSTM with fixed sequence length) developed using one-hot encoding achieved a maximum AUROC of 0.93 with MCC of 0.71 on an independent dataset. Machine learning model (extra tree) developed using compositional features of peptides achieved a maximum AUROC of 0.95 with MCC of 0.78. We also developed large language models and achieved maximum AUC of 0.93 using ESM2-t33. Finally, we developed hybrid or ensemble methods combining two or more methods to enhance performance. Our specific hybrid method, which combines a motif-based approach with a machine learning-based model, achieved a maximum AUROC of 0.98 with MCC 0.81 on an independent dataset. In this study, all models were trained and tested on 80 % of data using five-fold cross-validation and evaluated on the remaining 20 % of data called independent dataset. The evaluation of all methods on an independent dataset revealed that the method proposed in this study exhibited better performance than existing methods. To cater to the needs of the scientific community, we have developed a standalone software, pip package and web-based server ToxinPred3 (https://github.com/raghavagps/toxinpred3 and https://webs.iiitd.edu.in/raghava/toxinpred3/).
Collapse
Affiliation(s)
- Anand Singh Rathore
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Shubham Choudhury
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Akanksha Arora
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Purva Tijare
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, Okhla Phase 3, New Delhi, 110020, India.
| |
Collapse
|
8
|
Cheong HH, Zuo W, Chen J, Un CW, Si YW, Wong KH, Kwok HF, Siu SWI. Identification of Anticancer Peptides from the Genome of Candida albicans: in Silico Screening, in Vitro and in Vivo Validations. J Chem Inf Model 2024; 64:6174-6189. [PMID: 39008832 DOI: 10.1021/acs.jcim.4c00501] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/17/2024]
Abstract
Anticancer peptides (ACPs) are promising future therapeutics, but their experimental discovery remains time-consuming and costly. To accelerate the discovery process, we propose a computational screening workflow to identify, filter, and prioritize peptide sequences based on predicted class probability, antitumor activity, and toxicity. The workflow was applied to identify novel ACPs with potent activity against colorectal cancer from the genome sequences of Candida albicans. As a result, four candidates were identified and validated in the HCT116 colon cancer cell line. Among them, PCa1 and PCa2 emerged as the most potent, displaying IC50 values of 3.75 and 56.06 μM, respectively, and demonstrating a 4-fold selectivity for cancer cells over normal cells. In the colon xenograft nude mice model, the administration of both peptides resulted in substantial inhibition of tumor growth without causing significant adverse effects. In conclusion, this work not only contributes a proven computational workflow for ACP discovery but also introduces two peptides, PCa1 and PCa2, as promising candidates poised for further development as targeted therapies for colon cancer. The method as a web service is available at https://app.cbbio.online/acpep/home and the source code at https://github.com/cartercheong/AcPEP_classification.git.
Collapse
Affiliation(s)
- Hong-Hin Cheong
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Weimin Zuo
- Department of Biomedical Sciences, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
- Cancer Centre, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Jiarui Chen
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Chon-Wai Un
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Yain-Whar Si
- Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Koon Ho Wong
- Department of Biomedical Sciences, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
- MoE Frontiers Science Center for Precision Oncology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
- Cancer Centre, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Hang Fai Kwok
- Department of Biomedical Sciences, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
- MoE Frontiers Science Center for Precision Oncology, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
- Cancer Centre, Faculty of Health Sciences, University of Macau, Avenida de Universidade, Taipa, Macau SAR 999078, China
| | - Shirley W I Siu
- Centre for Artificial Intelligence Driven Drug Discovery, Faculty of Applied Sciences, Macao Polytechnic University, R. de Luís Gonzaga Gomes, Macau SAR 999078, China
- Institute of Science and Environment, University of Saint Joseph, Estrada Marginal da Ilha Verde 14-17, Macau SAR 999078, China
| |
Collapse
|
9
|
Balaji PD, Selvam S, Sohn H, Madhavan T. MLASM: Machine learning based prediction of anticancer small molecules. Mol Divers 2024; 28:2153-2161. [PMID: 38554168 DOI: 10.1007/s11030-024-10823-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 02/10/2024] [Indexed: 04/01/2024]
Abstract
Cancer, being the second leading cause of death globally. So, the development of effective anticancer treatments is crucial in the field of medicine. Anticancer peptides (ACPs) have shown promising therapeutic potential in cancer treatment compared to traditional methods. However, the process of identifying ACPs through experimental means is often time-intensive and expensive. To overcome this issue, we employed a machine learning-based approach for the first time to develop an anticancer model using small molecules. Anticancer small molecules (ACSMs) are compounds that have been developed to target and inhibit cancer cells. In this study, we used 10,000 compounds to develop the machine learning models using five algorithms such as, Random Forest (RF), Light gradient boosting machine (LightGBM), K-nearest neighbors (KNN), Decision tree (DT) and Extreme Gradient Boosting (XGB). The developed models were evaluated using the test set and top three models were identified (RF, LightGBM and XGB). Furthermore, to validate the predictive performance of our models, we have performed external validation using an FDA approved anticancer compounds/drugs. Following this analysis, we found that our LightGBM model correctly predicted 9 compounds as active. However, RF and XGB exhibited some limitations by predicting 8 and 7 compounds as active out of 10, respectively. These results demonstrate that, when compared to RF and XGB, the LightGBM model showcase robust prediction capabilities, achieving a superior accuracy of 79% with an AUC of 0.88. These findings provide promising insights into the potential of our approach for predicting anticancer small molecules, highlighting the role of machine learning in advancing cancer treatment research.
Collapse
Affiliation(s)
- Priya Dharshini Balaji
- Computational Biology Laboratory, Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, Tamil Nadu, 603203, India
| | - Subathra Selvam
- Computational Biology Laboratory, Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, Tamil Nadu, 603203, India
| | - Honglae Sohn
- Department of Chemistry, Department of Carbon Materials, Chosun University, Gwangju, South Korea
| | - Thirumurthy Madhavan
- Computational Biology Laboratory, Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, Tamil Nadu, 603203, India.
| |
Collapse
|
10
|
Ghafoor H, Asim MN, Ibrahim MA, Ahmed S, Dengel A. CAPTURE: Comprehensive anti-cancer peptide predictor with a unique amino acid sequence encoder. Comput Biol Med 2024; 176:108538. [PMID: 38759585 DOI: 10.1016/j.compbiomed.2024.108538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 04/26/2024] [Accepted: 04/28/2024] [Indexed: 05/19/2024]
Abstract
Anticancer peptides (ACPs) key properties including bioactivity, high efficacy, low toxicity, and lack of drug resistance make them ideal candidates for cancer therapies. To deeply explore the potential of ACPs and accelerate development of cancer therapies, although 53 Artificial Intelligence supported computational predictors have been developed for ACPs and non ACPs classification but only one predictor has been developed for ACPs functional types annotations. Moreover, these predictors extract amino acids distribution patterns to transform peptides sequences into statistical vectors that are further fed to classifiers for discriminating peptides sequences and annotating peptides functional classes. Overall, these predictors remain fail in extracting diverse types of amino acids distribution patterns from peptide sequences. The paper in hand presents a unique CARE encoder that transforms peptides sequences into statistical vectors by extracting 4 different types of distribution patterns including correlation, distribution, composition, and transition. Across public benchmark dataset, proposed encoder potential is explored under two different evaluation settings namely; intrinsic and extrinsic. Extrinsic evaluation indicates that 12 different machine learning classifiers achieve superior performance with the proposed encoder as compared to 55 existing encoders. Furthermore, an intrinsic evaluation reveals that, unlike existing encoders, the proposed encoder generates more discriminative clusters for ACPs and non-ACPs classes. Across 8 public benchmark ACPs and non-ACPs classification datasets, proposed encoder and Adaboost classifier based CAPTURE predictor outperforms existing predictors with an average accuracy, recall and MCC score of 1%, 4%, and 2% respectively. In generalizeability evaluation case study, across 7 benchmark anti-microbial peptides classification datasets, CAPTURE surpasses existing predictors by an average AU-ROC of 2%. CAPTURE predictive pipeline along with label powerset method outperforms state-of-the-art ACPs functional types predictor by 5%, 5%, 5%, 6%, and 3% in terms of average accuracy, subset accuracy, precision, recall, and F1 respectively. CAPTURE web application is available at https://sds_genetic_analysis.opendfki.de/CAPTURE.
Collapse
Affiliation(s)
- Hina Ghafoor
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany.
| | - Muhammad Ali Ibrahim
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; German Research Center for Artificial Intelligence GmbH, Kaiserslautern, 67663, Germany
| |
Collapse
|
11
|
Chen T, Kabir MF. Explainable machine learning approach for cancer prediction through binarilization of RNA sequencing data. PLoS One 2024; 19:e0302947. [PMID: 38728288 PMCID: PMC11086842 DOI: 10.1371/journal.pone.0302947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 04/15/2024] [Indexed: 05/12/2024] Open
Abstract
In recent years, researchers have proven the effectiveness and speediness of machine learning-based cancer diagnosis models. However, it is difficult to explain the results generated by machine learning models, especially ones that utilized complex high-dimensional data like RNA sequencing data. In this study, we propose the binarilization technique as a novel way to treat RNA sequencing data and used it to construct explainable cancer prediction models. We tested our proposed data processing technique on five different models, namely neural network, random forest, xgboost, support vector machine, and decision tree, using four cancer datasets collected from the National Cancer Institute Genomic Data Commons. Since our datasets are imbalanced, we evaluated the performance of all models using metrics designed for imbalance performance like geometric mean, Matthews correlation coefficient, F-Measure, and area under the receiver operating characteristic curve. Our approach showed comparative performance while relying on less features. Additionally, we demonstrated that data binarilization offers higher explainability by revealing how each feature affects the prediction. These results demonstrate the potential of data binarilization technique in improving the performance and explainability of RNA sequencing based cancer prediction models.
Collapse
Affiliation(s)
- Tianjie Chen
- Department of Computer Science, Pennsylvania State University Harrisburg, Middletown, Pennsylvania, United States of America
| | - Md Faisal Kabir
- Department of Computer Science, Pennsylvania State University Harrisburg, Middletown, Pennsylvania, United States of America
| |
Collapse
|
12
|
Bian J, Liu X, Dong G, Hou C, Huang S, Zhang D. ACP-ML: A sequence-based method for anticancer peptide prediction. Comput Biol Med 2024; 170:108063. [PMID: 38301519 DOI: 10.1016/j.compbiomed.2024.108063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 01/08/2024] [Accepted: 01/27/2024] [Indexed: 02/03/2024]
Abstract
Cancer is a serious malignant tumor and is difficult to cure. Chemotherapy, as a primary treatment for cancer, causes significant harm to normal cells in the body and is often accompanied by serious side effects. Recently, anti-cancer peptides (ACPs) as a type of protein for treating cancers dominated research into the development of new anti-tumor drugs because of their ability to specifically target and destroy cancer cells. The screening of proteins with cancer-inhibiting properties from a large pool of proteins is key to the development of anti-tumor drugs. However, it is expensive and inefficient to accurately identify protein functions only through biological experiments due to their complex structure. Therefore, we propose a new prediction model ACP-ML to effectively predict ACPs. In terms of feature extraction, DPC, PseAAC, CTDC, CTDT and CS-Pse-PSSM features were used and the most optimal feature set was selected by comparing combinations of these features. Then, a two-step feature selection process using MRMD and RFE algorithms was performed to determine the most crucial features from the most optimal feature set for identifying ACPs. Furthermore, we assessed the classification accuracy of single learning models and different strategies-based ensemble models through ten-fold cross-validation. Ultimately, a voting-based ensemble learning method is developed to predict ACPs. To validate its effectiveness, two independent test sets were used to perform tests, achieving accuracy of 90.891 % and 92.578 % respectively. Compared with existing anticancer peptide prediction algorithms, the proposed feature processing method is more effective, and the proposed ensemble model ACP-ML exhibits stronger generalization capability and higher accuracy.
Collapse
Affiliation(s)
- Jilong Bian
- Northeast Forestry University, College of Computer and Control Engineering, Harbin, Heilongjiang, China.
| | - Xuan Liu
- Northeast Forestry University, College of Computer and Control Engineering, Harbin, Heilongjiang, China
| | - Guanghui Dong
- Northeast Forestry University, College of Computer and Control Engineering, Harbin, Heilongjiang, China
| | - Chang Hou
- Northeast Forestry University, College of Computer and Control Engineering, Harbin, Heilongjiang, China
| | - Shan Huang
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang, China.
| | - Dandan Zhang
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, China.
| |
Collapse
|
13
|
Lee YJ. Examining the functional space of gut microbiome-derived peptides. Microbiologyopen 2023; 12:e1393. [PMID: 38129980 PMCID: PMC10714122 DOI: 10.1002/mbo3.1393] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 12/04/2023] [Accepted: 12/05/2023] [Indexed: 12/23/2023] Open
Abstract
The human gut microbiome contains thousands of small, novel peptides that could play a role in microbe-microbe and host-microbe interactions, contributing to human health and disease. Although these peptides have not yet been systematically characterized, computational tools can be used to elucidate the bioactivities they may have. This article proposes probing the functional space of gut microbiome-derived peptides (MDPs) using in silico approaches for three bioactivities: antimicrobial, anticancer, and nucleomodulins. Machine learning programs that support peptide and protein queries are provided for each bioactivity. Considering the biases of an activity-centric approach, activity-agnostic tools using structural and chemical similarity and target prediction are also described. Gut MDPs represent a vast functional space that can not only contribute to our understanding of microbiome interactions but potentially even serve as a source of life-changing therapeutics.
Collapse
Affiliation(s)
- Ying‐Chiang J. Lee
- Department of Molecular BiologyPrinceton UniversityPrincetonNew JerseyUSA
| |
Collapse
|
14
|
Basith S, Pham NT, Song M, Lee G, Manavalan B. ADP-Fuse: A novel two-layer machine learning predictor to identify antidiabetic peptides and diabetes types using multiview information. Comput Biol Med 2023; 165:107386. [PMID: 37619323 DOI: 10.1016/j.compbiomed.2023.107386] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2023] [Revised: 08/03/2023] [Accepted: 08/14/2023] [Indexed: 08/26/2023]
Abstract
Diabetes mellitus has become a major public health concern associated with high mortality and reduced life expectancy and can cause blindness, heart attacks, kidney failure, lower limb amputations, and strokes. A new generation of antidiabetic peptides (ADPs) that act on β-cells or T-cells to regulate insulin production is being developed to alleviate the effects of diabetes. However, the lack of effective peptide-mining tools has hampered the discovery of these promising drugs. Hence, novel computational tools need to be developed urgently. In this study, we present ADP-Fuse, a novel two-layer prediction framework capable of accurately identifying ADPs or non-ADPs and categorizing them into type 1 and type 2 ADPs. First, we comprehensively evaluated 22 peptide sequence-derived features coupled with eight notable machine learning algorithms. Subsequently, the most suitable feature descriptors and classifiers for both layers were identified. The output of these single-feature models, embedded with multiview information, was trained with an appropriate classifier to provide the final prediction. Comprehensive cross-validation and independent tests substantiate that ADP-Fuse surpasses single-feature models and the feature fusion approach for the prediction of ADPs and their types. In addition, the SHapley Additive exPlanation method was used to elucidate the contributions of individual features to the prediction of ADPs and their types. Finally, a user-friendly web server for ADP-Fuse was developed and made publicly accessible (https://balalab-skku.org/ADP-Fuse), enabling the swift screening and identification of novel ADPs and their types. This framework is expected to contribute significantly to antidiabetic peptide identification.
Collapse
Affiliation(s)
- Shaherin Basith
- Department of Physiology, Ajou University School of Medicine, Suwon, 16499, Republic of Korea
| | - Nhat Truong Pham
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Republic of Korea
| | - Minkyung Song
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Republic of Korea; Department of Biopharmaceutical Convergence, Sungkyunkwan University, Suwon, 16419, Republic of Korea.
| | - Gwang Lee
- Department of Physiology, Ajou University School of Medicine, Suwon, 16499, Republic of Korea; Department of Molecular Science and Technology, Ajou University, Suwon, 16499, Republic of Korea.
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Republic of Korea.
| |
Collapse
|