1
|
Zhang L, Wang X, Gao G, Bian Z, Kong L. SSE-Net: A novel network based on sequence spatial equation for Camellia sinensis lysine acetylation identification. Comput Biol Chem 2025; 117:108442. [PMID: 40174510 DOI: 10.1016/j.compbiolchem.2025.108442] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2024] [Revised: 02/25/2025] [Accepted: 03/22/2025] [Indexed: 04/04/2025]
Abstract
Lysine acetylation (Kace) is one of the most important post-translational modifications. It is key to identify Kace sites for understanding regulation mechanisms in Camellia sinensis. In this study, we defined a mathematical formula, named sequence spatial equation (SSE), which could give each amino acid coordinate in 3-D space by rotating and translating. Based on SSE, an optional network SSE-Net was constructed for representing spatial structure information. Centrality metrics of SSE-Net were used to design structure feature vectors for reflecting the importance of sites. The optimal features were fed into classifier to construct model SSE-ET. The results showed that SSE-ET outperformed the other classifiers. Meanwhile, all MCC results were higher than 0.7 for different machine learning, which indicated that SSE-Net was effective for representing Kace sites in Camellia sinensis. Moreover, we implemented the other models on our dataset. The results of comparison showed that SSE-ET was much more powerful than the others. Specifically, the result of SN was nearly 20 % higher than the other models. These results showed that the proposed SSE was a valuable mathematics concept for reflecting 3-D space Kace site information in Camellia sinensis, and SSE-Net may be an essential complementary for biology and bioinformatics research.
Collapse
Affiliation(s)
- Lichao Zhang
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China; Hebei Innovation Center for Smart Perception and Applied Technology of Agricultural Data, Qinhuangdao, PR China.
| | - Xue Wang
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China
| | - Ge Gao
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China
| | - Zhengyan Bian
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China
| | - Liang Kong
- Hebei Innovation Center for Smart Perception and Applied Technology of Agricultural Data, Qinhuangdao, PR China; School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao, PR China
| |
Collapse
|
2
|
Abbas Z, Kim S, Lee N, Kazmi SAW, Lee SW. A robust ensemble framework for anticancer peptide classification using multi-model voting approach. Comput Biol Med 2025; 188:109750. [PMID: 40032410 DOI: 10.1016/j.compbiomed.2025.109750] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2024] [Revised: 01/14/2025] [Accepted: 01/22/2025] [Indexed: 03/05/2025]
Abstract
Anticancer peptides (ACPs) hold great potential for cancer therapeutics, yet accurately identifying them remains a challenging task due to the complexity of peptide sequences and their interactions with biological systems. In this study, we propose a novel machine learning-based framework for ACP classification, integrating multiple feature sets, including sequence composition, physicochemical properties, and embedding features derived from pre-trained language models. We evaluate the performance of various classifiers on benchmark datasets and compare our model against state-of-the-art methods. The results demonstrate that our model outperforms existing methods such as UniDL4BioPep, ACPred-Fuse, and iACP with an accuracy of 75.58%, an AUC of 0.8272, and an MCC of 0.5119. Our approach provides a more balanced sensitivity of 0.7384 and specificity of 0.773, ensuring robust identification of both ACPs and non-ACPs. These findings suggest that incorporating diverse feature sets can significantly enhance ACP classification, potentially facilitating the discovery of novel anticancer peptides for therapeutic applications.
Collapse
Affiliation(s)
- Zeeshan Abbas
- Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea; Department of Artificial Intelligence, Sungkyunkwan University, Suwon 16419, Republic of Korea
| | - Sunyeup Kim
- Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea
| | - Nangkyeong Lee
- Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea
| | | | - Seung Won Lee
- Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon, Republic of Korea; Department of Artificial Intelligence, Sungkyunkwan University, Suwon 16419, Republic of Korea; Department of Metabiohealth, Sungkyunkwan University, Suwon 16419, Republic of Korea; Personalized Cancer Immunotherapy Research Center, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea.
| |
Collapse
|
3
|
Asim MN, Asif T, Mehmood F, Dengel A. Peptide classification landscape: An in-depth systematic literature review on peptide types, databases, datasets, predictors architectures and performance. Comput Biol Med 2025; 188:109821. [PMID: 39987697 DOI: 10.1016/j.compbiomed.2025.109821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2024] [Revised: 02/03/2025] [Accepted: 02/05/2025] [Indexed: 02/25/2025]
Abstract
Peptides are gaining significant attention in diverse fields such as the pharmaceutical market has seen a steady rise in peptide-based therapeutics over the past six decades. Peptides have been utilized in the development of distinct applications including inhibitors of SARS-COV-2 and treatments for conditions like cancer and diabetes. Distinct types of peptides possess unique characteristics, and development of peptide-specific applications require the discrimination of one peptide type from others. To the best of our knowledge, approximately 230 Artificial Intelligence (AI) driven applications have been developed for 22 distinct types of peptides, yet there remains significant room for development of new predictors. A Comprehensive review addresses the critical gap by providing a consolidated platform for the development of AI-driven peptide classification applications. This paper offers several key contributions, including presenting the biological foundations of 22 unique peptide types and categorizes them into four main classes: Regulatory, Therapeutic, Nutritional, and Delivery Peptides. It offers an in-depth overview of 47 databases that have been used to develop peptide classification benchmark datasets. It summarizes details of 288 benchmark datasets that are used in development of diverse types AI-driven peptide classification applications. It provides a detailed summary of 197 sequence representation learning methods and 94 classifiers that have been used to develop 230 distinct AI-driven peptide classification applications. Across 22 distinct types peptide classification tasks related to 288 benchmark datasets, it demonstrates performance values of 230 AI-driven peptide classification applications. It summarizes experimental settings and various evaluation measures that have been employed to assess the performance of AI-driven peptide classification applications. The primary focus of this manuscript is to consolidate scattered information into a single comprehensive platform. This resource will greatly assist researchers who are interested in developing new AI-driven peptide classification applications.
Collapse
Affiliation(s)
- Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany; Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany.
| | - Tayyaba Asif
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany
| | - Faiza Mehmood
- Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; Institute of Data Sciences, University of Engineering and Technology, Lahore, Pakistan
| | - Andreas Dengel
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany; Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau, Kaiserslautern, 67663, Germany; Intelligentx GmbH (intelligentx.com), Kaiserslautern, Germany
| |
Collapse
|
4
|
Wang S, Ma B. Anti-Cancer Peptides Identification and Activity Type Classification With Protein Sequence Pre-Training. IEEE J Biomed Health Inform 2025; 29:1692-1701. [PMID: 40048353 DOI: 10.1109/jbhi.2024.3358632] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/09/2025]
Abstract
Cancer remains a significant global health challenge, responsible for millions of deaths annually. Addressing this issue necessitates the discovery of novel anti-cancer drugs. Anti-cancer peptides (ACPs), with their unique ability to selectively target cancer cells, offer new hope in discovering low side-effect anti-cancer drugs. However, the process of discovering novel ACPs is both time-consuming and costly. Therefore, there is an urgent need for a computational method that can predict whether a given peptide is an ACP and classify its specific functional types. In this paper, we introduce DUO-ACP, a model serving dual roles in ACP prediction: identification and functional type classification. DUO-ACP employs two embedding modules to acquire knowledge about global protein features and local ACP characteristics, complemented by a prediction module. When assessed on two publicly available datasets for each task, DUO-ACP surpasses all existing methods, achieving outstanding results: an ACP identification accuracy of 89.5% and a Macro-averaged AUC of 88.6% in ACP functional type classification. We further interpret the contribution of each part of our model, including the two types of embeddings as well as ensemble learning. On a new curated dataset, the prediction results of DUO-ACP closely match existing literature, highlighting DUO-ACP's generalization capabilities on previously unseen data and displaying the potential capability of discovering novel ACP.
Collapse
|
5
|
Hou D, Zhou H, Tang Y, Liu Z, Su L, Guo J, Pathak JL, Wu L. Dynamic Visualization of Computer-Aided Peptide Design for Cancer Therapeutics. Drug Des Devel Ther 2025; 19:1043-1065. [PMID: 39974609 PMCID: PMC11837852 DOI: 10.2147/dddt.s497126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2024] [Accepted: 01/20/2025] [Indexed: 02/21/2025] Open
Abstract
Purpose Cancer stands as a significant global public health concern, with traditional therapies potentially yielding severe side effects. Peptide-based cancer therapy is increasingly employed for diseases like cancer due to its advantages of excellent targeting, biocompatibility, and convenient synthesis. With advancements in computer technology and bioinformatics, rational design strategies based on computer technology have been employed to develop more cost-effective and potent anticancer peptides (ACPs). This study aims to explore the current status, hotspots, and future trends in the field of computer-aided design of peptides for cancer treatment through a bibliometric analysis. Methods A total of 1547 relevant publications published from 2006 to 2024 were collected from the Web of Science Core Collection. Bibliometric analysis was conducted using tools like CiteSpace, VOSviewer, Bibliometrix, Origin, and an online bibliometric platform. Results The research in this field has shown a steady growth trend, with the United States and China making the most significant contributions. Currently, ACP research mainly focuses on cell-penetrating peptides related to drug delivery, which are expected to become future research hotspots. Beyond that, peptide vaccines associated with immunotherapy are also worthy of attention. In addition, molecular dynamics simulation and molecular docking are currently popular research methods. At the same time, deep learning is the emerging keyword, indicating its potential for a more significant impact on future peptide design. Conclusion Deep learning technology represents emerging research hotspots with immense potential and promising prospects. As cutting-edge research directions, cellularly penetrating peptides and polypeptide immunotherapy are expected to achieve breakthroughs in cancer treatment. This study provides valuable insights into the computer-aided design of peptides in cancer therapy, contributing significantly to advancing the in-depth research and applications in this area.
Collapse
Affiliation(s)
- Dan Hou
- Department of Basic Oral Medicine, School and Hospital of Stomatology, Guangdong Engineering Research Center of Oral Restoration and Reconstruction, Guangzhou Medical University, Guangzhou, Guangdong, 510182, People’s Republic of China
- Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, Guangdong, 510182, People’s Republic of China
- Department of Oral and Maxillofacial Surgery/Oral Pathology, Amsterdam UMC/VUmc and Academic Centre for Dentistry Amsterdam (ACTA), Vrije Universiteit Amsterdam, Amsterdam Movement Science, Amsterdam, 1081 hZ, the Netherlands
| | - Haobin Zhou
- Department of Basic Oral Medicine, School and Hospital of Stomatology, Guangdong Engineering Research Center of Oral Restoration and Reconstruction, Guangzhou Medical University, Guangzhou, Guangdong, 510182, People’s Republic of China
- Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, Guangdong, 510182, People’s Republic of China
| | - Yuting Tang
- Department of Basic Oral Medicine, School and Hospital of Stomatology, Guangdong Engineering Research Center of Oral Restoration and Reconstruction, Guangzhou Medical University, Guangzhou, Guangdong, 510182, People’s Republic of China
- Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, Guangdong, 510182, People’s Republic of China
| | - Ziyuan Liu
- Department of Basic Oral Medicine, School and Hospital of Stomatology, Guangdong Engineering Research Center of Oral Restoration and Reconstruction, Guangzhou Medical University, Guangzhou, Guangdong, 510182, People’s Republic of China
- Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, Guangdong, 510182, People’s Republic of China
| | - Lin Su
- Department of Basic Oral Medicine, School and Hospital of Stomatology, Guangdong Engineering Research Center of Oral Restoration and Reconstruction, Guangzhou Medical University, Guangzhou, Guangdong, 510182, People’s Republic of China
- Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, Guangdong, 510182, People’s Republic of China
| | - Junkai Guo
- Department of Basic Oral Medicine, School and Hospital of Stomatology, Guangdong Engineering Research Center of Oral Restoration and Reconstruction, Guangzhou Medical University, Guangzhou, Guangdong, 510182, People’s Republic of China
- Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, Guangdong, 510182, People’s Republic of China
| | - Janak Lal Pathak
- Department of Basic Oral Medicine, School and Hospital of Stomatology, Guangdong Engineering Research Center of Oral Restoration and Reconstruction, Guangzhou Medical University, Guangzhou, Guangdong, 510182, People’s Republic of China
- Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, Guangdong, 510182, People’s Republic of China
| | - Lihong Wu
- Department of Basic Oral Medicine, School and Hospital of Stomatology, Guangdong Engineering Research Center of Oral Restoration and Reconstruction, Guangzhou Medical University, Guangzhou, Guangdong, 510182, People’s Republic of China
- Guangzhou Key Laboratory of Basic and Applied Research of Oral Regenerative Medicine, Guangzhou, Guangdong, 510182, People’s Republic of China
| |
Collapse
|
6
|
Shoombuatong W, Schaduangrat N, Homdee N, Ahmed S, Chumnanpuen P. Advancing the accuracy of tyrosinase inhibitory peptides prediction via a multiview feature fusion strategy. Sci Rep 2025; 15:4762. [PMID: 39922825 PMCID: PMC11807091 DOI: 10.1038/s41598-024-81807-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 11/29/2024] [Indexed: 02/10/2025] Open
Abstract
Tyrosinase plays a crucial role as an enzyme in the production of melanin, which is the pigment accountable for determining the color of the hair, eyes, and skin. Tyrosinase inhibitory peptides (TIPs), mainly designed to regulate the activity of the enzyme tyrosinase, are of interest in various domains, including cosmetics, dermatology, and pharmaceuticals, due to their potential applications in controlling skin pigmentation. To date, a few machine learning-based models have been proposed for predicting TIPs, but their predictive performance remains unsatisfactory. In this study, we propose an innovative computational approach, named TIPred-MVFF, to accurately predict TIPs using only sequence information. Firstly, we established an up-to-date and high-quality dataset by collecting samples from various sources. Secondly, we applied a multi-view feature fusion (MVFF) strategy to extract and explore probability and category information embedded in TIPs, employing several machine learning (ML) algorithms coupled with different commonly used sequence-based feature encodings. Then, we employed resampling approaches to address the class imbalance issue. Finally, to maximize the utility of each feature, we fused probability-based and sequence-based features, generating more informative feature that were used to develop the final prediction model. Based on the independent test, experimental results showed that TIPred-MVFF outperformed several conventional ML classifiers and existing methods in terms of prediction accuracy and robustness, achieving an accuracy of 0.937 and a Matthew's correlation coefficient of 0.847. This new computational approach is anticipated to aid community-wide efforts in rapidly and cost-effectively discovering novel peptides with strong tyrosinase inhibitory activities.
Collapse
Affiliation(s)
- Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nutta Homdee
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Saeed Ahmed
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
- Department of Computer Science, University of Swabi, Swabi, 23561, Pakistan
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand.
- Kasetsart University International College (KUIC), Kasetsart University, Bangkok, 10900, Thailand.
| |
Collapse
|
7
|
Charoenkwan P, Chumnanpuen P, Schaduangrat N, Shoombuatong W. Deepstack-ACE: A deep stacking-based ensemble learning framework for the accelerated discovery of ACE inhibitory peptides. Methods 2025; 234:131-140. [PMID: 39709069 DOI: 10.1016/j.ymeth.2024.12.005] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2024] [Revised: 11/27/2024] [Accepted: 12/07/2024] [Indexed: 12/23/2024] Open
Abstract
Identifying angiotensin-I-converting enzyme (ACE) inhibitory peptides accurately is crucial for understanding the primary factor that regulates the renin-angiotensin system and for providing guidance in developing new potential drugs. Given the inherent experimental complexities, using computational methods for in silico peptide identification could be indispensable for facilitating the high-throughput characterization of ACE inhibitory peptides. In this paper, we propose a novel deep stacking-based ensemble learning framework, termed Deepstack-ACE, to precisely identify ACE inhibitory peptides. In Deepstack-ACE, the input peptide sequences are fed into the word2vec embedding technique to generate sequence representations. Then, these representations were employed to train five powerful deep learning methods, including long short-term memory, convolutional neural network, multi-layer perceptron, gated recurrent unit network, and recurrent neural network, for the construction of base-classifiers. Finally, the optimized stacked model was constructed based on the best combination of selected base-classifiers. Benchmarking experiments showed that Deepstack-ACE attained a more accurate and robust identification of ACE inhibitory peptides compared to its base-classifiers and several conventional machine learning classifiers. Remarkably, in the independent test, our proposed model significantly outperformed the current state-of-the-art methods, with a balanced accuracy of 0.916, sensitivity of 0.911, and Matthews correlation coefficient scores of 0.826. Moreover, we developed a user-friendly web server for Deepstack-ACE, which is freely available at https://pmlabqsar.pythonanywhere.com/Deepstack-ACE. We anticipate that our proposed Deepstack-ACE model can provide a faster and reasonably accurate identification of ACE inhibitory peptides.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok 10900, Thailand; Kasetsart University International College (KUIC), Kasetsart University, Bangkok 10900, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
8
|
Li Z, Zeng Y, Jiang M, Wei B. Deep Drug-Target Binding Affinity Prediction Base on Multiple Feature Extraction and Fusion. ACS OMEGA 2025; 10:2020-2032. [PMID: 39866608 PMCID: PMC11755178 DOI: 10.1021/acsomega.4c08048] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/02/2024] [Revised: 12/25/2024] [Accepted: 01/03/2025] [Indexed: 01/28/2025]
Abstract
Accurate drug-target binding affinity (DTA) prediction is crucial in drug discovery. Recently, deep learning methods for DTA prediction have made significant progress. However, there are still two challenges: (1) recent models always ignore the correlations in drug and target data in the drug/target representation process and (2) the interaction learning of drug-target pairs always is by simple concatenation, which is insufficient to explore their fusion. To overcome these challenges, we propose an end-to-end sequence-based model called BTDHDTA. In the feature extraction process, the bidirectional gated recurrent unit (GRU), transformer encoder, and dilated convolution are employed to extract global, local, and their correlation patterns of drug and target input. Additionally, a module combining convolutional neural networks with a Highway connection is introduced to fuse drug and protein deep features. We evaluate the performance of BTDHDTA on three benchmark data sets (Davis, KIBA, and Metz), demonstrating its superiority over several current state-of-the-art methods in key metrics such as Mean Squared Error (MSE), Concordance Index (CI), and Regression toward the mean (R m 2). The results indicate that our method achieves a better performance in DTA prediction. In the case study, we use the BTDHDTA model to predict the binding affinities between 3137 FDA-approved drugs and severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) replication-related proteins, validating the model's effectiveness in practical scenarios.
Collapse
Affiliation(s)
- Zepeng Li
- School
of Computer Science and Technology, Zhejiang
Sci-Tech University, Hangzhou 310018, China
| | - Yuni Zeng
- School
of Computer Science and Technology, Zhejiang
Sci-Tech University, Hangzhou 310018, China
| | - Mingfeng Jiang
- School
of Computer Science and Technology, Zhejiang
Sci-Tech University, Hangzhou 310018, China
| | - Bo Wei
- School
of Computer Science and Technology, Zhejiang
Sci-Tech University, Hangzhou 310018, China
- Longgang
Research Institute, Zhejiang Sci-Tech University, Longgang 325000, Zhejiang, China
| |
Collapse
|
9
|
Shahid, Hayat M, Alghamdi W, Akbar S, Raza A, Kadir RA, Sarker MR. pACP-HybDeep: predicting anticancer peptides using binary tree growth based transformer and structural feature encoding with deep-hybrid learning. Sci Rep 2025; 15:565. [PMID: 39747941 PMCID: PMC11695694 DOI: 10.1038/s41598-024-84146-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2024] [Accepted: 12/20/2024] [Indexed: 01/04/2025] Open
Abstract
Worldwide, Cancer remains a significant health concern due to its high mortality rates. Despite numerous traditional therapies and wet-laboratory methods for treating cancer-affected cells, these approaches often face limitations, including high costs and substantial side effects. Recently the high selectivity of peptides has garnered significant attention from scientists due to their reliable targeted actions and minimal adverse effects. Furthermore, keeping the significant outcomes of the existing computational models, we propose a highly reliable and effective model namely, pACP-HybDeep for the accurate prediction of anticancer peptides. In this model, training peptides are numerically encoded using an attention-based ProtBERT-BFD encoder to extract semantic features along with CTDT-based structural information. Furthermore, a k-nearest neighbor-based binary tree growth (BTG) algorithm is employed to select an optimal feature set from the multi-perspective vector. The selected feature vector is subsequently trained using a CNN + RNN-based deep learning model. Our proposed pACP-HybDeep model demonstrated a high training accuracy of 95.33%, and an AUC of 0.97. To validate the generalization capabilities of the model, our pACP-HybDeep model achieved accuracies of 94.92%, 92.26%, and 91.16% on independent datasets Ind-S1, Ind-S2, and Ind-S3, respectively. The demonstrated efficacy, and reliability of the pACP-HybDeep model using test datasets establish it as a valuable tool for researchers in academia and pharmaceutical drug design.
Collapse
Affiliation(s)
- Shahid
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| | - Maqsood Hayat
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan.
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan.
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Ali Raza
- Department of Computer Science, MY University, Islamabad, 45750, Pakistan
| | - Rabiah Abdul Kadir
- Institute of Visual Informatics, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia.
| | - Mahidur R Sarker
- Institute of Visual Informatics, Universiti Kebangsaan Malaysia, 43600, Bangi, Selangor, Malaysia
- Universidad de Dise˜no, Innovaci´on y Tecnología, UDIT, Av. Alfonso XIII, 97, 28016, Madrid, Spain
| |
Collapse
|
10
|
Liang Y, Ma X, Li J, Zhang S. iACVP-MR: Accurate Identification of Anti-coronavirus Peptide based on Multiple Features Information and Recurrent Neural Network. Curr Med Chem 2025; 32:2055-2067. [PMID: 38549527 DOI: 10.2174/0109298673277663240101111507] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 11/26/2023] [Accepted: 11/30/2023] [Indexed: 05/14/2024]
Abstract
BACKGROUND Over the years, viruses have caused human illness and threatened human health. Therefore, it is pressing to develop anti-coronavirus infection drugs with clear function, low cost, and high safety. Anti-coronavirus peptide (ACVP) is a key therapeutic agent against coronavirus. Traditional methods for finding ACVP need a great deal of money and man power. Hence, it is a significant task to establish intelligent computational tools to able rapid, efficient and accurate identification of ACVP. METHODS In this paper, we construct an excellent model named iACVP-MR to identify ACVP based on multiple features and recurrent neural networks. Multiple features are extracted by using reduced amino acid component and dipeptide component, compositions of k-spaced amino acid pairs, BLOSUM62 encoder according to the N5C5 sequence, as well as second-order moving average approach based on 16 physicochemical properties. Then, two recurrent neural networks named long-short term memory (LSTM) and bidirectional gated recurrent unit (BiGRU) combined attention mechanism are used for feature fusion and classification, respectively. RESULTS The accuracies of ENNAVIA-C and ENNAVIA-D datasets under the 10-fold cross-validation are 99.15% and 98.92%, respectively, and other evaluation indexes have also obtained satisfactory results. The experimental results show that our model is superior to other existing models. CONCLUSION The iACVP-MR model can be viewed as a powerful and intelligent tool for the accurate identification of ACVP. The datasets and source codes for iACVP-MR are freely downloaded at https://github.com/yunyunliang88/iACVP-MR.
Collapse
Affiliation(s)
- Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, P.R. China
| | - Xinyan Ma
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, P.R. China
| | - Jin Li
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, P.R. China
| | - Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, P.R. China
| |
Collapse
|
11
|
Huang G, Cao Y, Dai Q, Chen W. ACP-DPE: A Dual-Channel Deep Learning Model for Anticancer Peptide Prediction. IET Syst Biol 2025; 19:e70010. [PMID: 40119615 PMCID: PMC11928748 DOI: 10.1049/syb2.70010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2024] [Revised: 02/13/2025] [Accepted: 02/20/2025] [Indexed: 03/24/2025] Open
Abstract
Cancer is a serious and complex disease caused by uncontrolled cell growth and is becoming one of the leading causes of death worldwide. Anticancer peptides (ACPs), as a bioactive peptide with lower toxicity, emerge as a promising means of effectively treating cancer. Identifying ACPs is challenging due to the limitation of experimental conditions. To address this, we proposed a dual-channel-based deep learning method, termed ACP-DPE, for ACP prediction. The ACP-DPE consisted of two parallel channels: one was an embedding layer followed by the bi-directional gated recurrent unit (Bi-GRU) module, and the other was an adaptive embedding layer followed by the dilated convolution module. The Bi-GRU module captured the peptide sequence dependencies, whereas the dilated convolution module characterised the local relationship of amino acids. Experimental results show that ACP-DPE achieves an accuracy of 82.81% and a sensitivity of 86.63%, surpassing the state-of-the-art method by 3.86% and 5.1%, respectively. These findings demonstrate the effectiveness of ACP-DPE for ACP prediction and highlight its potential as a valuable tool in cancer treatment research.
Collapse
Affiliation(s)
- Guohua Huang
- College of Information Science and EngineeringShaoyang UniversityShaoyangChina
- Hunan Provincial Key Laboratory of Finance & Economics Big Data Science and TechnologyHunan University of Finance and EconomicsChangshaChina
| | - Yujie Cao
- College of Information Science and EngineeringShaoyang UniversityShaoyangChina
| | - Qi Dai
- College of Life Science and MedicineZhejiang Sci‐Tech UniversityHangzhouChina
| | - Weihong Chen
- Hunan Provincial Key Laboratory of Finance & Economics Big Data Science and TechnologyHunan University of Finance and EconomicsChangshaChina
| |
Collapse
|
12
|
Mohamed RA, Marie OM, Badran DI, Hammam OA, Ahmed HO. Development of a Novel Peptide with RGD Tumor Homing Motif: Evaluation of its Anticancer Potential in Hepatocellular Carcinoma and Colon Cancer Cells. Recent Pat Biotechnol 2025; 19:128-141. [PMID: 39916422 DOI: 10.2174/0118722083300452240315035722] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2023] [Revised: 02/21/2024] [Accepted: 02/28/2024] [Indexed: 05/08/2025]
Abstract
BACKGROUND Peptide-based therapy has emerged as a promising avenue for treating various disorders, and recent research has highlighted the potential of anti-cancer peptides (ACPs) in cancer treatment. In this context, this study aimed to design a novel peptide incorporating a tumor-homing peptide (RGD) and C-amidation to enhance its anticancer activity, particularly against liver (HepG2) and colon (HCT-116) cancer cell lines. OBJECTIVES The primary objective was to design a peptide with improved anticancer properties by leveraging the tumor-homing capabilities of RGD and enhancing its activity through C-amidation. The study sought to evaluate the cytotoxicity of the designed peptide against red blood cells (RBCs) and normal Vero cells. Furthermore, the anticancer efficacy of the peptide was assessed in hepatocellular carcinoma (HepG2) and colon cancer (HCT-116) cell lines. The specific objectives included examining the apoptotic induction and morphological changes in treated cells compared to untreated cells. METHODS The peptide was designed using the ACPred-FL bioinformatics tool, and its cytotoxicity was assessed through hemolysis assays against RBCs and normal Vero cells. Anticancer activity was evaluated against HepG2 and HCT-116 cell lines. The analysis of apoptotic induction involved measuring the relative gene expression of oncogenic marker BCL2 and apoptotic markers (BAX, BID, CAS-8). Additionally, Cytopathological examination and Western Blot analysis were employed to study morphological changes and confirm the quantification of relevant markers. RESULTS The designed peptide, consisting of twelve amino acids with a molecular mass of 1230.6233 Da and an isoelectric point of 9.81, exhibited low erythrocyte lysis and minimal toxicity to normal cells. The IC50 values demonstrated significant anticancer activity against both HepG2 (36.49±2.6 μg/mL) and HCT-116 (11.03±2.5 μg/mL) cell lines. Treated cells exhibited a significant decrease in the oncogenic marker BCL2 and an upregulation of apoptotic markers (BAX, BID, CAS-8). Western Blot analysis confirmed these results in addition to cytopathological examination that scattered apoptotic and degenerative changes. CONCLUSION The designed peptide is considered a patent product that displayed remarkable anticancer activity against hepatocellular carcinoma and colon cancer cell lines, effectively modulating apoptotic and oncogenic markers. These findings highlight the potential of the peptide as a therapeutic agent for cancer treatment, emphasizing its clinical significance in combating liver and colon cancers. Nonetheless, further research and development are warranted to explore the translational potential of this peptide in clinical studies.
Collapse
Affiliation(s)
| | - Ohoud Mohammed Marie
- Department of Chemistry, Faculty of Science, Suez Canal University, Ismailia, Egypt
| | - Dahlia Ibrahim Badran
- Department of Medical Biochemistry, Faculty of Medicine, Suez Canal University, Ismailia, Egypt
| | - Olfat Ali Hammam
- Department of Pathology, Theodor Bilharz Research Institute, Giza, Egypt
| | - Hend Okasha Ahmed
- Department of Biochemistry and Molecular Biology, Theodor Bilharz Research Institute, Giza, Egypt
| |
Collapse
|
13
|
Saraswat A, Sharma U, Gandotra A, Wasan L, Artham S, Maitra A, Singh B. Pred-AHCP: Robust Feature Selection-Enabled Sequence-Specific Prediction of Anti-Hepatitis C Peptides via Machine Learning. J Chem Inf Model 2024; 64:9111-9124. [PMID: 39505690 DOI: 10.1021/acs.jcim.4c00900] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2024]
Abstract
Every year, an estimated 1.5 million people worldwide contract Hepatitis C, a significant contributor to liver problems. Although many studies have explored machine learning's potential to predict antiviral peptides, very few have addressed the problem of predicting peptides against specific viruses such as Hepatitis C. In this study, we demonstrate the application and fine-tuning of machine learning (ML) algorithms to predict peptides that are effective against Hepatitis C virus (HCV). We developed a fine-tuned and explainable ML model that harnesses the amino acid sequence of a peptide to predict its anti-hepatitis C potential. Specifically, features were computed based on sequence and physicochemical properties. The feature selection was performed using a combined strategy of mutual information and variance inflation factor. This facilitated the removal of redundant and multicollinear features, enhancing the model's generalizability in predicting anti-hepatitis C peptides (AHCPs). The model using the random forest algorithm produced the best performance with an accuracy of about 92%. The feature analysis highlights that the distributions of hydrophobicity, polarizability, coil-forming residues, frequency of glycine residues and the existence of dipeptide motifs VL, LV, and CC emerged as the key predictors for identifying AHCPs targeting different components of HCV. The developed model can be accessed through the Pred-AHCP web server, provided at http://tinyurl.com/web-Pred-AHCP. This resource facilitates the prediction and re-engineering of AHCPs for designing peptide-based therapeutics while also proposing an exploration of similar strategies for designing peptide inhibitors effective against other viruses. The developed ML model can also be used for validating peptide sequences generated using generative artificial intelligence methods for further optimization.
Collapse
Affiliation(s)
- Akash Saraswat
- Department of Applied Sciences, School of Engineering and Technology, BML Munjal University, Gurugram, Haryana 122413, India
| | - Utsav Sharma
- Department of Computer Science and Engineering, School of Engineering and Technology, BML Munjal University, Gurugram, Haryana 122413, India
| | - Aryan Gandotra
- Department of Computer Science and Engineering, School of Engineering and Technology, BML Munjal University, Gurugram, Haryana 122413, India
| | - Lakshit Wasan
- Department of Computer Science and Engineering, School of Engineering and Technology, BML Munjal University, Gurugram, Haryana 122413, India
| | - Sainithin Artham
- Department of Computer Science and Engineering, School of Engineering and Technology, BML Munjal University, Gurugram, Haryana 122413, India
| | - Arijit Maitra
- Department of Applied Sciences, School of Engineering and Technology, BML Munjal University, Gurugram, Haryana 122413, India
| | - Bipin Singh
- Centre for Life Sciences, Mahindra University, Hyderabad, Telangana 500043, India
| |
Collapse
|
14
|
Niu S, Fan H, Wang F, Yang X, Xia J. Identification of Multi-functional Therapeutic Peptides Based on Prototypical Supervised Contrastive Learning. Interdiscip Sci 2024:10.1007/s12539-024-00674-3. [PMID: 39714581 DOI: 10.1007/s12539-024-00674-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2024] [Revised: 10/31/2024] [Accepted: 11/04/2024] [Indexed: 12/24/2024]
Abstract
High-throughput sequencing has exponentially increased peptide sequences, necessitating a computational method to identify multi-functional therapeutic peptides (MFTP) from their sequences. However, existing computational methods are challenged by class imbalance, particularly in learning effective sequence representations. To address this, we propose PSCFA, a prototypical supervised contrastive learning with a feature augmentation method for MFTP prediction. We employ a two-stage training scheme to train the feature extractor and the classifier respectively, underpinned by the principle that better feature representation boosts classification accuracy. In the first stage, we utilize a prototypical supervised contrastive learning strategy to enhance the uniformity of feature space distribution, ensuring that the characteristics of samples within the same category are tightly clustered while those from different categories are more dispersed. In the second stage, a feature augmentation strategy that focuses on infrequent labels (tail labels) is used to refine the learning process of the classifier. We use a prototype-based variational autoencoder to capture semantic links among common labels (head labels) and their prototypes. This knowledge is then transferred to tail labels, generating enhanced features for classifier training. The experiments prove that the PSCFA method significantly outperforms existing methods for MFTP prediction, making a significant advancement in therapeutic peptide identification.
Collapse
Affiliation(s)
- Sitong Niu
- College of Mathematics and System sciences, Xinjiang University, Urumqi, 830046, Xinjiang, China
| | - Henghui Fan
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
- Information Materials and Intelligent Sensing Laboratory of Anhui Province, Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China
| | - Fei Wang
- School of Artificial Intelligence, Anhui University, Hefei, 230601, Anhui, China
| | - Xiaomei Yang
- College of Mathematics and System sciences, Xinjiang University, Urumqi, 830046, Xinjiang, China.
| | - Junfeng Xia
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, 230601, Anhui, China.
| |
Collapse
|
15
|
Wang X, Zhang Z, Liu C. iACP-DFSRA: Identification of Anticancer Peptides Based on a Dual-channel Fusion Strategy of ResCNN and Attention. J Mol Biol 2024; 436:168810. [PMID: 39362624 DOI: 10.1016/j.jmb.2024.168810] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2024] [Revised: 09/10/2024] [Accepted: 09/27/2024] [Indexed: 10/05/2024]
Abstract
Anticancer peptides (ACPs) have been widely applied in the treatment of cancer owing to good safety, rational side effects, and high selectivity. However, the number of ACPs that have been experimentally validated is limited as identification of ACPs is extremely expensive. Hence, accurate and cost-effective identification methods for ACPs are urgently needed. In this work, we proposed a deep learning-based model, named iACP-DFSRA, for ACPs identification. Specifically, we adopted two kinds of sequence embedding technologies, ProtBert_BFD pre-training language model and handcrafted features to encode protein sequences. Then, the LightGBM was used for feature selection, and the selected features were input into ResCNN and Attention mechanism, respectively, to extract local and global features. Finally, the concatenate features were deeply fused by using the Attention mechanism to allow key features to be paid more attention to by the model and make predictions by fully connected layer. The results of 10-fold cross-validation demonstrated that the iACP-DFSRA model delivered improved results in most metrics with Sp of 94.15%, Sn of 95.32%, Acc of 94.74% and MCC of 89.48% compared to the latest AACFlow model. Indeed, the iACP-DFSRA model is the only model with Acc > 90% and MCC > 80% on this independent test dataset. Furthermore, we have further demonstrated the superiority of our model on additional datasets. In addition, t-SNE and SHAP interpretation analysis demonstrated that it is crucial to use two channels for feature extraction and use the Attention mechanism for deep fusion, which helps the iACP-DFSRA to predict ACPs more effectively.
Collapse
Affiliation(s)
- Xin Wang
- School of Science, Dalian Maritime University, Dalian 116026, China.
| | - Zimeng Zhang
- School of Science, Dalian Maritime University, Dalian 116026, China
| | - Chang Liu
- School of Science, Dalian Maritime University, Dalian 116026, China
| |
Collapse
|
16
|
Li J, He S, Zhang J, Zhang F, Zou Q, Ni F. T4Seeker: a hybrid model for type IV secretion effectors identification. BMC Biol 2024; 22:259. [PMID: 39543674 PMCID: PMC11566746 DOI: 10.1186/s12915-024-02064-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2024] [Accepted: 11/06/2024] [Indexed: 11/17/2024] Open
Abstract
BACKGROUND The type IV secretion system is widely present in various bacteria, such as Salmonella, Escherichia coli, and Helicobacter pylori. These bacteria use the type IV secretion system to secrete type IV secretion effectors, infect host cells, and disrupt or modulate the communication pathways. In this study, type III and type VI secretion effectors were used as negative samples to train a robust model. RESULTS The area under the curve of T4Seeker on the validation and independent test sets were 0.947 and 0.970, respectively, demonstrating the strong predictive capacity and robustness of T4Seeker. After comparing with the classic and state-of-the-art T4SE identification models, we found that T4Seeker, which is based on traditional features and large language model features, had a higher predictive ability. CONCLUSION The T4Seeker proposed in this study demonstrates superior performance in the field of T4SEs prediction. By integrating features at multiple levels, it achieves higher predictive accuracy and strong generalization capability, providing an effective tool for future T4SE research.
Collapse
Affiliation(s)
- Jing Li
- Department of Microbiology, University of Hong Kong, Hong Kong, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang, China
- School of Biomedical Sciences, University of Hong Kong, Hong Kong, China
| | - Shida He
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang, China
- The Joint Innovation Center for Engineering in Medicine, Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, Quzhou, 324000, China
- Department of Respiratory and Critical Care, Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou, 324000, China
| | - Jian Zhang
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang, China
| | - Feng Zhang
- The Joint Innovation Center for Engineering in Medicine, Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou People's Hospital, Quzhou, 324000, China
- Department of Respiratory and Critical Care, Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou, 324000, China
| | - Quan Zou
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, 1 Chengdian Road, Quzhou, Zhejiang, China
| | - Fengming Ni
- Department of Gastroenterology, The First Hospital of Jilin University, Changchun, 130021, China.
| |
Collapse
|
17
|
Kilimci ZH, Yalcin M. ACP-ESM: A novel framework for classification of anticancer peptides using protein-oriented transformer approach. Artif Intell Med 2024; 156:102951. [PMID: 39173421 DOI: 10.1016/j.artmed.2024.102951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2023] [Revised: 07/19/2024] [Accepted: 08/13/2024] [Indexed: 08/24/2024]
Abstract
Anticancer peptides (ACPs) are a class of molecules that have gained significant attention in the field of cancer research and therapy. ACPs are short chains of amino acids, the building blocks of proteins, and they possess the ability to selectively target and kill cancer cells. One of the key advantages of ACPs is their ability to selectively target cancer cells while sparing healthy cells to a greater extent. This selectivity is often attributed to differences in the surface properties of cancer cells compared to normal cells. That is why ACPs are being investigated as potential candidates for cancer therapy. ACPs may be used alone or in combination with other treatment modalities like chemotherapy and radiation therapy. While ACPs hold promise as a novel approach to cancer treatment, there are challenges to overcome, including optimizing their stability, improving selectivity, and enhancing their delivery to cancer cells, continuous increasing in number of peptide sequences, developing a reliable and precise prediction model. In this work, we propose an efficient transformer-based framework to identify ACPs for by performing accurate a reliable and precise prediction model. For this purpose, four different transformer models, namely ESM, ProtBERT, BioBERT, and SciBERT are employed to detect ACPs from amino acid sequences. To demonstrate the contribution of the proposed framework, extensive experiments are carried on widely-used datasets in the literature, two versions of AntiCp2, cACP-DeepGram, ACP-740. Experiment results show the usage of proposed model enhances classification accuracy when compared to the literature studies. The proposed framework, ESM, exhibits 96.45% of accuracy for AntiCp2 dataset, 97.66% of accuracy for cACP-DeepGram dataset, and 88.51% of accuracy for ACP-740 dataset, thence determining new state-of-the-art. The code of proposed framework is publicly available at github (https://github.com/mstf-yalcin/acp-esm).
Collapse
Affiliation(s)
- Zeynep Hilal Kilimci
- Department of Information Systems Engineering, Kocaeli University, 41001, Kocaeli, Turkey.
| | - Mustafa Yalcin
- Department of Information Systems Engineering, Kocaeli University, 41001, Kocaeli, Turkey.
| |
Collapse
|
18
|
Zuo Y, Fang X, Wan J, He W, Liu X, Zeng X, Deng Z. PreMLS: The undersampling technique based on ClusterCentroids to predict multiple lysine sites. PLoS Comput Biol 2024; 20:e1012544. [PMID: 39436947 PMCID: PMC11530015 DOI: 10.1371/journal.pcbi.1012544] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2024] [Revised: 11/01/2024] [Accepted: 10/09/2024] [Indexed: 10/25/2024] Open
Abstract
The translated protein undergoes a specific modification process, which involves the formation of covalent bonds on lysine residues and the attachment of small chemical moieties. The protein's fundamental physicochemical properties undergo a significant alteration. The change significantly alters the proteins' 3D structure and activity, enabling them to modulate key physiological processes. The modulation encompasses inhibiting cancer cell growth, delaying ovarian aging, regulating metabolic diseases, and ameliorating depression. Consequently, the identification and comprehension of post-translational lysine modifications hold substantial value in the realms of biological research and drug development. Post-translational modifications (PTMs) at lysine (K) sites are among the most common protein modifications. However, research on K-PTMs has been largely centered on identifying individual modification types, with a relative scarcity of balanced data analysis techniques. In this study, a classification system is developed for the prediction of concurrent multiple modifications at a single lysine residue. Initially, a well-established multi-label position-specific triad amino acid propensity algorithm is utilized for feature encoding. Subsequently, PreMLS: a novel ClusterCentroids undersampling algorithm based on MiniBatchKmeans was introduced to eliminate redundant or similar major class samples, thereby mitigating the issue of class imbalance. A convolutional neural network architecture was specifically constructed for the analysis of biological sequences to predict multiple lysine modification sites. The model, evaluated through five-fold cross-validation and independent testing, was found to significantly outperform existing models such as iMul-kSite and predML-Site. The results presented here aid in prioritizing potential lysine modification sites, facilitating subsequent biological assays and advancing pharmaceutical research. To enhance accessibility, an open-access predictive script has been crafted for the multi-label predictive model developed in this study.
Collapse
Affiliation(s)
- Yun Zuo
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| | - Xingze Fang
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| | - Jiayong Wan
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| | - Wenying He
- School of Artificial Intelligence, Hebei University of Technology, Tianjin, China
| | - Xiangrong Liu
- Department of Computer Science and Technology, National Institute for Data Science in Health and Medicine, Xiamen Key Laboratory of Intelligent Storage and Computing, Xiamen University, Xiamen, China
| | - Xiangxiang Zeng
- School of Information Science and Engineering, Hunan University, Changsha, China
| | - Zhaohong Deng
- School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
| |
Collapse
|
19
|
Guo X, Zheng Z, Cheong KH, Zou Q, Tiwari P, Ding Y. Sequence homology score-based deep fuzzy network for identifying therapeutic peptides. Neural Netw 2024; 178:106458. [PMID: 38901093 DOI: 10.1016/j.neunet.2024.106458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Revised: 05/29/2024] [Accepted: 06/09/2024] [Indexed: 06/22/2024]
Abstract
The detection of therapeutic peptides is a topic of immense interest in the biomedical field. Conventional biochemical experiment-based detection techniques are tedious and time-consuming. Computational biology has become a useful tool for improving the detection efficiency of therapeutic peptides. Most computational methods do not consider the deviation caused by noise. To improve the generalization performance of therapeutic peptide prediction methods, this work presents a sequence homology score-based deep fuzzy echo-state network with maximizing mixture correntropy (SHS-DFESN-MMC) model. Our method is compared with the existing methods on eight types of therapeutic peptide datasets. The model parameters are determined by 10 fold cross-validation on their training sets and verified by independent test sets. Across the 8 datasets, the average area under the receiver operating characteristic curve (AUC) values of SHS-DFESN-MMC are the highest on both the training (0.926) and independent sets (0.923).
Collapse
Affiliation(s)
- Xiaoyi Guo
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, PR China; Quzhou People's Hospital, Quzhou Affiliated Hospital of Wenzhou Medical University, Quzhou, 324000, PR China; Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, S637371, Singapore.
| | - Ziyu Zheng
- Department of Mathematical Sciences, University of Nottingham Ningbo, Ningbo, 315100, PR China.
| | - Kang Hao Cheong
- Division of Mathematical Sciences, School of Physical and Mathematical Sciences, Nanyang Technological University, S637371, Singapore; College of Computing and Data Science, Nanyang Technological University, S639798, Singapore.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, PR China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, PR China.
| | - Prayag Tiwari
- School of Information Technology, Halmstad University, Sweden.
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, PR China.
| |
Collapse
|
20
|
Wang X, Wang S. ACP-PDAFF: Pretrained model and dual-channel attentional feature fusion for anticancer peptides prediction. Comput Biol Chem 2024; 112:108141. [PMID: 38996756 DOI: 10.1016/j.compbiolchem.2024.108141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 05/26/2024] [Accepted: 06/28/2024] [Indexed: 07/14/2024]
Abstract
Anticancer peptides(ACPs) have attracted significant interest as a novel method of treating cancer due to their ability to selectively kill cancer cells without damaging normal cells. Many artificial intelligence-based methods have demonstrated impressive performance in predicting ACPs. Nevertheless, the limitations of existing methods in feature engineering include handcrafted features driven by prior knowledge, insufficient feature extraction, and inefficient feature fusion. In this study, we propose a model based on a pretrained model, and dual-channel attentional feature fusion(DAFF), called ACP-PDAFF. Firstly, to reduce the heavy dependence on expert knowledge-based handcrafted features, binary profile features (BPF) and physicochemical properties features(PCPF) are used as inputs to the transformer model. Secondly, aimed at learning more diverse feature informations of ACPs, a pretrained model ProtBert is utilized. Thirdly, for better fusion of different feature channels, DAFF is employed. Finally, to evaluate the performance of the model, we compare it with other methods on five benchmark datasets, including ACP-Mixed-80 dataset, Main and Alternate datasets of AntiCP 2.0, LEE and Independet dataset, and ACPred-Fuse dataset. And the accuracies obtained by ACP-PDAFF are 0.86, 0.80, 0.94, 0.97 and 0.95 on five datasets, respectively, higher than existing methods by 1% to 12%. Therefore, by learning rich feature informations and effectively fusing different feature channels, ACD-PDAFF achieves outstanding performance. Our code and the datasets are available at https://github.com/wongsing/ACP-PDAFF.
Collapse
Affiliation(s)
- Xinyi Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China
| | - Shunfang Wang
- Department of Computer Science and Engineering, School of Information Science and Engineering, Yunnan University, Kunming, 650504, Yunnan, China.
| |
Collapse
|
21
|
Meher PK, Pradhan UK, Sethi PL, Naha S, Gupta A, Parsad R. PredPSP: a novel computational tool to discover pathway-specific photosynthetic proteins in plants. PLANT MOLECULAR BIOLOGY 2024; 114:106. [PMID: 39316155 DOI: 10.1007/s11103-024-01500-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2024] [Accepted: 09/04/2024] [Indexed: 09/25/2024]
Abstract
Photosynthetic proteins play a crucial role in agricultural productivity by harnessing light energy for plant growth. Understanding these proteins, especially within C3 and C4 pathways, holds promise for improving crops in challenging environments. Despite existing models, a comprehensive computational framework specifically targeting plant photosynthetic proteins is lacking. The underutilization of plant datasets in computational algorithms accentuates the gap this study aims to fill by introducing a novel sequence-based computational method for identifying these proteins. The scope of this study encompassed diverse plant species, ensuring comprehensive representation across C3 and C4 pathways. Utilizing six deep learning models and seven shallow learning algorithms, paired with six sequence-derived feature sets followed by feature selection strategy, this study developed a comprehensive model for prediction of plant-specific photosynthetic proteins. Following 5-fold cross-validation analysis, LightGBM with 65 and 90 LGBM-VIM selected features respectively emerged as the best models for C3 (auROC: 91.78%, auPRC: 92.55%) and C4 (auROC: 99.05%, auPRC: 99.18%) plants. Validation using an independent dataset confirmed the robustness of the proposed model for both C3 (auROC: 87.23%, auPRC: 88.40%) and C4 (auROC: 92.83%, auPRC: 92.29%) categories. Comparison with existing methods demonstrated the superiority of the proposed model in predicting plant-specific photosynthetic proteins. This study further established a free online prediction server PredPSP ( https://iasri-sg.icar.gov.in/predpsp/ ) to facilitate ongoing efforts for identifying photosynthetic proteins in C3 and C4 plants. Being first of its kind, this study offers valuable insights into predicting plant-specific photosynthetic proteins which holds significant implications for plant biology.
Collapse
Affiliation(s)
- Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India.
| | - Upendra Kumar Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India
| | - Padma Lochan Sethi
- Department of Bioinformatics, Odisha University of Agriculture & Technology, Bhubaneswar, 751003, Odisha, India
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India
| | - Rajender Parsad
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi, 110012, India
| |
Collapse
|
22
|
Kamble P, Nagar PR, Bhakhar KA, Garg P, Sobhia ME, Naidu S, Bharatam PV. Cancer pharmacoinformatics: Databases and analytical tools. Funct Integr Genomics 2024; 24:166. [PMID: 39294509 DOI: 10.1007/s10142-024-01445-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 08/26/2024] [Accepted: 09/03/2024] [Indexed: 09/20/2024]
Abstract
Cancer is a subject of extensive investigation, and the utilization of omics technology has resulted in the generation of substantial volumes of big data in cancer research. Numerous databases are being developed to manage and organize this data effectively. These databases encompass various domains such as genomics, transcriptomics, proteomics, metabolomics, immunology, and drug discovery. The application of computational tools into various core components of pharmaceutical sciences constitutes "Pharmacoinformatics", an emerging paradigm in rational drug discovery. The three major features of pharmacoinformatics include (i) Structure modelling of putative drugs and targets, (ii) Compilation of databases and analysis using statistical approaches, and (iii) Employing artificial intelligence/machine learning algorithms for the discovery of novel therapeutic molecules. The development, updating, and analysis of databases using statistical approaches play a pivotal role in pharmacoinformatics. Multiple software tools are associated with oncoinformatics research. This review catalogs the databases and computational tools related to cancer drug discovery and highlights their potential implications in the pharmacoinformatics of cancer.
Collapse
Affiliation(s)
- Pradnya Kamble
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India
| | - Prinsa R Nagar
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India
| | - Kaushikkumar A Bhakhar
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India
| | - Prabha Garg
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India
| | - M Elizabeth Sobhia
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India
| | - Srivatsava Naidu
- Center of Biomedical Engineering, Indian Institute of Technology Ropar, Rupnagar, Punjab, India
| | - Prasad V Bharatam
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India.
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India.
| |
Collapse
|
23
|
Uddin I, Awan HH, Khalid M, Khan S, Akbar S, Sarker MR, Abdolrasol MGM, Alghamdi TAH. A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications. Sci Rep 2024; 14:20819. [PMID: 39242695 PMCID: PMC11379919 DOI: 10.1038/s41598-024-71568-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 08/29/2024] [Indexed: 09/09/2024] Open
Abstract
RNA modifications play an important role in actively controlling recently created formation in cellular regulation mechanisms, which link them to gene expression and protein. The RNA modifications have numerous alterations, presenting broad glimpses of RNA's operations and character. The modification process by the TET enzyme oxidation is the crucial change associated with cytosine hydroxymethylation. The effect of CR is an alteration in specific biochemical ways of the organism, such as gene expression and epigenetic alterations. Traditional laboratory systems that identify 5-hydroxymethylcytosine (5hmC) samples are expensive and time-consuming compared to other methods. To address this challenge, the paper proposed XGB5hmC, a machine learning algorithm based on a robust gradient boosting algorithm (XGBoost), with different residue based formulation methods to identify 5hmC samples. Their results were amalgamated, and six different frequency residue based encoding features were fused to form a hybrid vector in order to enhance model discrimination capabilities. In addition, the proposed model incorporates SHAP (Shapley Additive Explanations) based feature selection to demonstrate model interpretability by highlighting the high contributory features. Among the applied machine learning algorithms, the XGBoost ensemble model using the tenfold cross-validation test achieved improved results than existing state-of-the-art models. Our model reported an accuracy of 89.97%, sensitivity of 87.78%, specificity of 94.45%, F1-score of 0.8934%, and MCC of 0.8764%. This study highlights the potential to provide valuable insights for enhancing medical assessment and treatment protocols, representing a significant advancement in RNA modification analysis.
Collapse
Affiliation(s)
- Islam Uddin
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Hamid Hussain Awan
- Department of Computer Science, Muslim Youth University, Islamabad, Pakistan
| | - Majdi Khalid
- Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah, 21955, Saudi Arabia
| | - Salman Khan
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Mahidur R Sarker
- Institute of Visual Informatics, Universiti Kebangsaan Malaysia, Bangi, 43600, Selangor, Malaysia
- Universidad de Diseño, Innovación y Tecnología, UDIT, Av. Alfonso XIII, 97, 28016, Madrid, Spain
| | - Maher G M Abdolrasol
- Institute of Sustainable Energy, Universiti Tenaga Nasional, Kajang, 43000, Malaysia
| | - Thamer A H Alghamdi
- Wolfson Centre for Magnetics, School of Engineering, Cardiff University, Cardiff, CF24 3AA, UK.
- Electrical Engineering Department, Faculty of Engineering, Al-Baha University, Al-Baha, 65779, Saudi Arabia.
| |
Collapse
|
24
|
Sangaraju VK, Pham NT, Wei L, Yu X, Manavalan B. mACPpred 2.0: Stacked Deep Learning for Anticancer Peptide Prediction with Integrated Spatial and Probabilistic Feature Representations. J Mol Biol 2024; 436:168687. [PMID: 39237191 DOI: 10.1016/j.jmb.2024.168687] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Revised: 05/28/2024] [Accepted: 06/20/2024] [Indexed: 09/07/2024]
Abstract
Anticancer peptides (ACPs), naturally occurring molecules with remarkable potential to target and kill cancer cells. However, identifying ACPs based solely from their primary amino acid sequences remains a major hurdle in immunoinformatics. In the past, several web-based machine learning (ML) tools have been proposed to assist researchers in identifying potential ACPs for further testing. Notably, our meta-approach method, mACPpred, introduced in 2019, has significantly advanced the field of ACP research. Given the exponential growth in the number of characterized ACPs, there is now a pressing need to create an updated version of mACPpred. To develop mACPpred 2.0, we constructed an up-to-date benchmarking dataset by integrating all publicly available ACP datasets. We employed a large-scale of feature descriptors, encompassing both conventional feature descriptors and advanced pre-trained natural language processing (NLP)-based embeddings. We evaluated their ability to discriminate between ACPs and non-ACPs using eleven different classifiers. Subsequently, we employed a stacked deep learning (SDL) approach, incorporating 1D convolutional neural network (1D CNN) blocks and hybrid features. These features included the top seven performing NLP-based features and 90 probabilistic features, allowing us to identify hidden patterns within these diverse features and improve the accuracy of our ACP prediction model. This is the first study to integrate spatial and probabilistic feature representations for predicting ACPs. Rigorous cross-validation and independent tests conclusively demonstrated that mACPpred 2.0 not only surpassed its predecessor (mACPpred) but also outperformed the existing state-of-the-art predictors, highlighting the importance of advanced feature representation capabilities attained through SDL. To facilitate widespread use and accessibility, we have developed a user-friendly for mACPpred 2.0, available at https://balalab-skku.org/mACPpred2/.
Collapse
Affiliation(s)
- Vinoth Kumar Sangaraju
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea
| | - Nhat Truong Pham
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea
| | - Leyi Wei
- Faculty of Applied Sciences, Macao Polytechnic University, Macau
| | - Xue Yu
- Beidahuang Industry Group General Hospital, 150001 Harbin, China.
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon, 16419, Gyeonggi-do, Republic of Korea.
| |
Collapse
|
25
|
Sabir MJ, Kamli MR, Atef A, Alhibshi AM, Edris S, Hajarah NH, Bahieldin A, Manavalan B, Sabir JSM. Computational prediction of phosphorylation sites of SARS-CoV-2 infection using feature fusion and optimization strategies. Methods 2024; 229:1-8. [PMID: 38768932 DOI: 10.1016/j.ymeth.2024.04.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2023] [Revised: 03/15/2024] [Accepted: 04/30/2024] [Indexed: 05/22/2024] Open
Abstract
SARS-CoV-2's global spread has instigated a critical health and economic emergency, impacting countless individuals. Understanding the virus's phosphorylation sites is vital to unravel the molecular intricacies of the infection and subsequent changes in host cellular processes. Several computational methods have been proposed to identify phosphorylation sites, typically focusing on specific residue (S/T) or Y phosphorylation sites. Unfortunately, current predictive tools perform best on these specific residues and may not extend their efficacy to other residues, emphasizing the urgent need for enhanced methodologies. In this study, we developed a novel predictor that integrated all the residues (STY) phosphorylation sites information. We extracted ten different feature descriptors, primarily derived from composition, evolutionary, and position-specific information, and assessed their discriminative power through five classifiers. Our results indicated that Light Gradient Boosting (LGB) showed superior performance, and five descriptors displayed excellent discriminative capabilities. Subsequently, we identified the top two integrated features have high discriminative capability and trained with LGB to develop the final prediction model, LGB-IPs. The proposed approach shows an excellent performance on 10-fold cross-validation with an ACC, MCC, and AUC values of 0.831, 0.662, 0.907, respectively. Notably, these performances are replicated in the independent evaluation. Consequently, our approach may provide valuable insights into the phosphorylation mechanisms in SARS-CoV-2 infection for biomedical researchers.
Collapse
Affiliation(s)
- Mumdooh J Sabir
- Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Majid Rasool Kamli
- Centre of Excellence in Bionanoscience Research, King Abdulaziz University, Jeddah, Saudi Arabia; Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ahmed Atef
- Centre of Excellence in Bionanoscience Research, King Abdulaziz University, Jeddah, Saudi Arabia; Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Alawiah M Alhibshi
- Centre of Excellence in Bionanoscience Research, King Abdulaziz University, Jeddah, Saudi Arabia; Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Sherif Edris
- Centre of Excellence in Bionanoscience Research, King Abdulaziz University, Jeddah, Saudi Arabia; Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Nahid H Hajarah
- Centre of Excellence in Bionanoscience Research, King Abdulaziz University, Jeddah, Saudi Arabia; Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ahmed Bahieldin
- Centre of Excellence in Bionanoscience Research, King Abdulaziz University, Jeddah, Saudi Arabia; Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Balachandran Manavalan
- Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
| | - Jamal S M Sabir
- Centre of Excellence in Bionanoscience Research, King Abdulaziz University, Jeddah, Saudi Arabia; Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah, Saudi Arabia.
| |
Collapse
|
26
|
López-Cortés A, Cabrera-Andrade A, Echeverría-Garcés G, Echeverría-Espinoza P, Pineda-Albán M, Elsitdie N, Bueno-Miño J, Cruz-Segundo CM, Dorado J, Pazos A, Gonzáles-Díaz H, Pérez-Castillo Y, Tejera E, Munteanu CR. Unraveling druggable cancer-driving proteins and targeted drugs using artificial intelligence and multi-omics analyses. Sci Rep 2024; 14:19359. [PMID: 39169044 PMCID: PMC11339426 DOI: 10.1038/s41598-024-68565-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2024] [Accepted: 07/25/2024] [Indexed: 08/23/2024] Open
Abstract
The druggable proteome refers to proteins that can bind to small molecules with appropriate chemical affinity, inducing a favorable clinical response. Predicting druggable proteins through screening and in silico modeling is imperative for drug design. To contribute to this field, we developed an accurate predictive classifier for druggable cancer-driving proteins using amino acid composition descriptors of protein sequences and 13 machine learning linear and non-linear classifiers. The optimal classifier was achieved with the support vector machine method, utilizing 200 tri-amino acid composition descriptors. The high performance of the model is evident from an area under the receiver operating characteristics (AUROC) of 0.975 ± 0.003 and an accuracy of 0.929 ± 0.006 (threefold cross-validation). The machine learning prediction model was enhanced with multi-omics approaches, including the target-disease evidence score, the shortest pathways to cancer hallmarks, structure-based ligandability assessment, unfavorable prognostic protein analysis, and the oncogenic variome. Additionally, we performed a drug repurposing analysis to identify drugs with the highest affinity capable of targeting the best predicted proteins. As a result, we identified 79 key druggable cancer-driving proteins with the highest ligandability, and 23 of them demonstrated unfavorable prognostic significance across 16 TCGA PanCancer types: CDKN2A, BCL10, ACVR1, CASP8, JAG1, TSC1, NBN, PREX2, PPP2R1A, DNM2, VAV1, ASXL1, TPR, HRAS, BUB1B, ATG7, MARK3, SETD2, CCNE1, MUTYH, CDKN2C, RB1, and SMARCA4. Moreover, we prioritized 11 clinically relevant drugs targeting these proteins. This strategy effectively predicts and prioritizes biomarkers, therapeutic targets, and drugs for in-depth studies in clinical trials. Scripts are available at https://github.com/muntisa/machine-learning-for-druggable-proteins .
Collapse
Affiliation(s)
- Andrés López-Cortés
- Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador.
| | - Alejandro Cabrera-Andrade
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador
- Escuela de Enfermería, Facultad de Ciencias de la Salud, Universidad de Las Américas, Quito, Ecuador
| | - Gabriela Echeverría-Garcés
- Centro de Referencia Nacional de Genómica, Secuenciación y Bioinformática, Instituto Nacional de Investigación en Salud Pública "Leopoldo Izquieta Pérez", Quito, Ecuador
- Latin American Network for the Implementation and Validation of Clinical Pharmacogenomics Guidelines (RELIVAF-CYTED), Santiago, Chile
| | | | - Micaela Pineda-Albán
- Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador
| | - Nicole Elsitdie
- Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador
| | - José Bueno-Miño
- Cancer Research Group (CRG), Faculty of Medicine, Universidad de Las Américas, Quito, Ecuador
| | - Carlos M Cruz-Segundo
- RNASA-IMEDIR, Computer Science Faculty, University of A Coruna, A Coruña, Spain
- Tecnológico de Estudios Superiores de Jocotitlán, Jocotitlán, Mexico
| | - Julian Dorado
- RNASA-IMEDIR, Computer Science Faculty, University of A Coruna, A Coruña, Spain
- Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), University of A Coruna, A Coruña, Spain
| | - Alejandro Pazos
- RNASA-IMEDIR, Computer Science Faculty, University of A Coruna, A Coruña, Spain
- Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), University of A Coruna, A Coruña, Spain
- Biomedical Research Institute of A Coruna (INIBIC), University Hospital Complex of A Coruna (CHUAC), A Coruña, Spain
| | - Humberto Gonzáles-Díaz
- Department of Organic Chemistry II, University of the Basque Country UPV/EHU, Biscay, Spain
- IKERBASQUE, Basque Foundation for Science, Biscay, Spain
| | | | - Eduardo Tejera
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito, Ecuador
| | - Cristian R Munteanu
- RNASA-IMEDIR, Computer Science Faculty, University of A Coruna, A Coruña, Spain
- Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), University of A Coruna, A Coruña, Spain
- Biomedical Research Institute of A Coruna (INIBIC), University Hospital Complex of A Coruna (CHUAC), A Coruña, Spain
| |
Collapse
|
27
|
Zhu H, Hao H, Yu L. Identification of microbe-disease signed associations via multi-scale variational graph autoencoder based on signed message propagation. BMC Biol 2024; 22:172. [PMID: 39148051 PMCID: PMC11328394 DOI: 10.1186/s12915-024-01968-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2024] [Accepted: 08/01/2024] [Indexed: 08/17/2024] Open
Abstract
BACKGROUND Plenty of clinical and biomedical research has unequivocally highlighted the tremendous significance of the human microbiome in relation to human health. Identifying microbes associated with diseases is crucial for early disease diagnosis and advancing precision medicine. RESULTS Considering that the information about changes in microbial quantities under fine-grained disease states helps to enhance a comprehensive understanding of the overall data distribution, this study introduces MSignVGAE, a framework for predicting microbe-disease sign associations using signed message propagation. MSignVGAE employs a graph variational autoencoder to model noisy signed association data and extends the multi-scale concept to enhance representation capabilities. A novel strategy for propagating signed message in signed networks addresses heterogeneity and consistency among nodes connected by signed edges. Additionally, we utilize the idea of denoising autoencoder to handle the noise in similarity feature information, which helps overcome biases in the fused similarity data. MSignVGAE represents microbe-disease associations as a heterogeneous graph using similarity information as node features. The multi-class classifier XGBoost is utilized to predict sign associations between diseases and microbes. CONCLUSIONS MSignVGAE achieves AUROC and AUPR values of 0.9742 and 0.9601, respectively. Case studies on three diseases demonstrate that MSignVGAE can effectively capture a comprehensive distribution of associations by leveraging signed information.
Collapse
Affiliation(s)
- Huan Zhu
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Hongxia Hao
- School of Computer Science and Technology, Xidian University, Xi'an, China.
| | - Liang Yu
- School of Computer Science and Technology, Xidian University, Xi'an, China.
| |
Collapse
|
28
|
Garai S, Thomas J, Dey P, Das D. LGBM-ACp: an ensemble model for anticancer peptide prediction and in silico screening with potential drug targets. Mol Divers 2024; 28:1965-1981. [PMID: 36637711 DOI: 10.1007/s11030-023-10602-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2022] [Accepted: 01/06/2023] [Indexed: 01/14/2023]
Abstract
Conventional cancer therapies are highly expensive and have serious complications. An alternative approach now emphasizes on the development of small, biologically active peptides without acute toxicity. Experimental screening to find curative anticancer peptides (ACP) often gives rise to multiple obstacles and is time dependent. Consequently, developing an effective computational technique to identify promising ACP candidates prior to preclinical research is in high demand. This study proposed a machine-learning framework that used the light gradient-boosting machine as a classifier and two compositional and two binary profile features as input. The ensemble model displayed an accuracy, MCC, and AUROC of 97.52%, 0.91, and 0.98, respectively, which outclassed most of the existing sequence-based computational tools. A distinct dataset of non-mutagenic, non-toxic, and non-inhibitory Cytochrome P-450 peptides was used to validate the hybrid model. The most relevant ACP in the alternative dataset was compared with two standard ACPs, beta defensin 2, and cecropin-A. Molecular docking of the predicted peptide revealed that it has a strong binding affinity with twenty-five anticancer drug targets, most notably phosphoenolpyruvate carboxykinase (- 7.2 kcal/mol). Additionally, molecular dynamics simulation and principal component analysis supported the stability of the peptide-receptor complex. Overall, the present findings will take a step forward in rational drug design through rapid identification and screening of therapeutic peptides.
Collapse
Affiliation(s)
- Swarnava Garai
- Department of Bioengineering, NIT Agartala, Tripura, 799046, India
| | - Juanit Thomas
- Department of Bioengineering, NIT Agartala, Tripura, 799046, India
| | - Palash Dey
- Civil Engineering Department, The ICFAI University, Tripura, 799210, India
| | - Deeplina Das
- Department of Bioengineering, NIT Agartala, Tripura, 799046, India.
| |
Collapse
|
29
|
Yadav AK, Gupta PK, Singh TR. PMTPred: machine-learning-based prediction of protein methyltransferases using the composition of k-spaced amino acid pairs. Mol Divers 2024; 28:2301-2315. [PMID: 39033257 DOI: 10.1007/s11030-024-10937-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 07/10/2024] [Indexed: 07/23/2024]
Abstract
Protein methyltransferases (PMTs) are a group of enzymes that help catalyze the transfer of a methyl group to its substrates. These enzymes play an important role in epigenetic regulation and can methylate various substrates with DNA, RNA, protein, and small-molecule secondary metabolites. Dysregulation of methyltransferases is implicated in various human cancers. However, in light of the well-recognized significance of PMTs, reliable and efficient identification methods are essential. In the present work, we propose a machine-learning-based method for the identification of PMTs. Various sequence-based features were calculated, and prediction models were trained using various machine-learning algorithms using a tenfold cross-validation technique. After evaluating each model on the dataset, the SVM-based CKSAAP model achieved the highest prediction accuracy with balanced sensitivity and specificity. Also, this SVM model outperformed deep-learning algorithms for the prediction of PMTs. In addition, cross-database validation was performed to ensure the robustness of the model. Feature importance was assessed using shapley additive explanations (SHAP) values, providing insights into the contributions of different features to the model's predictions. Finally, the SVM-based CKSAAP model was implemented in a standalone tool, PMTPred, due to its consistent performance during independent testing and cross-database evaluation. We believe that PMTPred will be a useful and efficient tool for the identification of PMTs. The PMTPred is freely available for download at https://github.com/ArvindYadav7/PMTPred and http://www.bioinfoindia.org/PMTPred/home.html for research and academic use.
Collapse
Affiliation(s)
- Arvind Kumar Yadav
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology, Solan- 173234, Himachal Pradesh, India
| | - Pradeep Kumar Gupta
- Department of Computer Science and Engineering, Jaypee University of Information Technology, Solan- 173234, Himachal Pradesh, India
- School of Computing, Department of Data Science and Engineering, Mohan Babu University, Tirupati- 517102, Andhra Pradesh, India
| | - Tiratha Raj Singh
- Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology, Solan- 173234, Himachal Pradesh, India.
- Centre of Excellence in Healthcare Technologies and Informatics (CHETI), Department of Biotechnology and Bioinformatics, Jaypee University of Information Technology, Solan- 173234, Himachal Pradesh, India.
| |
Collapse
|
30
|
Balaji PD, Selvam S, Sohn H, Madhavan T. MLASM: Machine learning based prediction of anticancer small molecules. Mol Divers 2024; 28:2153-2161. [PMID: 38554168 DOI: 10.1007/s11030-024-10823-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2024] [Accepted: 02/10/2024] [Indexed: 04/01/2024]
Abstract
Cancer, being the second leading cause of death globally. So, the development of effective anticancer treatments is crucial in the field of medicine. Anticancer peptides (ACPs) have shown promising therapeutic potential in cancer treatment compared to traditional methods. However, the process of identifying ACPs through experimental means is often time-intensive and expensive. To overcome this issue, we employed a machine learning-based approach for the first time to develop an anticancer model using small molecules. Anticancer small molecules (ACSMs) are compounds that have been developed to target and inhibit cancer cells. In this study, we used 10,000 compounds to develop the machine learning models using five algorithms such as, Random Forest (RF), Light gradient boosting machine (LightGBM), K-nearest neighbors (KNN), Decision tree (DT) and Extreme Gradient Boosting (XGB). The developed models were evaluated using the test set and top three models were identified (RF, LightGBM and XGB). Furthermore, to validate the predictive performance of our models, we have performed external validation using an FDA approved anticancer compounds/drugs. Following this analysis, we found that our LightGBM model correctly predicted 9 compounds as active. However, RF and XGB exhibited some limitations by predicting 8 and 7 compounds as active out of 10, respectively. These results demonstrate that, when compared to RF and XGB, the LightGBM model showcase robust prediction capabilities, achieving a superior accuracy of 79% with an AUC of 0.88. These findings provide promising insights into the potential of our approach for predicting anticancer small molecules, highlighting the role of machine learning in advancing cancer treatment research.
Collapse
Affiliation(s)
- Priya Dharshini Balaji
- Computational Biology Laboratory, Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, Tamil Nadu, 603203, India
| | - Subathra Selvam
- Computational Biology Laboratory, Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, Tamil Nadu, 603203, India
| | - Honglae Sohn
- Department of Chemistry, Department of Carbon Materials, Chosun University, Gwangju, South Korea
| | - Thirumurthy Madhavan
- Computational Biology Laboratory, Department of Genetic Engineering, School of Bio-Engineering, SRM Institute of Science and Technology, Kattankulathur, Chengalpattu, Tamil Nadu, 603203, India.
| |
Collapse
|
31
|
Ullah F, Salam A, Nadeem M, Amin F, AlSalman H, Abrar M, Alfakih T. Extended dipeptide composition framework for accurate identification of anticancer peptides. Sci Rep 2024; 14:17381. [PMID: 39075193 PMCID: PMC11286958 DOI: 10.1038/s41598-024-68475-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2024] [Accepted: 07/24/2024] [Indexed: 07/31/2024] Open
Abstract
The identification of anticancer peptides (ACPs) is crucial, especially in the development of peptide-based cancer therapy. The classical models such as Split Amino Acid Composition (SAAC) and Pseudo Amino Acid Composition (PseAAC) lack the incorporation of feature representation. These advancements improve the predictive accuracy and efficiency of ACP identification. Thus, the effort of this research is to propose and develop an advanced framework based on feature extraction. Thus, to achieve this objective herein we propose an Extended Dipeptide Composition (EDPC) framework. The proposed EDPC framework extends the dipeptide composition by considering the local sequence environment information and reforming the CD-HIT framework to remove noise and redundancy. To measure the accuracy, we have performed several experiments. These experiments were employed using four famous machine learning (ML) algorithms named; Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and K Nearest Neighbor (KNN). For comparisons, we have used accuracy, specificity, sensitivity, precision, recall, and F1-Score as evaluation criteria. The reliability of the proposed framework is further evaluated using statistical significance tests. As a result, the proposed EDPC framework exhibited enhanced performance than SAAC and PseAAC, where the SVM model delivered the highest accuracy of 96. 6% and significant enhancements in specificity, sensitivity, precision, and F1-score over multiple datasets. Due to the incorporation of enhanced feature representation and the incorporation of local and global sequence profiles proposed EDPC achieves higher classification performance. The proposed frameworks can deal with noise and also duplicating features. These are accompanied by a wide range of feature representations. Finally, our proposed framework can be used for clinical applications where ACP identification is essential. Future works will include extending to a larger variety of datasets, incorporating tertiary structural information, and using deep learning techniques to improve the proposed EDPC.
Collapse
Affiliation(s)
- Faizan Ullah
- Department of Computer Science, Bacha Khan University, Charsadda, 24420, Pakistan
| | - Abdu Salam
- Department of Computer Science, Abdul Wali Khan University, Mardan, 23200, Pakistan
| | - Muhammad Nadeem
- Department of Computer Science and Software Engineering, International Islamic University, Islamabad, 44000, Pakistan
| | - Farhan Amin
- School of Computer Science and Engineering, Yeungnam University, Gyeongsan, 38541, Korea.
| | - Hussain AlSalman
- Department of Computer Science, College of Computer and Information Sciences, King Saud University, 11543, Riyadh, Saudi Arabia.
| | - Mohammad Abrar
- Faculty of Computer Studies, Arab Open University, Muscat, Oman
| | - Taha Alfakih
- Department of Information Systems, College of Computer and Information Sciences, King Saud University, 11543, Riyadh, Saudi Arabia
| |
Collapse
|
32
|
Arif M, Musleh S, Fida H, Alam T. PLMACPred prediction of anticancer peptides based on protein language model and wavelet denoising transformation. Sci Rep 2024; 14:16992. [PMID: 39043738 PMCID: PMC11266708 DOI: 10.1038/s41598-024-67433-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2024] [Accepted: 07/11/2024] [Indexed: 07/25/2024] Open
Abstract
Anticancer peptides (ACPs) perform a promising role in discovering anti-cancer drugs. The growing research on ACPs as therapeutic agent is increasing due to its minimal side effects. However, identifying novel ACPs using wet-lab experiments are generally time-consuming, labor-intensive, and expensive. Leveraging computational methods for fast and accurate prediction of ACPs would harness the drug discovery process. Herein, a machine learning-based predictor, called PLMACPred, is developed for identifying ACPs from peptide sequence only. PLMACPred adopted a set of encoding schemes representing evolutionary-property, composition-property, and protein language model (PLM), i.e., evolutionary scale modeling (ESM-2)- and ProtT5-based embedding to encode peptides. Then, two-dimensional (2D) wavelet denoising (WD) was employed to remove the noise from extracted features. Finally, ensemble-based cascade deep forest (CDF) model was developed to identify ACP. PLMACPred model attained superior performance on all three benchmark datasets, namely, ACPmain, ACPAlter, and ACP740 over tenfold cross validation and independent dataset. PLMACPred outperformed the existing models and improved the prediction accuracy by 18.53%, 2.4%, 7.59% on ACPmain, ACPalter, ACP740 dataset, respectively. We showed that embedding from ProtT5 and ESM-2 was capable of capturing better contextual information from the entire sequence than the other encoding schemes for ACP prediction. For the explainability of proposed model, SHAP (SHapley Additive exPlanations) method was used to analyze the feature effect on the ACP prediction. A list of novel sequence motifs was proposed from the ACP sequence using MEME suites. We believe, PLMACPred will support in accelerating the discovery of novel ACPs as well as other activities of microbial peptides.
Collapse
Affiliation(s)
- Muhammad Arif
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Saleh Musleh
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Huma Fida
- Department of Microbiology, Abdul Wali Khan University, Mardan, KPK, Pakistan
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar.
| |
Collapse
|
33
|
Zhang L, Hu X, Xiao K, Kong L. Effective identification and differential analysis of anticancer peptides. Biosystems 2024; 241:105246. [PMID: 38848816 DOI: 10.1016/j.biosystems.2024.105246] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2024] [Revised: 05/27/2024] [Accepted: 06/04/2024] [Indexed: 06/09/2024]
Abstract
Anticancer peptides (ACPs) have recently emerged as promising cancer therapeutics due to their selectivity and lower toxicity. However, the number of experimentally validated ACPs is limited, and identifying ACPs from large-scale sequence data is time-consuming and expensive. Therefore, it is critical to develop and improve upon existing computational models for identifying ACPs. In this study, a computational method named ACP_DA was proposed based on peptide residue composition and physiochemical properties information. To curtail overfitting and reduce computational costs, a sequential forward selection method was utilized to construct the optimal feature groups. Subsequently, the feature vectors were fed into light gradient boosting machine classifier for model construction. It was observed by an independent set test that ACP_DA achieved the highest Matthew's correlation coefficient of 0.63 and accuracy of 0.8129, displaying at least a 2% enhancement compared to state-of-the-art methods. The satisfactory results demonstrate the effectiveness of ACP_DA as a powerful tool for identifying ACPs, with the potential to significantly contribute to the development and optimization of promising therapies. The data and resource codes are available at https://github.com/Zlclab/ACP_DA.
Collapse
Affiliation(s)
- Lichao Zhang
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China; Hebei Innovation Center for Smart Perception and Applied Technology of Agricultural Data, Qinhuangdao, PR China
| | - Xueli Hu
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China
| | - Kang Xiao
- School of Mathematics and Statistics, Northeastern University at Qinhuangdao, Qinhuangdao, PR China
| | - Liang Kong
- Hebei Innovation Center for Smart Perception and Applied Technology of Agricultural Data, Qinhuangdao, PR China; School of Mathematics and Information Science & Technology, Hebei Normal University of Science & Technology, Qinhuangdao, PR China.
| |
Collapse
|
34
|
Kao HJ, Weng TH, Chen CH, Chen YC, Chi YH, Huang KY, Weng SL. Integrating In Silico and In Vitro Approaches to Identify Natural Peptides with Selective Cytotoxicity against Cancer Cells. Int J Mol Sci 2024; 25:6848. [PMID: 38999958 PMCID: PMC11240926 DOI: 10.3390/ijms25136848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2024] [Revised: 06/14/2024] [Accepted: 06/18/2024] [Indexed: 07/14/2024] Open
Abstract
Anticancer peptides (ACPs) are bioactive compounds known for their selective cytotoxicity against tumor cells via various mechanisms. Recent studies have demonstrated that in silico machine learning methods are effective in predicting peptides with anticancer activity. In this study, we collected and analyzed over a thousand experimentally verified ACPs, specifically targeting peptides derived from natural sources. We developed a precise prediction model based on their sequence and structural features, and the model's evaluation results suggest its strong predictive ability for anticancer activity. To enhance reliability, we integrated the results of this model with those from other available methods. In total, we identified 176 potential ACPs, some of which were synthesized and further evaluated using the MTT colorimetric assay. All of these putative ACPs exhibited significant anticancer effects and selective cytotoxicity against specific tumor cells. In summary, we present a strategy for identifying and characterizing natural peptides with selective cytotoxicity against cancer cells, which could serve as novel therapeutic agents. Our prediction model can effectively screen new molecules for potential anticancer activity, and the results from in vitro experiments provide compelling evidence of the candidates' anticancer effects and selective cytotoxicity.
Collapse
Affiliation(s)
- Hui-Ju Kao
- Department of Medical Research, Hsinchu MacKay Memorial Hospital, Hsinchu City 300, Taiwan
- Department of Medical Research, Hsinchu Municipal MacKay Children's Hospital, Hsinchu City 300, Taiwan
| | - Tzu-Han Weng
- Department of Dermatology, MacKay Memorial Hospital, Taipei City 104, Taiwan
| | - Chia-Hung Chen
- Department of Medical Research, Hsinchu MacKay Memorial Hospital, Hsinchu City 300, Taiwan
- Department of Medical Research, Hsinchu Municipal MacKay Children's Hospital, Hsinchu City 300, Taiwan
| | - Yu-Chi Chen
- Department of Medical Research, Hsinchu MacKay Memorial Hospital, Hsinchu City 300, Taiwan
- Department of Medical Research, Hsinchu Municipal MacKay Children's Hospital, Hsinchu City 300, Taiwan
| | - Yu-Hsiang Chi
- National Center for High-Performance Computing, Hsinchu City 300, Taiwan
| | - Kai-Yao Huang
- Department of Medical Research, Hsinchu MacKay Memorial Hospital, Hsinchu City 300, Taiwan
- Department of Medical Research, Hsinchu Municipal MacKay Children's Hospital, Hsinchu City 300, Taiwan
- Department of Medicine, MacKay Medical College, New Taipei City 252, Taiwan
- Institute of Biomedical Sciences, MacKay Medical College, New Taipei City 252, Taiwan
| | - Shun-Long Weng
- Department of Medicine, MacKay Medical College, New Taipei City 252, Taiwan
- Department of Obstetrics and Gynecology, Hsinchu MacKay Memorial Hospital, Hsinchu City 300, Taiwan
- Department of Obstetrics and Gynecology, Hsinchu Municipal MacKay Children's Hospital, Hsinchu City 300, Taiwan
| |
Collapse
|
35
|
Song H, Lin X, Zhang H, Yin H. ACP-ESM2: The prediction of anticancer peptides based on pre-trained classifier. Comput Biol Chem 2024; 110:108091. [PMID: 38735271 DOI: 10.1016/j.compbiolchem.2024.108091] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2024] [Revised: 04/07/2024] [Accepted: 04/29/2024] [Indexed: 05/14/2024]
Abstract
Anticancer peptides (ACPs) are a type of protein molecule that has anti-cancer activity and can inhibit cancer cell growth and survival. Traditional classification approaches for ACPs are expensive and time-consuming. This paper proposes a pre-trained classifier model, ESM2-GRU, for ACP prediction to make it easier to predict ACPs, gain a better understanding of the structural and functional differences of anti-cancer peptides, and optimize the design for the development of more effective anti-cancer treatment strategies. The model is made up of the ESM2 pre-trained model, a bidirectional GRU recurrent neural network, and a fully connected layer. ACP sequences are first fed into the ESM2 model, which then expands the dimensions before feeding the findings back into the bidirectional GRU recurrent neural network. Finally, the fully connected layer generates the ultimate output. Experimental validation demonstrates that the ESM2-GRU model greatly improves classification performance on the benchmark dataset ACP606, with AUC, ACC, and MCC values of 0.975, 0.852, and 0.738, respectively. This exceptional prediction potential helps to identify specific types of anti-cancer peptides, improving their targeting and selectivity and, therefore, furthering the development of tailored medicine and treatments.
Collapse
Affiliation(s)
- Huijia Song
- School of Information Engineering, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Xiaozhu Lin
- School of Information Engineering, Beijing Institute of Petrochemical Technology, Beijing 102617, China.
| | - Huainian Zhang
- School of Information Engineering, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| | - Huijuan Yin
- School of Information Engineering, Beijing Institute of Petrochemical Technology, Beijing 102617, China
| |
Collapse
|
36
|
Liao YH, Chen SZ, Bin YN, Zhao JP, Feng XL, Zheng CH. UsIL-6: An unbalanced learning strategy for identifying IL-6 inducing peptides by undersampling technique. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 250:108176. [PMID: 38677081 DOI: 10.1016/j.cmpb.2024.108176] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/18/2022] [Revised: 03/26/2024] [Accepted: 04/11/2024] [Indexed: 04/29/2024]
Abstract
BACKGROUND AND OBJECTIVE Interleukin-6 (IL-6) is the critical factor of early warning, monitoring, and prognosis in the inflammatory storm of COVID-19 cases. IL-6 inducing peptides, which can induce cytokine IL-6 production, are very important for the development of diagnosis and immunotherapy. Although the existing methods have some success in predicting IL-6 inducing peptides, there is still room for improvement in the performance of these models in practical application. METHODS In this study, we proposed UsIL-6, a high-performance bioinformatics tool for identifying IL-6 inducing peptides. First, we extracted five groups of physicochemical properties and sequence structural information from IL-6 inducing peptide sequences, and obtained a 636-dimensional feature vector, we also employed NearMiss3 undersampling method and normalization method StandardScaler to process the data. Then, a 40-dimensional optimal feature vector was obtained by Boruta feature selection method. Finally, we combined this feature vector with extreme randomization tree classifier to build the final model UsIL-6. RESULTS The AUC value of UsIL-6 on the independent test dataset was 0.87, and the BACC value was 0.808, which indicated that UsIL-6 had better performance than the existing methods in IL-6 inducing peptide recognition. CONCLUSIONS The performance comparison on independent test dataset confirmed that UsIL-6 could achieve the highest performance, best robustness, and most excellent generalization ability. We hope that UsIL-6 will become a valuable method to identify, annotate and characterize new IL-6 inducing peptides.
Collapse
Affiliation(s)
- Yan-Hong Liao
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China
| | - Shou-Zhi Chen
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China
| | - Yan-Nan Bin
- School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| | - Jian-Ping Zhao
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China.
| | - Xin-Long Feng
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China.
| | - Chun-Hou Zheng
- School of Mathematics and System Science, Xinjiang University, Urumqi, Xinjiang 830017, China; School of Computer Science and Technology, Anhui University, Hefei, Anhui 230601, China
| |
Collapse
|
37
|
Chen T, Kabir MF. Explainable machine learning approach for cancer prediction through binarilization of RNA sequencing data. PLoS One 2024; 19:e0302947. [PMID: 38728288 PMCID: PMC11086842 DOI: 10.1371/journal.pone.0302947] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2023] [Accepted: 04/15/2024] [Indexed: 05/12/2024] Open
Abstract
In recent years, researchers have proven the effectiveness and speediness of machine learning-based cancer diagnosis models. However, it is difficult to explain the results generated by machine learning models, especially ones that utilized complex high-dimensional data like RNA sequencing data. In this study, we propose the binarilization technique as a novel way to treat RNA sequencing data and used it to construct explainable cancer prediction models. We tested our proposed data processing technique on five different models, namely neural network, random forest, xgboost, support vector machine, and decision tree, using four cancer datasets collected from the National Cancer Institute Genomic Data Commons. Since our datasets are imbalanced, we evaluated the performance of all models using metrics designed for imbalance performance like geometric mean, Matthews correlation coefficient, F-Measure, and area under the receiver operating characteristic curve. Our approach showed comparative performance while relying on less features. Additionally, we demonstrated that data binarilization offers higher explainability by revealing how each feature affects the prediction. These results demonstrate the potential of data binarilization technique in improving the performance and explainability of RNA sequencing based cancer prediction models.
Collapse
Affiliation(s)
- Tianjie Chen
- Department of Computer Science, Pennsylvania State University Harrisburg, Middletown, Pennsylvania, United States of America
| | - Md Faisal Kabir
- Department of Computer Science, Pennsylvania State University Harrisburg, Middletown, Pennsylvania, United States of America
| |
Collapse
|
38
|
Tran TO, Le NQK. Sa-TTCA: An SVM-based approach for tumor T-cell antigen classification using features extracted from biological sequencing and natural language processing. Comput Biol Med 2024; 174:108408. [PMID: 38636332 DOI: 10.1016/j.compbiomed.2024.108408] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2023] [Revised: 01/13/2024] [Accepted: 04/01/2024] [Indexed: 04/20/2024]
Abstract
Accurately predicting tumor T-cell antigen (TTCA) sequences is a crucial task in the development of cancer vaccines and immunotherapies. TTCAs derived from tumor cells, are presented to immune cells (T cells) through major histocompatibility complex (MHC), via the recognition of specific portions of their structure known as epitopes. More specifically, MHC class I introduces TTCAs to T-cell receptors (TCR) which are located on the surface of CD8+ T cells. However, TTCA sequences are varied and lead to struggles in vaccine design. Recently, Machine learning (ML) models have been developed to predict TTCA sequences which could aid in fast and correct TTCA identification. During the construction of the TTCA predictor, the peptide encoding strategy is an important step. Previous studies have used biological descriptors for encoding TTCA sequences. However, there have been no studies that use natural language processing (NLP), a potential approach for this purpose. As sentences have their own words with diverse properties, biological sequences also hold unique characteristics that reflect evolutionary information, physicochemical values, and structural information. We hypothesized that NLP methods would benefit the prediction of TTCA. To develop a new identifying TTCA model, we first constructed a based model with widely used ML algorithms and extracted features from biological descriptors. Then, to improve our model performance, we added extracted features from biological language models (BLMs) based on NLP methods. Besides, we conducted feature selection by using Chi-square and Pearson Correlation Coefficient techniques. Then, SMOTE, Up-sampling, and Near-Miss were used to treat unbalanced data. Finally, we optimized Sa-TTCA by the SVM algorithm to the four most effective feature groups. The best performance of Sa-TTCA showed a competitive balanced accuracy of 87.5% on a training set, and 72.0% on an independent testing set. Our results suggest that integrating biological descriptors with natural language processing has the potential to improve the precision of predicting protein/peptide functionality, which could be beneficial for developing cancer vaccines.
Collapse
Affiliation(s)
- Thi-Oanh Tran
- International Ph.D. Program in Cell Therapy and Regenerative Medicine, College of Medicine, Taipei Medical University, Taipei, 110, Taiwan; AIBioMed Research Group, Taipei Medical University, Taipei, 110, Taiwan; Hematology and Blood Transfusion Center, Bach Mai Hospital, No. 78, Giai Phong Street, Hanoi, Viet Nam
| | - Nguyen Quoc Khanh Le
- AIBioMed Research Group, Taipei Medical University, Taipei, 110, Taiwan; Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, 110, Taiwan; Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei, 110, Taiwan; Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, 110, Taiwan.
| |
Collapse
|
39
|
Xu M, Pang J, Ye Y, Zhang Z. Integrating Traditional Machine Learning and Deep Learning for Precision Screening of Anticancer Peptides: A Novel Approach for Efficient Drug Discovery. ACS OMEGA 2024; 9:16820-16831. [PMID: 38617603 PMCID: PMC11007766 DOI: 10.1021/acsomega.4c01374] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/12/2024] [Revised: 03/03/2024] [Accepted: 03/22/2024] [Indexed: 04/16/2024]
Abstract
The rapid and effective identification of anticancer peptides (ACPs) by computer technology provides a new perspective for cancer treatment. In the identification process of ACPs, accurate sequence encoding and effective classification models are crucial for predicting their biological activity. Traditional machine learning methods have been widely applied in sequence analysis, but deep learning provides a new approach to capture sequence complexity. In this study, a two-stage ACPs classification model was innovatively proposed. Three novel coding strategies were explored; two mainstream Natural Language Processing (NLP) models and 11 machine learning models were fused to identify ACPs, which significantly improved the prediction accuracy of ACPs. We analyzed the correlation between peptide chain amino acids and evaluated the relevant performance of the model by the ROC curve and t-SNE dimensionality reduction technique. The results indicated that the deep learning and machine learning fusion models of M3E-base and KNeighborsDist models, especially when considering the semantic information on amino acid sequences, achieved the highest average accuracy (AvgAcc) of 0.939, with an AUC value as high as 0.97. Then, in vitro cell experiments were used to verify that the two ACPs predicted by the model had antitumor efficacy. This study provides a convenient and effective method for screening ACPs. With further optimization and testing, these strategies have the potential to play an important role in drug discovery and design.
Collapse
Affiliation(s)
- Meiqi Xu
- Key
Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang
Province, School of Medicine, Hangzhou City
University, Hangzhou 310015, Zhejiang, China
| | - Jiefu Pang
- School
of Computer Science, Hangzhou Dianzi University, Hangzhou 310018, Zhejiang, China
| | - Yangyang Ye
- Key
Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang
Province, School of Medicine, Hangzhou City
University, Hangzhou 310015, Zhejiang, China
| | - Ziyi Zhang
- Key
Laboratory of Novel Targets and Drug Study for Neural Repair of Zhejiang
Province, School of Medicine, Hangzhou City
University, Hangzhou 310015, Zhejiang, China
| |
Collapse
|
40
|
Yang X, Jin J, Wang R, Li Z, Wang Y, Wei L. CACPP: A Contrastive Learning-Based Siamese Network to Identify Anticancer Peptides Based on Sequence Only. J Chem Inf Model 2024; 64:2807-2816. [PMID: 37252890 DOI: 10.1021/acs.jcim.3c00297] [Citation(s) in RCA: 11] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
Anticancer peptides (ACPs) recently have been receiving increasing attention in cancer therapy due to their low consumption, few adverse side effects, and easy accessibility. However, it remains a great challenge to identify anticancer peptides via experimental approaches, requiring expensive and time-consuming experimental studies. In addition, traditional machine-learning-based methods are proposed for ACP prediction mainly depending on hand-crafted feature engineering, which normally achieves low prediction performance. In this study, we propose CACPP (Contrastive ACP Predictor), a deep learning framework based on the convolutional neural network (CNN) and contrastive learning for accurately predicting anticancer peptides. In particular, we introduce the TextCNN model to extract the high-latent features based on the peptide sequences only and exploit the contrastive learning module to learn more distinguishable feature representations to make better predictions. Comparative results on the benchmark data sets indicate that CACPP outperforms all the state-of-the-art methods in the prediction of anticancer peptides. Moreover, to intuitively show that our model has good classification ability, we visualize the dimension reduction of the features from our model and explore the relationship between ACP sequences and anticancer functions. Furthermore, we also discuss the influence of data set construction on model prediction and explore our model performance on the data sets with verified negative samples.
Collapse
Affiliation(s)
- Xuetong Yang
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Junru Jin
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Ruheng Wang
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Zhongshen Li
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Yu Wang
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| | - Leyi Wei
- School of Software, Shandong University, Jinan 250101, China
- Joint SDU-NTU Centre for Artificial Intelligence Research (C-FAIR), Shandong University, Jinan 250101, China
| |
Collapse
|
41
|
Li H, Meng J, Wang Z, Tang Y, Xia S, Wang Y, Qin Z, Luan Y. miPEPPred-FRL: A Novel Method for Predicting Plant MiRNA-Encoded Peptides Using Adaptive Feature Representation Learning. J Chem Inf Model 2024; 64:2889-2900. [PMID: 37733290 DOI: 10.1021/acs.jcim.3c01020] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/22/2023]
Abstract
MicroRNAs (miRNAs) are an essential type of small molecule RNAs that play significant regulatory roles in organisms. Recent studies have demonstrated that small open reading frames (sORFs) harbored in primary miRNAs (pri-miRNAs) can encode small peptides, known as miPEPs. Plant miPEPs can increase the abundance and activity of cognate miRNAs by promoting the transcription of their corresponding pri-miRNAs, thereby modulating plant traits. Biological experiments are the most effective way to accurately identify miPEPs; however, they are time-consuming and expensive. Hence, an efficient computational method for the identification of miPEPs on a large scale is highly desirable. Up to now, there have been no specialized computational tools for identifying miPEPs. In this work, a novel predictor named miPEPPred-FRL based on an adaptive feature representation learning framework that consists of the feature transformation module and the cascade architecture has been proposed. The feature transformation module integrating a newly designed feature selection method and classifier selection rule is developed to convert sequence-based features into primary class and probabilistic features, which are then fed into the improved cascade architecture to obtain more stable and discriminative augmented features. Finally, the augmented features are utilized to construct the final predictor. Cross-validation experiments illustrate that the novel feature selection method and classifier selection rule contribute to boosting the feature representation ability of the framework. Furthermore, the high accuracy of miPEPPred-FRL on independent testing data suggests that it is a trustworthy and valuable tool for the identification of miPEPs.
Collapse
Affiliation(s)
- Haibin Li
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Zhaowei Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Youwei Tang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Shihao Xia
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Yu Wang
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Zhaojing Qin
- School of Computer Science and Technology, Dalian University of Technology, Dalian, Liaoning 116024, China
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, Liaoning 116024, China
| |
Collapse
|
42
|
Liang X, Zhao H, Wang J. MA-PEP: A novel anticancer peptide prediction framework with multimodal feature fusion based on attention mechanism. Protein Sci 2024; 33:e4966. [PMID: 38532681 PMCID: PMC10966354 DOI: 10.1002/pro.4966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2023] [Revised: 01/30/2024] [Accepted: 03/06/2024] [Indexed: 03/28/2024]
Abstract
AntiCancer Peptides (ACPs) have emerged as promising therapeutic agents for cancer treatment. The time-consuming and costly nature of wet-lab discriminatory methods has spurred the development of various machine learning and deep learning-based ACP classification methods. Nonetheless, current methods encountered challenges in efficiently integrating features from various peptide modalities, thereby limiting a more comprehensive understanding of ACPs and further restricting the improvement of prediction model performance. In this study, we introduce a novel ACP prediction method, MA-PEP, which leverages multiple attention mechanisms for feature enhancement and fusion to improve ACP prediction. By integrating the enhanced molecular-level chemical features and sequence information of peptides, MA-PEP demonstrates superior prediction performance across several benchmark datasets, highlighting its efficacy in ACP prediction. Moreover, the visual analysis and case studies further demonstrate MA-PEP's reliable feature extraction capability and its promise in the realm of ACP exploration. The code and datasets for MA-PEP are available at https://github.com/liangxiaodata/MA-PEP.
Collapse
Affiliation(s)
- Xiao Liang
- School of Computer Science and Engineering, Central South UniversityChangshaChina
- Hunan Provincial Key Lab on Bioinformatics, Central South UniversityChangshaChina
| | - Haochen Zhao
- School of Computer Science and Engineering, Central South UniversityChangshaChina
- Hunan Provincial Key Lab on Bioinformatics, Central South UniversityChangshaChina
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South UniversityChangshaChina
- Hunan Provincial Key Lab on Bioinformatics, Central South UniversityChangshaChina
| |
Collapse
|
43
|
Gao J, Zhao Y, Chen C, Ning Q. MVNN-HNHC:A multi-view neural network for identification of human non-histone crotonylation sites. Anal Biochem 2024; 687:115426. [PMID: 38141798 DOI: 10.1016/j.ab.2023.115426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 11/21/2023] [Accepted: 12/06/2023] [Indexed: 12/25/2023]
Abstract
Crotonylation on lysine sites in human non-histone proteins plays a crucial role in biology activities. However, because traditional experimental methods for crotonylation site identification are time-consuming and labor-intensive, computational prediction methods have become increasingly popular in recent years. Despite its significance, crotonylation site prediction has received less attention in non-histone proteins than in histones. In this study, we proposed a Multi-View Neural Network for identification of Human Non-Histone Crotonylation sites, named MVNN-HNHC. MVNN-HNHC integrated multi-view encoding features and adaptive encoding features through multi-channel neural network to deeply learn about attribute differences between crotonylation sites and non-crotonylation sites from various aspects. In MVNN-HNHC, convolutional neural networks can obtain local information from these features, and bidirectional long short term memory networks were utilized to extract sequence information. Then, we employ the attention mechanism to fuse the outputs of various feature extraction modules. Finally, the fully connection network acted as the classifier to predict whether a lysine site was crotonylation site or non-crotonylation site. Performance metrics on independent test set, including sensitivity, specificity, accuracy, Matthews correlation coefficient, and area under the curve (AUC) values reach 80.06 %, 75.77 %, 77.06 %, 0.5203, and 0.7792, respectively. To verify the effectiveness of this method, we carry out a series of experiments and the results show that MVNN-HNHC is an effective tool for predicting crotonylation sites in non-histone proteins. The data and code are available on https://github.com/xbbxhbc/junjun0612.git.
Collapse
Affiliation(s)
- Jun Gao
- Department of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| | - Yaomiao Zhao
- Department of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China
| | - Chen Chen
- Naval Architecture and Ocean Engineering College, Dalian Maritime University, Dalian, 116026, China.
| | - Qiao Ning
- Department of Information Science and Technology, Dalian Maritime University, Dalian, 116026, China.
| |
Collapse
|
44
|
Lee B, Shin D. Contrastive learning for enhancing feature extraction in anticancer peptides. Brief Bioinform 2024; 25:bbae220. [PMID: 38725157 PMCID: PMC11082072 DOI: 10.1093/bib/bbae220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2024] [Revised: 03/28/2024] [Accepted: 04/21/2024] [Indexed: 05/13/2024] Open
Abstract
Cancer, recognized as a primary cause of death worldwide, has profound health implications and incurs a substantial social burden. Numerous efforts have been made to develop cancer treatments, among which anticancer peptides (ACPs) are garnering recognition for their potential applications. While ACP screening is time-consuming and costly, in silico prediction tools provide a way to overcome these challenges. Herein, we present a deep learning model designed to screen ACPs using peptide sequences only. A contrastive learning technique was applied to enhance model performance, yielding better results than a model trained solely on binary classification loss. Furthermore, two independent encoders were employed as a replacement for data augmentation, a technique commonly used in contrastive learning. Our model achieved superior performance on five of six benchmark datasets against previous state-of-the-art models. As prediction tools advance, the potential in peptide-based cancer therapeutics increases, promising a brighter future for oncology research and patient care.
Collapse
Affiliation(s)
- Byungjo Lee
- Research Institute, National Cancer Center, 323, Ilsan-ro, Ilsandong-gu, Goyang, 10408, Republic of Korea
| | - Dongkwan Shin
- Research Institute, National Cancer Center, 323, Ilsan-ro, Ilsandong-gu, Goyang, 10408, Republic of Korea
- Department of Cancer Biomedical Science, National Cancer Center Graduate School of Cancer Science and Policy, 323, Ilsan-ro, Ilsandong-gu, Goyang, 10408, Republic of Korea
| |
Collapse
|
45
|
Datta S, Nabeel Asim M, Dengel A, Ahmed S. NTpred: a robust and precise machine learning framework for in silico identification of Tyrosine nitration sites in protein sequences. Brief Funct Genomics 2024; 23:163-179. [PMID: 37248673 DOI: 10.1093/bfgp/elad018] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 04/12/2023] [Accepted: 05/02/2023] [Indexed: 05/31/2023] Open
Abstract
Post-translational modifications (PTMs) either enhance a protein's activity in various sub-cellular processes, or degrade their activity which leads toward failure of intracellular processes. Tyrosine nitration (NT) modification degrades protein's activity that initiates and propagates various diseases including neurodegenerative, cardiovascular, autoimmune diseases and carcinogenesis. Identification of NT modification supports development of novel therapies and drug discoveries for associated diseases. Identification of NT modification in biochemical labs is expensive, time consuming and error-prone. To supplement this process, several computational approaches have been proposed. However these approaches fail to precisely identify NT modification, due to the extraction of irrelevant, redundant and less discriminative features from protein sequences. This paper presents the NTpred framework that is competent in extracting comprehensive features from raw protein sequences using four different sequence encoders. To reap the benefits of different encoders, it generates four additional feature spaces by fusing different combinations of individual encodings. Furthermore, it eradicates irrelevant and redundant features from eight different feature spaces through a Recursive Feature Elimination process. Selected features of four individual encodings and four feature fusion vectors are used to train eight different Gradient Boosted Tree classifiers. The probability scores from the trained classifiers are utilized to generate a new probabilistic feature space, which is used to train a Logistic Regression classifier. On the BD1 benchmark dataset, the proposed framework outperforms the existing best-performing predictor in 5-fold cross validation and independent test evaluation with combined improvement of 13.7% in MCC and 20.1% in AUC. Similarly, on the BD2 benchmark dataset, the proposed framework outperforms the existing best-performing predictor with combined improvement of 5.3% in MCC and 1.0% in AUC. NTpred is publicly available for further experimentation and predictive use at: https://sds_genetic_analysis.opendfki.de/PredNTS/.
Collapse
Affiliation(s)
- Sourajyoti Datta
- Department of Computer Science, Rheinland Pfälzische Technische Universität, Kaiserslautern, 67663, Germany
| | - Muhammad Nabeel Asim
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany
| | - Andreas Dengel
- Department of Computer Science, Rheinland Pfälzische Technische Universität, Kaiserslautern, 67663, Germany
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany
| | - Sheraz Ahmed
- German Research Center for Artificial Intelligence, Kaiserslautern, 67663, Germany
| |
Collapse
|
46
|
Kanwal S, Arif R, Ahmed S, Kabir M. A novel stacking-based predictor for accurate prediction of antimicrobial peptides. J Biomol Struct Dyn 2024:1-12. [PMID: 38500243 DOI: 10.1080/07391102.2024.2329298] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 03/06/2024] [Indexed: 03/20/2024]
Abstract
Antimicrobial peptides (AMPs) are gaining acceptance and support as a chief antibiotic substitute since they boost human immunity. They retain a wide range of actions and have a low risk of developing resistance, which are critical properties to the pharmaceutical industry for drug discovery. Antibiotic sensitivity, however, is an issue that affects people all around the world and has the potential to one day lead to an epidemic. As cutting-edge therapeutic agents, AMPs are also expected to cure microbial infections. In order to produce tolerable drugs, it is crucial to understand the significance of the basic architecture of AMPs. Traditional laboratory methods are expensive and time-consuming for AMPs testing and detection. Currently, bioinformatics techniques are being successfully applied to the detection of AMPs. In this study, we have developed a novel STacking-based ensemble learning framework for AntiMicrobial Peptide (STAMP) prediction. First, we constructed 84 different baseline models by using 12 different feature encoding schemes and 7 popular machine learning algorithms. Second, these baseline models were trained and employed to create a new probabilistic feature vector. Finally, based on the feature selection strategy, we determined the optimal probabilistic feature vector, which was further utilized for the construction of our stacked model. Resultantly, the STAMP predictor achieved excellent performance during cross-validation with an accuracy and Matthew's correlation coefficient of 0.930 and 0.860, respectively. The corresponding metrics during the independent test were 0.710 and 0.464, respectively. Overall, STAMP achieved a more accurate and stable performance than the baseline models and significantly outperformed the existing predictors, demonstrating the effectiveness of our proposed hybrid framework. Furthermore, STAMP is expected to assist community-wide efforts in identifying AMPs and will contribute to the development of novel therapeutic methods and drug-design for immunity.
Collapse
Affiliation(s)
- Sameera Kanwal
- School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Roha Arif
- School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Saeed Ahmed
- School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Muhammad Kabir
- School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
47
|
Azad H, Akbar MY, Sarfraz J, Haider W, Riaz MN, Ali GM, Ghazanfar S. G-ACP: a machine learning approach to the prediction of therapeutic peptides for gastric cancer. J Biomol Struct Dyn 2024:1-14. [PMID: 38450672 DOI: 10.1080/07391102.2024.2323141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Accepted: 02/15/2024] [Indexed: 03/08/2024]
Abstract
Conventional Gastrointestinal (GI) cancer treatments are quite expensive and have major hazards. Nowadays, a different strategy places more emphasis on creating tiny biologically active peptides that do not cause severe poisoning. Anticancer peptides (ACPs) are found through experimental screening, which is time-dependent and frequently fraught with difficulties. Gastric ACPs are emerging as a promising GI cancer treatment in the current day. It is crucial to identify novel gastric ACPs to have an improved knowledge of their functioning processes and treatment of gastric cancer. As a result of the post-genomic era's massive production of peptide sequences, rapid and effective ACPs using a computational method are essential. Several adaptive statistical techniques for distinguishing ACPs and non-ACPs have recently been developed. A variety of adapted statistically significant methods have been developed to differentiate between ACPs and non-ACPs. Despite significant progress, there is no specific model for the prediction of gastric ACPs because the specific model will predict a particular type of peptide more accurately and quickly. To overcome this, an initiative is taken for the creation of a reliable framework for the accurate identification of gastric ACPs. The current technique in particular contains four possible features along with one hybrid feature encoding mechanisms which are the target-class motif previously indicated by Amino Acid Composition, Dipeptide Composition, Tripeptide Composition (TPC), Pseudo Amino Acid Composition (PAAC), and their Hybrid. Machine Learning algorithms make high-performance and accurate prediction tools. Moreover, highly variable and ideal deep feature selection is done using an ANOVA-based F score for feature pruning. Experiments on a range of algorithms are carried out to identify the optimal operating strategy due to the diverse nature of learning. Following analysis of the empirical results, Naïve Bayes with TPC and Hybrid feature space outperforms other methods with 0.99 accuracy score on the testing dataset. To find the model generalization an external validation is carried out. In external datasets, the Extra Trees with PAAC features outperforms with the accuracy of 0.94. The comparison study shows that our suggested model will predict gastric ACPs more accurately and will be useful in drug development and gastric cancer. The predictive model can be freely accessed at https://github.com/humeraazad10/G-ACP.git.
Collapse
Affiliation(s)
- Humera Azad
- Department of Biosciences (Bioinformatics) Islamabad, Comsats University Islamabad, Pakistan
| | - Muhammad Yasir Akbar
- National Institute for Genomics and Advanced Biotechnology (NIGAB), National Agricultural Research Center (NARC), Pakistan
| | | | - Waseem Haider
- Department of Biosciences (Bioinformatics) Islamabad, Comsats University Islamabad, Pakistan
| | - Muhammad Naeem Riaz
- National Institute for Genomics and Advanced Biotechnology (NIGAB), National Agricultural Research Center (NARC), Pakistan
| | - Ghulam Muhammad Ali
- Department of Biosciences (Bioinformatics) Islamabad, Comsats University Islamabad, Pakistan
| | - Shakira Ghazanfar
- National Institute for Genomics and Advanced Biotechnology (NIGAB), National Agricultural Research Center (NARC), Pakistan
| |
Collapse
|
48
|
Bian J, Liu X, Dong G, Hou C, Huang S, Zhang D. ACP-ML: A sequence-based method for anticancer peptide prediction. Comput Biol Med 2024; 170:108063. [PMID: 38301519 DOI: 10.1016/j.compbiomed.2024.108063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 01/08/2024] [Accepted: 01/27/2024] [Indexed: 02/03/2024]
Abstract
Cancer is a serious malignant tumor and is difficult to cure. Chemotherapy, as a primary treatment for cancer, causes significant harm to normal cells in the body and is often accompanied by serious side effects. Recently, anti-cancer peptides (ACPs) as a type of protein for treating cancers dominated research into the development of new anti-tumor drugs because of their ability to specifically target and destroy cancer cells. The screening of proteins with cancer-inhibiting properties from a large pool of proteins is key to the development of anti-tumor drugs. However, it is expensive and inefficient to accurately identify protein functions only through biological experiments due to their complex structure. Therefore, we propose a new prediction model ACP-ML to effectively predict ACPs. In terms of feature extraction, DPC, PseAAC, CTDC, CTDT and CS-Pse-PSSM features were used and the most optimal feature set was selected by comparing combinations of these features. Then, a two-step feature selection process using MRMD and RFE algorithms was performed to determine the most crucial features from the most optimal feature set for identifying ACPs. Furthermore, we assessed the classification accuracy of single learning models and different strategies-based ensemble models through ten-fold cross-validation. Ultimately, a voting-based ensemble learning method is developed to predict ACPs. To validate its effectiveness, two independent test sets were used to perform tests, achieving accuracy of 90.891 % and 92.578 % respectively. Compared with existing anticancer peptide prediction algorithms, the proposed feature processing method is more effective, and the proposed ensemble model ACP-ML exhibits stronger generalization capability and higher accuracy.
Collapse
Affiliation(s)
- Jilong Bian
- Northeast Forestry University, College of Computer and Control Engineering, Harbin, Heilongjiang, China.
| | - Xuan Liu
- Northeast Forestry University, College of Computer and Control Engineering, Harbin, Heilongjiang, China
| | - Guanghui Dong
- Northeast Forestry University, College of Computer and Control Engineering, Harbin, Heilongjiang, China
| | - Chang Hou
- Northeast Forestry University, College of Computer and Control Engineering, Harbin, Heilongjiang, China
| | - Shan Huang
- Department of Neurology, The Second Affiliated Hospital, Harbin Medical University, Harbin, Heilongjiang, China.
| | - Dandan Zhang
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of Harbin Medical University, Harbin, Heilongjiang, China.
| |
Collapse
|
49
|
Gu X, Liu J, Yu Y, Xiao P, Ding Y. MFD-GDrug: multimodal feature fusion-based deep learning for GPCR-drug interaction prediction. Methods 2024; 223:75-82. [PMID: 38286333 DOI: 10.1016/j.ymeth.2024.01.017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2023] [Revised: 01/14/2024] [Accepted: 01/26/2024] [Indexed: 01/31/2024] Open
Abstract
The accurate identification of drug-protein interactions (DPIs) is crucial in drug development, especially concerning G protein-coupled receptors (GPCRs), which are vital targets in drug discovery. However, experimental validation of GPCR-drug pairings is costly, prompting the need for accurate predictive methods. To address this, we propose MFD-GDrug, a multimodal deep learning model. Leveraging the ESM pretrained model, we extract protein features and employ a CNN for protein feature representation. For drugs, we integrated multimodal features of drug molecular structures, including three-dimensional features derived from Mol2vec and the topological information of drug graph structures extracted through Graph Convolutional Neural Networks (GCN). By combining structural characterizations and pretrained embeddings, our model effectively captures GPCR-drug interactions. Our tests on leading GPCR-drug interaction datasets show that MFD-GDrug outperforms other methods, demonstrating superior predictive accuracy.
Collapse
Affiliation(s)
- Xingyue Gu
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China
| | - Junkai Liu
- School of Electronic and Information Engineering, Suzhou University of Science and Technology, Suzhou 215009, China
| | - Yue Yu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China
| | - Pengfeng Xiao
- State Key Laboratory of Digital Medical Engineering, School of Biological Science and Medical Engineering, Southeast University, Nanjing 210096, China.
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang 324003, China; Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611730, China.
| |
Collapse
|
50
|
Liu M, Wu T, Li X, Zhu Y, Chen S, Huang J, Zhou F, Liu H. ACPPfel: Explainable deep ensemble learning for anticancer peptides prediction based on feature optimization. Front Genet 2024; 15:1352504. [PMID: 38487252 PMCID: PMC10937565 DOI: 10.3389/fgene.2024.1352504] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 02/19/2024] [Indexed: 03/17/2024] Open
Abstract
Background: Cancer is a significant global health problem that continues to cause a high number of deaths worldwide. Traditional cancer treatments often come with risks that can compromise the functionality of vital organs. As a potential alternative to these conventional therapies, Anticancer peptides (ACPs) have garnered attention for their small size, high specificity, and reduced toxicity, making them as a promising option for cancer treatments. Methods: However, the process of identifying effective ACPs through wet-lab screening experiments is time-consuming and requires a lot of labor. To overcome this challenge, a deep ensemble learning method is constructed to predict anticancer peptides (ACPs) in this study. To evaluate the reliability of the framework, four different datasets are used in this study for training and testing. During the training process of the model, integration of feature selection methods, feature dimensionality reduction measures, and optimization of the deep ensemble model are carried out. Finally, we explored the interpretability of features that affected the final prediction results and built a web server platform to facilitate anticancer peptides prediction, which can be used by all researchers for further studies. This web server can be accessed at http://lmylab.online:5001/. Results: The result of this study achieves an accuracy rate of 98.53% and an AUC (Area under Curve) value of 0.9972 on the ACPfel dataset, it has improvements on other datasets as well.
Collapse
Affiliation(s)
- Mingyou Liu
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
| | - Tao Wu
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
| | - Xue Li
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
| | - Yingxue Zhu
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
| | - Sen Chen
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
| | - Jian Huang
- School of Life Science and Technology, University of Electronic Science and Technology, Chengdu, China
- School of Healthcare Technology, Chengdu Neusoft University, Chengdu, China
| | - Fengfeng Zhou
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| | - Hongmei Liu
- School of Biology and Engineering (School of Health Medicine Modern Industry), Guizhou Medical University, Guiyang, China
- Engineering Research Center of Health Medicine Biotechnology of Guizhou Province, Guizhou Medical University, Guiyang, China
- College of Computer Science and Technology, and Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, China
| |
Collapse
|