1
|
Ali F, Almuhaimeed A, Alghamdi W, Aldossary H, Asiry O, Masmoudi A. Leveraging deep learning for epigenetic protein prediction: a novel approach for early lung cancer diagnosis and drug discovery. Health Inf Sci Syst 2025; 13:28. [PMID: 40083337 PMCID: PMC11896910 DOI: 10.1007/s13755-025-00347-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2024] [Accepted: 01/21/2025] [Indexed: 03/16/2025] Open
Abstract
Epigenetic protein (EP) plays a crucial role in influencing disease development, controlling gene expression, and shaping cell identity. They hold potential as targets for future therapies, and studying their mechanisms can lead to improved diagnosis and treatment strategies for various diseases. Anticipating EP is imperative, yet conventional experimental approaches for prediction prove time-intensive and expensive. This work constructed CNN-BiLSTM, computational method for identification of EP prediction. Utilizing primary sequences, two datasets were constructed, and an amphiphilic pseudo amino acid, group dipeptide composition and group amino acid composition were devised to extract numerical features. Model training incorporated a suite of deep learning architectures, including BiLSTM, GRU, and CNN. Notably, an ensemble model combining CNN and BiLSTM, trained using AmpPseAAC features, demonstrated superior performance across both training and testing datasets compared to other predictors. This research contributes to the ongoing efforts to revolutionize therapeutic approaches by facilitating the identification of novel drug targets and improving disease treatment outcomes.
Collapse
Affiliation(s)
- Farman Ali
- Department of Computer Science, Bahria University Islamabad Campus, Islamabad, Pakistan
| | - Abdullah Almuhaimeed
- King Abdulaziz City for Science and Technology, Digital Health Institute, 11442 Riyadh, Saudi Arabia
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, 21589 Jeddah, Saudi Arabia
| | - Haya Aldossary
- Computer Science Department, College of Science and Humanities, Imam Abdulrahman Bin Faisal University, 31961 Jubail, Saudi Arabia
| | - Othman Asiry
- Department of Information Technology, College of Computing and Information Technology at Khulais, University of Jeddah, Jeddah, Saudi Arabia
| | - Atef Masmoudi
- Department of Computer Science, College of Computer Science, King Khalid University, 61421 Abha, Saudi Arabia
| |
Collapse
|
2
|
Hemmati S. Expanding the cryoprotectant toolbox in biomedicine by multifunctional antifreeze peptides. Biotechnol Adv 2025; 81:108545. [PMID: 40023203 DOI: 10.1016/j.biotechadv.2025.108545] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2024] [Revised: 01/07/2025] [Accepted: 02/23/2025] [Indexed: 03/04/2025]
Abstract
The global cryopreservation market size rises exponentially due to increased demand for cell therapy-based products, assisted reproductive technology, and organ transplantation. Cryoprotectants (CPAs) are required to reduce ice-related damage, osmotic cell injury, and protein denaturation. Antioxidants are needed to hamper membrane lipid peroxidation under freezing stress, and antibiotics are added to the cryo-solutions to prevent contamination. The vitrification process for sized organs requires a high concentration of CPA, which is hardly achievable using conventional penetrating toxic CPAs like DMSO. Antifreeze peptides (AFpeps) are biocompatible CPAs leveraging inspiration from nature, such as freeze-tolerant and freeze-avoidant organisms, to circumvent logistic limitations in cryogenic conditions. This study aims to introduce the advances of AFpeps with cell-penetrating, antioxidant, and antimicrobial characteristics. We herein revisit the placement of AFpeps in the biobanking of cancer cells, immune cells, stem cells, blood cells, germ cells (sperms and oocytes), and probiotics. Implementing low-immunogenic AFpeps for allograft cryopreservation minimizes HLA mismatching risk after organ transplantation. Applying AFpeps to formulate bioinks with optimal rheology in extrusion-based 3D cryobiopriners expedites the bench-to-beside transition of bioprinted scaffolds. This study advocates that the fine-tuned synthetic or insect-derived AFpeps, forming round blunt-shape crystals, are biomedically broad-spectrum, and cell-permeable AFpeps from marine and plant sources, which result in sharp ice crystals, are appropriate for cryosurgery. Perspectives of the available room for developing peptide mimetics in favor of higher activity and stability and peptide-functionalized nanoparticles for enhanced delivery are delineated. Finally, antitumor immune activation by cryoimmunotherapy as an autologous in-vivo tumor lysate vaccine has been illustrated.
Collapse
Affiliation(s)
- Shiva Hemmati
- Department of Pharmaceutical Biotechnology, School of Pharmacy, Shiraz University of Medical Sciences, Shiraz, Iran; Biotechnology Research Center, Shiraz University of Medical Sciences, Shiraz, Iran; Department of Pharmaceutical Biology, Faculty of Pharmaceutical Sciences, UCSI University, 56000 Cheras, Kuala Lumpur, Malaysia.
| |
Collapse
|
3
|
Li J, Zhang F, Wen Z, Fang C. AFP-MCDF: Multi and cross-dimensional feature fusion methods for antifreeze protein prediction. Anal Biochem 2025; 704:115881. [PMID: 40348048 DOI: 10.1016/j.ab.2025.115881] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2025] [Revised: 04/22/2025] [Accepted: 04/23/2025] [Indexed: 05/14/2025]
Abstract
Antifreeze proteins can effectively inhibit the formation of ice crystals and enhance cell survival in low-temperature environments. They protect the texture prolong the shelf life of food and maintain cell and tissue integrity in medical treatments, thereby improving the success rate of surgery and transplantation. Accurate prediction of Antifreeze proteins is important to advance these fields. Traditional wet-experiment methods, while providing reliable validation results, are usually time-consuming and costly. And existing computational methods still have room for improvement in predicting performance. In this study, a novel antifreeze protein prediction method, AFP-MCDF, is proposed. The AFP-MCDF method first extracts one- and two-dimensional feature representations of Antifreeze protein sequences using the pre-trained protein language models ProtBERT and ESM-2. Subsequently, these features are fused multidimensionally via BiLSTM and TextCNN to capture long-term dependencies and local features. Finally, the method predicts the frost resistance of Antifreeze protein sequences by cross-dimensional fusion and linear mapping from N to 2 dimensions. Experimental results show that AFP-MCDF performs well in the antifreeze protein prediction task, outperforming traditional computational methods and reaching the current state-of-the-art.
Collapse
Affiliation(s)
- Jinfeng Li
- Beijing Institute of Petrochemical Technology, Beijing, 102617, China
| | - Fan Zhang
- Beijing Institute of Petrochemical Technology, Beijing, 102617, China
| | - Zhenguo Wen
- Beijing Institute of Petrochemical Technology, Beijing, 102617, China
| | - Chun Fang
- Beijing Institute of Petrochemical Technology, Beijing, 102617, China.
| |
Collapse
|
4
|
Ali F, Masmoudi A, Alkhalifah T, Alturise F, Alghamdi W, Khalid M. IR-MBiTCN: Computational prediction of insulin receptor using deep learning: A multi-information fusion approach with multiscale bidirectional temporal convolutional network. Int J Biol Macromol 2025; 311:143844. [PMID: 40319974 DOI: 10.1016/j.ijbiomac.2025.143844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2025] [Revised: 04/28/2025] [Accepted: 04/30/2025] [Indexed: 05/07/2025]
Abstract
The insulin receptor (IR) is a transmembrane protein that controls glucose homeostasis and is highly associated with chronic diseases including cancer and neurological. Traditional experimental methods have provided essential insights into IR structure and function, but they are constrained by time, cost, and scalability. To address these limitations, we present a computational technique for IR prediction based on deep learning and multi-information fusion. First, we built sequence-based training and testing datasets. Second, the compositional, word embedding, and evolutionary features were retrieved using the Weighted-Group Dipeptide Composition (W-GDPC), FastText, and Bi-Block-Position Specific Scoring Matrix (BB-PSSM), respectively. Third, we use compositional, word embedding, and evolutionary features to generate multi-perspective fused features (MPFF). Fourth, the Multiscale Bidirectional Temporal Convolutional Network (MBiTCN) is used to train the model to process features at multiscale and analyze sequences in both forward and backward directions. The proposed approach (IR-MBiTCN) outperforms competing deep learning (DL) and machine learning (ML)-based models on training and testing datasets, achieving 83.50 % and 79.43 % accuracy, respectively. This study represents a pioneering use of computational methodology in IR prediction, providing a scalable, efficient alternative to experimental procedures and paving the way for advances in chronic disease therapy and drug discovery.
Collapse
Affiliation(s)
- Farman Ali
- Department of Computer Science, Bahria University Islamabad, Pakistan.
| | - Atef Masmoudi
- Department of Computer Science, College of Computer Science, King Khalid University, Abha 61421, Saudi Arabia
| | - Tamim Alkhalifah
- Department of Computer Engineering, College of Computer, Qassim University, Buraydah, Saudi Arabia
| | - Fahad Alturise
- Department of Cybersecurity, College of Computer, Qassim University, Buraydah, Saudi Arabia
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| | - Majdi Khalid
- Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah 21955, Saudi Arabia
| |
Collapse
|
5
|
Chen S, Zheng P, Zheng L, Yao Q, Meng Z, Lin L, Chen X, Liu R. BERT-DomainAFP: Antifreeze protein recognition and classification model based on BERT and structural domain annotation. iScience 2025; 28:112077. [PMID: 40241758 PMCID: PMC12002629 DOI: 10.1016/j.isci.2025.112077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 01/03/2025] [Accepted: 02/17/2025] [Indexed: 04/18/2025] Open
Abstract
Antifreeze proteins (AFPs) are crucial for organisms to adapt to low temperatures, with applications in medicine, food storage, aquaculture, and agriculture. Accurate AFP identification is challenging due to structural and sequence diversity. To improve prediction and classification, we propose BERT-DomainAFP, a deep learning model trained on the AntiFreezeDomains dataset created with a novel annotation strategy. The model uses pre-trained ProteinBERT and incorporates oversampling and undersampling techniques to handle unbalanced data, ensuring high predictive ability. BERT-DomainAFP achieves 98.48% accuracy, the highest among existing models, and can classify different AFP types based on structural domain features. This model outperforms current tools, offering a promising solution for AFP recognition and classification in research and applications.
Collapse
Affiliation(s)
- Shengzhen Chen
- State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Ping Zheng
- State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Lele Zheng
- State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Qinglong Yao
- State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Ziyu Meng
- State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Longshan Lin
- Laboratory of Marine Biodiversity Research, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen 361005, China
| | - Xinhua Chen
- State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| | - Ruoyu Liu
- State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou 350002, China
| |
Collapse
|
6
|
Shoombuatong W, Schaduangrat N, Homdee N, Ahmed S, Chumnanpuen P. Advancing the accuracy of tyrosinase inhibitory peptides prediction via a multiview feature fusion strategy. Sci Rep 2025; 15:4762. [PMID: 39922825 PMCID: PMC11807091 DOI: 10.1038/s41598-024-81807-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2024] [Accepted: 11/29/2024] [Indexed: 02/10/2025] Open
Abstract
Tyrosinase plays a crucial role as an enzyme in the production of melanin, which is the pigment accountable for determining the color of the hair, eyes, and skin. Tyrosinase inhibitory peptides (TIPs), mainly designed to regulate the activity of the enzyme tyrosinase, are of interest in various domains, including cosmetics, dermatology, and pharmaceuticals, due to their potential applications in controlling skin pigmentation. To date, a few machine learning-based models have been proposed for predicting TIPs, but their predictive performance remains unsatisfactory. In this study, we propose an innovative computational approach, named TIPred-MVFF, to accurately predict TIPs using only sequence information. Firstly, we established an up-to-date and high-quality dataset by collecting samples from various sources. Secondly, we applied a multi-view feature fusion (MVFF) strategy to extract and explore probability and category information embedded in TIPs, employing several machine learning (ML) algorithms coupled with different commonly used sequence-based feature encodings. Then, we employed resampling approaches to address the class imbalance issue. Finally, to maximize the utility of each feature, we fused probability-based and sequence-based features, generating more informative feature that were used to develop the final prediction model. Based on the independent test, experimental results showed that TIPred-MVFF outperformed several conventional ML classifiers and existing methods in terms of prediction accuracy and robustness, achieving an accuracy of 0.937 and a Matthew's correlation coefficient of 0.847. This new computational approach is anticipated to aid community-wide efforts in rapidly and cost-effectively discovering novel peptides with strong tyrosinase inhibitory activities.
Collapse
Affiliation(s)
- Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nutta Homdee
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Saeed Ahmed
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
- Department of Computer Science, University of Swabi, Swabi, 23561, Pakistan
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand.
- Kasetsart University International College (KUIC), Kasetsart University, Bangkok, 10900, Thailand.
| |
Collapse
|
7
|
Kumar N, Choudhury S, Bajiya N, Patiyal S, Raghava GPS. Prediction of Anti-Freezing Proteins From Their Evolutionary Profile. Proteomics 2025; 25:e202400157. [PMID: 39305039 DOI: 10.1002/pmic.202400157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2024] [Revised: 08/29/2024] [Accepted: 08/29/2024] [Indexed: 02/06/2025]
Abstract
Prediction of antifreeze proteins (AFPs) holds significant importance due to their diverse applications in healthcare. An inherent limitation of current AFP prediction methods is their reliance on unreviewed proteins for evaluation. This study evaluates, proposed and existing methods on an independent dataset containing 80 AFPs and 73 non-AFPs obtained from Uniport, which have been already reviewed by experts. Initially, we constructed machine learning models for AFP prediction using selected composition-based protein features and achieved a peak AUROC of 0.90 with an MCC of 0.69 on the independent dataset. Subsequently, we observed a notable enhancement in model performance, with the AUROC increasing from 0.90 to 0.93 upon incorporating evolutionary information instead of relying solely on the primary sequence of proteins. Furthermore, we explored hybrid models integrating our machine learning approaches with BLAST-based similarity and motif-based methods. However, the performance of these hybrid models either matched or was inferior to that of our best machine-learning model. Our best model based on evolutionary information outperforms all existing methods on independent/validation dataset. To facilitate users, a user-friendly web server with a standalone package named "AFPropred" was developed (https://webs.iiitd.edu.in/raghava/afpropred).
Collapse
Affiliation(s)
- Nishant Kumar
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Shubham Choudhury
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Nisha Bajiya
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| | - Sumeet Patiyal
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
- Cancer Data Science Laboratory, National Cancer Institute, National Institutes of Health, Bethesda, Maryland, USA
| | - Gajendra P S Raghava
- Department of Computational Biology, Indraprastha Institute of Information Technology, New Delhi, India
| |
Collapse
|
8
|
Zouari S, Ali F, Masmoudi A, Ghazalah SA, Alghamdi W, Kateb FA, Ibrahim N. Deep-GB: A novel deep learning model for globular protein prediction using CNN-BiLSTM architecture and enhanced PSSM with trisection strategy. IET Syst Biol 2024; 18:208-217. [PMID: 39514139 DOI: 10.1049/syb2.12108] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2024] [Revised: 09/30/2024] [Accepted: 10/27/2024] [Indexed: 11/16/2024] Open
Abstract
Globular proteins (GPs) play vital roles in a wide range of biological processes, encompassing enzymatic catalysis and immune responses. Enzymes, among these globular proteins, facilitate biochemical reactions, while others, such as haemoglobin, contribute to essential physiological functions such as oxygen transport. Given the importance of these considerations, accurately identifying Globular proteins is essential. To address the need for precise GP identification, this research introduces an innovative approach that employs a hybrid-based deep learning model called Deep-GP. We generated two datasets based on primary sequences and developed a novel feature descriptor called, Consensus Sequence-based Trisection-Position Specific Scoring Matrix (CST-PSSM). The model training phase involved the application of deep learning techniques, including the bidirectional long short-term memory network (BiLSTM), gated recurrent unit (GRU), and convolutional neural network (CNN). The BiLSTM and CNN were hybridised for ensemble learning. The CST-PSSM-based ensemble model achieved the most accurate predictive outcomes, outperforming other competitive predictors across both training and testing datasets. This demonstrates the potential of harnessing deep learning for precise GB prediction as a robust tool to expedite research, streamline drug discovery, and unveil novel therapeutic targets.
Collapse
Affiliation(s)
- Sonia Zouari
- National Engineering School of Sfax, University of Sfax, Sfax, Tunisia
| | - Farman Ali
- Department of Computer Science, Bahria University Islamabad Campus, Islamabad, Pakistan
| | - Atef Masmoudi
- Department of Computer Science, College of Computer Science, King Khalid University, Abha, Saudi Arabia
| | - Sarah Abu Ghazalah
- Department of Informatics and Computer System, College of Computer Science, King Khalid University, Abha, Saudi Arabia
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Faris A Kateb
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Nouf Ibrahim
- Family Medicine Clinic, Makkah Armed Force Medical Center, Makkah, Saudi Arabia
| |
Collapse
|
9
|
Beltrán JF, Herrera-Belén L, Yáñez AJ, Jimenez L. Prediction of viral oncoproteins through the combination of generative adversarial networks and machine learning techniques. Sci Rep 2024; 14:27108. [PMID: 39511292 PMCID: PMC11543823 DOI: 10.1038/s41598-024-77028-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2024] [Accepted: 10/18/2024] [Indexed: 11/15/2024] Open
Abstract
Viral oncoproteins play crucial roles in transforming normal cells into cancer cells, representing a significant factor in the etiology of various cancers. Traditionally, identifying these oncoproteins is both time-consuming and costly. With advancements in computational biology, bioinformatics tools based on machine learning have emerged as effective methods for predicting biological activities. Here, for the first time, we propose an innovative approach that combines Generative Adversarial Networks (GANs) with supervised learning methods to enhance the accuracy and generalizability of viral oncoprotein prediction. Our methodology evaluated multiple machine learning models, including Random Forest, Multilayer Perceptron, Light Gradient Boosting Machine, eXtreme Gradient Boosting, and Support Vector Machine. In ten-fold cross-validation on our training dataset, the GAN-enhanced Random Forest model demonstrated superior performance metrics: 0.976 accuracy, 0.976 F1 score, 0.977 precision, 0.976 sensitivity, and 1.0 AUC. During independent testing, this model achieved 0.982 accuracy, 0.982 F1 score, 0.982 precision, 0.982 sensitivity, and 1.0 AUC. These results establish our new tool, VirOncoTarget, accessible via a web application. We anticipate that VirOncoTarget will be a valuable resource for researchers, enabling rapid and reliable viral oncoprotein prediction and advancing our understanding of their role in cancer biology.
Collapse
Affiliation(s)
- Jorge F Beltrán
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile.
| | - Lisandra Herrera-Belén
- Departamento de Ciencias Básicas, Facultad de Ciencias, Universidad Santo Tomas, Temuco, Chile
| | - Alejandro J Yáñez
- Departamento de Investigación y Desarrollo, Greenvolution SpA, Puerto Varas, Chile
- Interdisciplinary Center for Aquaculture Research (INCAR), Concepcion, Chile
| | - Luis Jimenez
- Department of Chemical Engineering, Faculty of Engineering and Science, Universidad de La Frontera, Ave. Francisco Salazar 01145, Temuco, Chile
| |
Collapse
|
10
|
Wu J, Liu Y, Zhu Y, Yu DJ. Improving Antifreeze Proteins Prediction With Protein Language Models and Hybrid Feature Extraction Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2024; 21:2349-2358. [PMID: 39316498 DOI: 10.1109/tcbb.2024.3467261] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2024]
Abstract
Accurate identification of antifreeze proteins (AFPs) is crucial in developing biomimetic synthetic anti-icing materials and low-temperature organ preservation materials. Although numerous machine learning-based methods have been proposed for AFPs prediction, the complex and diverse nature of AFPs limits the prediction performance of existing methods. In this study, we propose AFP-Deep, a new deep learning method to predict antifreeze proteins by integrating embedding from protein sequences with pre-trained protein language models and evolutionary contexts with hybrid feature extraction networks. The experimental results demonstrated that the main advantage of AFP-Deep is its utilization of pre-trained protein language models, which can extract discriminative global contextual features from protein sequences. Additionally, the hybrid deep neural networks designed for protein language models and evolutionary context feature extraction enhance the correlation between embeddings and antifreeze pattern. The performance evaluation results show that AFP-Deep achieves superior performance compared to state-of-the-art models on benchmark datasets, achieving an AUPRC of 0.724 and 0.924, respectively.
Collapse
|
11
|
Adnan A, Hongya W, Ali F, Khalid M, Alghushairy O, Alsini R. A bi-layer model for identification of piwiRNA using deep neural learning. J Biomol Struct Dyn 2024; 42:5725-5733. [PMID: 37608578 DOI: 10.1080/07391102.2023.2243523] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 06/15/2023] [Indexed: 08/24/2023]
Abstract
piwiRNA is a kind of non-coding RNA (ncRNA) that cannot be translated into proteins. It helps in understanding the study of gametes generation and regulation of gene expression over both transcriptional and post-transcriptional levels. piwiRNA has the function of instructing deadenylation, animal fertility, silencing transposons, fighting viruses, and regulating endogenous genes. Due to the great significance of piwiRNA, prediction of piwiRNA is essential for crucial cellular functions. Several predictors were established for prediction of piwiRNA. However, improving the prediction of piwiRNA is highly desirable. In the current study, we developed a more promising predictor named, BLP-piwiRNA. The features are explored by reverse complement k-mer, gapped-k-mer composition, and k-mer composition. The feature set of all descriptors is fused and the best features are selected by cascade and relief feature selection strategies. The best feature sets are provided to random forest (RF), deep neural network (DNN), and support vector machine (SVM). The models validation are examined by 10-fold test. DNN with optimal features of Cascade feature selection approach secured the highest prediction results. The results illustrate that BLP-piwiRNA effectively outperforms the existing studies. The proposed approach would be beneficial for both research community and drug development industry. BLP-piwiRNA would serve as novel biomarkers and therapeutic targets for tumor diagnostics and treatment.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Adnan Adnan
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Wang Hongya
- School of Computer Science and Technology, Donghua University, Shanghai, China
| | - Farman Ali
- Department of Software Engineering, Sarhad University of Science and Information Technology, Peshawar, Pakistan
| | - Majdi Khalid
- Department of Computer Science, College of Computers and Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Omar Alghushairy
- Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| |
Collapse
|
12
|
Ali F, Almuhaimeed A, Khalid M, Alshanbari H, Masmoudi A, Alsini R. DEEP-EP: Identification of epigenetic protein by ensemble residual convolutional neural network for drug discovery. Methods 2024; 226:49-53. [PMID: 38621436 DOI: 10.1016/j.ymeth.2024.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 04/06/2024] [Accepted: 04/08/2024] [Indexed: 04/17/2024] Open
Abstract
Epigenetic proteins (EP) play a role in the progression of a wide range of diseases, including autoimmune disorders, neurological disorders, and cancer. Recognizing their different functions has prompted researchers to investigate them as potential therapeutic targets and pharmacological targets. This paper proposes a novel deep learning-based model that accurately predicts EP. This study introduces a novel deep learning-based model that accurately predicts EP. Our approach entails generating two distinct datasets for training and evaluating the model. We then use three distinct strategies to transform protein sequences to numerical representations: Dipeptide Deviation from Expected Mean (DDE), Dipeptide Composition (DPC), and Group Amino Acid (GAAC). Following that, we train and compare the performance of four advanced deep learning models algorithms: Ensemble Residual Convolutional Neural Network (ERCNN), Generative Adversarial Network (GAN), Convolutional Neural Network (CNN), and Gated Recurrent Unit (GRU). The DDE encoding combined with the ERCNN model demonstrates the best performance on both datasets. This study demonstrates deep learning's potential for precisely predicting EP, which can considerably accelerate research and streamline drug discovery efforts. This analytical method has the potential to find new therapeutic targets and advance our understanding of EP activities in disease.
Collapse
Affiliation(s)
- Farman Ali
- Department of Computer Science, Bahria University Islamabad Campus, Pakistan.
| | - Abdullah Almuhaimeed
- Digital Health Institute, King Abdulaziz City for Science and Technology, Riyadh 11442, Saudi Arabia
| | - Majdi Khalid
- Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah 21955, Saudi Arabia
| | - Hanan Alshanbari
- Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah 21955, Saudi Arabia
| | - Atef Masmoudi
- College of Computer Science, King Khalid University, Abha, Saudi Arabia
| | - Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
| |
Collapse
|
13
|
Arif M, Fang G, Ghulam A, Musleh S, Alam T. DPI_CDF: druggable protein identifier using cascade deep forest. BMC Bioinformatics 2024; 25:145. [PMID: 38580921 PMCID: PMC11334562 DOI: 10.1186/s12859-024-05744-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Accepted: 03/13/2024] [Indexed: 04/07/2024] Open
Abstract
BACKGROUND Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor's performance is still not satisfactory. METHODS In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF. RESULTS The empirical outcomes on 10-fold cross-validation demonstrate that the proposed model achieved 99.13 % accuracy and 0.982 of Matthew's-correlation-coefficient (MCC) on the training dataset. The generalization power of the trained model is further examined on an independent dataset and achieved 95.01% of maximum accuracy and 0.900 MCC. When compared to current state-of-the-art methods, DPI_CDF improves in terms of accuracy by 4.27% and 4.31% on training and testing datasets, respectively. We believe, DPI_CDF will support the research community to identify druggable proteins and escalate the drug discovery process. AVAILABILITY The benchmark datasets and source codes are available in GitHub: http://github.com/Muhammad-Arif-NUST/DPI_CDF .
Collapse
Affiliation(s)
- Muhammad Arif
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Ge Fang
- State Key Laboratory for Organic Electronics and Information Displays, Institute of Advanced Materials (IAM), Nanjing 210023, P. R. China, Nanjing 210023, China
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bankok, 10700, Thailand
| | - Ali Ghulam
- Information Technology Centre, Sindh Agriculture University, Sindh, Pakistan
| | - Saleh Musleh
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar
| | - Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar.
| |
Collapse
|
14
|
Khalid M, Ali F, Alghamdi W, Alzahrani A, Alsini R, Alzahrani A. An ensemble computational model for prediction of clathrin protein by coupling machine learning with discrete cosine transform. J Biomol Struct Dyn 2024:1-9. [PMID: 38498362 DOI: 10.1080/07391102.2024.2329777] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 02/19/2024] [Indexed: 03/20/2024]
Abstract
Clathrin protein (CP) plays a pivotal role in numerous cellular processes, including endocytosis, signal transduction, and neuronal function. Dysregulation of CP has been associated with a spectrum of diseases. Given its involvement in various cellular functions, CP has garnered significant attention for its potential applications in drug design and medicine, ranging from targeted drug delivery to addressing viral infections, neurological disorders, and cancer. The accurate identification of CP is crucial for unraveling its function and devising novel therapeutic strategies. Computational methods offer a rapid, cost-effective, and less labor-intensive alternative to traditional identification methods, making them especially appealing for high-throughput screening. This paper introduces CL-Pred, a novel computational method for CP identification. CL-Pred leverages three feature descriptors: Dipeptide Deviation from Expected Mean (DDE), Bigram Position Specific Scoring Matrix (BiPSSM), and Position Specific Scoring Matrix-Tetra Slice-Discrete Cosine Transform (PSSM-TS-DCT). The model is trained using three classifiers: Support Vector Machine (SVM), Extremely Randomized Tree (ERT), and Light eXtreme Gradient Boosting (LiXGB). Notably, the LiXGB-based model achieves outstanding performance, demonstrating accuracies of 94.63% and 93.65% on the training and testing datasets, respectively. The proposed CL-Pred method is poised to significantly advance our comprehension of clathrin-mediated endocytosis, cellular physiology, and disease pathogenesis. Furthermore, it holds promise for identifying potential drug targets across a spectrum of diseases.
Collapse
Affiliation(s)
- Majdi Khalid
- Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Farman Ali
- Sarhad University of Science and Information Technology Peshawar, Mardan Campus, Mardan, Pakistan
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Abdulrahman Alzahrani
- Department of Information System and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ahmed Alzahrani
- College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
15
|
Alsini R, Almuhaimeed A, Ali F, Khalid M, Farrash M, Masmoudi A. Deep-VEGF: deep stacked ensemble model for prediction of vascular endothelial growth factor by concatenating gated recurrent unit with two-dimensional convolutional neural network. J Biomol Struct Dyn 2024:1-11. [PMID: 38450715 DOI: 10.1080/07391102.2024.2323144] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2023] [Accepted: 02/16/2024] [Indexed: 03/08/2024]
Abstract
Vascular endothelial growth factor (VEGF) is involved in the development and progression of various diseases, including cancer, diabetic retinopathy, macular degeneration and arthritis. Understanding the role of VEGF in various disorders has led to the development of effective treatments, including anti-VEGF drugs, which have significantly improved therapeutic methods. Accurate VEGF identification is critical, yet experimental identification is expensive and time-consuming. This study presents Deep-VEGF, a novel computational model for VEGF prediction based on deep-stacked ensemble learning. We formulated two datasets using primary sequences. A novel feature descriptor named K-Space Tri Slicing-Bigram position-specific scoring metrix (KSTS-BPSSM) is constructed to extract numerical features from primary sequences. The model training is performed by deep learning techniques, including gated recurrent unit (GRU), generative adversarial network (GAN) and convolutional neural network (CNN). The GRU and CNN are ensembled using stacking learning approach. KSTS-BPSSM-based ensemble model secured the most accurate predictive outcomes, surpassing other competitive predictors across both training and testing datasets. This demonstrates the potential of leveraging deep learning for accurate VEGF prediction as a powerful tool to accelerate research, streamline drug discovery and uncover novel therapeutic targets. This insightful approach holds promise for expanding our knowledge of VEGF's role in health and disease.
Collapse
Affiliation(s)
- Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Abdullah Almuhaimeed
- Digital Health Institute, King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia
| | - Farman Ali
- Sarhad University of Science and Information Technology Peshawar, Mardan Campus, Pakistan
| | - Majdi Khalid
- Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Majed Farrash
- Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Atef Masmoudi
- College of Computer Science, King Khalid University, Abha, Saudi Arabia
| |
Collapse
|
16
|
Akbar S, Raza A, Zou Q. Deepstacked-AVPs: predicting antiviral peptides using tri-segment evolutionary profile and word embedding based multi-perspective features with deep stacking model. BMC Bioinformatics 2024; 25:102. [PMID: 38454333 PMCID: PMC10921744 DOI: 10.1186/s12859-024-05726-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 03/01/2024] [Indexed: 03/09/2024] Open
Abstract
BACKGROUND Viral infections have been the main health issue in the last decade. Antiviral peptides (AVPs) are a subclass of antimicrobial peptides (AMPs) with substantial potential to protect the human body against various viral diseases. However, there has been significant production of antiviral vaccines and medications. Recently, the development of AVPs as an antiviral agent suggests an effective way to treat virus-affected cells. Recently, the involvement of intelligent machine learning techniques for developing peptide-based therapeutic agents is becoming an increasing interest due to its significant outcomes. The existing wet-laboratory-based drugs are expensive, time-consuming, and cannot effectively perform in screening and predicting the targeted motif of antiviral peptides. METHODS In this paper, we proposed a novel computational model called Deepstacked-AVPs to discriminate AVPs accurately. The training sequences are numerically encoded using a novel Tri-segmentation-based position-specific scoring matrix (PSSM-TS) and word2vec-based semantic features. Composition/Transition/Distribution-Transition (CTDT) is also employed to represent the physiochemical properties based on structural features. Apart from these, the fused vector is formed using PSSM-TS features, semantic information, and CTDT descriptors to compensate for the limitations of single encoding methods. Information gain (IG) is applied to choose the optimal feature set. The selected features are trained using a stacked-ensemble classifier. RESULTS The proposed Deepstacked-AVPs model achieved a predictive accuracy of 96.60%%, an area under the curve (AUC) of 0.98, and a precision-recall (PR) value of 0.97 using training samples. In the case of the independent samples, our model obtained an accuracy of 95.15%, an AUC of 0.97, and a PR value of 0.97. CONCLUSION Our Deepstacked-AVPs model outperformed existing models with a ~ 4% and ~ 2% higher accuracy using training and independent samples, respectively. The reliability and efficacy of the proposed Deepstacked-AVPs model make it a valuable tool for scientists and may perform a beneficial role in pharmaceutical design and research academia.
Collapse
Affiliation(s)
- Shahid Akbar
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| | - Ali Raza
- Department of Physical and Numerical Sciences, Qurtuba University of Science and Information Technology, Peshawar, 25124, KP, Pakistan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, People's Republic of China.
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, People's Republic of China.
| |
Collapse
|
17
|
Lin L, Long Y, Liu J, Deng D, Yuan Y, Liu L, Tan B, Qi H. FRP-XGBoost: Identification of ferroptosis-related proteins based on multi-view features. Int J Biol Macromol 2024; 262:130180. [PMID: 38360239 DOI: 10.1016/j.ijbiomac.2024.130180] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2023] [Revised: 02/11/2024] [Accepted: 02/12/2024] [Indexed: 02/17/2024]
Abstract
Ferroptosis represents a novel form of programmed cell death. Pan-cancer bioinformatics analysis indicates that identifying and modulating ferroptosis offer innovative approaches for preventing and treating diverse tumor pathologies. However, the precise detection of ferroptosis-related proteins via conventional wet-laboratory techniques remains a formidable challenge, largely due to the constraints of existing methodologies. These traditional approaches are not only labor-intensive but also financially burdensome. Consequently, there is an imperative need for the development of more sophisticated and efficient computational tools to facilitate the detection of these proteins. In this paper, we presented a XGBoost and multi-view features-based machine learning prediction method for predicting ferroptosis-related proteins, which was referred to as FRP-XGBoost. In this study, we explored four types of protein feature extraction methods and evaluated their effectiveness in predicting ferroptosis-related proteins using six of the most commonly used traditional classifiers. To enhance the representational power of the hybrid features, we employed a two-step feature selection technique to identify the optimal subset of features. Subsequently, we constructed a prediction model using the XGBoost algorithm. The FRP-XGBoost achieved an accuracy of 96.74 % in 10-fold cross-validation and a further accuracy of 91.52 % in an independent test. The implementation source code of FRP-XGBoost is available at https://github.com/linli5417/FRP-XGBoost.
Collapse
Affiliation(s)
- Li Lin
- Department of Obstetrics and Gynecology, Women and Children's Hospital of Chongqing Medical University, Chongqing 401147, China; Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Chongqing 401147, China
| | - Yao Long
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing 400016, China; Joint International Research Laboratory of Reproduction and Development, Chinese Ministry of Education, Chongqing Medical University, 400016, China; Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China
| | - Jinkai Liu
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing 400016, China; Joint International Research Laboratory of Reproduction and Development, Chinese Ministry of Education, Chongqing Medical University, 400016, China; Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China
| | - Dongliang Deng
- Department of Oncology, Chongqing Traditional Chinese Medicine Hospital, Chongqing 400021, China
| | - Yu Yuan
- Department of Obstetrics and Gynecology, Women and Children's Hospital of Chongqing Medical University, Chongqing 401147, China; Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Chongqing 401147, China
| | - Lubin Liu
- Department of Obstetrics and Gynecology, Women and Children's Hospital of Chongqing Medical University, Chongqing 401147, China; Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Chongqing 401147, China
| | - Bin Tan
- Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing 400016, China; Joint International Research Laboratory of Reproduction and Development, Chinese Ministry of Education, Chongqing Medical University, 400016, China; Department of Obstetrics, The First Affiliated Hospital of Chongqing Medical University, Chongqing 400016, China.
| | - Hongbo Qi
- Department of Obstetrics and Gynecology, Women and Children's Hospital of Chongqing Medical University, Chongqing 401147, China; Department of Obstetrics and Gynecology, Chongqing Health Center for Women and Children, Chongqing 401147, China; Chongqing Key Laboratory of Maternal and Fetal Medicine, Chongqing Medical University, Chongqing 400016, China; Joint International Research Laboratory of Reproduction and Development, Chinese Ministry of Education, Chongqing Medical University, 400016, China.
| |
Collapse
|
18
|
Shoombuatong W, Homdee N, Schaduangrat N, Chumnanpuen P. Leveraging a meta-learning approach to advance the accuracy of Na v blocking peptides prediction. Sci Rep 2024; 14:4463. [PMID: 38396246 PMCID: PMC10891130 DOI: 10.1038/s41598-024-55160-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Accepted: 02/21/2024] [Indexed: 02/25/2024] Open
Abstract
The voltage-gated sodium (Nav) channel is a crucial molecular component responsible for initiating and propagating action potentials. While the α subunit, forming the channel pore, plays a central role in this function, the complete physiological function of Nav channels relies on crucial interactions between the α subunit and auxiliary proteins, known as protein-protein interactions (PPI). Nav blocking peptides (NaBPs) have been recognized as a promising and alternative therapeutic agent for pain and itch. Although traditional experimental methods can precisely determine the effect and activity of NaBPs, they remain time-consuming and costly. Hence, machine learning (ML)-based methods that are capable of accurately contributing in silico prediction of NaBPs are highly desirable. In this study, we develop an innovative meta-learning-based NaBP prediction method (MetaNaBP). MetaNaBP generates new feature representations by employing a wide range of sequence-based feature descriptors that cover multiple perspectives, in combination with powerful ML algorithms. Then, these feature representations were optimized to identify informative features using a two-step feature selection method. Finally, the selected informative features were applied to develop the final meta-predictor. To the best of our knowledge, MetaNaBP is the first meta-predictor for NaBP prediction. Experimental results demonstrated that MetaNaBP achieved an accuracy of 0.948 and a Matthews correlation coefficient of 0.898 over the independent test dataset, which were 5.79% and 11.76% higher than the existing method. In addition, the discriminative power of our feature representations surpassed that of conventional feature descriptors over both the training and independent test datasets. We anticipate that MetaNaBP will be exploited for the large-scale prediction and analysis of NaBPs to narrow down the potential NaBPs.
Collapse
Affiliation(s)
- Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| | - Nutta Homdee
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Pramote Chumnanpuen
- Department of Zoology, Faculty of Science, Kasetsart University, Bangkok, 10900, Thailand
- Omics Center for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok, 10900, Thailand
| |
Collapse
|
19
|
Dhibar S, Jana B. Accurate Prediction of Antifreeze Protein from Sequences through Natural Language Text Processing and Interpretable Machine Learning Approaches. J Phys Chem Lett 2023; 14:10727-10735. [PMID: 38009833 DOI: 10.1021/acs.jpclett.3c02817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Antifreeze proteins (AFPs) bind to growing iceplanes owing to their structural complementarity nature, thereby inhibiting the ice-crystal growth by thermal hysteresis. Classification of AFPs from sequence is a difficult task due to their low sequence similarity, and therefore, the usual sequence similarity algorithms, like Blast and PSI-Blast, are not efficient. Here, a method combining n-gram feature vectors and machine learning models to accelerate the identification of potential AFPs from sequences is proposed. All these n-gram features are extracted from the K-mer counting method. The comparative analysis reveals that, among different machine learning models, Xgboost outperforms others in predicting AFPs from sequence when penta-mers are used as a feature vector. When tested on an independent dataset, our method performed better compared to other existing ones with sensitivity of 97.50%, recall of 98.30%, and f1 score of 99.10%. Further, we used the SHAP method, which provides important insight into the functional activity of AFPs.
Collapse
Affiliation(s)
- Saikat Dhibar
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Jadavpur, Kolkata 700032, India
| | - Biman Jana
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Jadavpur, Kolkata 700032, India
| |
Collapse
|
20
|
Alghushairy O, Ali F, Alghamdi W, Khalid M, Alsini R, Asiry O. Machine learning-based model for accurate identification of druggable proteins using light extreme gradient boosting. J Biomol Struct Dyn 2023; 42:12330-12341. [PMID: 37850427 DOI: 10.1080/07391102.2023.2269280] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2023] [Accepted: 10/04/2023] [Indexed: 10/19/2023]
Abstract
The identification of druggable proteins (DPs) is significant for the development of new drugs, personalized medicine, understanding of disease mechanisms, drug repurposing, and economic benefits. By identifying new druggable targets, researchers can develop new therapies for a range of diseases, leading to better patient outcomes. Identification of DPs by machine learning strategies is more efficient and cost-effective than conventional methods. In this study, a computational predictor, namely Drug-LXGB, is introduced to enhance the identification of DPs. Features are discovered by composition, transition, and distribution (CTD), composition of K-spaced amino acid pair (CKSAAP), pseudo-position-specific scoring matrix (PsePSSM), and a novel descriptor, called multi-block pseudo amino acid composition (MB-PseAAC). The dimensions of CTD, CKSAAP, PsePSSM, and MB-PseAAC are integrated and utilized the sequential forward selection as feature selection algorithm. The best characteristics are provided by random forest, extreme gradient boosting, and light eXtreme gradient boosting (LXGB). The predictive analysis of these learning methods is measured via 10-fold cross-validation. The LXGB-based model secures the highest results than other existing predictors. Our novel protocol will perform an active role in designing novel drugs and would be fruitful to explore the potential target. This study will help better to capture a more universal view of a potential target.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Omar Alghushairy
- Department of Information Systems and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Farman Ali
- Department of Software Engineering, Sarhad University of Science and Information Technology Peshawar Mardan Campus, Peshawar, Pakistan
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Majdi Khalid
- Department of Computer Science, College of Computers and Information Systems, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Othman Asiry
- Department of Information Technology, College of Computing and Information Technology at Khulais, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
21
|
Ali F, Kumar H, Alghamdi W, Kateb FA, Alarfaj FK. Recent Advances in Machine Learning-Based Models for Prediction of Antiviral Peptides. ARCHIVES OF COMPUTATIONAL METHODS IN ENGINEERING : STATE OF THE ART REVIEWS 2023; 30:1-12. [PMID: 37359746 PMCID: PMC10148704 DOI: 10.1007/s11831-023-09933-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Accepted: 04/19/2023] [Indexed: 06/28/2023]
Abstract
Viruses have killed and infected millions of people across the world. It causes several chronic diseases like COVID-19, HIV, and hepatitis. To cope with such diseases and virus infections, antiviral peptides (AVPs) have been applied in the design of drugs. Keeping in view the significant role in pharmaceutical industry and other research fields, identification of AVPs is highly indispensable. In this connection, experimental and computational methods were proposed to identify AVPs. However, more accurate predictors for boosting AVPs identification are highly desirable. This work presents a thorough study and reports the available predictors of AVPs. We explained applied datasets, feature representation approaches, classification algorithms, and evaluation parameters of performance. In this study, the limitations of the existing studies and the best methods were emphasized. Provided the pros and cons of the applied classifiers. The future insights demonstrate efficient feature encoding approaches, best feature optimization schemes, and effective classification techniques that can improve the performance of novel method for accurate prediction of AVPs.
Collapse
Affiliation(s)
- Farman Ali
- Sarhad University of Science and Information Technology Peshawar, Mardan Campus, Khyber Pakhtunkhwa, Pakistan
| | - Harish Kumar
- Department of Computer Science, College of Computer Science, King Khalid University, Abha, Saudi Arabia
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, 21589 Saudi Arabia
| | - Faris A. Kateb
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, 21589 Saudi Arabia
| | - Fawaz Khaled Alarfaj
- Department of Management Information Systems, King Faisal University, Hufof, Saudi Arabia
| |
Collapse
|