1
|
Karim A, Alkhalifah T, Alturise F, Khan YD. PADG-Pred: Exploring Ensemble Approaches for Identifying Parkinson's Disease Associated Biomarkers Using Genomic Sequences Analysis. IET Syst Biol 2025; 19:e70006. [PMID: 40088455 PMCID: PMC11910177 DOI: 10.1049/syb2.70006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Revised: 01/04/2025] [Accepted: 02/12/2025] [Indexed: 03/17/2025] Open
Abstract
Parkinson's disease (PD), a degenerative disorder affecting the nervous system, manifests as unbalanced movements, stiffness, tremors, and coordination difficulties. Its cause, believed to involve genetic and environmental factors, underscores the critical need for prompt diagnosis and intervention to enhance treatment effectiveness. Despite the array of available diagnostics, their reliability remains a challenge. In this study, an innovative predictor PADG-Pred is proposed for the identification of Parkinson's associated biomarkers, utilising a genomic profile. In this study, a novel predictor, PADG-Pred, which not only identifies Parkinson's associated biomarkers through genomic profiling but also uniquely integrates multiple statistical feature extraction techniques with ensemble-based classification frameworks, thereby providing a more robust and interpretable decision-making process than existing tools. The processed dataset was utilised for feature extraction through multiple statistical moments and it is further involved in extensive training of the model using diverse classification techniques, encompassing Ensemble methods; XGBoost, Random Forest, Light Gradient Boosting Machine, Bagging, ExtraTrees, and Stacking. State-of-the-art validation procedures are applied, assessing key metrics such as specificity, accuracy, sensitivity/recall, and Mathew's correlation coefficient. The outcomes demonstrate the outstanding performance of PADG-RF, showcasing accuracy metrics consistently achieving ∼91% for the independent set, ∼94% for 5-fold, and ∼96% for 10-fold in cross-validation.
Collapse
Affiliation(s)
- Ayesha Karim
- Department of Computer ScienceSchool of Systems and Technology University of Management and TechnologyLahorePakistan
| | - Tamim Alkhalifah
- Department of Computer Engineering, College of ComputerQassim UniversityBuraydahSaudi Arabia
| | - Fahad Alturise
- Department of CybersecurityCollege of ComputerQassim UniversityBuraydahSaudi Arabia
| | - Yaser Daanial Khan
- Department of Computer ScienceSchool of Systems and Technology University of Management and TechnologyLahorePakistan
| |
Collapse
|
2
|
Binzagr F, Naseem A, Farooq MU, Alromema N. TNFR-LSTM: A Deep Intelligent Model for Identification of Tumour Necroses Factor Receptor (TNFR) Activity. IET Syst Biol 2025; 19:e70007. [PMID: 40156875 PMCID: PMC11954562 DOI: 10.1049/syb2.70007] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2024] [Revised: 02/11/2025] [Accepted: 02/12/2025] [Indexed: 04/01/2025] Open
Abstract
Tumour necrosis factors (TNFs) are key players in processes such as inflammation, cancer development, and autoimmune diseases. However, accurately identifying TNFs remains challenging because of their complex interactions with other cytokines. Although existing machine learning models offer some potential, they often fall short in reliably distinguishing TNFs. To address this issue, the authors developed DEEP-TNFR, a more advanced model designed specifically to predict TNFR activity. The approach incorporates features such as relative and reverse positions, along with statistical moments, and is tested on a recognised benchmark dataset. The authors explored six different deep learning classifiers, including fully connected networks (FCN), convolutional neural networks (CNN), simple RNN (RNN), long short-term memory (LSTM), bidirectional LSTM (Bi-LSTM), and gated recurrent units (GRU). The model's effectiveness was evaluated through multiple methods: self-consistency, independent set testing, and 5- and 10-fold cross-validation, using metrics, such as accuracy, specificity, sensitivity, and Matthews correlation coefficient. Among these classifiers, LSTM proved to be the most effective, outperforming the others and setting a new standard compared to previous studies. DEEP-TNFR is poised to significantly support ongoing research by enhancing the accuracy of TNFR identification.
Collapse
Affiliation(s)
- Faisal Binzagr
- Department of Computer ScienceFaculty of Computing and Information Technology‐RabighKing Abdulaziz UniversityRabighSaudi Arabia
| | - Ansar Naseem
- Department of Software EngineeringSuperior UniversityLahorePakistan
| | - Muhammad Umer Farooq
- Department of Computer ScienceSchool of Systems and TechnologyUniversity of Management and TechnologyLahorePakistan
| | - Nashwan Alromema
- Department of Computer ScienceFaculty of Computing and Information Technology‐RabighKing Abdulaziz UniversityRabighSaudi Arabia
| |
Collapse
|
3
|
Sankar D, Oviya IR. Multidisciplinary approaches to study anaemia with special mention on aplastic anaemia (Review). Int J Mol Med 2024; 54:95. [PMID: 39219286 PMCID: PMC11410310 DOI: 10.3892/ijmm.2024.5419] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2024] [Accepted: 07/02/2024] [Indexed: 09/04/2024] Open
Abstract
Anaemia is a common health problem worldwide that disproportionately affects vulnerable groups, such as children and expectant mothers. It has a variety of underlying causes, some of which are genetic. A comprehensive strategy combining physical examination, laboratory testing (for example, a complete blood count), and molecular tools for accurate identification is required for diagnosis. With nearly 400 varieties of anaemia, accurate diagnosis remains a challenging task. Red blood cell abnormalities are largely caused by genetic factors, which means that a thorough understanding requires interpretation at the molecular level. As a result, precision medicine has become a key paradigm, utilising artificial intelligence (AI) techniques, such as deep learning and machine learning, to improve prognostic evaluation, treatment prediction, and diagnostic accuracy. Furthermore, exploring the immunomodulatory role of vitamin D along with biomarker‑based molecular techniques offers promising avenues for insight into anaemia's pathophysiology. The intricacy of aplastic anaemia makes it particularly noteworthy as a topic deserving of concentrated molecular research. Given the complexity of anaemia, an integrated strategy integrating clinical, laboratory, molecular, and AI techniques shows a great deal of promise. Such an approach holds promise for enhancing global anaemia management options in addition to advancing our understanding of the illness.
Collapse
Affiliation(s)
- Divya Sankar
- Department of Sciences, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Chennai, Tamil Nadu 601103, India
| | - Iyyappan Ramalakshmi Oviya
- Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai, Tamil Nadu 601103, India
| |
Collapse
|
4
|
Arshad F, Ahmed S, Amjad A, Kabir M. An explainable stacking-based approach for accelerating the prediction of antidiabetic peptides. Anal Biochem 2024; 691:115546. [PMID: 38670418 DOI: 10.1016/j.ab.2024.115546] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2024] [Revised: 04/20/2024] [Accepted: 04/24/2024] [Indexed: 04/28/2024]
Abstract
Diabetes is a chronic disease that is characterized by high blood sugar levels and can have several harmful outcomes. Hyperglycemia, which is defined by persistently elevated blood sugar, is one of the primary concerns. People can improve their overall well-being and get optimal health outcomes by prioritizing diabetes control. Although the use of experimental approaches in diabetes treatment is cost-effective, it necessitates the development of many strategies for evaluating the efficacy of therapies. Researchers can quickly create new strategies for managing diabetes and get vital insights by enabling virtual screening with computational tools and procedures. In this study, we suggest a predictor named STADIP (STacking-based predictor for AntiDiabetic Peptides), a new method to predict antidiabetic peptides (ADPs) utilizing a stacked-based ensemble approach. It uses 12 different feature encodings and seven machine-learning techniques to construct 84 baseline models. The impacts of various baseline models on ADP prediction were then thoroughly examined. A two-step feature selection method, eXtreme Gradient Boosting with Sequential Forward Selection (XGB-SFS), was employed to determine the optimal number, out of 84 PFs to enhance predictive performance. Subsequently, utilizing the meta-predictor approach, 45 selected PFs were integrated into an XGB classifier to formulate the final hybrid model. The proposed method demonstrated superior predictive capabilities compared to constituent baseline models, as evidenced by evaluations on both cross-validation and independent tests. During extensive independent testing, STADIP achieved promising performance with accuracy and mathew's correlation coefficient of 0.954 and 0.877, respectively. It is anticipated that it will be useful tool in helping the scientific community to identify new antidiabetic proteins.
Collapse
Affiliation(s)
- Farwa Arshad
- School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan.
| | - Saeed Ahmed
- School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan.
| | - Aqsa Amjad
- School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan.
| | - Muhammad Kabir
- School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan.
| |
Collapse
|
5
|
Yang S, Xu P. HemoDL: Hemolytic peptides prediction by double ensemble engines from Rich sequence-derived and transformer-enhanced information. Anal Biochem 2024; 690:115523. [PMID: 38552762 DOI: 10.1016/j.ab.2024.115523] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Revised: 03/20/2024] [Accepted: 03/22/2024] [Indexed: 04/02/2024]
Abstract
Hemolytic peptides can trigger hemolysis by rupturing red blood cells' membranes and triggering cell disruption. Due to the labor-intensive and time-consuming in-lab identification process, accurate, high-throughput hemolytic peptide prediction is crucial for the growth of peptide sequence data in proteomics and peptidomics. In this study, we offer the HemoDL ensemble learning model, which learns the distinct distribution of sequence characteristics for predicting the hemolytic activity of peptides using a double LightGBM framework. To determine the most informative encoding features, we compare 17 widely used features across four benchmark datasets. Our investigation reveals that CTD, BPF, Charge, AAC, GDPC, ATC, QSO, and transformer-based features exhibit more positive contributions to detecting the hemolytic activity of peptides. Comparison with eight state-of-the-art methods demonstrates that HemoDL outperforms other models, attaining higher Matthews Correlation Coefficient values on four test datasets, ranging from 6.30% to 16.04%, 6.63%-11.26%, 4.76%-9.92%, and 7.41%-15.03%, respectively. Additionally, we provide the HemoDL with a user-friendly graphical interface available at https://github.com/abcair/HemoDL. In summary, the HemoDL model, leveraging CTD, BPF, Charge, AAC, GDPC, ATC, QSO and transformer-based encoding features within a double LightGBM learning framework, achieves high accuracy in predicting the hemolytic activity of peptides.
Collapse
Affiliation(s)
- Sen Yang
- School of Computer Science and Artificial Intelligence Aliyun School of Big Data School of Software, Changzhou University, Changzhou, 213164, China; The Affiliated Changzhou No.2 People's Hospital of Nanjing Medical University, Changzhou, 213164, China
| | - Piao Xu
- College of Economics and Management, Nanjing Forestry University, China.
| |
Collapse
|
6
|
Naseem A, Alturise F, Alkhalifah T, Khan YD. BBB-PEP-prediction: improved computational model for identification of blood-brain barrier peptides using blending position relative composition specific features and ensemble modeling. J Cheminform 2023; 15:110. [PMID: 37980534 PMCID: PMC10656963 DOI: 10.1186/s13321-023-00773-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2023] [Accepted: 10/21/2023] [Indexed: 11/20/2023] Open
Abstract
BBPs have the potential to facilitate the delivery of drugs to the brain, opening up new avenues for the development of treatments targeting diseases of the central nervous system (CNS). The obstacle faced in central nervous system disorders stems from the formidable task of traversing the blood-brain barrier (BBB) for pharmaceutical agents. Nearly 98% of small molecule-based drugs and nearly 100% of large molecule-based drugs encounter difficulties in successfully penetrating the BBB. This importance leads to identification of these peptides, can help in healthcare systems. In this study, we proposed an improved intelligent computational model BBB-PEP-Prediction for identification of BBB peptides. Position and statistical moments based features have been computed for acquired benchmark dataset. Four types of ensembles such as bagging, boosting, stacking and blending have been utilized in the methodology section. Bagging employed Random Forest (RF) and Extra Trees (ET), Boosting utilizes XGBoost (XGB) and Light Gradient Boosting Machine (LGBM). Stacking uses ET and XGB as base learners, blending exploited LGBM and RF as base learners, while Logistic Regression (LR) has been applied as Meta learner for stacking and blending. Three classifiers such as LGBM, XGB and ET have been optimized by using Randomized search CV. Four types of testing such as self-consistency, independent set, cross-validation with 5 and 10 folds and jackknife test have been employed. Evaluation metrics such as Accuracy (ACC), Specificity (SPE), Sensitivity (SEN), Mathew's correlation coefficient (MCC) have been utilized. The stacking of classifiers has shown best results in almost each testing. The stacking results for independent set testing exhibits accuracy, specificity, sensitivity and MCC score of 0.824, 0.911, 0.831 and 0.663 respectively. The proposed model BBB-PEP-Prediction shown superlative performance as compared to previous benchmark studies. The proposed system helps in future research and research community for in-silico identification of BBB peptides.
Collapse
Affiliation(s)
- Ansar Naseem
- Department of Artificial Intelligence, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Saudi Arabia.
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|