1
|
Karim A, Alkhalifah T, Alturise F, Khan YD. PADG-Pred: Exploring Ensemble Approaches for Identifying Parkinson's Disease Associated Biomarkers Using Genomic Sequences Analysis. IET Syst Biol 2025; 19:e70006. [PMID: 40088455 PMCID: PMC11910177 DOI: 10.1049/syb2.70006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2024] [Revised: 01/04/2025] [Accepted: 02/12/2025] [Indexed: 03/17/2025] Open
Abstract
Parkinson's disease (PD), a degenerative disorder affecting the nervous system, manifests as unbalanced movements, stiffness, tremors, and coordination difficulties. Its cause, believed to involve genetic and environmental factors, underscores the critical need for prompt diagnosis and intervention to enhance treatment effectiveness. Despite the array of available diagnostics, their reliability remains a challenge. In this study, an innovative predictor PADG-Pred is proposed for the identification of Parkinson's associated biomarkers, utilising a genomic profile. In this study, a novel predictor, PADG-Pred, which not only identifies Parkinson's associated biomarkers through genomic profiling but also uniquely integrates multiple statistical feature extraction techniques with ensemble-based classification frameworks, thereby providing a more robust and interpretable decision-making process than existing tools. The processed dataset was utilised for feature extraction through multiple statistical moments and it is further involved in extensive training of the model using diverse classification techniques, encompassing Ensemble methods; XGBoost, Random Forest, Light Gradient Boosting Machine, Bagging, ExtraTrees, and Stacking. State-of-the-art validation procedures are applied, assessing key metrics such as specificity, accuracy, sensitivity/recall, and Mathew's correlation coefficient. The outcomes demonstrate the outstanding performance of PADG-RF, showcasing accuracy metrics consistently achieving ∼91% for the independent set, ∼94% for 5-fold, and ∼96% for 10-fold in cross-validation.
Collapse
Affiliation(s)
- Ayesha Karim
- Department of Computer ScienceSchool of Systems and Technology University of Management and TechnologyLahorePakistan
| | - Tamim Alkhalifah
- Department of Computer Engineering, College of ComputerQassim UniversityBuraydahSaudi Arabia
| | - Fahad Alturise
- Department of CybersecurityCollege of ComputerQassim UniversityBuraydahSaudi Arabia
| | - Yaser Daanial Khan
- Department of Computer ScienceSchool of Systems and Technology University of Management and TechnologyLahorePakistan
| |
Collapse
|
2
|
Karim A, Alromema N, Malebary SJ, Binzagr F, Ahmed A, Khan YD. eNSMBL-PASD: Spearheading early autism spectrum disorder detection through advanced genomic computational frameworks utilizing ensemble learning models. Digit Health 2025; 11:20552076241313407. [PMID: 39872002 PMCID: PMC11770729 DOI: 10.1177/20552076241313407] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Accepted: 12/18/2024] [Indexed: 01/29/2025] Open
Abstract
Objective Autism spectrum disorder (ASD) is a complex neurodevelopmental condition influenced by various genetic and environmental factors. Currently, there is no definitive clinical test, such as a blood analysis or brain scan, for early diagnosis. The objective of this study is to develop a computational model that predicts ASD driver genes in the early stages using genomic data, aiming to enhance early diagnosis and intervention. Methods This study utilized a benchmark genomic dataset, which was processed using feature extraction techniques to identify relevant genetic patterns. Several ensemble classification methods, including Extreme Gradient Boosting, Random Forest, Light Gradient Boosting Machine, ExtraTrees, and a stacked ensemble of classifiers, were applied to assess the predictive power of the genomic features. TheEnsemble Model Predictor for Autism Spectrum Disorder (eNSMBL-PASD) model was rigorously validated using multiple performance metrics such as accuracy, sensitivity, specificity, and Mathew's correlation coefficient. Results The proposed model demonstrated superior performance across various validation techniques. The self-consistency test achieved 100% accuracy, while the independent set and cross-validation tests yielded 91% and 87% accuracy, respectively. These results highlight the model's robustness and reliability in predicting ASD-related genes. Conclusion The eNSMBL-PASD model provides a promising tool for the early detection of ASD by identifying genetic markers associated with the disorder. In the future, this model has the potential to assist healthcare professionals, particularly doctors and psychologists, in diagnosing and formulating treatment plans for ASD at its earliest stages.
Collapse
Affiliation(s)
- Ayesha Karim
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Nashwan Alromema
- Department of Computer Science, Faculty of Computing and Information Technology-Rabigh, King AbdulAziz University, Jeddah, Saudi Arabia
| | - Sharaf J Malebary
- Department of Information Technology, Faculty of Computing and Information Technology, King AbdulAziz University, Rabigh, Saudi Arabia
| | - Faisal Binzagr
- Department of Computer Science, Faculty of Computing and Information Technology-Rabigh, King AbdulAziz University, Jeddah, Saudi Arabia
| | - Amir Ahmed
- College of Information Technology, Information Systems and Security, United Arab Emirates University, Alain, United Arab Emirates
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|
3
|
Malebary SJ, Alromema N, Suleman MT, Saleem M. m5c-iDeep: 5-Methylcytosine sites identification through deep learning. Methods 2024; 230:80-90. [PMID: 39089345 DOI: 10.1016/j.ymeth.2024.07.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2024] [Revised: 07/16/2024] [Accepted: 07/23/2024] [Indexed: 08/03/2024] Open
Abstract
5-Methylcytosine (m5c) is a modified cytosine base which is formed as the result of addition of methyl group added at position 5 of carbon. This modification is one of the most common PTM that used to occur in almost all types of RNA. The conventional laboratory methods do not provide quick reliable identification of m5c sites. However, the sequence data readiness has made it feasible to develop computationally intelligent models that optimize the identification process for accuracy and robustness. The present research focused on the development of in-silico methods built using deep learning models. The encoded data was then fed into deep learning models, which included gated recurrent unit (GRU), long short-term memory (LSTM), and bi-directional LSTM (Bi-LSTM). After that, the models were subjected to a rigorous evaluation process that included both independent set testing and 10-fold cross validation. The results revealed that LSTM-based model, m5c-iDeep, outperformed revealing 99.9 % accuracy while comparing with existing m5c predictors. In order to facilitate researchers, m5c-iDeep was also deployed on a web-based server which is accessible at https://taseersuleman-m5c-ideep-m5c-ideep.streamlit.app/.
Collapse
Affiliation(s)
- Sharaf J Malebary
- Department of Information Technology, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, P.O. Box 344, Rabigh 21911, Saudi Arabia
| | - Nashwan Alromema
- Department of Computer Science, Faculty of Computing and Information Technology-Rabigh, King Abdulaziz University, P.O. Box 344, Rabigh 21911, Saudi Arabia.
| | - Muhammad Taseer Suleman
- Department of Criminology and Forensic Sciences, Lahore Garrison University, Lahore Pakistan; Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore 54770 Pakistan
| | - Maham Saleem
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore 54770 Pakistan
| |
Collapse
|
4
|
Naseem A, Khan YD. An intelligent model for prediction of abiotic stress-responsive microRNAs in plants using statistical moments based features and ensemble approaches. Methods 2024; 228:65-79. [PMID: 38768931 DOI: 10.1016/j.ymeth.2024.05.008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 04/30/2024] [Accepted: 05/10/2024] [Indexed: 05/22/2024] Open
Abstract
This study proposed an intelligent model for predicting abiotic stress-responsive microRNAs in plants. MicroRNAs (miRNAs) are short RNA molecules regulates the stress in genes. Experimental methods are costly and time-consuming, as compare to in-silico prediction. Addressing this gap, the study seeks to develop an efficient computational model for plant stress response prediction. The two benchmark datasets for MiRNA and Pre-MiRNA dataset have been acquired in this study. Four ensemble approaches such as bagging, boosting, stacking, and blending have been employed. Classifiers such as Random Forest (RF), Extra Trees (ET), Ada Boost (ADB), Light Gradient Boosting Machine (LGBM), and Support Vector Machine (SVM). Stacking and Blending employed all stated classifiers as base learners and Logistic Regression (LR) as Meta Classifier. There have been a total of four types of testing used, including independent set, self-consistency, cross-validation with 5 and 10 folds, and jackknife. This study has utilized evaluation metrics such as accuracy score, specificity, sensitivity, Mathew's correlation coefficient (MCC), and AUC. Our proposed methodology has outperformed existing state of the art study in both datasets based on independent set testing. The SVM-based approach has exhibited accuracy score of 0.659 for the MiRNA dataset, which is better than the previous study. The ET classifier has surpassed the accuracy of Pre-MiRNA dataset as compared to the existing benchmark study, achieving an impressive score of 0.67. The proposed method can be used in future research to predict abiotic stresses in plants.
Collapse
Affiliation(s)
- Ansar Naseem
- Department of Artificial Intelligence, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, Pakistan.
| |
Collapse
|
5
|
Suleman MT, Alturise F, Alkhalifah T, Khan YD. m1A-Ensem: accurate identification of 1-methyladenosine sites through ensemble models. BioData Min 2024; 17:4. [PMID: 38360720 PMCID: PMC10868122 DOI: 10.1186/s13040-023-00353-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Accepted: 12/31/2023] [Indexed: 02/17/2024] Open
Abstract
BACKGROUND 1-methyladenosine (m1A) is a variant of methyladenosine that holds a methyl substituent in the 1st position having a prominent role in RNA stability and human metabolites. OBJECTIVE Traditional approaches, such as mass spectrometry and site-directed mutagenesis, proved to be time-consuming and complicated. METHODOLOGY The present research focused on the identification of m1A sites within RNA sequences using novel feature development mechanisms. The obtained features were used to train the ensemble models, including blending, boosting, and bagging. Independent testing and k-fold cross validation were then performed on the trained ensemble models. RESULTS The proposed model outperformed the preexisting predictors and revealed optimized scores based on major accuracy metrics. CONCLUSION For research purpose, a user-friendly webserver of the proposed model can be accessed through https://taseersuleman-m1a-ensem1.streamlit.app/ .
Collapse
Affiliation(s)
- Muhammad Taseer Suleman
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia.
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of Systems and Technology, University of Management and Technology, Lahore, 54770, Pakistan
| |
Collapse
|
6
|
Deep-Learning-Based Automated Identification and Visualization of Oral Cancer in Optical Coherence Tomography Images. Biomedicines 2023; 11:biomedicines11030802. [PMID: 36979780 PMCID: PMC10044902 DOI: 10.3390/biomedicines11030802] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2023] [Revised: 02/15/2023] [Accepted: 03/04/2023] [Indexed: 03/09/2023] Open
Abstract
Early detection and diagnosis of oral cancer are critical for a better prognosis, but accurate and automatic identification is difficult using the available technologies. Optical coherence tomography (OCT) can be used as diagnostic aid due to the advantages of high resolution and non-invasion. We aim to evaluate deep-learning-based algorithms for OCT images to assist clinicians in oral cancer screening and diagnosis. An OCT data set was first established, including normal mucosa, precancerous lesion, and oral squamous cell carcinoma. Then, three kinds of convolutional neural networks (CNNs) were trained and evaluated by using four metrics (accuracy, precision, sensitivity, and specificity). Moreover, the CNN-based methods were compared against machine learning approaches through the same dataset. The results show the performance of CNNs, with a classification accuracy of up to 96.76%, is better than the machine-learning-based method with an accuracy of 92.52%. Moreover, visualization of lesions in OCT images was performed and the rationality and interpretability of the model for distinguishing different oral tissues were evaluated. It is proved that the automatic identification algorithm of OCT images based on deep learning has the potential to provide decision support for the effective screening and diagnosis of oral cancer.
Collapse
|
7
|
Suleman MT, Alturise F, Alkhalifah T, Khan YD. iDHU-Ensem: Identification of dihydrouridine sites through ensemble learning models. Digit Health 2023; 9:20552076231165963. [PMID: 37009307 PMCID: PMC10064468 DOI: 10.1177/20552076231165963] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2022] [Accepted: 03/09/2023] [Indexed: 04/04/2023] Open
Abstract
Background Dihydrouridine (D) is one of the most significant uridine modifications that have a prominent occurrence in eukaryotes. The folding and conformational flexibility of transfer RNA (tRNA) can be attained through this modification. Objective The modification also triggers lung cancer in humans. The identification of D sites was carried out through conventional laboratory methods; however, those were costly and time-consuming. The readiness of RNA sequences helps in the identification of D sites through computationally intelligent models. However, the most challenging part is turning these biological sequences into distinct vectors. Methods The current research proposed novel feature extraction mechanisms and the identification of D sites in tRNA sequences using ensemble models. The ensemble models were then subjected to evaluation using k-fold cross-validation and independent testing. Results The results revealed that the stacking ensemble model outperformed all the ensemble models by revealing 0.98 accuracy, 0.98 specificity, 0.97 sensitivity, and 0.92 Matthews Correlation Coefficient. The proposed model, iDHU-Ensem, was also compared with pre-existing predictors using an independent test. The accuracy scores have shown that the proposed model in this research study performed better than the available predictors. Conclusion The current research contributed towards the enhancement of D site identification capabilities through computationally intelligent methods. A web-based server, iDHU-Ensem, was also made available for the researchers at https://taseersuleman-idhu-ensem-idhu-ensem.streamlit.app/.
Collapse
Affiliation(s)
- Muhammad Taseer Suleman
- Department of Computer Science, School of systems and technology, University of Management and Technology, Lahore, Pakistan
| | - Fahad Alturise
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
- Fahad Alturise, Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia.
| | - Tamim Alkhalifah
- Department of Computer, College of Science and Arts in Ar Rass, Qassim University, Ar Rass, Qassim, Saudi Arabia
| | - Yaser Daanial Khan
- Department of Computer Science, School of systems and technology, University of Management and Technology, Lahore, Pakistan
| |
Collapse
|