1
|
Khan S, Uddin I, Noor S, AlQahtani SA, Ahmad N. N6-methyladenine identification using deep learning and discriminative feature integration. BMC Med Genomics 2025; 18:58. [PMID: 40158097 PMCID: PMC11955129 DOI: 10.1186/s12920-025-02131-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2024] [Accepted: 03/20/2025] [Indexed: 04/01/2025] Open
Abstract
N6-methyladenine (6 mA) is a pivotal DNA modification that plays a crucial role in epigenetic regulation, gene expression, and various biological processes. With advancements in sequencing technologies and computational biology, there is an increasing focus on developing accurate methods for 6 mA site identification to enhance early detection and understand its biological significance. Despite the rapid progress of machine learning in bioinformatics, accurately detecting 6 mA sites remains a challenge due to the limited generalizability and efficiency of existing approaches. In this study, we present Deep-N6mA, a novel Deep Neural Network (DNN) model incorporating optimal hybrid features for precise 6 mA site identification. The proposed framework captures complex patterns from DNA sequences through a comprehensive feature extraction process, leveraging k-mer, Dinucleotide-based Cross Covariance (DCC), Trinucleotide-based Auto Covariance (TAC), Pseudo Single Nucleotide Composition (PseSNC), Pseudo Dinucleotide Composition (PseDNC), and Pseudo Trinucleotide Composition (PseTNC). To optimize computational efficiency and eliminate irrelevant or noisy features, an unsupervised Principal Component Analysis (PCA) algorithm is employed, ensuring the selection of the most informative features. A multilayer DNN serves as the classification algorithm to identify N6-methyladenine sites accurately. The robustness and generalizability of Deep-N6mA were rigorously validated using fivefold cross-validation on two benchmark datasets. Experimental results reveal that Deep-N6mA achieves an average accuracy of 97.70% on the F. vesca dataset and 95.75% on the R. chinensis dataset, outperforming existing methods by 4.12% and 4.55%, respectively. These findings underscore the effectiveness of Deep-N6mA as a reliable tool for early 6 mA site detection, contributing to epigenetic research and advancing the field of computational biology.
Collapse
Affiliation(s)
- Salman Khan
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Islam Uddin
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Sumaiya Noor
- Business and Management Sciences Department, Purdue University, West Lafayette, IN, USA
| | - Salman A AlQahtani
- Department of Computer Engineering, New Emerging Technologies and 5g Network and Beyond Research Chair, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Nijad Ahmad
- Department of Computer Science, Khurasan University, Jalalabad, Afghanistan.
| |
Collapse
|
2
|
Khan S, Noor S, Awan HH, Iqbal S, AlQahtani SA, Dilshad N, Ahmad N. Deep-ProBind: binding protein prediction with transformer-based deep learning model. BMC Bioinformatics 2025; 26:88. [PMID: 40121399 PMCID: PMC11929993 DOI: 10.1186/s12859-025-06101-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2024] [Accepted: 03/04/2025] [Indexed: 03/25/2025] Open
Abstract
Binding proteins play a crucial role in biological systems by selectively interacting with specific molecules, such as DNA, RNA, or peptides, to regulate various cellular processes. Their ability to recognize and bind target molecules with high specificity makes them essential for signal transduction, transport, and enzymatic activity. Traditional experimental methods for identifying protein-binding peptides are costly and time-consuming. Current sequence-based approaches often struggle with accuracy, focusing too narrowly on proximal sequence features and ignoring structural data. This study presents Deep-ProBind, a powerful prediction model designed to classify protein binding sites by integrating sequence and structural information. The proposed model employs a transformer and evolutionary-based attention mechanism, i.e., Bidirectional Encoder Representations from Transformers (BERT) and Pseudo position specific scoring matrix -Discrete Wavelet Transform (PsePSSM -DWT) approach to encode peptides. The SHapley Additive exPlanations (SHAP) algorithm selects the optimal hybrid features, and a Deep Neural Network (DNN) is then used as the classification algorithm to predict protein-binding peptides. The performance of the proposed model was evaluated in comparison with traditional Machine Learning (ML) algorithms and existing models. Experimental results demonstrate that Deep-ProBind achieved 92.67% accuracy with tenfold cross-validation on benchmark datasets and 93.62% accuracy on independent samples. The Deep-ProBind outperforms existing models by 3.57% on training data and 1.52% on independent tests. These results demonstrate Deep-ProBind's reliability and effectiveness, making it a valuable tool for researchers and a potential resource in pharmacological studies, where peptide binding plays a critical role in therapeutic development.
Collapse
Affiliation(s)
- Salman Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, KPK, Pakistan
| | - Sumaiya Noor
- Business and Management Sciences Department, Purdue University, West Lafayette, IN, USA
| | - Hamid Hussain Awan
- Department of Computer Science, Rawalpindi Women University, Rawalpindi, 46300, Punjab, Pakistan
| | - Shehryar Iqbal
- School of Physics, Engineering and Computer Science, University of Hertfordshire, Hatfield, UK
| | - Salman A AlQahtani
- New Emerging Technologies and 5g Network and Beyond Research Chair, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Naqqash Dilshad
- Department of Computer Science & Engineering, Sejong University, Seoul, 05006, South Korea
| | - Nijad Ahmad
- Department of Computer Science, Khurasan University, Jalalabad, Afghanistan.
| |
Collapse
|
3
|
Khan S, Noor S, Javed T, Naseem A, Aslam F, AlQahtani SA, Ahmad N. XGBoost-enhanced ensemble model using discriminative hybrid features for the prediction of sumoylation sites. BioData Min 2025; 18:12. [PMID: 39901279 PMCID: PMC11792219 DOI: 10.1186/s13040-024-00415-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2024] [Accepted: 12/10/2024] [Indexed: 02/05/2025] Open
Abstract
Posttranslational modifications (PTMs) are essential for regulating protein localization and stability, significantly affecting gene expression, biological functions, and genome replication. Among these, sumoylation a PTM that attaches a chemical group to protein sequences-plays a critical role in protein function. Identifying sumoylation sites is particularly important due to their links to Parkinson's and Alzheimer's. This study introduces XGBoost-Sumo, a robust model to predict sumoylation sites by integrating protein structure and sequence data. The model utilizes a transformer-based attention mechanism to encode peptides and extract evolutionary features through the PsePSSM-DWT approach. By fusing word embeddings with evolutionary descriptors, it applies the SHapley Additive exPlanations (SHAP) algorithm for optimal feature selection and uses eXtreme Gradient Boosting (XGBoost) for classification. XGBoost-Sumo achieved an impressive accuracy of 99.68% on benchmark datasets using 10-fold cross-validation and 96.08% on independent samples. This marks a significant improvement, outperforming existing models by 10.31% on training data and 2.74% on independent tests. The model's reliability and high performance make it a valuable resource for researchers, with strong potential for applications in pharmaceutical development.
Collapse
Affiliation(s)
- Salman Khan
- New Emerging Technologies and 5G Network and Beyond Research Chair, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Sumaiya Noor
- Business and Management Sciences Department, Purdue University, West Lafayette, IN, USA
| | - Tahir Javed
- Department of Computer Science, Allama Iqbal Open University, Islamabad, Pakistan
| | - Afshan Naseem
- Institute of Oceanography and Environment (INOS), Universiti Malaysia Terengganu, Kuala Nerus, Terengganu, 21030, Malaysia
| | - Fahad Aslam
- Institute of Oceanography and Environment (INOS), Universiti Malaysia Terengganu, Kuala Nerus, Terengganu, 21030, Malaysia
| | - Salman A AlQahtani
- New Emerging Technologies and 5G Network and Beyond Research Chair, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Nijad Ahmad
- Department of Computer Science, Khurasan University Jalalabad, Jalalabad, Afghanistan.
| |
Collapse
|
4
|
Noor S, Naseem A, Awan HH, Aslam W, Khan S, AlQahtani SA, Ahmad N. Deep-m5U: a deep learning-based approach for RNA 5-methyluridine modification prediction using optimized feature integration. BMC Bioinformatics 2024; 25:360. [PMID: 39563239 DOI: 10.1186/s12859-024-05978-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Accepted: 11/06/2024] [Indexed: 11/21/2024] Open
Abstract
BACKGROUND RNA 5-methyluridine (m5U) modifications play a crucial role in biological processes, making their accurate identification a key focus in computational biology. This paper introduces Deep-m5U, a robust predictor designed to enhance the prediction of m5U modifications. The proposed method, named Deep-m5U, utilizes a hybrid pseudo-K-tuple nucleotide composition (PseKNC) for sequence formulation, a Shapley Additive exPlanations (SHAP) algorithm for discriminant feature selection, and a deep neural network (DNN) as the classifier. RESULTS The model was evaluated using two benchmark datasets, i.e., Full Transcript and Mature mRNA. Deep-m5U achieved overall accuracies of 91.47% and 95.86% for the Full Transcript and Mature mRNA datasets with 10-fold cross-validation, and for independent samples, the model attained 92.94% and 95.17% accuracy. CONCLUSION Compared to existing models, Deep-m5U showed approximately 5.23% and 3.73% higher accuracy on the training data and 3.95% and 3.26% higher accuracy on independent samples for the Full Transcript and Mature mRNA datasets, respectively. The reliability and effectiveness of Deep-m5U make it a valuable tool for scientists and a potential asset in pharmaceutical design and research.
Collapse
Affiliation(s)
- Sumaiya Noor
- Business and Management Sciences Department, Purdue University, West Lafayette, IN, USA
| | - Afshan Naseem
- Institute of Oceanography and Environment (INOS), Universiti Malaysia Terengganu, 21030, Kuala Nerus, Terengganu, Malaysia
| | - Hamid Hussain Awan
- Department of Computer Science, Muslim Youth University, Islamabad, Pakistan
| | - Wasiq Aslam
- Department of Computer Science, Muslim Youth University, Islamabad, Pakistan
| | - Salman Khan
- New Emerging Technologies and 5G Network and Beyond Research Chair, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Salman A AlQahtani
- New Emerging Technologies and 5G Network and Beyond Research Chair, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
| | - Nijad Ahmad
- Department of Computer Science, Khurasan University, Jalalabad, Afghanistan.
| |
Collapse
|
5
|
Uddin I, Awan HH, Khalid M, Khan S, Akbar S, Sarker MR, Abdolrasol MGM, Alghamdi TAH. A hybrid residue based sequential encoding mechanism with XGBoost improved ensemble model for identifying 5-hydroxymethylcytosine modifications. Sci Rep 2024; 14:20819. [PMID: 39242695 PMCID: PMC11379919 DOI: 10.1038/s41598-024-71568-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2024] [Accepted: 08/29/2024] [Indexed: 09/09/2024] Open
Abstract
RNA modifications play an important role in actively controlling recently created formation in cellular regulation mechanisms, which link them to gene expression and protein. The RNA modifications have numerous alterations, presenting broad glimpses of RNA's operations and character. The modification process by the TET enzyme oxidation is the crucial change associated with cytosine hydroxymethylation. The effect of CR is an alteration in specific biochemical ways of the organism, such as gene expression and epigenetic alterations. Traditional laboratory systems that identify 5-hydroxymethylcytosine (5hmC) samples are expensive and time-consuming compared to other methods. To address this challenge, the paper proposed XGB5hmC, a machine learning algorithm based on a robust gradient boosting algorithm (XGBoost), with different residue based formulation methods to identify 5hmC samples. Their results were amalgamated, and six different frequency residue based encoding features were fused to form a hybrid vector in order to enhance model discrimination capabilities. In addition, the proposed model incorporates SHAP (Shapley Additive Explanations) based feature selection to demonstrate model interpretability by highlighting the high contributory features. Among the applied machine learning algorithms, the XGBoost ensemble model using the tenfold cross-validation test achieved improved results than existing state-of-the-art models. Our model reported an accuracy of 89.97%, sensitivity of 87.78%, specificity of 94.45%, F1-score of 0.8934%, and MCC of 0.8764%. This study highlights the potential to provide valuable insights for enhancing medical assessment and treatment protocols, representing a significant advancement in RNA modification analysis.
Collapse
Affiliation(s)
- Islam Uddin
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Hamid Hussain Awan
- Department of Computer Science, Muslim Youth University, Islamabad, Pakistan
| | - Majdi Khalid
- Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah, 21955, Saudi Arabia
| | - Salman Khan
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
| | - Shahid Akbar
- Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan.
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.
| | - Mahidur R Sarker
- Institute of Visual Informatics, Universiti Kebangsaan Malaysia, Bangi, 43600, Selangor, Malaysia
- Universidad de Diseño, Innovación y Tecnología, UDIT, Av. Alfonso XIII, 97, 28016, Madrid, Spain
| | - Maher G M Abdolrasol
- Institute of Sustainable Energy, Universiti Tenaga Nasional, Kajang, 43000, Malaysia
| | - Thamer A H Alghamdi
- Wolfson Centre for Magnetics, School of Engineering, Cardiff University, Cardiff, CF24 3AA, UK.
- Electrical Engineering Department, Faculty of Engineering, Al-Baha University, Al-Baha, 65779, Saudi Arabia.
| |
Collapse
|
6
|
Jia Y, Yu Z, Hong Z. Semantic aware-based instruction embedding for binary code similarity detection. PLoS One 2024; 19:e0305299. [PMID: 38861533 PMCID: PMC11166306 DOI: 10.1371/journal.pone.0305299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2024] [Accepted: 05/27/2024] [Indexed: 06/13/2024] Open
Abstract
Binary code similarity detection plays a crucial role in various applications within binary security, including vulnerability detection, malicious software analysis, etc. However, existing methods suffer from limited differentiation in binary embedding representations across different compilation environments, lacking dynamic high-level semantics. Moreover, current approaches often neglect multi-level semantic feature extraction, thereby failing to acquire precise semantic information about the binary code. To address these limitations, this paper introduces a novel detection solution called BinBcla. This method employs an enhanced pre-training model to generate instruction embeddings with dynamic semantics for binary functions. Subsequently, multi-feature fusion technique is utilized to extract local semantic information and long-distance global features from the code, respectively, employing self-attention to comprehend the structure information of the code. Finally, an improved cosine similarity method is employed to learn relationships among all elements of the distance vectors, thereby enhancing the model's robustness to new sample functions. Experiments are conducted across different architectures, compilers, and optimization levels. The results indicate that BinBcla achieves higher accuracy, precision and F1 score compared to existing methods.
Collapse
Affiliation(s)
- Yuhao Jia
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, Zhejiang, China
| | - Zhicheng Yu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, Zhejiang, China
| | - Zhen Hong
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, Zhejiang, China
| |
Collapse
|
7
|
Vijayvargiya A, Sinha A, Gehlot N, Jena A, Kumar R, Moran K. S-WD-EEMD: A hybrid framework for imbalanced sEMG signal analysis in diagnosis of human knee abnormality. PLoS One 2024; 19:e0301263. [PMID: 38820390 PMCID: PMC11142505 DOI: 10.1371/journal.pone.0301263] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2024] [Accepted: 03/13/2024] [Indexed: 06/02/2024] Open
Abstract
The diagnosis of human knee abnormalities using the surface electromyography (sEMG) signal obtained from lower limb muscles with machine learning is a major problem due to the noisy nature of the sEMG signal and the imbalance in data corresponding to healthy and knee abnormal subjects. To address this challenge, a combination of wavelet decomposition (WD) with ensemble empirical mode decomposition (EEMD) and the Synthetic Minority Oversampling Technique (S-WD-EEMD) is proposed. In this study, a hybrid WD-EEMD is considered for the minimization of noises produced in the sEMG signal during the collection, while the Synthetic Minority Oversampling Technique (SMOTE) is considered to balance the data by increasing the minority class samples during the training of machine learning techniques. The findings indicate that the hybrid WD-EEMD with SMOTE oversampling technique enhances the efficacy of the examined classifiers when employed on the imbalanced sEMG data. The F-Score of the Extra Tree Classifier, when utilizing WD-EEMD signal processing with SMOTE oversampling, is 98.4%, whereas, without the SMOTE oversampling technique, it is 95.1%.
Collapse
Affiliation(s)
- Ankit Vijayvargiya
- Insight Science Foundation Ireland Research Centre for Data Analytics, School of Human and Health Performance, Dublin City University, Dublin, Ireland
- Department of Electrical Engineering, Swami Keshvanand Institute of Technology, Management & Gramothan, Jaipur, Rajasthan, India
| | - Aparna Sinha
- Department of Information Technology, Bansthali Vidyapeeth, Radha Kishnpura, Rajasthan, India
| | - Naveen Gehlot
- Department of Electrical Engineering, Malaviya National Institute of Technology, Jaipur, Rajasthan, India
| | - Ashutosh Jena
- Department of Electrical Engineering, Malaviya National Institute of Technology, Jaipur, Rajasthan, India
| | - Rajesh Kumar
- Department of Electrical Engineering, Malaviya National Institute of Technology, Jaipur, Rajasthan, India
| | - Kieran Moran
- Insight Science Foundation Ireland Research Centre for Data Analytics, School of Human and Health Performance, Dublin City University, Dublin, Ireland
| |
Collapse
|
8
|
Akbar S, Zou Q, Raza A, Alarfaj FK. iAFPs-Mv-BiTCN: Predicting antifungal peptides using self-attention transformer embedding and transform evolutionary based multi-view features with bidirectional temporal convolutional networks. Artif Intell Med 2024; 151:102860. [PMID: 38552379 DOI: 10.1016/j.artmed.2024.102860] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Revised: 02/21/2024] [Accepted: 03/25/2024] [Indexed: 04/26/2024]
Abstract
Globally, fungal infections have become a major health concern in humans. Fungal diseases generally occur due to the invading fungus appearing on a specific portion of the body and becoming hard for the human immune system to resist. The recent emergence of COVID-19 has intensely increased different nosocomial fungal infections. The existing wet-laboratory-based medications are expensive, time-consuming, and may have adverse side effects on normal cells. In the last decade, peptide therapeutics have gained significant attention due to their high specificity in targeting affected cells without affecting healthy cells. Motivated by the significance of peptide-based therapies, we developed a highly discriminative prediction scheme called iAFPs-Mv-BiTCN to predict antifungal peptides correctly. The training peptides are encoded using word embedding methods such as skip-gram and attention mechanism-based bidirectional encoder representation using transformer. Additionally, transform-based evolutionary features are generated using the Pseduo position-specific scoring matrix using discrete wavelet transform (PsePSSM-DWT). The fused vector of word embedding and evolutionary descriptors is formed to compensate for the limitations of single encoding methods. A Shapley Additive exPlanations (SHAP) based global interpolation approach is applied to reduce training costs by choosing the optimal feature set. The selected feature set is trained using a bi-directional temporal convolutional network (BiTCN). The proposed iAFPs-Mv-BiTCN model achieved a predictive accuracy of 98.15 % and an AUC of 0.99 using training samples. In the case of the independent samples, our model obtained an accuracy of 94.11 % and an AUC of 0.98. Our iAFPs-Mv-BiTCN model outperformed existing models with a ~4 % and ~5 % higher accuracy using training and independent samples, respectively. The reliability and efficacy of the proposed iAFPs-Mv-BiTCN model make it a valuable tool for scientists and may perform a beneficial role in pharmaceutical design and research academia.
Collapse
Affiliation(s)
- Shahid Akbar
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China; Department of Computer Science, Abdul Wali Khan University Mardan, KP 23200, Pakistan
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China; Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, PR China.
| | - Ali Raza
- Department of Physical and Numerical Sciences, Qurtuba University of Science and Information Technology, Peshawar, KP 25124, Pakistan
| | - Fawaz Khaled Alarfaj
- Department of Management Information Systems (MIS), School of Business, King Faisal University (KFU), Al-Ahsa 31982, Saudi Arabia
| |
Collapse
|
9
|
Khan S, Uddin I, Khan M, Iqbal N, Alshanbari HM, Ahmad B, Khan DM. Sequence based model using deep neural network and hybrid features for identification of 5-hydroxymethylcytosine modification. Sci Rep 2024; 14:9116. [PMID: 38643305 PMCID: PMC11551160 DOI: 10.1038/s41598-024-59777-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2023] [Accepted: 04/15/2024] [Indexed: 04/22/2024] Open
Abstract
RNA modifications are pivotal in the development of newly synthesized structures, showcasing a vast array of alterations across various RNA classes. Among these, 5-hydroxymethylcytosine (5HMC) stands out, playing a crucial role in gene regulation and epigenetic changes, yet its detection through conventional methods proves cumbersome and costly. To address this, we propose Deep5HMC, a robust learning model leveraging machine learning algorithms and discriminative feature extraction techniques for accurate 5HMC sample identification. Our approach integrates seven feature extraction methods and various machine learning algorithms, including Random Forest, Naive Bayes, Decision Tree, and Support Vector Machine. Through K-fold cross-validation, our model achieved a notable 84.07% accuracy rate, surpassing previous models by 7.59%, signifying its potential in early cancer and cardiovascular disease diagnosis. This study underscores the promise of Deep5HMC in offering insights for improved medical assessment and treatment protocols, marking a significant advancement in RNA modification analysis.
Collapse
Affiliation(s)
- Salman Khan
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Islam Uddin
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Mukhtaj Khan
- Department of Information Technology, The University of Haripur, Haripur, Pakistan
| | - Nadeem Iqbal
- Department of Computer Science, Abdul Wali Khan University Mardan, Mardan, Pakistan
| | - Huda M Alshanbari
- Department of Mathematical Sciences, College of Science, Princess Nourah bint Abdulrahman University, P.O. Box 84428, 11671, Riyadh, Saudi Arabia
| | - Bakhtiyar Ahmad
- Higher Education Department Afghanistan, Kabul, Afghanistan.
| | - Dost Muhammad Khan
- Department of Statistics, Abdul Wali Khan University Mardan, Mardan, 23200, KP, Pakistan
| |
Collapse
|
10
|
Ma Y, Zhang B, Liu Z, Liu Y, Wang J, Li X, Feng F, Ni Y, Li S. IAS-FET: An intelligent assistant system and an online platform for enhancing successful rate of in-vitro fertilization embryo transfer technology based on clinical features. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2024; 245:108050. [PMID: 38301430 DOI: 10.1016/j.cmpb.2024.108050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/14/2023] [Revised: 01/20/2024] [Accepted: 01/23/2024] [Indexed: 02/03/2024]
Abstract
BACKGROUND Among all of the assisted reproductive technology (ART) methods, in vitro fertilization-embryo transfer (IVF-ET) holds a prominent position as a key solution for overcoming infertility. However, its success rate hovers at a modest 30% to 70%. Adding to the challenge is the absence of effective models and clinical tools capable of predicting the outcome of IVF-ET before embryo formation. Our study is dedicated to filling this critical gap by aiming to predict IVF-ET outcomes and ultimately enhance the success rate of this transformative procedure. METHODS In this retrospective study, infertile patients who received artificial assisted pregnancy treatment at Gansu Provincial Maternity and Child-care Hospital in China were enrolled from 2016 to 2020. Individual's clinical information were studied by cascade XGBoost method to build an intelligent assisted system for predicting the outcome of IVF-ET, called IAS-FET. The cascade XGBoost model was trained using clinical information from 2292 couples and externally tested using clinical information from 573 couples. In addition, several schemes which will be of help for patients to adjust their physical condition to improve their success rate on ART were suggested by IAS-FET. RESULTS The outcome of IVF-ET can be predicted by the built IAS-FET method with the area under curve (AUC) value of 0.8759 on the external test set. Besides, this IAS-FET method can provide several schemes to improve the successful rate of IVF-ET outcomes. The built tool for IAS-FET is addressed as a free platform online at http://www.cppdd.cn/ART for the convenient usage of users. CONCLUSIONS It suggested the significant influence of personal clinical features for the success of ART. The proposed system IAS-FET based on the top 27 factors could be a promising tool to predict the outcome of ART and propose a plan for the patient's physical adjustment. With the help of IAS-FET, patients can take informed steps towards increasing their chances of a successful outcome on their journey to parenthood.
Collapse
Affiliation(s)
- Ying Ma
- Gansu Provincial Maternity and Child-care Hospital, Lanzhou, Gansu 730030, China
| | - Bowen Zhang
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China; School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, Hubei 430073, China
| | - Zhaoqing Liu
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Yujie Liu
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Jiarui Wang
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China
| | - Xingxuan Li
- School of Chemistry and Chemical Engineering, Lanzhou University, Lanzhou, Gansu 730030, China
| | - Fan Feng
- Gansu Provincial Maternity and Child-care Hospital, Lanzhou, Gansu 730030, China
| | - Yali Ni
- Gansu Provincial Maternity and Child-care Hospital, Lanzhou, Gansu 730030, China
| | - Shuyan Li
- School of Medical Information and Engineering, Xuzhou Medical University, Xuzhou, Jiangsu 221004, China.
| |
Collapse
|
11
|
Khan S, Khan M, Iqbal N, Dilshad N, Almufareh MF, Alsubaie N. Enhancing Sumoylation Site Prediction: A Deep Neural Network with Discriminative Features. Life (Basel) 2023; 13:2153. [PMID: 38004293 PMCID: PMC10672286 DOI: 10.3390/life13112153] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Revised: 10/18/2023] [Accepted: 10/25/2023] [Indexed: 11/26/2023] Open
Abstract
Sumoylation is a post-translation modification (PTM) mechanism that involves many critical biological processes, such as gene expression, localizing and stabilizing proteins, and replicating the genome. Moreover, sumoylation sites are associated with different diseases, including Parkinson's and Alzheimer's. Due to its vital role in the biological process, identifying sumoylation sites in proteins is significant for monitoring protein functions and discovering multiple diseases. Therefore, in the literature, several computational models utilizing conventional ML methods have been introduced to classify sumoylation sites. However, these models cannot accurately classify the sumoylation sites due to intrinsic limitations associated with the conventional learning methods. This paper proposes a robust computational model (called Deep-Sumo) for predicting sumoylation sites based on a deep-learning algorithm with efficient feature representation methods. The proposed model employs a half-sphere exposure method to represent protein sequences in a feature vector. Principal Component Analysis is applied to extract discriminative features by eliminating noisy and redundant features. The discriminant features are given to a multilayer Deep Neural Network (DNN) model to predict sumoylation sites accurately. The performance of the proposed model is extensively evaluated using a 10-fold cross-validation test by considering various statistical-based performance measurement metrics. Initially, the proposed DNN is compared with the traditional learning algorithm, and subsequently, the performance of the Deep-Sumo is compared with the existing models. The validation results show that the proposed model reports an average accuracy of 96.47%, with improvement compared with the existing models. It is anticipated that the proposed model can be used as an effective tool for drug discovery and the diagnosis of multiple diseases.
Collapse
Affiliation(s)
- Salman Khan
- Department of Computer Science, Abdul Wali Khan University, Mardan 23200, Pakistan; (S.K.); (N.I.)
| | - Mukhtaj Khan
- Department of Information Technology, The University of Haripur, Haripur 22620, Pakistan;
| | - Nadeem Iqbal
- Department of Computer Science, Abdul Wali Khan University, Mardan 23200, Pakistan; (S.K.); (N.I.)
| | - Naqqash Dilshad
- Department of Convergence Engineering for Intelligent Drone, Sejong University, Seoul 05006, Republic of Korea;
| | - Maram Fahaad Almufareh
- Department of Information Systems, College of Computer and Information Sciences, Jouf University, Sakaka 72388, Saudi Arabia;
| | - Najah Alsubaie
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University (PNU), P.O. Box 84428, Riyadh 11671, Saudi Arabia
| |
Collapse
|
12
|
Li Q, Zhang Z, Ma Z. Raman spectral pattern recognition of breast cancer: A machine learning strategy based on feature fusion and adaptive hyperparameter optimization. Heliyon 2023; 9:e18148. [PMID: 37501962 PMCID: PMC10368853 DOI: 10.1016/j.heliyon.2023.e18148] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2023] [Revised: 07/08/2023] [Accepted: 07/10/2023] [Indexed: 07/29/2023] Open
Abstract
Raman spectroscopy, as a kind of molecular vibration spectroscopy, provides abundant information for measuring components and molecular structure in the early detection and diagnosis of breast cancer. Currently, portable Raman spectrometers have simplified and made equipment application more affordable, albeit at the cost of sacrificing the signal-to-noise ratio (SNR). Consequently, this necessitates a higher recognition rate from pattern recognition algorithms. Our study employs a feature fusion strategy to reduce the dimensionality of high-dimensional Raman spectra and enhance the discriminative information between normal tissues and tumors. In the conducted random experiment, the classifier achieved a performance of over 96% for all three average metrics: accuracy, sensitivity, and specificity. Additionally, we propose a multi-parameter serial encoding evolutionary algorithm (MSEA) and integrate it into the Adaptive Local Hyperplane K-nearest Neighbor classification algorithm (ALHK) for adaptive hyperparameter optimization. The implementation of serial encoding tackles the predicament of parallel optimization in multi-hyperparameter vector problems. To bolster the convergence of the optimization algorithm towards a global optimal solution, an exponential viability function is devised for nonlinear processing. Moreover, an improved elitist strategy is employed for individual selection, effectively eliminating the influence of probability factors on the robustness of the optimization algorithm. This study further optimizes the hyperparameter space through sensitivity analysis of hyperparameters and cross-validation experiments, leading to superior performance compared to the ALHK algorithm with manual hyperparameter configuration.
Collapse
Affiliation(s)
- Qingbo Li
- School of Instrumentation and Optoelectronic Engineering, Precision Opto-Mechatronics Technology Key Laboratory of Education Ministry, Beihang University, Xueyuan Road No. 37, Haidian District, Beijing, 100191, China
| | - Zhixiang Zhang
- School of Instrumentation and Optoelectronic Engineering, Precision Opto-Mechatronics Technology Key Laboratory of Education Ministry, Beihang University, Xueyuan Road No. 37, Haidian District, Beijing, 100191, China
| | - Zhenhe Ma
- Hebei Key Laboratory of Micro-Nano Precision Optical Sensing and Detection Technology, Northeastern University, Qinhuangdao Campus, Qinhuangdao, 066004, China
| |
Collapse
|
13
|
An Efficient AP-ANN-Based Multimethod Fusion Model to Detect Stress through EEG Signal Analysis. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2022; 2022:7672297. [PMID: 36544857 PMCID: PMC9763020 DOI: 10.1155/2022/7672297] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/25/2022] [Revised: 09/30/2022] [Accepted: 10/31/2022] [Indexed: 12/14/2022]
Abstract
Stress is a universal emotion that every human experiences daily. Psychologists say stress may lead to heart attack, depression, hypertension, strokes, or even sudden death. Many technical explorations like stress detection through facial expression, speech, text, physical behaviors, etc., were explored, but no consensus has been reached on the best method. The advancement in biomedical engineering yielded a rapid development of electroencephalogram (EEG) signal analysis that has inspired the idea of a multimethod fusion approach for the first time which employs multiple techniques such as discrete wavelet transform (DWT) for de-noising, adaptive synthetic sampling (ADASYN) for class balancing, and affinity propagation (AP) as a stratified sampling model along with the artificial neural network (ANN) as the classifier model for human emotion classification. From the EEG recordings of the DEAP dataset, the artifacts are removed, the signal is decomposed using a DWT, and features are extracted and fused to form the feature vector. As the dataset is high-dimensional, feature selection is done and ADASYN is used to address the imbalance of classes resulting in large-scale data. The innovative idea of the proposed system is to perform sampling using affinity propagation as a stratified sampling-based clustering algorithm as it determines the number of representative samples automatically which makes it superior to the K-Means, K-Medoid, that requires the K-value. Those samples are used as inputs to various classification models, the comparison of the AP-ANN, AP-SVM, and AP-RF is done, and their most important five performance metrics such as accuracy, precision, recall, F1-score, and specificity were compared. From our experiment, the AP-ANN model provides better accuracy of 86.8% and greater precision of 85.7%, a higher F1 score of 84.9%, a recall rate of 84.1%, and a specificity value of 89.2% which altogether provides better results than the other existing algorithms.
Collapse
|
14
|
Bonidia RP, Avila Santos AP, de Almeida BLS, Stadler PF, Nunes da Rocha U, Sanches DS, de Carvalho ACPLF. Information Theory for Biological Sequence Classification: A Novel Feature Extraction Technique Based on Tsallis Entropy. ENTROPY (BASEL, SWITZERLAND) 2022; 24:1398. [PMID: 37420418 DOI: 10.3390/e24101398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/16/2022] [Accepted: 09/24/2022] [Indexed: 07/09/2023]
Abstract
In recent years, there has been an exponential growth in sequencing projects due to accelerated technological advances, leading to a significant increase in the amount of data and resulting in new challenges for biological sequence analysis. Consequently, the use of techniques capable of analyzing large amounts of data has been explored, such as machine learning (ML) algorithms. ML algorithms are being used to analyze and classify biological sequences, despite the intrinsic difficulty in extracting and finding representative biological sequence methods suitable for them. Thereby, extracting numerical features to represent sequences makes it statistically feasible to use universal concepts from Information Theory, such as Tsallis and Shannon entropy. In this study, we propose a novel Tsallis entropy-based feature extractor to provide useful information to classify biological sequences. To assess its relevance, we prepared five case studies: (1) an analysis of the entropic index q; (2) performance testing of the best entropic indices on new datasets; (3) a comparison made with Shannon entropy and (4) generalized entropies; (5) an investigation of the Tsallis entropy in the context of dimensionality reduction. As a result, our proposal proved to be effective, being superior to Shannon entropy and robust in terms of generalization, and also potentially representative for collecting information in fewer dimensions compared with methods such as Singular Value Decomposition and Uniform Manifold Approximation and Projection.
Collapse
Affiliation(s)
- Robson P Bonidia
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil
| | - Anderson P Avila Santos
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research-UFZ GmbH, 04318 Leipzig, Germany
| | - Breno L S de Almeida
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil
| | - Peter F Stadler
- Department of Computer Science and Interdisciplinary Center of Bioinformatics, University of Leipzig, 04107 Leipzig, Germany
| | - Ulisses Nunes da Rocha
- Department of Environmental Microbiology, Helmholtz Centre for Environmental Research-UFZ GmbH, 04318 Leipzig, Germany
| | - Danilo S Sanches
- Department of Computer Science, Federal University of Technology-Paraná-UTFPR, Cornélio Procópio 86300-000, Brazil
| | - André C P L F de Carvalho
- Institute of Mathematics and Computer Sciences, University of São Paulo, São Carlos 13566-590, Brazil
| |
Collapse
|
15
|
Zhang B, Fan T. Knowledge structure and emerging trends in the application of deep learning in genetics research: A bibliometric analysis [2000–2021]. Front Genet 2022; 13:951939. [PMID: 36081985 PMCID: PMC9445221 DOI: 10.3389/fgene.2022.951939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Accepted: 07/13/2022] [Indexed: 11/13/2022] Open
Abstract
Introduction: Deep learning technology has been widely used in genetic research because of its characteristics of computability, statistical analysis, and predictability. Herein, we aimed to summarize standardized knowledge and potentially innovative approaches for deep learning applications of genetics by evaluating publications to encourage more research.Methods: The Science Citation Index Expanded TM (SCIE) database was searched for deep learning applications for genomics-related publications. Original articles and reviews were considered. In this study, we derived a clustered network from 69,806 references that were cited by the 1,754 related manuscripts identified. We used CiteSpace and VOSviewer to identify countries, institutions, journals, co-cited references, keywords, subject evolution, path, current characteristics, and emerging topics.Results: We assessed the rapidly increasing publications concerned about deep learning applications of genomics approaches and identified 1,754 articles that published reports focusing on this subject. Among these, a total of 101 countries and 2,487 institutes contributed publications, The United States of America had the most publications (728/1754) and the highest h-index, and the US has been in close collaborations with China and Germany. The reference clusters of SCI articles were clustered into seven categories: deep learning, logic regression, variant prioritization, random forests, scRNA-seq (single-cell RNA-seq), genomic regulation, and recombination. The keywords representing the research frontiers by year were prediction (2016–2021), sequence (2017–2021), mutation (2017–2021), and cancer (2019–2021).Conclusion: Here, we summarized the current literature related to the status of deep learning for genetics applications and analyzed the current research characteristics and future trajectories in this field. This work aims to provide resources for possible further intensive exploration and encourages more researchers to overcome the research of deep learning applications in genetics.
Collapse
Affiliation(s)
- Bijun Zhang
- Department of Clinical Genetics, Shengjing Hospital of China Medical University, Shenyang, China
| | - Ting Fan
- Department of Computer, School of Intelligent Medicine, China Medical University, Shenyang, China
- *Correspondence: Ting Fan,
| |
Collapse
|
16
|
Khan S, Khan M, Iqbal N, Amiruddin Abd Rahman M, Khalis Abdul Karim M. Deep-piRNA: Bi-Layered Prediction Model for PIWI-Interacting RNA Using Discriminative Features. COMPUTERS, MATERIALS & CONTINUA 2022; 72:2243-2258. [DOI: 10.32604/cmc.2022.022901] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/22/2021] [Accepted: 11/11/2021] [Indexed: 09/02/2023]
|
17
|
Liu G, Song S, Zhang Q, Dong B, Sun Y, Liu G, Zhao X. Epigenetic Marks and Variation of Sequence-Based Information Along Genomic Regions Are Predictive of Recombination Hot/Cold Spots in Saccharomyces cerevisiae. Front Genet 2021; 12:705038. [PMID: 34267784 PMCID: PMC8276760 DOI: 10.3389/fgene.2021.705038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2021] [Accepted: 06/07/2021] [Indexed: 11/16/2022] Open
Abstract
Characterization and identification of recombination hotspots provide important insights into the mechanism of recombination and genome evolution. In contrast with existing sequence-based models for predicting recombination hotspots which were defined in a ORF-based manner, here, we first defined recombination hot/cold spots based on public high-resolution Spo11-oligo-seq data, then characterized them in terms of DNA sequence and epigenetic marks, and finally presented classifiers to identify hotspots. We found that, in addition to some previously discovered DNA-based features like GC-skew, recombination hotspots in yeast can also be characterized by some remarkable features associated with DNA physical properties and shape. More importantly, by using DNA-based features and several epigenetic marks, we built several classifiers to discriminate hotspots from coldspots, and found that SVM classifier performs the best with an accuracy of ∼92%, which is also the highest among the models in comparison. Feature importance analysis combined with prediction results show that epigenetic marks and variation of sequence-based features along the hotspots contribute dominantly to hotspot identification. By using incremental feature selection method, an optimal feature subset that consists of much less features was obtained without sacrificing prediction accuracy.
Collapse
Affiliation(s)
- Guoqing Liu
- School of Life Sciences and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,Inner Mongolia Key Laboratory of Functional Genomics and Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Shuangjian Song
- School of Life Sciences and Technology, Inner Mongolia University of Science and Technology, Baotou, China
| | - Qiguo Zhang
- School of Life Sciences and Technology, Inner Mongolia University of Science and Technology, Baotou, China
| | - Biyu Dong
- School of Life Sciences and Technology, Inner Mongolia University of Science and Technology, Baotou, China
| | - Yu Sun
- School of Life Sciences, Inner Mongolia University, Hohhot, China
| | - Guojun Liu
- School of Life Sciences and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,Inner Mongolia Key Laboratory of Functional Genomics and Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| | - Xiujuan Zhao
- School of Life Sciences and Technology, Inner Mongolia University of Science and Technology, Baotou, China.,Inner Mongolia Key Laboratory of Functional Genomics and Bioinformatics, Inner Mongolia University of Science and Technology, Baotou, China
| |
Collapse
|