1
|
Ahmed H, Mumtaz Z, Saqib S, Zubair Yousaf M. ViroNia: LSTM based proteomics model for precise prediction of HCV. Comput Biol Med 2025; 186:109573. [PMID: 39733555 DOI: 10.1016/j.compbiomed.2024.109573] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2024] [Revised: 12/06/2024] [Accepted: 12/11/2024] [Indexed: 12/31/2024]
Abstract
Classification of viruses carries important implications in terms of understanding their evolution and the designing of interventions. This study introduces ViroNia as a novel LSTM-based system specifically meant for high-accuracy classification of viral proteins. Although originally developed for generative tasks, LSTM architectures have been found to be highly efficient for classification tasks as well; the model demonstrates this capability. It outperforms the deep architectures, such as Simple RNN, GRU, 1d CNN and Bidirectional LSTM, with the advantage of using pairwise sequence similarity and efficient data handling. ViroNia, with a dataset of 2250 protein sequences from both the NCBI and BVBRC databases, shows great performance at accuracy levels of 99.7 % and 99.6 % for broad as well as detail-level classifications, respectively. Cross-validation was carried out on the data provided for the fivefold strategy, achieving average accuracies of 92.29 % (±1.55 %) and 90.31 % (±5.41 %), respectively, at both the broad and detail level. The architecture allows for real-time data processing and automatic feature extraction, addressing the scalability limitations faced by BLAST (Basic Local Alignment Search Tool). The comparative analysis revealed that, although existing deep learning models share similar training parameters, ViroNia significantly enhanced classification outcomes. It finds specific applications in those areas that demand real-time analysis and learning on extra viral protein datasets, and hence, contributes broadly to ongoing viral research.
Collapse
Affiliation(s)
- Hania Ahmed
- KAM School of Life Sciences, Forman Christian College University, Lahore, Pakistan.
| | - Zilwa Mumtaz
- KAM School of Life Sciences, Forman Christian College University, Lahore, Pakistan.
| | - Sharmeen Saqib
- KAM School of Life Sciences, Forman Christian College University, Lahore, Pakistan.
| | | |
Collapse
|
2
|
Pradhan UK, Behera P, Das R, Naha S, Gupta A, Parsad R, Pradhan SK, Meher PK. AScirRNA: A novel computational approach to discover abiotic stress-responsive circular RNAs in plant genome. Comput Biol Chem 2024; 113:108205. [PMID: 39265460 DOI: 10.1016/j.compbiolchem.2024.108205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 07/12/2024] [Accepted: 09/04/2024] [Indexed: 09/14/2024]
Abstract
In the realm of plant biology, understanding the intricate regulatory mechanisms governing stress responses stands as a pivotal pursuit. Circular RNAs (circRNAs), emerging as critical players in gene regulation, have garnered attention in recent days for their potential roles in abiotic stress adaptation. A comprehensive grasp of circRNAs' functions in stress response offers avenues for breeders to manipulating plants to develop abiotic stress resistant crop cultivars to thrive in challenging climates. This study pioneers a machine learning-based model for predicting abiotic stress-responsive circRNAs. The K-tuple nucleotide composition (KNC) and Pseudo KNC (PKNC) features were utilized to numerically represent circRNAs. Three different feature selection strategies were employed to select relevant and non-redundant features. Eight shallow and four deep learning algorithms were evaluated to build the final predictive model. Following five-fold cross-validation process, XGBoost learning algorithm demonstrated superior performance with LightGBM-chosen 260 KNC features (Accuracy: 74.55 %, auROC: 81.23 %, auPRC: 76.52 %) and 160 PKNC features (Accuracy: 74.32 %, auROC: 81.04 %, auPRC: 76.43 %), over other combinations of learning algorithms and feature selection techniques. Further, the robustness of the developed models were evaluated using an independent test dataset, where the overall accuracy, auROC and auPRC were found to be 73.13 %, 72.34 % and 72.68 % for KNC feature set and 73.52 %, 79.53 % and 73.09 % for PKNC feature set, respectively. This computational approach was also integrated into an online prediction tool, AScirRNA (https://iasri-sg.icar.gov.in/ascirna/) for easy prediction by the users. Both the proposed model and the developed tool are poised to augment ongoing efforts in identifying stress-responsive circRNAs in plants.
Collapse
Affiliation(s)
- Upendra Kumar Pradhan
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Prasanjit Behera
- Department of Bioinformatics, Odisha University of Agriculture & Technology, Bhubaneswar, Odisha 751003, India.
| | - Ritwika Das
- Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Sanchita Naha
- Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Ajit Gupta
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Rajender Parsad
- ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| | - Sukanta Kumar Pradhan
- Department of Bioinformatics, Odisha University of Agriculture & Technology, Bhubaneswar, Odisha 751003, India.
| | - Prabina Kumar Meher
- Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
| |
Collapse
|
3
|
Murmu S, Sinha D, Chaurasia H, Sharma S, Das R, Jha GK, Archak S. A review of artificial intelligence-assisted omics techniques in plant defense: current trends and future directions. FRONTIERS IN PLANT SCIENCE 2024; 15:1292054. [PMID: 38504888 PMCID: PMC10948452 DOI: 10.3389/fpls.2024.1292054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2023] [Accepted: 01/24/2024] [Indexed: 03/21/2024]
Abstract
Plants intricately deploy defense systems to counter diverse biotic and abiotic stresses. Omics technologies, spanning genomics, transcriptomics, proteomics, and metabolomics, have revolutionized the exploration of plant defense mechanisms, unraveling molecular intricacies in response to various stressors. However, the complexity and scale of omics data necessitate sophisticated analytical tools for meaningful insights. This review delves into the application of artificial intelligence algorithms, particularly machine learning and deep learning, as promising approaches for deciphering complex omics data in plant defense research. The overview encompasses key omics techniques and addresses the challenges and limitations inherent in current AI-assisted omics approaches. Moreover, it contemplates potential future directions in this dynamic field. In summary, AI-assisted omics techniques present a robust toolkit, enabling a profound understanding of the molecular foundations of plant defense and paving the way for more effective crop protection strategies amidst climate change and emerging diseases.
Collapse
Affiliation(s)
- Sneha Murmu
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Dipro Sinha
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Himanshushekhar Chaurasia
- Central Institute for Research on Cotton Technology, Indian Council of Agricultural Research (ICAR), Mumbai, India
| | - Soumya Sharma
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Ritwika Das
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Girish Kumar Jha
- Indian Agricultural Statistics Research Institute, Indian Council of Agricultural Research (ICAR), New Delhi, India
| | - Sunil Archak
- National Bureau of Plant Genetic Resources, Indian Council of Agricultural Research (ICAR), New Delhi, India
| |
Collapse
|
4
|
Cao X, Fang Y, Yang C, Liu Z, Xu G, Jiang Y, Wu P, Song W, Xing H, Wu X. Prediction of Prostate Cancer Risk Stratification Based on A Nonlinear Transformation Stacking Learning Strategy. Int Neurourol J 2024; 28:33-43. [PMID: 38569618 PMCID: PMC10990759 DOI: 10.5213/inj.2346332.166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2023] [Accepted: 01/04/2024] [Indexed: 04/05/2024] Open
Abstract
PURPOSE Prostate cancer (PCa) is an epithelial malignancy that originates in the prostate gland and is generally categorized into low, intermediate, and high-risk groups. The primary diagnostic indicator for PCa is the measurement of serum prostate-specific antigen (PSA) values. However, reliance on PSA levels can result in false positives, leading to unnecessary biopsies and an increased risk of invasive injuries. Therefore, it is imperative to develop an efficient and accurate method for PCa risk stratification. Many recent studies on PCa risk stratification based on clinical data have employed a binary classification, distinguishing between low to intermediate and high risk. In this paper, we propose a novel machine learning (ML) approach utilizing a stacking learning strategy for predicting the tripartite risk stratification of PCa. METHODS Clinical records, featuring attributes selected using the lasso method, were utilized with 5 ML classifiers. The outputs of these classifiers underwent transformation by various nonlinear transformers and were then concatenated with the lasso-selected features, resulting in a set of new features. A stacking learning strategy, integrating different ML classifiers, was developed based on these new features. RESULTS Our proposed approach demonstrated superior performance, achieving an accuracy of 0.83 and an area under the receiver operating characteristic curve value of 0.88 in a dataset comprising 197 PCa patients with 42 clinical characteristics. CONCLUSION This study aimed to improve clinicians' ability to rapidly assess PCa risk stratification while reducing the burden on patients. This was achieved by using artificial intelligence-related technologies as an auxiliary method for diagnosing PCa.
Collapse
Affiliation(s)
- Xinyu Cao
- School of Computer Science & Engineering, Wuhan Institute of Technology, Wuhan, China
| | - Yin Fang
- School of Computer Science & Engineering, Wuhan Institute of Technology, Wuhan, China
| | - Chunguang Yang
- Department of Urology, Tongji Hospital Affiliated Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Zhenghao Liu
- Department of Urology, Tongji Hospital Affiliated Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Guoping Xu
- School of Computer Science & Engineering, Wuhan Institute of Technology, Wuhan, China
| | - Yan Jiang
- School of Computer Science & Engineering, Wuhan Institute of Technology, Wuhan, China
| | - Peiyan Wu
- School of Computer Science & Engineering, Wuhan Institute of Technology, Wuhan, China
| | - Wenbo Song
- School of Computer Science & Engineering, Wuhan Institute of Technology, Wuhan, China
| | - Hanshuo Xing
- School of Computer Science & Engineering, Wuhan Institute of Technology, Wuhan, China
| | - Xinglong Wu
- School of Computer Science & Engineering, Wuhan Institute of Technology, Wuhan, China
| |
Collapse
|