1
|
Liu F, Chen L, Wu Q, Li L, Li J, Su T, Li J, Liang S, Qing L. Radiomics of Dynamic Contrast-Enhanced MRI for Predicting Radiation-Induced Hepatic Toxicity After Intensity Modulated Radiotherapy for Hepatocellular Carcinoma: A Machine Learning Predictive Model Based on the SHAP Methodology. J Hepatocell Carcinoma 2025; 12:999-1015. [PMID: 40406666 PMCID: PMC12095435 DOI: 10.2147/jhc.s523448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2025] [Accepted: 05/03/2025] [Indexed: 05/26/2025] Open
Abstract
Objective To develop an interpretable machine learning (ML) model using dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) radiomic data, dosimetric parameters, and clinical data for predicting radiation-induced hepatic toxicity (RIHT) in patients with hepatocellular carcinoma (HCC) following intensity-modulated radiation therapy (IMRT). Methods A retrospective analysis of 150 HCC patients was performed, with a 7:3 ratio used to divide the data into training and validation cohorts. Radiomic features from the original MRI sequences and Delta-radiomic features were extracted. Seven ML models based on radiomics were developed: logistic regression (LR), random forest (RF), support vector machine (SVM), eXtreme Gradient Boosting (XGBoost), adaptive boosting (AdaBoost), decision tree (DT), and artificial neural network (ANN). The predictive performance of the models was evaluated using receiver operating characteristic (ROC) curve analysis and calibration curves. Shapley additive explanations (SHAP) were employed to interpret the contribution of each variable and its risk threshold. Results Original radiomic features and Delta-radiomic features were extracted from DCE-MRI images and filtered to generate Radiomics-scores and Delta-Radiomics-scores. These were then combined with independent risk factors (Body Mass Index (BMI), V5, and pre-Child-Pugh score(pre-CP)) identified through univariate and multivariate logistic regression and Spearman correlation analysis to construct the ML models. In the training cohort, the AUC values were 0.8651 for LR, 0.7004 for RF, 0.6349 for SVM, 0.6706 for XGBoost, 0.7341 for AdaBoost, 0.6806 for Decision Tree, and 0.6786 for ANN. The corresponding accuracies were 84.4%, 65.6%, 75.0%, 65.6%, 71.9%, 68.8%, and 71.9%, respectively. The validation cohort further confirmed the superiority of the LR model, which was selected as the optimal model. SHAP analysis revealed that Delta-radiomics made a substantial positive contribution to the model. Conclusion The interpretable ML model based on radiomics provides a non-invasive tool for predicting RIHT in patients with HCC, demonstrating satisfactory discriminative performance.
Collapse
Affiliation(s)
- Fushuang Liu
- Department of Radiation Oncology, Guangxi Medical University Cancer Hospital, Nanning, 530001, People’s Republic of China
| | - Lijun Chen
- Department of Radiation Oncology, Guangxi Medical University Cancer Hospital, Nanning, 530001, People’s Republic of China
| | - Qiaoyuan Wu
- Department of Radiation Oncology, Guangxi Medical University Cancer Hospital, Nanning, 530001, People’s Republic of China
| | - Liqing Li
- Department of Radiation Oncology, Guangxi Medical University Cancer Hospital, Nanning, 530001, People’s Republic of China
| | - Jizhou Li
- Department of Radiation Oncology, Guangxi Medical University Cancer Hospital, Nanning, 530001, People’s Republic of China
| | - Tingshi Su
- Department of Radiation Oncology, Guangxi Medical University Cancer Hospital, Nanning, 530001, People’s Republic of China
| | - Jianxu Li
- Department of Radiation Oncology, Guangxi Medical University Cancer Hospital, Nanning, 530001, People’s Republic of China
| | - Shixiong Liang
- Department of Radiation Oncology, Guangxi Medical University Cancer Hospital, Nanning, 530001, People’s Republic of China
| | - Liping Qing
- Department of Radiation Oncology, Guangxi Medical University Cancer Hospital, Nanning, 530001, People’s Republic of China
| |
Collapse
|
2
|
Zhang S, Jing Y, Liang Y. EACVP: An ESM-2 LM Framework Combined CNN and CBAM Attention to Predict Anti-coronavirus Peptides. Curr Med Chem 2025; 32:2040-2054. [PMID: 38494930 DOI: 10.2174/0109298673287899240303164403] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2023] [Revised: 01/13/2024] [Accepted: 02/19/2024] [Indexed: 03/19/2024]
Abstract
BACKGROUND The novel coronavirus pneumonia (COVID-19) outbreak in late 2019 killed millions worldwide. Coronaviruses cause diseases such as severe acute respiratory syndrome (SARS-CoV) and SARS-CoV-2. Many peptides in the host defense system have antiviral activity. How to establish a set of efficient models to identify anti-coronavirus peptides is a meaningful study. METHODS Given this, a new prediction model EACVP is proposed. This model uses the evolutionary scale language model (ESM-2 LM) to characterize peptide sequence information. The ESM model is a natural language processing model trained by machine learning technology. It is trained on a highly diverse and dense dataset (UR50/D 2021_04) and uses the pre-trained language model to obtain peptide sequence features with 320 dimensions. Compared with traditional feature extraction methods, the information represented by ESM-2 LM is more comprehensive and stable. Then, the features are input into the convolutional neural network (CNN), and the convolutional block attention module (CBAM) lightweight attention module is used to perform attention operations on CNN in space dimension and channel dimension. To verify the rationality of the model structure, we performed ablation experiments on the benchmark and independent test datasets. We compared the EACVP with existing methods on the independent test dataset. RESULTS Experimental results show that ACC, F1-score, and MCC are 3.95%, 35.65% and 0.0725 higher than the most advanced methods, respectively. At the same time, we tested EACVP on ENNAVIA-C and ENNAVIA-D data sets, and the results showed that EACVP has good migration and is a powerful tool for predicting anti-coronavirus peptides. CONCLUSION The results prove that this model EACVP could fully characterize the peptide information and achieve high prediction accuracy. It can be generalized to different data sets. The data and code of the article have been uploaded to https://github.- com/JYY625/EACVP.git.
Collapse
Affiliation(s)
- Shengli Zhang
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, P.R. China
- Key Laboratory of Computational Science and Application of Hainan Province, Haikou, 571158, P.R. China
| | - Yuanyuan Jing
- School of Mathematics and Statistics, Xidian University, Xi'an, 710071, P.R. China
| | - Yunyun Liang
- School of Science, Xi'an Polytechnic University, Xi'an, 710048, P.R. China
| |
Collapse
|
3
|
Arif R, Kanwal S, Ahmed S, Kabir M. A Computational Predictor for Accurate Identification of Tumor Homing Peptides by Integrating Sequential and Deep BiLSTM Features. Interdiscip Sci 2024; 16:503-518. [PMID: 38733473 DOI: 10.1007/s12539-024-00628-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2023] [Revised: 03/16/2024] [Accepted: 03/27/2024] [Indexed: 05/13/2024]
Abstract
Cancer remains a severe illness, and current research indicates that tumor homing peptides (THPs) play an important part in cancer therapy. The identification of THPs can provide crucial insights for drug-discovery and pharmaceutical industries as they allow for tailored medication delivery towards cancer cells. These peptides have a high affinity enabling particular receptors present upon tumor surfaces, allowing for the creation of precision medications that reduce off-target consequences and enhance cancer patient treatment results. Wet-lab techniques are considered essential tools for studying THPs; however, they're labor-extensive and time-consuming, therefore making prediction of THPs a challenging task for the researchers. Computational-techniques, on the other hand, are considered significant tools in identifying THPs according to the sequence data. Despite many strategies have been presented to predict new THP, there is still a need to develop a robust method with higher rates of success. In this paper, we developed a novel framework, THP-DF, for accurately identifying THPs on a large-scale. Firstly, the peptide sequences are encoded through various sequential features. Secondly, each feature is passed to BiLSTM and attention layers to extract simplified deep features. Finally, an ensemble-framework is formed via integrating sequential- and deep features which are fed to a support vector machine which with 10-fold cross-validation to carry to validate the efficiency. The experimental results showed that THP-DF worked better on both [Formula: see text] and [Formula: see text] datasets by achieving accuracy of > 95% which are higher than existing predictors both datasets. This indicates that the proposed predictor could be a beneficial tool to precisely and rapidly identify THPs and will contribute to the cutting-edge cancer treatment strategies and pharmaceuticals.
Collapse
Affiliation(s)
- Roha Arif
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan
| | - Sameera Kanwal
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan
| | - Saeed Ahmed
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan
| | - Muhammad Kabir
- School of Systems and Technology, University of Management and Technology, Lahore, 54782, Pakistan.
| |
Collapse
|
4
|
Khalid M, Ali F, Alghamdi W, Alzahrani A, Alsini R, Alzahrani A. An ensemble computational model for prediction of clathrin protein by coupling machine learning with discrete cosine transform. J Biomol Struct Dyn 2024:1-9. [PMID: 38498362 DOI: 10.1080/07391102.2024.2329777] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 02/19/2024] [Indexed: 03/20/2024]
Abstract
Clathrin protein (CP) plays a pivotal role in numerous cellular processes, including endocytosis, signal transduction, and neuronal function. Dysregulation of CP has been associated with a spectrum of diseases. Given its involvement in various cellular functions, CP has garnered significant attention for its potential applications in drug design and medicine, ranging from targeted drug delivery to addressing viral infections, neurological disorders, and cancer. The accurate identification of CP is crucial for unraveling its function and devising novel therapeutic strategies. Computational methods offer a rapid, cost-effective, and less labor-intensive alternative to traditional identification methods, making them especially appealing for high-throughput screening. This paper introduces CL-Pred, a novel computational method for CP identification. CL-Pred leverages three feature descriptors: Dipeptide Deviation from Expected Mean (DDE), Bigram Position Specific Scoring Matrix (BiPSSM), and Position Specific Scoring Matrix-Tetra Slice-Discrete Cosine Transform (PSSM-TS-DCT). The model is trained using three classifiers: Support Vector Machine (SVM), Extremely Randomized Tree (ERT), and Light eXtreme Gradient Boosting (LiXGB). Notably, the LiXGB-based model achieves outstanding performance, demonstrating accuracies of 94.63% and 93.65% on the training and testing datasets, respectively. The proposed CL-Pred method is poised to significantly advance our comprehension of clathrin-mediated endocytosis, cellular physiology, and disease pathogenesis. Furthermore, it holds promise for identifying potential drug targets across a spectrum of diseases.
Collapse
Affiliation(s)
- Majdi Khalid
- Department of Computer Science and Artificial Intelligence, College of Computing, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Farman Ali
- Sarhad University of Science and Information Technology Peshawar, Mardan Campus, Mardan, Pakistan
| | - Wajdi Alghamdi
- Department of Information Technology, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Abdulrahman Alzahrani
- Department of Information System and Technology, College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| | - Raed Alsini
- Department of Information Systems, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ahmed Alzahrani
- College of Computer Science and Engineering, University of Jeddah, Jeddah, Saudi Arabia
| |
Collapse
|
5
|
Ramazi S, Tabatabaei SAH, Khalili E, Nia AG, Motarjem K. Analysis and review of techniques and tools based on machine learning and deep learning for prediction of lysine malonylation sites in protein sequences. Database (Oxford) 2024; 2024:baad094. [PMID: 38245002 PMCID: PMC10799748 DOI: 10.1093/database/baad094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 11/30/2023] [Accepted: 12/20/2023] [Indexed: 01/22/2024]
Abstract
The post-translational modifications occur as crucial molecular regulatory mechanisms utilized to regulate diverse cellular processes. Malonylation of proteins, a reversible post-translational modification of lysine/k residues, is linked to a variety of biological functions, such as cellular regulation and pathogenesis. This modification plays a crucial role in metabolic pathways, mitochondrial functions, fatty acid oxidation and other life processes. However, accurately identifying malonylation sites is crucial to understand the molecular mechanism of malonylation, and the experimental identification can be a challenging and costly task. Recently, approaches based on machine learning (ML) have been suggested to address this issue. It has been demonstrated that these procedures improve accuracy while lowering costs and time constraints. However, these approaches also have specific shortcomings, including inappropriate feature extraction out of protein sequences, high-dimensional features and inefficient underlying classifiers. As a result, there is an urgent need for effective predictors and calculation methods. In this study, we provide a comprehensive analysis and review of existing prediction models, tools and benchmark datasets for predicting malonylation sites in protein sequences followed by a comparison study. The review consists of the specifications of benchmark datasets, explanation of features and encoding methods, descriptions of the predictions approaches and their embedding ML or deep learning models and the description and comparison of the existing tools in this domain. To evaluate and compare the prediction capability of the tools, a new bunch of data has been extracted based on the most updated database and the tools have been assessed based on the extracted data. Finally, a hybrid architecture consisting of several classifiers including classical ML models and a deep learning model has been proposed to ensemble the prediction results. This approach demonstrates the better performance in comparison with all prediction tools included in this study (the source codes of the models presented in this manuscript are available in https://github.com/Malonylation). Database URL: https://github.com/A-Golshan/Malonylation.
Collapse
Affiliation(s)
| | - Seyed Amir Hossein Tabatabaei
- Department of Computer Science, Faculty of Mathematical Sciences, University of Guilan, Namjoo St. Postal, Rasht 41938-33697, Iran
- Department of Biophysics, Faculty of Biological Sciences, Tarbiat Modares University, Jalal AleAhmad, Tehran 14117-13116, Iran
| | - Elham Khalili
- Department of Plant Sciences, Faculty of Science, Tarbiat Modares University, Jalal AleAhmad, Tehran 14117-13116, Iran
| | - Amirhossein Golshan Nia
- Department of Mathematics and Computer Science, Amirkabir University of Technology, No. 350, Hafez Ave, Tehran 15916-34311, Iran
| | - Kiomars Motarjem
- Department of Statistics, Faculty of Mathematical Sciences, Tarbiat Modares University, Jalal AleAhmad, Tehran 14117-13116, Iran
| |
Collapse
|