1
|
Abbas M, Sahibzada KI, Shahid S, Yousaf N, Hu Y, Wei DQ. ABP-Xplorer: A Machine Learning Approach for Prediction of Antibacterial Peptides Targeting Mycobacterium abscessus-tRNA-Methyltransferase (TrmD). J Chem Inf Model 2025. [PMID: 40377983 DOI: 10.1021/acs.jcim.5c00663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/18/2025]
Abstract
Mycobacterium abscessus (MAB) infections pose a significant treatment challenge due to their intrinsic resistance to antibiotics, requiring prolonged multidrug regimens with limited success and frequent relapses. tRNA (m1G37) methyltransferase (TrmD), an enzyme essential for maintaining the reading frame during protein synthesis in MAB and other mycobacteria, is a potential therapeutic target for identifying new inhibitors. This study introduces ABP-Xplorer, a machine learning-based (ML) model designed to predict the antibacterial potential of peptides targeting MAB-TrmD ribosomal sites. A systematic evaluation of 26 machine learning models identified the Random Forest (RF) classifier as the most effective, achieving 96% accuracy. To address data set imbalance and enhance predictive reliability, the Synthetic Minority Oversampling Technique (SMOTE) was applied, improving model generalization and reducing bias. After that, an ABP-Xplorer streamlit was developed to predict positive and negative antibacterial peptides (ABP), enabling easy sequence input and classification based on predictive scoring. For validation, 12 positive peptides with high predictive scores were selected for molecular docking by HADDOCK. Docking analysis of selected peptides confirmed strong binding to TrmD, with P1, P7, P8, and P9 as top candidates. Notably, P1 exhibited the best interaction with a HADDOCK score of -102.2, followed by P7 (-93.6) and P8 (-91.4), indicating their potential for further development as TrmD inhibitors.Moreover, Ramachandran plot analysis validated the structural reliability. Future research should focus on the experimental validation of these peptides and optimizing their stability and bioavailability for therapeutic applications.
Collapse
Affiliation(s)
- Munawar Abbas
- College of Food Science and Technology, Henan University of Technology, Zhengzhou 450001, Henan, China
| | - Kashif Iqbal Sahibzada
- College of Biological Engineering, Henan University of Technology, Zhengzhou 454001, Henan, P. R. China
- Department of Health Professional Technologies, Faculty of Allied Health Sciences, The University of Lahore, Lahore 54570, Pakistan
| | - Shumaila Shahid
- School of Biochemistry and Biotechnology, University of the Punjab, Lahore 54570, Pakistan
| | - Numan Yousaf
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P. R. China
| | - Yuansen Hu
- College of Biological Engineering, Henan University of Technology, Zhengzhou 454001, Henan, P. R. China
| | - Dong-Qing Wei
- College of Food Science and Technology, Henan University of Technology, Zhengzhou 450001, Henan, China
- State Key Laboratory of Microbial Metabolism, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200030, P. R. China
- Zhongjing Research and Industrialization Institute of Chinese Medicine, Zhongguancun Scientific Park, Meixi, Nanyang, Henan 473006, P. R. China
| |
Collapse
|
2
|
Ma Y, Lv H, Ma Y, Wang X, Lv L, Liang X, Wang L. Advancing preeclampsia prediction: a tailored machine learning pipeline integrating resampling and ensemble models for handling imbalanced medical data. BioData Min 2025; 18:25. [PMID: 40128863 PMCID: PMC11934807 DOI: 10.1186/s13040-025-00440-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2024] [Accepted: 03/12/2025] [Indexed: 03/26/2025] Open
Abstract
BACKGROUND Constructing a predictive model is challenging in imbalanced medical dataset (such as preeclampsia), particularly when employing ensemble machine learning algorithms. OBJECTIVE This study aims to develop a robust pipeline that enhances the predictive performance of ensemble machine learning models for the early prediction of preeclampsia in an imbalanced dataset. METHODS Our research establishes a comprehensive pipeline optimized for early preeclampsia prediction in imbalanced medical datasets. We gathered electronic health records from pregnant women at the People's Hospital of Guangxi from 2015 to 2020, with additional external validation using three public datasets. This extensive data collection facilitated the systematic assessment of various resampling techniques, varied minority-to-majority ratios, and ensemble machine learning algorithms through a structured evaluation process. We analyzed 4,608 combinations of model settings against performance metrics such as G-mean, MCC, AP, and AUC to determine the most effective configurations. Advanced statistical analyses including OLS regression, ANOVA, and Kruskal-Wallis tests were utilized to fine-tune these settings, enhancing model performance and robustness for clinical application. RESULTS Our analysis confirmed the significant impact of systematic sequential optimization of variables on the predictive performance of our models. The most effective configuration utilized the Inverse Weighted Gaussian Mixture Model for resampling, combined with Gradient Boosting Decision Trees algorithm, and an optimized minority-to-majority ratio of 0.09, achieving a Geometric Mean of 0.6694 (95% confidence interval: 0.5855-0.7557). This configuration significantly outperformed the baseline across all evaluated metrics, demonstrating substantial improvements in model performance. CONCLUSIONS This study establishes a robust pipeline that significantly enhances the predictive performance of models for preeclampsia within imbalanced datasets. Our findings underscore the importance of a strategic approach to variable optimization in medical diagnostics, offering potential for broad application in various medical contexts where class imbalance is a concern.
Collapse
Affiliation(s)
- Yinyao Ma
- Department of Obstetrics, People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, 530016, China
| | | | - Yanhua Ma
- Department of Obstetrics, People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, 530016, China
| | | | | | - Xuxia Liang
- Department of Obstetrics, People's Hospital of Guangxi Zhuang Autonomous Region, Nanning, 530016, China.
| | - Lei Wang
- BGI Research, Wuhan, 430074, China.
- Guangdong Bigdata Engineering Technology Research Center for Life Sciences, BGI Research, Shenzhen, 518083, China.
| |
Collapse
|
3
|
Huang MW, Tsai CF, Lin WC, Lin JY. Interaction effect between data discretization and data resampling for class-imbalanced medical datasets. Technol Health Care 2025; 33:1000-1013. [PMID: 40105161 DOI: 10.1177/09287329241295874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2025]
Abstract
BackgroundData discretization is an important preprocessing step in data mining for the transfer of continuous feature values to discrete ones, which allows some specific data mining algorithms to construct more effective models and facilitates the data mining process. Because many medical domain datasets are class imbalanced, data resampling methods, including oversampling, undersampling, and hybrid sampling methods, have been widely applied to rebalance the training set, facilitating effective differentiation between majority and minority classes.ObjectiveHerein, we examine the effect of incorporating both data discretization and data resampling as steps in the analytical process on the classifier performance for class-imbalanced medical datasets. The order in which these two steps are carried out is compared in the experiments.MethodsTwo experimental studies were conducted, one based on 11 two-class imbalanced medical datasets and the other using 3 multiclass imbalanced medical datasets. In addition, the two discretization algorithms employed are ChiMerge and minimum description length principle (MDLP). On the other hand, the data resampling algorithms chosen for performance comparison are Tomek links undersampling, synthetic minority oversampling technique (SMOTE) oversampling, and SMOTE-Tomek hybrid sampling algorithms. Moreover, the support vector machine (SVM), C4.5 decision tree, and random forest (RF) techniques were used to examine the classification performances of the different approaches.ResultsThe results show that on average, the combination approaches can allow the classifiers to provide higher area under the ROC curve (AUC) rates than the best baseline approach at approximately 0.8%-3.5% and 0.9%-2.5% for twoclass and multiclass imbalanced medical datasets, respectively. Particularly, the optimal results for two-class imbalanced datasets are obtained by performing the MDLP method first for data discretization and SMOTE second for oversampling, providing the highest AUC rate and requiring the least computational cost. For multiclass imbalanced datasets, performing SMOTE or SMOTE-Tomek first for data resampling and ChiMerge second for data discretization offers the best performances.ConclusionsClassifiers with oversampling can provide better performances than the baseline method without oversampling. In contrast, performing data discretization does not necessarily make the classifiers outperform the baselines. On average, the combination approaches have potential to allow the classifiers to provide higher AUC rates than the best baseline approach.
Collapse
Affiliation(s)
- Min-Wei Huang
- Kaohsiung Municipal Kai-Syuan Psychiatric Hospital, Kaohsiung
- Department of Physical Therapy and Graduate Institute of Rehabilitation Science, China Medical University, Taichung
- School of Medicine, College of Medicine, National Sun Yat-sen University, Kaohsiung
| | - Chih-Fong Tsai
- Department of Information Management, National Central University, Taoyuan
| | - Wei-Chao Lin
- Department of Information Management, Chang Gung University, Taoyuan
- Department of Digital Financial Technology, Chang Gung University, Taoyuan
- Division of Thoracic Surgery, Chang Gung Memorial Hospital at Linkou, Taoyuan
| | - Jia-Yang Lin
- Department of Information Management, National Central University, Taoyuan
| |
Collapse
|
4
|
Cruz EO, Sakowitz S, Mallick S, Le N, Chervu N, Bakhtiyar SS, Benharash P. Application of machine learning to predict in-hospital mortality after transcatheter mitral valve repair. Surgery 2024; 176:1442-1449. [PMID: 39122592 DOI: 10.1016/j.surg.2024.07.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 06/14/2024] [Accepted: 07/03/2024] [Indexed: 08/12/2024]
Abstract
INTRODUCTION Transcatheter mitral valve repair offers a minimally invasive treatment option for patients at high risk for traditional open repair. We sought to develop dynamic machine-learning risk prediction models for in-hospital mortality after transcatheter mitral valve repair using a national cohort. METHODS All adult hospitalization records involving transcatheter mitral valve repair were identified in the 2016-2020 Nationwide Readmissions Database. As a result of initial class imbalance, undersampling of the majority class and subsequent oversampling of the minority class using Synthetic Minority Oversampling TEchnique were employed in each cross-validation training fold. Machine-learning models were trained to predict patient mortality after transcatheter mitral valve repair and compared with traditional logistic regression. Shapley additive explanations plots were also developed to understand the relative impact of each feature used for training. RESULTS Among 2,450 patients included for analysis, the in-hospital mortality rate was 1.8%. Naïve Bayes and random forest models were the best at predicting transcatheter mitral valve repair postoperative mortality, with an area under the receiver operating characteristic curve of 0.83 ± 0.05 and 0.82 ± 0.04, respectively. Both models demonstrated superior ability to predict mortality relative to logistic regression (P < .001 for both). Medicare insurance coverage, comorbid liver disease, congestive heart failure, renal failure, and previous coronary artery bypass grafting were associated with greater predicted likelihood of in-hospital mortality, whereas elective surgery and private insurance coverage were linked with lower odds of mortality. CONCLUSION Machine-learning models significantly outperformed traditional regression methods in predicting in-hospital mortality after transcatheter mitral valve repair. Furthermore, we identified key patient factors and comorbidities linked with greater postoperative mortality. Future work and clinical validation are warranted to continue improving risk assessment in transcatheter mitral valve repair .
Collapse
Affiliation(s)
- Emma O Cruz
- Division of Cardiac Surgery, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA; Department of Computer Science, Stanford University, Palo Alto, CA
| | - Sara Sakowitz
- Division of Cardiac Surgery, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA. https://www.twitter.com/sarasakowitz
| | - Saad Mallick
- Division of Cardiac Surgery, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
| | - Nguyen Le
- Division of Cardiac Surgery, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
| | - Nikhil Chervu
- Division of Cardiac Surgery, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA
| | - Syed Shahyan Bakhtiyar
- Division of Cardiac Surgery, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA; Department of Surgery, University of Colorado Denver, Aurora, CO. https://www.twitter.com/Aortologist
| | - Peyman Benharash
- Division of Cardiac Surgery, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA.
| |
Collapse
|
5
|
Adeoye J, Su YX. Leveraging artificial intelligence for perioperative cancer risk assessment of oral potentially malignant disorders. Int J Surg 2024; 110:1677-1686. [PMID: 38051932 PMCID: PMC10942172 DOI: 10.1097/js9.0000000000000979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 11/21/2023] [Indexed: 12/07/2023]
Abstract
Oral potentially malignant disorders (OPMDs) are mucosal conditions with an inherent disposition to develop oral squamous cell carcinoma. Surgical management is the most preferred strategy to prevent malignant transformation in OPMDs, and surgical approaches to treatment include conventional scalpel excision, laser surgery, cryotherapy, and photodynamic therapy. However, in reality, since all patients with OPMDs will not develop oral squamous cell carcinoma in their lifetime, there is a need to stratify patients according to their risk of malignant transformation to streamline surgical intervention for patients with the highest risks. Artificial intelligence (AI) has the potential to integrate disparate factors influencing malignant transformation for robust, precise, and personalized cancer risk stratification of OPMD patients than current methods to determine the need for surgical resection, excision, or re-excision. Therefore, this article overviews existing AI models and tools, presents a clinical implementation pathway, and discusses necessary refinements to aid the clinical application of AI-based platforms for cancer risk stratification of OPMDs in surgical practice.
Collapse
Affiliation(s)
| | - Yu-Xiong Su
- Division of Oral and Maxillofacial Surgery, Faculty of Dentistry, University of Hong Kong, Hong Kong SAR, People’s Republic of China
| |
Collapse
|
6
|
Ikuta S, Fujikawa M, Nakajima T, Kasai M, Aihara T, Yamanaka N. Machine learning approach to predict postpancreatectomy hemorrhage following pancreaticoduodenectomy: a retrospective study. Langenbecks Arch Surg 2024; 409:29. [PMID: 38183456 DOI: 10.1007/s00423-023-03223-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 12/29/2023] [Indexed: 01/08/2024]
Abstract
BACKGROUND Postpancreatectomy hemorrhage (PPH) is a rare yet dreaded complication following pancreaticoduodenectomy (PD). This retrospective study aimed to explore a machine learning (ML) model for predicting PPH in PD patients. METHODS A total of 284 patients who underwent open PD at our institute were included in the analysis. To address the issue of imbalanced data, the adaptive synthetic sampling (ADASYN) technique was employed. The best-performing ML model was selected using the PyCaret library in Python and evaluated based on recall, precision, and F1 score metrics. In addition to assessing the model's performance on the test data, bootstrap validation (n = 1000) with the original dataset was conducted. RESULTS PPH occurred in 11 patients (3.9%), with a median onset time of 22 days postoperatively. These minority cases were oversampled to 85 using ADASYN. The extra trees classifier demonstrated superior performance with recall, precision, and F1 score of 0.967, 0.914, and 0.937, respectively. Both validation using the test data and bootstrap resampling consistently demonstrated recall, precision, and F1 score exceeding 0.9. The model identified the peak value of C-reactive protein during the first 7 postoperative days as the most significant feature, followed by the preoperative neutrophil-to-lymphocyte ratio. CONCLUSIONS This study highlights the potential of the ML approach to predict PPH occurrence following PD. Vigilance and early interventions guided by such model predictions could positively impact outcomes for high-risk patients.
Collapse
Affiliation(s)
- Shinichi Ikuta
- Department of Surgery, Meiwa Hospital, 4-31 Agenaruo, Nishinomiya, Hyogo, 663-8186, Japan.
| | - Masataka Fujikawa
- Department of Surgery, Meiwa Hospital, 4-31 Agenaruo, Nishinomiya, Hyogo, 663-8186, Japan
| | - Takayoshi Nakajima
- Department of Surgery, Meiwa Hospital, 4-31 Agenaruo, Nishinomiya, Hyogo, 663-8186, Japan
| | - Meidai Kasai
- Department of Surgery, Meiwa Hospital, 4-31 Agenaruo, Nishinomiya, Hyogo, 663-8186, Japan
| | - Tsukasa Aihara
- Department of Surgery, Meiwa Hospital, 4-31 Agenaruo, Nishinomiya, Hyogo, 663-8186, Japan
| | - Naoki Yamanaka
- Department of Surgery, Meiwa Hospital, 4-31 Agenaruo, Nishinomiya, Hyogo, 663-8186, Japan
| |
Collapse
|
7
|
Liu P, Sun Y, Zhao X, Yan Y. Deep learning algorithm performance in contouring head and neck organs at risk: a systematic review and single-arm meta-analysis. Biomed Eng Online 2023; 22:104. [PMID: 37915046 PMCID: PMC10621161 DOI: 10.1186/s12938-023-01159-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 09/21/2023] [Indexed: 11/03/2023] Open
Abstract
PURPOSE The contouring of organs at risk (OARs) in head and neck cancer radiation treatment planning is a crucial, yet repetitive and time-consuming process. Recent studies have applied deep learning (DL) algorithms to automatically contour head and neck OARs. This study aims to conduct a systematic review and meta-analysis to summarize and analyze the performance of DL algorithms in contouring head and neck OARs. The objective is to assess the advantages and limitations of DL algorithms in contour planning of head and neck OARs. METHODS This study conducted a literature search of Pubmed, Embase and Cochrane Library databases, to include studies related to DL contouring head and neck OARs, and the dice similarity coefficient (DSC) of four categories of OARs from the results of each study are selected as effect sizes for meta-analysis. Furthermore, this study conducted a subgroup analysis of OARs characterized by image modality and image type. RESULTS 149 articles were retrieved, and 22 studies were included in the meta-analysis after excluding duplicate literature, primary screening, and re-screening. The combined effect sizes of DSC for brainstem, spinal cord, mandible, left eye, right eye, left optic nerve, right optic nerve, optic chiasm, left parotid, right parotid, left submandibular, and right submandibular are 0.87, 0.83, 0.92, 0.90, 0.90, 0.71, 0.74, 0.62, 0.85, 0.85, 0.82, and 0.82, respectively. For subgroup analysis, the combined effect sizes for segmentation of the brainstem, mandible, left optic nerve, and left parotid gland using CT and MRI images are 0.86/0.92, 0.92/0.90, 0.71/0.73, and 0.84/0.87, respectively. Pooled effect sizes using 2D and 3D images of the brainstem, mandible, left optic nerve, and left parotid gland for contouring are 0.88/0.87, 0.92/0.92, 0.75/0.71 and 0.87/0.85. CONCLUSIONS The use of automated contouring technology based on DL algorithms is an essential tool for contouring head and neck OARs, achieving high accuracy, reducing the workload of clinical radiation oncologists, and providing individualized, standardized, and refined treatment plans for implementing "precision radiotherapy". Improving DL performance requires the construction of high-quality data sets and enhancing algorithm optimization and innovation.
Collapse
Affiliation(s)
- Peiru Liu
- General Hospital of Northern Theater Command, Department of Radiation Oncology, Shenyang, China
- Beifang Hospital of China Medical University, Shenyang, China
| | - Ying Sun
- General Hospital of Northern Theater Command, Department of Radiation Oncology, Shenyang, China
| | - Xinzhuo Zhao
- Shenyang University of Technology, School of Electrical Engineering,, Shenyang, China
| | - Ying Yan
- General Hospital of Northern Theater Command, Department of Radiation Oncology, Shenyang, China.
| |
Collapse
|