Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Welvaars K, Oosterhoff JHF, van den Bekerom MPJ, Doornberg JN, van Haarst EP, OLVG Urology Consortium, and the Machine Learning Consortium van der ZeeJ Avan AndelG ALagerveldB WHoviusM CKauerP CBoevéL M Svan der KuitAMalleeWPoolmanR. Implications of resampling data to address the class imbalance problem (IRCIP): an evaluation of impact on performance between classification algorithms in medical data. JAMIA Open 2023;6:ooad033. [PMID: 37266187 PMCID: PMC10232287 DOI: 10.1093/jamiaopen/ooad033] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 04/04/2023] [Accepted: 05/11/2023] [Indexed: 06/03/2023] Open

For:	Welvaars K, Oosterhoff JHF, van den Bekerom MPJ, Doornberg JN, van Haarst EP, OLVG Urology Consortium, and the Machine Learning Consortium van der ZeeJ Avan AndelG ALagerveldB WHoviusM CKauerP CBoevéL M Svan der KuitAMalleeWPoolmanR. Implications of resampling data to address the class imbalance problem (IRCIP): an evaluation of impact on performance between classification algorithms in medical data. JAMIA Open 2023;6:ooad033. [PMID: 37266187 PMCID: PMC10232287 DOI: 10.1093/jamiaopen/ooad033] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 04/04/2023] [Accepted: 05/11/2023] [Indexed: 06/03/2023] Open

Number

Cited by Other Article(s)

Abbas M, Sahibzada KI, Shahid S, Yousaf N, Hu Y, Wei DQ. ABP-Xplorer: A Machine Learning Approach for Prediction of Antibacterial Peptides Targeting Mycobacterium abscessus-tRNA-Methyltransferase (TrmD). J Chem Inf Model 2025. [PMID: 40377983 DOI: 10.1021/acs.jcim.5c00663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/18/2025]

Ma Y, Lv H, Ma Y, Wang X, Lv L, Liang X, Wang L. Advancing preeclampsia prediction: a tailored machine learning pipeline integrating resampling and ensemble models for handling imbalanced medical data. BioData Min 2025;18:25. [PMID: 40128863 PMCID: PMC11934807 DOI: 10.1186/s13040-025-00440-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2024] [Accepted: 03/12/2025] [Indexed: 03/26/2025] Open

Abstract

BACKGROUND

Constructing a predictive model is challenging in imbalanced medical dataset (such as preeclampsia), particularly when employing ensemble machine learning algorithms.

OBJECTIVE

This study aims to develop a robust pipeline that enhances the predictive performance of ensemble machine learning models for the early prediction of preeclampsia in an imbalanced dataset.

METHODS

Our research establishes a comprehensive pipeline optimized for early preeclampsia prediction in imbalanced medical datasets. We gathered electronic health records from pregnant women at the People's Hospital of Guangxi from 2015 to 2020, with additional external validation using three public datasets. This extensive data collection facilitated the systematic assessment of various resampling techniques, varied minority-to-majority ratios, and ensemble machine learning algorithms through a structured evaluation process. We analyzed 4,608 combinations of model settings against performance metrics such as G-mean, MCC, AP, and AUC to determine the most effective configurations. Advanced statistical analyses including OLS regression, ANOVA, and Kruskal-Wallis tests were utilized to fine-tune these settings, enhancing model performance and robustness for clinical application.

RESULTS

Our analysis confirmed the significant impact of systematic sequential optimization of variables on the predictive performance of our models. The most effective configuration utilized the Inverse Weighted Gaussian Mixture Model for resampling, combined with Gradient Boosting Decision Trees algorithm, and an optimized minority-to-majority ratio of 0.09, achieving a Geometric Mean of 0.6694 (95% confidence interval: 0.5855-0.7557). This configuration significantly outperformed the baseline across all evaluated metrics, demonstrating substantial improvements in model performance.

CONCLUSIONS

This study establishes a robust pipeline that significantly enhances the predictive performance of models for preeclampsia within imbalanced datasets. Our findings underscore the importance of a strategic approach to variable optimization in medical diagnostics, offering potential for broad application in various medical contexts where class imbalance is a concern.

Collapse

Huang MW, Tsai CF, Lin WC, Lin JY. Interaction effect between data discretization and data resampling for class-imbalanced medical datasets. Technol Health Care 2025;33:1000-1013. [PMID: 40105161 DOI: 10.1177/09287329241295874] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/20/2025]

Abstract

BackgroundData discretization is an important preprocessing step in data mining for the transfer of continuous feature values to discrete ones, which allows some specific data mining algorithms to construct more effective models and facilitates the data mining process. Because many medical domain datasets are class imbalanced, data resampling methods, including oversampling, undersampling, and hybrid sampling methods, have been widely applied to rebalance the training set, facilitating effective differentiation between majority and minority classes.ObjectiveHerein, we examine the effect of incorporating both data discretization and data resampling as steps in the analytical process on the classifier performance for class-imbalanced medical datasets. The order in which these two steps are carried out is compared in the experiments.MethodsTwo experimental studies were conducted, one based on 11 two-class imbalanced medical datasets and the other using 3 multiclass imbalanced medical datasets. In addition, the two discretization algorithms employed are ChiMerge and minimum description length principle (MDLP). On the other hand, the data resampling algorithms chosen for performance comparison are Tomek links undersampling, synthetic minority oversampling technique (SMOTE) oversampling, and SMOTE-Tomek hybrid sampling algorithms. Moreover, the support vector machine (SVM), C4.5 decision tree, and random forest (RF) techniques were used to examine the classification performances of the different approaches.ResultsThe results show that on average, the combination approaches can allow the classifiers to provide higher area under the ROC curve (AUC) rates than the best baseline approach at approximately 0.8%-3.5% and 0.9%-2.5% for twoclass and multiclass imbalanced medical datasets, respectively. Particularly, the optimal results for two-class imbalanced datasets are obtained by performing the MDLP method first for data discretization and SMOTE second for oversampling, providing the highest AUC rate and requiring the least computational cost. For multiclass imbalanced datasets, performing SMOTE or SMOTE-Tomek first for data resampling and ChiMerge second for data discretization offers the best performances.ConclusionsClassifiers with oversampling can provide better performances than the baseline method without oversampling. In contrast, performing data discretization does not necessarily make the classifiers outperform the baselines. On average, the combination approaches have potential to allow the classifiers to provide higher AUC rates than the best baseline approach.

Collapse

Cruz EO, Sakowitz S, Mallick S, Le N, Chervu N, Bakhtiyar SS, Benharash P. Application of machine learning to predict in-hospital mortality after transcatheter mitral valve repair. Surgery 2024;176:1442-1449. [PMID: 39122592 DOI: 10.1016/j.surg.2024.07.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2024] [Revised: 06/14/2024] [Accepted: 07/03/2024] [Indexed: 08/12/2024]

Abstract

INTRODUCTION

Transcatheter mitral valve repair offers a minimally invasive treatment option for patients at high risk for traditional open repair. We sought to develop dynamic machine-learning risk prediction models for in-hospital mortality after transcatheter mitral valve repair using a national cohort.

METHODS

All adult hospitalization records involving transcatheter mitral valve repair were identified in the 2016-2020 Nationwide Readmissions Database. As a result of initial class imbalance, undersampling of the majority class and subsequent oversampling of the minority class using Synthetic Minority Oversampling TEchnique were employed in each cross-validation training fold. Machine-learning models were trained to predict patient mortality after transcatheter mitral valve repair and compared with traditional logistic regression. Shapley additive explanations plots were also developed to understand the relative impact of each feature used for training.

RESULTS

Among 2,450 patients included for analysis, the in-hospital mortality rate was 1.8%. Naïve Bayes and random forest models were the best at predicting transcatheter mitral valve repair postoperative mortality, with an area under the receiver operating characteristic curve of 0.83 ± 0.05 and 0.82 ± 0.04, respectively. Both models demonstrated superior ability to predict mortality relative to logistic regression (P < .001 for both). Medicare insurance coverage, comorbid liver disease, congestive heart failure, renal failure, and previous coronary artery bypass grafting were associated with greater predicted likelihood of in-hospital mortality, whereas elective surgery and private insurance coverage were linked with lower odds of mortality.

CONCLUSION

Machine-learning models significantly outperformed traditional regression methods in predicting in-hospital mortality after transcatheter mitral valve repair. Furthermore, we identified key patient factors and comorbidities linked with greater postoperative mortality. Future work and clinical validation are warranted to continue improving risk assessment in transcatheter mitral valve repair .

Collapse

Adeoye J, Su YX. Leveraging artificial intelligence for perioperative cancer risk assessment of oral potentially malignant disorders. Int J Surg 2024;110:1677-1686. [PMID: 38051932 PMCID: PMC10942172 DOI: 10.1097/js9.0000000000000979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2023] [Accepted: 11/21/2023] [Indexed: 12/07/2023]

Ikuta S, Fujikawa M, Nakajima T, Kasai M, Aihara T, Yamanaka N. Machine learning approach to predict postpancreatectomy hemorrhage following pancreaticoduodenectomy: a retrospective study. Langenbecks Arch Surg 2024;409:29. [PMID: 38183456 DOI: 10.1007/s00423-023-03223-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Accepted: 12/29/2023] [Indexed: 01/08/2024]

Liu P, Sun Y, Zhao X, Yan Y. Deep learning algorithm performance in contouring head and neck organs at risk: a systematic review and single-arm meta-analysis. Biomed Eng Online 2023;22:104. [PMID: 37915046 PMCID: PMC10621161 DOI: 10.1186/s12938-023-01159-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2023] [Accepted: 09/21/2023] [Indexed: 11/03/2023] Open

Abstract

PURPOSE

The contouring of organs at risk (OARs) in head and neck cancer radiation treatment planning is a crucial, yet repetitive and time-consuming process. Recent studies have applied deep learning (DL) algorithms to automatically contour head and neck OARs. This study aims to conduct a systematic review and meta-analysis to summarize and analyze the performance of DL algorithms in contouring head and neck OARs. The objective is to assess the advantages and limitations of DL algorithms in contour planning of head and neck OARs.

METHODS

This study conducted a literature search of Pubmed, Embase and Cochrane Library databases, to include studies related to DL contouring head and neck OARs, and the dice similarity coefficient (DSC) of four categories of OARs from the results of each study are selected as effect sizes for meta-analysis. Furthermore, this study conducted a subgroup analysis of OARs characterized by image modality and image type.

RESULTS

149 articles were retrieved, and 22 studies were included in the meta-analysis after excluding duplicate literature, primary screening, and re-screening. The combined effect sizes of DSC for brainstem, spinal cord, mandible, left eye, right eye, left optic nerve, right optic nerve, optic chiasm, left parotid, right parotid, left submandibular, and right submandibular are 0.87, 0.83, 0.92, 0.90, 0.90, 0.71, 0.74, 0.62, 0.85, 0.85, 0.82, and 0.82, respectively. For subgroup analysis, the combined effect sizes for segmentation of the brainstem, mandible, left optic nerve, and left parotid gland using CT and MRI images are 0.86/0.92, 0.92/0.90, 0.71/0.73, and 0.84/0.87, respectively. Pooled effect sizes using 2D and 3D images of the brainstem, mandible, left optic nerve, and left parotid gland for contouring are 0.88/0.87, 0.92/0.92, 0.75/0.71 and 0.87/0.85.

CONCLUSIONS

The use of automated contouring technology based on DL algorithms is an essential tool for contouring head and neck OARs, achieving high accuracy, reducing the workload of clinical radiation oncologists, and providing individualized, standardized, and refined treatment plans for implementing "precision radiotherapy". Improving DL performance requires the construction of high-quality data sets and enhancing algorithm optimization and innovation.

Collapse