1
|
Luo Z, Li W, Li J, Zhang Y. A new Tec family-based clinical model predicts survival in differentiated thyroid cancer patients via machine learning. Thyroid Res 2025; 18:18. [PMID: 40307932 PMCID: PMC12044924 DOI: 10.1186/s13044-025-00234-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/21/2024] [Accepted: 02/21/2025] [Indexed: 05/02/2025] Open
Abstract
BACKGROUND The Tec family of proteins has been identified as a key player in numerous diseases. However, no studies on the associations of Tec family proteins with overall survival (OS) in differentiated thyroid cancer (DTC) patients have been conducted. METHODS RNA sequencing (RNA-Seq) and clinical data were downloaded from The Cancer Genome Atlas (TCGA) database. LASSO-Cox, random forest, and eXtreme Gradient Boosting (XGBoost) analysis methods were used to screen for the genes encoding Tec family proteins that were most closely associated with DTC. A predictive model was developed to estimate the OS of DTC patients. The validity of the prediction model was evaluated via receiver operating characteristic (ROC) curves, decision curve analysis (DCA), and fivefold and 200-fold cross-validation. In addition, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses were performed to investigate the biological functions of the most significant genes. RESULTS The AC007494.3 and AC019226.2 genes were most strongly associated with the OS of DTC patients. Therefore, the model can be used to predict the OS of DTC patients. Functional annotation analysis revealed characteristics similar to those of other Tec kinases. CONCLUSIONS We found that the TEC gene has significant predictive value for the prognosis of DTC patients. The TEC gene has potential value as a target for future drug development. In addition, we recommend more comprehensive treatment and closer monitoring of high-risk populations.
Collapse
Affiliation(s)
- Ziyu Luo
- Department of Surgical Oncology, Shaanxi Provincial People's Hospital, Xi'an, Shaanxi, 710068, China
| | - Wenhan Li
- Department of Surgical Oncology, Shaanxi Provincial People's Hospital, Xi'an, Shaanxi, 710068, China
| | - Jianhui Li
- Department of Surgical Oncology, Shaanxi Provincial People's Hospital, Xi'an, Shaanxi, 710068, China
| | - Ying Zhang
- Department of Surgical Oncology, Shaanxi Provincial People's Hospital, Xi'an, Shaanxi, 710068, China.
| |
Collapse
|
2
|
Zeng S, Li L, Li J, He X. Two-stage DRG grouping of cerebral infarction based on comorbidity and complications classification. Front Public Health 2025; 13:1513744. [PMID: 40356838 PMCID: PMC12066792 DOI: 10.3389/fpubh.2025.1513744] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2024] [Accepted: 03/21/2025] [Indexed: 05/15/2025] Open
Abstract
Background Since 2017, cerebral infarction (CI) has become a leading cause of mortality in China, with rising treatment costs posing significant challenges to the healthcare system. The Diagnosis-Related Groups (DRG) payment system has been recognized as a potential solution to curb rising healthcare expenditures. However, in its implementation, China faces considerable hurdles due to its vast geographical size, regional economic disparities, and heterogeneous disease spectrum. Objective This study proposes a novel two-stage grouping strategy with a two-stage method tailored to address the local context of western China. The method adaptively accommodates regional variations in disease burden and healthcare resource distribution. Methods Using hospitalization data from 111,025 CI patients collected by the Healthcare Security Administration of a western Chinese city between 2016 and 2018 (during the pre-DRG implementation period), we developed a two-stage DRG method. In the first stage, regression analysis identified and prioritized comorbidities and complications that influence medical costs. In the second stage, a decision tree algorithm established standardized classification protocols for DRG grouping, ensuring regional adaptability. Results The average hospitalization cost for CI patients was USD$ 1,565, with total expenditures reaching USD$ 1.71 million in the target city. By employing this localized two-stage grouping model, the proportion of inter-group variations, as measured by the coefficient of variation (CV), is below 1, reaching 100%, satisfying the technical criteria for DRG categorization. This optimization reduced the number of DRG from 18 to 4. It increased the proportion of groups with CV to <0.8 from 67 to 100%, signifying a substantial enhancement in group heterogeneity compared to the existing grouping method, China Healthcare Security Diagnosis-Related Groups (CHS-DRG). Conclusion This study demonstrates the effectiveness of our proposed two-stage method using real data. Implementation of this localized method in the target city could result in potential savings of USD$ 8.59 million, surpassing the existing CHS-DRG method. These findings suggest that this adaptive method may be a scalable strategy for resource-limited regions undergoing healthcare system reforms.
Collapse
Affiliation(s)
- Siyu Zeng
- School of Logistics, Chengdu University of Information Technology, Chengdu, Sichuan, China
| | - Lele Li
- School of Labor and Human Resources, Renmin University of China, Beijing, China
- Institute for Hospital Management of Henan Province, Zhengzhou, China
| | - Jialing Li
- School of Management, Hunan University of Technology and Business, Changsha, Hunan, China
| | - Xiaozhou He
- Business School, Sichuan University, Chengdu, Sichuan, China
| |
Collapse
|
3
|
Cichosz S, Bender C. Early Detection of Elevated Ketone Bodies in Type 1 Diabetes Using Insulin and Glucose Dynamics Across Age Groups: Model Development Study. JMIR Diabetes 2025; 10:e67867. [PMID: 40209022 PMCID: PMC12005466 DOI: 10.2196/67867] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2024] [Revised: 03/09/2025] [Accepted: 03/09/2025] [Indexed: 04/12/2025] Open
Abstract
Background Diabetic ketoacidosis represents a significant and potentially life-threatening complication of diabetes, predominantly observed in individuals with type 1 diabetes (T1D). Studies have documented suboptimal adherence to diabetes management among children and adolescents, as evidenced by deficient ketone monitoring practices. Objective The aim of the study was to explore the potential for prediction of elevated ketone bodies from continuous glucose monitoring (CGM) and insulin data in pediatric and adult patients with T1D using a closed-loop system. Methods Participants used the Dexcom G6 CGM system and the iLet Bionic Pancreas system for insulin administration for up to 13 weeks. We used supervised binary classification machine learning, incorporating feature engineering to identify elevated ketone bodies (>0.6 mmol/L). Features were derived from CGM, insulin delivery data, and self-monitoring of blood glucose to develop an extreme gradient boosting-based prediction model. A total of 259 participants aged 6-79 years with over 49,000 days of full-time monitoring were included in the study. Results Among the participants, 1768 ketone samples were eligible for modeling, including 383 event samples with elevated ketone bodies (≥0.6 mmol/L). Insulin, self-monitoring of blood glucose, and current glucose measurements provided discriminative information on elevated ketone bodies (receiver operating characteristic area under the curve [ROC-AUC] 0.64-0.69). The CGM-derived features exhibited stronger discrimination (ROC-AUC 0.75-0.76). Integration of all feature types resulted in an ROC-AUC estimate of 0.82 (SD 0.01) and a precision recall-AUC of 0.53 (SD 0.03). Conclusions CGM and insulin data present a valuable avenue for early prediction of patients at risk of elevated ketone bodies. Furthermore, our findings indicate the potential application of such predictive models in both pediatric and adult populations with T1D.
Collapse
Affiliation(s)
- Simon Cichosz
- Department of Health Science and Technology, Aalborg University, Selma Lagerløfs Vej 249, Aalborg, 9260, Denmark, 45 99403809
| | - Clara Bender
- Department of Health Science and Technology, Aalborg University, Selma Lagerløfs Vej 249, Aalborg, 9260, Denmark, 45 99403809
| |
Collapse
|
4
|
Luo X, Cui X, Wang R, Cheng Y, Zhu R, Tai Y, Wu C, He J. An interpretable machine learning scoring tool for estimating time to recurrence readmissions in stroke patients. Int J Med Inform 2025; 194:105704. [PMID: 39561668 DOI: 10.1016/j.ijmedinf.2024.105704] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 11/08/2024] [Accepted: 11/12/2024] [Indexed: 11/21/2024]
Abstract
BACKGROUND Stroke recurrence readmission poses an additional burden on both patients and healthcare systems. Risk stratification aims to accurately divide patients into groups to provide targeted interventions at reducing readmission. To accurately predict short and intermediate-term risks of readmission and provide information for further temporal risk stratification, we developed and validated an interpretable machine learning risk scoring tool. METHODS In this retrospective study, all stroke admission episodes from January 1st 2015 to December 31st 2019 were obtained from the Shanghai Health and Health Development Research Centre database, which covers medical records of all patients hospitalized in 436 medical institutes in Shanghai. The outcome was time to stroke recurrence readmission within 90 days post discharge. The Score for Stroke Recurrence Readmission Prediction (SSRRP) tool was derived via an interpretable machine learning-based system for time-to-event outcomes. SSRRP as six-variable survival score includes sequelae, length of stay, type of stroke, random plasma glucose, medical expense payment, and number of hospitalizations. RESULTS A total of 339,212 S admission episodes were finally included in the whole cohort. Among them, 217,393 episodes were included in the training dataset, 54,347 episodes in the internal validation dataset, and 67,472 in the temporal validation dataset. Readmission within 90 days was documented in 33922(9.97 %) episodes, with a median time to emergency readmission of 19 days (Interquartile range: 8-43). In the temporal validation dataset, the SSRRP achieved an integrated area under the curve of 0.730(95 % CI, 0.724-0.737). In addition, SSRRP demonstrated good calibration and clinical benefit rate. CONCLUSIONS In this retrospective cohort study, the SSRRP, a parsimonious and point-based scoring tool, was developed to predict the risk of recurrent readmission for stroke. It also provided accurate information on the time to stroke readmission, enabling further temporal risk stratification and informed clinical decision-making.
Collapse
Affiliation(s)
- Xiao Luo
- Department of Military Health Statistics, Naval Medical University, Shanghai 200433, China
| | - Xin Cui
- Shanghai Health Statistics Center, Shanghai 200040, China
| | - Rui Wang
- Department of Military Health Statistics, Naval Medical University, Shanghai 200433, China
| | - Yi Cheng
- Department of Military Health Statistics, Naval Medical University, Shanghai 200433, China
| | - Ronghui Zhu
- Department of Military Health Statistics, Naval Medical University, Shanghai 200433, China
| | - Yaoyong Tai
- Department of Military Health Statistics, Naval Medical University, Shanghai 200433, China
| | - Cheng Wu
- Department of Military Health Statistics, Naval Medical University, Shanghai 200433, China.
| | - Jia He
- Department of Military Health Statistics, Naval Medical University, Shanghai 200433, China.
| |
Collapse
|
5
|
Mao Y, Liu Q, Fan H, Li E, He W, Ouyang X, Wang X, Qiu L, Dong H. Prediction Models for Post-Stroke Hospital Readmission: A Systematic Review. Public Health Nurs 2025; 42:535-546. [PMID: 39402856 DOI: 10.1111/phn.13441] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2024] [Revised: 08/29/2024] [Accepted: 09/28/2024] [Indexed: 01/07/2025]
Abstract
OBJECTIVE This study aims to evaluate the predictive performance and methodological quality of post-stroke readmission prediction models, identify key predictors associated with readmission, and provide guidance for selecting appropriate risk assessment tools. METHODS A comprehensive literature search was conducted from inception to February 1, 2024. Two independent researchers screened the literature and extracted relevant data using the CHARMS checklist. RESULTS Eleven studies and 16 prediction models were included, with sample sizes ranging from 108 to 803,124 cases and outcome event incidences between 5.2% and 50.0%. The four most frequently included predictors in the models were length of stay, hypertension, age, and functional disability. Twelve models reported an area under the curve (AUC) ranging from 0.520 to 0.940, and five models provided calibration metrics. Only one model included both internal and external validation, while six models had internal validation. Eleven studies were found to have a high risk of bias (ROB), predominantly in the area of data analysis. CONCLUSION This systematic review included 16 readmission prediction models for stroke, which generally exhibited good predictive performance and can effectively identify high-risk patients likely to be readmitted. However, the generalizability of these models remains uncertain due to methodological limitations. Rather than developing new readmission prediction models for stroke, the focus should shift toward external validation and the iterative adaptation of existing models. These models should be tailored to local settings, extended with new predictors if necessary, and presented in an interactive graphical user interface. TRIAL REGISTRATION PROSPERO registration number CRD42023466801.
Collapse
Affiliation(s)
- Yijun Mao
- Catheterization Laboratory, Xianyang Central Hospital, Xianyang City, Shaanxi Province, China
| | - Qiang Liu
- Orthopedic Surgery, Xianyang Central Hospital, Xianyang City, Shaanxi Province, China
| | - Hui Fan
- Nursing Department, Xianyang Central Hospital, Xianyang City, Shaanxi Province, China
| | - Erqing Li
- Catheterization Laboratory, Xianyang Central Hospital, Xianyang City, Shaanxi Province, China
| | - Wenjing He
- Nursing Department, Xianyang Central Hospital, Xianyang City, Shaanxi Province, China
| | - Xueqian Ouyang
- Nursing Department, Xianyang Central Hospital, Xianyang City, Shaanxi Province, China
| | - Xiaojuan Wang
- Nursing Department, Xianyang Central Hospital, Xianyang City, Shaanxi Province, China
| | - Li Qiu
- Nursing Department, Xianyang Central Hospital, Xianyang City, Shaanxi Province, China
| | - Huanni Dong
- Nursing Department, Xianyang Central Hospital, Xianyang City, Shaanxi Province, China
| |
Collapse
|
6
|
Cichosz SL, Kronborg T, Laugesen E, Hangaard S, Fleischer J, Hansen TK, Jensen MH, Poulsen PL, Vestergaard P. From Stability to Variability: Classification of Healthy Individuals, Prediabetes, and Type 2 Diabetes Using Glycemic Variability Indices from Continuous Glucose Monitoring Data. Diabetes Technol Ther 2025; 27:34-44. [PMID: 39115921 DOI: 10.1089/dia.2024.0226] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 08/28/2024]
Abstract
Objective: This study aims to investigate the continuum of glucose control from normoglycemia to dysglycemia (HbA1c ≥ 5.7%/39 mmol/mol) using metrics derived from continuous glucose monitoring (CGM). In addition, we aim to develop a machine learning-based classification model to classify dysglycemia based on observed patterns. Methods: Data from five distinct studies, each featuring at least two days of CGM, were pooled. Participants included individuals classified as healthy, with prediabetes, or with type 2 diabetes mellitus (T2DM). Various CGM indices were extracted and compared across groups. The data set was split 70/30 for training and testing two classification models (XGBoost/Logistic Regression) to differentiate between prediabetes or dysglycemia and the healthy group. Results: The analysis included 836 participants (healthy: n = 282; prediabetes: n = 133; T2DM: n = 432). Across all CGM indices, a progressive shift was observed from the healthy group to those with diabetes (P < 0.001). Statistically significant differences (P < 0.01) were noted in mean glucose, time below range, time above 140 mg/dl, mobility, multiscale complexity index, and glycemic risk index when transitioning from health to prediabetes. The XGBoost models achieved the highest receiver operating characteristic area under the curve values on the test data set ranging from 0.91 [confidence interval (CI): 0.87-0.95] (prediabetes identification) to 0.97 [CI: 0.95-0.98] (dysglycemia identification). Conclusion: Our findings demonstrate a gradual deterioration of glucose homeostasis and increased glycemic variability across the spectrum from normo- to dysglycemia, as evidenced by CGM metrics. The performance of CGM-based indices in classifying healthy individuals and those with prediabetes and diabetes is promising.
Collapse
Affiliation(s)
- Simon Lebech Cichosz
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
| | - Thomas Kronborg
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
- Steno Diabetes Center North Denmark, Aalborg University Hospital, Aalborg, Denmark
| | - Esben Laugesen
- Steno Diabetes Center Aarhus, Aarhus University Hospital, Aarhus, Denmark
- Diagnostic Center, Regional Hospital Silkeborg, Silkeborg, Denmark
| | - Stine Hangaard
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
- Steno Diabetes Center North Denmark, Aalborg University Hospital, Aalborg, Denmark
| | - Jesper Fleischer
- Steno Diabetes Center Aarhus, Aarhus University Hospital, Aarhus, Denmark
- Steno Diabetes Center Zealand, Zealand, Denmark
| | | | - Morten Hasselstrøm Jensen
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
- Department of Data Orchestration, Novo Nordisk, Søborg, Denmark
| | | | - Peter Vestergaard
- Steno Diabetes Center North Denmark, Aalborg University Hospital, Aalborg, Denmark
- Department of Endocrinology, Aalborg University Hospital, Aalborg, Denmark
| |
Collapse
|
7
|
Mbarek L, Chen S, Jin A, Pan Y, Meng X, Yang X, Xu Z, Jiang Y, Wang Y. Predicting 3-month poor functional outcomes of acute ischemic stroke in young patients using machine learning. Eur J Med Res 2024; 29:494. [PMID: 39385211 PMCID: PMC11466038 DOI: 10.1186/s40001-024-02056-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 09/09/2024] [Indexed: 10/12/2024] Open
Abstract
BACKGROUND Prediction of short-term outcomes in young patients with acute ischemic stroke (AIS) may assist in making therapy decisions. Machine learning (ML) is increasingly used in healthcare due to its high accuracy. This study aims to use a ML-based predictive model for poor 3-month functional outcomes in young AIS patients and to compare the predictive performance of ML models with the logistic regression model. METHODS We enrolled AIS patients aged between 18 and 50 years from the Third Chinese National Stroke Registry (CNSR-III), collected between 2015 and 2018. A modified Rankin Scale (mRS) ≥ 3 was a poor functional outcome at 3 months. Four ML tree models were developed: The extreme Gradient Boosting (XGBoost), Light Gradient Boosted Machine (lightGBM), Random Forest (RF), and The Gradient Boosting Decision Trees (GBDT), compared with logistic regression. We assess the model performance based on both discrimination and calibration. RESULTS A total of 2268 young patients with a mean age of 44.3 ± 5.5 years were included. Among them, (9%) had poor functional outcomes. The mRS at admission, living alone conditions, and high National Institutes of Health Stroke Scale (NIHSS) at discharge remained independent predictors of poor 3-month outcomes. The best AUC in the test group was XGBoost (AUC = 0.801), followed by GBDT, RF, and lightGBM (AUCs of 0.795, 0, 794, and 0.792, respectively). The XGBoost, RF, and lightGBM models were significantly better than logistic regression (P < 0.05). CONCLUSIONS ML outperformed logistic regression, where XGBoost the boost was the best model for predicting poor functional outcomes in young AIS patients. It is important to consider living alone conditions with high severity scores to improve stroke prognosis.
Collapse
Affiliation(s)
- Lamia Mbarek
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
| | - Siding Chen
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
- Changping Laboratory, Beijing, China
| | - Aoming Jin
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
| | - Yuesong Pan
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
| | - Xia Meng
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
| | - Xiaomeng Yang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
| | - Zhe Xu
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China
| | - Yong Jiang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China.
- Changping Laboratory, Beijing, China.
- Beijing Advanced Innovation Center for Big Data-Based Precision Medicine, Beihang University and Capital Medical University, Beijing, 100091, China.
| | - Yongjun Wang
- Department of Neurology, Beijing Tiantan Hospital, Capital Medical University, Beijing, China.
- China National Clinical Research Center for Neurological Diseases, Beijing Tiantan Hospital, Capital Medical University, No.119 South 4th Ring West Road, Fengtai District, Beijing, 100070, China.
- Changping Laboratory, Beijing, China.
- Research Unit of Artificial Intelligence in Cerebrovascular Disease, Chinese Academy of Medical Sciences, Beijing, 2019RU018, China.
- Beijing Advanced Innovation Centre for Big Data-Based Precision Medicine, Beihang University, Capital Medical University, Beijing, China.
- Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Beijing, China.
- Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, China.
| |
Collapse
|
8
|
Cichosz SL, Olesen SS, Jensen MH. Explainable Machine-Learning Models to Predict Weekly Risk of Hyperglycemia, Hypoglycemia, and Glycemic Variability in Patients With Type 1 Diabetes Based on Continuous Glucose Monitoring. J Diabetes Sci Technol 2024:19322968241286907. [PMID: 39377175 PMCID: PMC11571614 DOI: 10.1177/19322968241286907] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 10/09/2024]
Abstract
BACKGROUND AND OBJECTIVE The aim of this study was to develop and validate explainable prediction models based on continuous glucose monitoring (CGM) and baseline data to identify a week-to-week risk of CGM key metrics (hyperglycemia, hypoglycemia, glycemic variability). By having a weekly prediction of CGM key metrics, it is possible for the patient or health care personnel to take immediate preemptive action. METHODS We analyzed, trained, and internally tested three prediction models (Logistic regression, XGBoost, and TabNet) using CGM data from 187 type 1 diabetes patients with long-term CGM monitoring. A binary classification approach combined with feature engineering deployed on the CGM signals was used to predict hyperglycemia, hypoglycemia, and glycemic variability based on consensus targets (time above range ≥5%, time below range ≥4%, coefficient of variation ≥36%). The models were validated in two independent cohorts with a total of 223 additional patients of varying ages. RESULTS A total of 46 593 weeks of CGM data were included in the analysis. For the best model (XGBoost), the area under the receiver operating characteristic curve (ROC-AUC) was 0.9 [95% confidence interval (CI) = 0.89-0.91], 0.89 [95% CI = 0.88-0.9], and 0.8 [95% CI = 0.79-0.81] for predicting hyperglycemia, hypoglycemia, and glycemic variability in the interval validation, respectively. The validation test showed good generalizability of the models with ROC-AUC of 0.88 to 0.95, 0.84 to 0.89, and 0.80 to 0.82 for predicting the glycemic outcomes. CONCLUSION Prediction models based on real-world CGM data can be used to predict the risk of unstable glycemic control in the forthcoming week. The models showed good performance in both internal and external validation cohorts.
Collapse
Affiliation(s)
- Simon Lebech Cichosz
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
| | - Søren Schou Olesen
- Department of Clinical Medicine, Faculty of Medicine, Aalborg University Hospital, Aalborg, Denmark
- Mech-Sense, Centre for Pancreatic Diseases, Department of Gastroenterology and Hepatology, Aalborg University Hospital, Aalborg, Denmark
| | - Morten Hasselstrøm Jensen
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
- Data Science, Novo Nordisk A/S, Søborg, Denmark
| |
Collapse
|
9
|
Fares MY, Liu HH, da Silva Etges APB, Zhang B, Warner JJP, Olson JJ, Fedorka CJ, Khan AZ, Best MJ, Kirsch JM, Simon JE, Sanders B, Costouros JG, Zhang X, Jones P, Haas DA, Abboud JA. Utility of Machine Learning, Natural Language Processing, and Artificial Intelligence in Predicting Hospital Readmissions After Orthopaedic Surgery: A Systematic Review and Meta-Analysis. JBJS Rev 2024; 12:01874474-202408000-00011. [PMID: 39172864 DOI: 10.2106/jbjs.rvw.24.00075] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2024]
Abstract
BACKGROUND Numerous applications and strategies have been utilized to help assess the trends and patterns of readmissions after orthopaedic surgery in an attempt to extrapolate possible risk factors and causative agents. The aim of this work is to systematically summarize the available literature on the extent to which natural language processing, machine learning, and artificial intelligence (AI) can help improve the predictability of hospital readmissions after orthopaedic and spine surgeries. METHODS This is a systematic review and meta-analysis. PubMed, Embase and Google Scholar were searched, up until August 30, 2023, for studies that explore the use of AI, natural language processing, and machine learning tools for the prediction of readmission rates after orthopedic procedures. Data regarding surgery type, patient population, readmission outcomes, advanced models utilized, comparison methods, predictor sets, the inclusion of perioperative predictors, validation method, size of training and testing sample, accuracy, and receiver operating characteristics (C-statistic), among other factors, were extracted and assessed. RESULTS A total of 26 studies were included in our final dataset. The overall summary C-statistic showed a mean of 0.71 across all models, indicating a reasonable level of predictiveness. A total of 15 articles (57%) were attributed to the spine, making it the most commonly explored orthopaedic field in our study. When comparing accuracy of prediction models between different fields, models predicting readmissions after hip/knee arthroplasty procedures had a higher prediction accuracy (mean C-statistic = 0.79) than spine (mean C-statistic = 0.7) and shoulder (mean C-statistic = 0.67). In addition, models that used single institution data, and those that included intraoperative and/or postoperative outcomes, had a higher mean C-statistic than those utilizing other data sources, and that include only preoperative predictors. According to the Prediction model Risk of Bias Assessment Tool, the majority of the articles in our study had a high risk of bias. CONCLUSION AI tools perform reasonably well in predicting readmissions after orthopaedic procedures. Future work should focus on standardizing study methodologies and designs, and improving the data analysis process, in an attempt to produce more reliable and tangible results. LEVEL OF EVIDENCE Level III. See Instructions for Authors for a complete description of levels of evidence.
Collapse
Affiliation(s)
- Mohamad Y Fares
- Rothman Institute, Thomas Jefferson University Hospital, Philadelphia, Pennsylvania
| | | | | | | | - Jon J P Warner
- Department of Orthopaedic Surgery, Harvard Medical School, Boston Shoulder Institute, Massachusetts General Hospital, Boston, Massachusetts
| | | | - Catherine J Fedorka
- Cooper Bone and Joint Institute, Cooper University Hospital, Camden, New Jersey
| | - Adam Z Khan
- Department of Orthopaedic Surgery, Southern California Permanente Medical Group, Panorama City, California
| | - Matthew J Best
- Department of Orthopaedic Surgery, Johns Hopkins Hospital, Johns Hopkins University School of Medicine, Baltimore, Maryland
| | - Jacob M Kirsch
- Department of Orthopaedic Surgery, New England Baptist Hospital, Tufts University School of Medicine, Boston, Massachusetts
| | - Jason E Simon
- Department of Orthopaedic Surgery, Massachusetts General Hospital/Newton-Wellesley Hospital, Boston, Massachusetts
| | - Brett Sanders
- Center for Sports Medicine and Orthopaedics, Chattanooga, Tennessee
| | - John G Costouros
- Institute for Joint Restoration and Research, California Shoulder Center, Menlo Park, California
| | | | | | | | - Joseph A Abboud
- Rothman Institute, Thomas Jefferson University Hospital, Philadelphia, Pennsylvania
| |
Collapse
|
10
|
Lin CH, Chen YA, Jeng JS, Sun Y, Wei CY, Yeh PY, Chang WL, Fann YC, Hsu KC, Lee JT. Predicting ischemic stroke patients' prognosis changes using machine learning in a nationwide stroke registry. Med Biol Eng Comput 2024; 62:2343-2354. [PMID: 38575823 PMCID: PMC11289005 DOI: 10.1007/s11517-024-03073-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2023] [Accepted: 03/13/2024] [Indexed: 04/06/2024]
Abstract
Accurately predicting the prognosis of ischemic stroke patients after discharge is crucial for physicians to plan for long-term health care. Although previous studies have demonstrated that machine learning (ML) shows reasonably accurate stroke outcome predictions with limited datasets, to identify specific clinical features associated with prognosis changes after stroke that could aid physicians and patients in devising improved recovery care plans have been challenging. This study aimed to overcome these gaps by utilizing a large national stroke registry database to assess various prediction models that estimate how patients' prognosis changes over time with associated clinical factors. To properly evaluate the best predictive approaches currently available and avoid prejudice, this study employed three different prognosis prediction models including a statistical logistic regression model, commonly used clinical-based scores, and a latest high-performance ML-based XGBoost model. The study revealed that the XGBoost model outperformed other two traditional models, achieving an AUROC of 0.929 in predicting the prognosis changes of stroke patients followed for 3 months. In addition, the XGBoost model maintained remarkably high precision even when using only selected 20 most relevant clinical features compared to full clinical datasets used in the study. These selected features closely correlated with significant changes in clinical outcomes for stroke patients and showed to be effective for predicting prognosis changes after discharge, allowing physicians to make optimal decisions regarding their patients' recovery.
Collapse
Affiliation(s)
- Ching-Heng Lin
- Division of Intramural Research, Disorders and Stroke, National Institute of Neurological, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD, 20892, USA
- Center for Artificial Intelligence in Medicine, Chang Gung Memorial Hospital, Taoyuan, Taiwan
- Bachelor Program in Artificial Intelligence, Chang Gung University, Taoyuan, Taiwan
| | - Yi-An Chen
- Center for Artificial Intelligence in Medicine, Chang Gung Memorial Hospital, Taoyuan, Taiwan
| | - Jiann-Shing Jeng
- Stroke Center and Department of Neurology, National Taiwan University Hospital, Taipei, Taiwan
| | - Yu Sun
- Department of Neurology, En Chu Kong Hospital, New Taipei City, Taiwan
| | - Cheng-Yu Wei
- Department of Exercise and Health Promotion, College of Kinesiology and Health, Chinese Culture University, Taipei, Taiwan
| | - Po-Yen Yeh
- Department of Neurology, St. Martin de Porres Hospital, Chiayi, Taiwan
| | - Wei-Lun Chang
- Department of Neurology, Show Chwan Memorial Hospital, Changhua County, Taiwan
| | - Yang C Fann
- Division of Intramural Research, Disorders and Stroke, National Institute of Neurological, National Institutes of Health, 9000 Rockville Pike, Bethesda, MD, 20892, USA.
| | - Kai-Cheng Hsu
- Department of Medicine, China Medical University, Taichung, Taiwan.
- Artificial Intelligence Center for Medical Diagnosis, China Medical University Hospital, No. 2, Yude Rd., North Dist., Taichung, 404332, Taiwan.
- Department of Neurology, China Medical University Hospital, Taichung, Taiwan.
| | - Jiunn-Tay Lee
- Department of Neurology, Tri-Service General Hospital, National Defense Medical Center, Taipei, Taiwan, Republic of China
| |
Collapse
|
11
|
Lebech Cichosz S, Hasselstrøm Jensen M, Schou Olesen S. Development and Validation of a Machine Learning Model to Predict Weekly Risk of Hypoglycemia in Patients with Type 1 Diabetes Based on Continuous Glucose Monitoring. Diabetes Technol Ther 2024; 26:457-466. [PMID: 38215207 DOI: 10.1089/dia.2023.0532] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 01/14/2024]
Abstract
Aim: The aim of this study was to develop and validate a prediction model based on continuous glucose monitoring (CGM) data to identify a week-to-week risk profile of excessive hypoglycemia. Methods: We analyzed, trained, and internally tested two prediction models using CGM data from 205 type 1 diabetes patients with long-term CGM monitoring. A binary classification approach (XGBoost) combined with feature engineering deployed on the CGM signals was utilized to predict excessive hypoglycemia risk defined by two targets (time below range [TBR] >4% and the upper TBR 90th percentile limit) of TBR the following week. The models were validated in two independent cohorts with a total of 253 additional patients. Results: A total of 61,470 weeks of CGM data were included in the analysis. The XGBoost models had an area under the receiver operating characteristic curve (ROC-AUC) of 0.83-0.87 (95% confidence interval; 0.83-0.88) in the test dataset. The external validation showed ROC-AUCs of 0.81-0.90. The most discriminative features included the low blood glucose index, the glycemic risk assessment diabetes equation (GRADE), hypoglycemia, the TBR, waveform length, the coefficient of variation and mean glucose during the previous week. This highlights that the pattern of hypoglycemia combined with glucose variability during the past week contains information on the risk of future hypoglycemia. Conclusion: Prediction models based on real-world CGM data can be used to predict the risk of hypoglycemia in the forthcoming week. The models showed good performance in both the internal and external validation cohorts.
Collapse
Affiliation(s)
- Simon Lebech Cichosz
- Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
| | | | - Søren Schou Olesen
- Department of Clinical Medicine, Aalborg University Hospital, Aalborg, Denmark
- Department of Gastroenterology and Hepatology, Centre for Pancreatic Diseases and Mech-Sense, Aalborg University Hospital, Aalborg, Denmark
| |
Collapse
|
12
|
Darabi P, Gharibzadeh S, Khalili D, Bagherpour-Kalo M, Janani L. Optimizing cardiovascular disease mortality prediction: a super learner approach in the tehran lipid and glucose study. BMC Med Inform Decis Mak 2024; 24:97. [PMID: 38627734 PMCID: PMC11020797 DOI: 10.1186/s12911-024-02489-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 03/22/2024] [Indexed: 04/19/2024] Open
Abstract
BACKGROUND & AIM Cardiovascular disease (CVD) is the most important cause of death in the world and has a potential impact on health care costs, this study aimed to evaluate the performance of machine learning survival models and determine the optimum model for predicting CVD-related mortality. METHOD In this study, the research population was all participants in Tehran Lipid and Glucose Study (TLGS) aged over 30 years. We used the Gradient Boosting model (GBM), Support Vector Machine (SVM), Super Learner (SL), and Cox proportional hazard (Cox-PH) models to predict the CVD-related mortality using 26 features. The dataset was randomly divided into training (80%) and testing (20%). To evaluate the performance of the methods, we used the Brier Score (BS), Prediction Error (PE), Concordance Index (C-index), and time-dependent Area Under the Curve (TD-AUC) criteria. Four different clinical models were also performed to improve the performance of the methods. RESULTS Out of 9258 participants with a mean age of (SD; range) 43.74 (15.51; 20-91), 56.60% were female. The CVD death proportion was 2.5% (228 participants). The death proportion was significantly higher in men (67.98% M, 32.02% F). Based on predefined selection criteria, the SL method has the best performance in predicting CVD-related mortality (TD-AUC > 93.50%). Among the machine learning (ML) methods, The SVM has the worst performance (TD-AUC = 90.13%). According to the relative effect, age, fasting blood sugar, systolic blood pressure, smoking, taking aspirin, diastolic blood pressure, Type 2 diabetes mellitus, hip circumference, body mss index (BMI), and triglyceride were identified as the most influential variables in predicting CVD-related mortality. CONCLUSION According to the results of our study, compared to the Cox-PH model, Machine Learning models showed promising and sometimes better performance in predicting CVD-related mortality. This finding is based on the analysis of a large and diverse urban population from Tehran, Iran.
Collapse
Affiliation(s)
- Parvaneh Darabi
- Department of Biostatistics, School of Public Health, Iran University of Medical Sciences, Tehran, Iran
| | - Safoora Gharibzadeh
- Department of Epidemiology and Biostatistics, Pasteur Institute of Iran, Tehran, Iran.
| | - Davood Khalili
- Prevention of Metabolic Disorders Research Center, Research Institute for Endocrine Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
| | - Mehrdad Bagherpour-Kalo
- Department of Epidemiology and Biostatistics, School of Public health, Tehran University of Medical Sciences, Tehran, Iran
| | - Leila Janani
- Department of Biostatistics, School of Public Health, Iran University of Medical Sciences, Tehran, Iran.
- Imperial Clinical Trials Unit, School of Public Health, Imperial College London, London, UK.
| |
Collapse
|
13
|
Barnado A, Moore RP, Domenico HJ, Green S, Camai A, Suh A, Han B, Walker K, Anderson A, Caruth L, Katta A, McCoy AB, Byrne DW. Identifying antinuclear antibody positive individuals at risk for developing systemic autoimmune disease: development and validation of a real-time risk model. Front Immunol 2024; 15:1384229. [PMID: 38571954 PMCID: PMC10987951 DOI: 10.3389/fimmu.2024.1384229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2024] [Accepted: 03/08/2024] [Indexed: 04/05/2024] Open
Abstract
Objective Positive antinuclear antibodies (ANAs) cause diagnostic dilemmas for clinicians. Currently, no tools exist to help clinicians interpret the significance of a positive ANA in individuals without diagnosed autoimmune diseases. We developed and validated a risk model to predict risk of developing autoimmune disease in positive ANA individuals. Methods Using a de-identified electronic health record (EHR), we randomly chart reviewed 2,000 positive ANA individuals to determine if a systemic autoimmune disease was diagnosed by a rheumatologist. A priori, we considered demographics, billing codes for autoimmune disease-related symptoms, and laboratory values as variables for the risk model. We performed logistic regression and machine learning models using training and validation samples. Results We assembled training (n = 1030) and validation (n = 449) sets. Positive ANA individuals who were younger, female, had a higher titer ANA, higher platelet count, disease-specific autoantibodies, and more billing codes related to symptoms of autoimmune diseases were all more likely to develop autoimmune diseases. The most important variables included having a disease-specific autoantibody, number of billing codes for autoimmune disease-related symptoms, and platelet count. In the logistic regression model, AUC was 0.83 (95% CI 0.79-0.86) in the training set and 0.75 (95% CI 0.68-0.81) in the validation set. Conclusion We developed and validated a risk model that predicts risk for developing systemic autoimmune diseases and can be deployed easily within the EHR. The model can risk stratify positive ANA individuals to ensure high-risk individuals receive urgent rheumatology referrals while reassuring low-risk individuals and reducing unnecessary referrals.
Collapse
Affiliation(s)
- April Barnado
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Ryan P. Moore
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Henry J. Domenico
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Sarah Green
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Alex Camai
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Ashley Suh
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Bryan Han
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Katherine Walker
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Audrey Anderson
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Lannawill Caruth
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Anish Katta
- Division of Rheumatology & Immunology, Department of Medicine, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Allison B. McCoy
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
| | - Daniel W. Byrne
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, United States
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, TN, United States
| |
Collapse
|
14
|
Chen M, Qian D, Wang Y, An J, Meng K, Xu S, Liu S, Sun M, Li M, Pang C. Systematic Review of Machine Learning Applied to the Secondary Prevention of Ischemic Stroke. J Med Syst 2024; 48:8. [PMID: 38165495 DOI: 10.1007/s10916-023-02020-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2023] [Accepted: 11/13/2023] [Indexed: 01/03/2024]
Abstract
Ischemic stroke is a serious disease posing significant threats to human health and life, with the highest absolute and relative risks of a poor prognosis following the first occurrence, and more than 90% of strokes are attributable to modifiable risk factors. Currently, machine learning (ML) is widely used for the prediction of ischemic stroke outcomes. By identifying risk factors, predicting the risk of poor prognosis and thus developing personalized treatment plans, it effectively reduces the probability of poor prognosis, leading to more effective secondary prevention. This review includes 41 studies since 2018 that used ML algorithms to build prognostic prediction models for ischemic stroke, transient ischemic attack (TIA), and acute ischemic stroke (AIS). We analyzed in detail the risk factors used in these studies, the sources and processing methods of the required data, the model building and validation, and their application in different prediction time windows. The results indicate that among the included studies, the top five risk factors in terms of frequency were cardiovascular diseases, age, sex, national institutes of health stroke scale (NIHSS) score, and diabetes. Furthermore, 64% of the studies used single-center data, 65% of studies using imbalanced data did not perform data balancing, 88% of the studies did not utilize external validation datasets for model validation, and 72% of the studies did not provide explanations for their models. Addressing these issues is crucial for enhancing the credibility and effectiveness of the research, consequently improving the development and implementation of secondary prevention measures.
Collapse
Affiliation(s)
- Meng Chen
- School of Life Science and Technology, Changchun University of Science and Technology, Jilin Province, Changchun, 130022, People's Republic of China
| | - Dongbao Qian
- School of Life Science and Technology, Changchun University of Science and Technology, Jilin Province, Changchun, 130022, People's Republic of China
| | - Yixuan Wang
- Union Hospital of Jilin University, Jilin Province, Neurosurgery, Changchun, 130033, People's Republic of China
| | - Junyan An
- Union Hospital of Jilin University, Jilin Province, Neurosurgery, Changchun, 130033, People's Republic of China
| | - Ke Meng
- Union Hospital of Jilin University, Jilin Province, Neurosurgery, Changchun, 130033, People's Republic of China
| | - Shuai Xu
- School of Life Science and Technology, Changchun University of Science and Technology, Jilin Province, Changchun, 130022, People's Republic of China
| | - Sheng Liu
- School of Life Science and Technology, Changchun University of Science and Technology, Jilin Province, Changchun, 130022, People's Republic of China
| | - Meiyan Sun
- Union Hospital of Jilin University, Jilin Province, Neurosurgery, Changchun, 130033, People's Republic of China
| | - Miao Li
- School of Life Science and Technology, Changchun University of Science and Technology, Jilin Province, Changchun, 130022, People's Republic of China.
- Union Hospital of Jilin University, Jilin Province, Neurosurgery, Changchun, 130033, People's Republic of China.
| | - Chunying Pang
- School of Life Science and Technology, Changchun University of Science and Technology, Jilin Province, Changchun, 130022, People's Republic of China.
| |
Collapse
|
15
|
Tian J, Cui R, Song H, Zhao Y, Zhou T. Prediction of acute kidney injury in patients with liver cirrhosis using machine learning models: evidence from the MIMIC-III and MIMIC-IV. Int Urol Nephrol 2024; 56:237-247. [PMID: 37256426 DOI: 10.1007/s11255-023-03646-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Accepted: 05/23/2023] [Indexed: 06/01/2023]
Abstract
PURPOSE To develop and validate a machine learning (ML)-based prediction model for acute kidney injury (AKI) in patients with liver cirrhosis. METHODS Data on liver cirrhosis patients were extracted from the Medical Information Mart for Intensive Care III (MIMIC-III) and MIMIC-IV databases in this retrospective cohort study. ML algorithms, including random forest (RF), extreme gradient boosting (XGB), light gradient boosting machine (LGBM), and gradient boosting decision tree (GBDT) were applied to construct prediction models. Predictors were screened via univariate logistic regression, and then the models were developed with all data of the included patients. A bootstrap resampling method was adopted to validate the models. The predictive abilities of our final model were compared with those of the sequential organ failure assessment score (SOFA), simplified acute physiology score II (SAPS II), Model for End-stage Liver Disease (MELD), and MELD Na. RESULTS This study included 950 patients, of which 429 (45.16%) had AKI. Mechanical ventilation, vasopressor, international normalized ratio (INR), bilirubin, Charlson comorbidity index (CCI), prothrombin time (PT), estimated glomerular filtration rate (EGFR), partial thromboplastin time (PTT), and heart rate served as predictors. In the derivation set, the developed RF [area under curve (AUC) = 0.747], XGB (AUC = 0.832), LGBM (AUC = 0.785), and GBDT (AUC = 0.811) models exhibited significantly greater predictive performance than the logistic regression model (AUC = 0.699) (all P < 0.05). Among the ML-based models, the XGB model had the greatest AUC. In internal validation, the predictive capacity of the XGB model (AUC = 0.833) was significantly superior to that of the logistic regression model (AUC = 0.701) (P = 0.045). Hence, the XGB model was selected as the final model for AKI prediction. In contrast to the XGB model (AUC = 0.832), the SOFA (AUC = 0.609), MELD (AUC = 0.690), MELD Na (AUC = 0.690), and SAPS II (AUC = 0.641) had significantly lower predictive abilities in the derivation set (all P < 0.001). The XGB model was internally validated to have an AUC of 0.833, which was significantly higher than the SOFA (AUC = 0.609), MELD (AUC = 0.690), MELD Na (AUC = 0.688), and SAPS II (AUC = 0.641) (all P < 0.05). CONCLUSION The XGB model had a better performance than the logistic regression model, SOFA, MELD, MELD Na, and SAPS II in AKI prediction for cirrhosis patients, which may help identify patients at a risk of AKI, and then provide timely interventions.
Collapse
Affiliation(s)
- Jia Tian
- Department of Nephrology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang, People's Republic of China
| | - Rui Cui
- Department of Nephrology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang, People's Republic of China
| | - Huinan Song
- Department of Nephrology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang, People's Republic of China
| | - Yingzi Zhao
- Department of Nephrology, The Fourth Affiliated Hospital of Harbin Medical University, Harbin, 150001, Heilongjiang, People's Republic of China
| | - Ting Zhou
- The Ward No. 2, Department of Gastroenterology, The Fourth Affiliated Hospital of Harbin Medical University, No. 37 Yiyuan Street, Nangang District, Harbin, 150001, Heilongjiang, People's Republic of China.
| |
Collapse
|
16
|
Mohammadi T, Hooshanginezhad Z, Mohammadi B, Dolatshahi S. The association of stroke risk factors with the future thickness of carotid atherosclerotic plaques. Neurol Res 2023; 45:818-826. [PMID: 37125820 DOI: 10.1080/01616412.2023.2208484] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 04/23/2023] [Indexed: 05/02/2023]
Abstract
OBJECTIVES An advancing atherosclerotic plaque is a risk factor for stroke. We conducted this study to assess the relationship between risk factors of stroke with changing in the thickness of carotid plaques thickness evident on sonography. METHODS We carried out a secondary analysis of data from a study on carotid bifurcation plaques. Data were collected in the sonography laboratories of two university hospitals. In total, 564 (240; 42.6% men) patients with atherosclerotic plaques in the carotid bifurcation and internal carotid artery with stenosis ≥ 30% evident on duplex sonography were included. We developed machine learning models using an extreme gradient boosting algorithm with Shapley additive explanation method to find important risk factors and their interactions. The outcome was a change in the carotid plaque thickness after 36 months, and the predictors were initial plaque thickness and the risk factors of stroke. RESULTS Two regression models were developed for left and right carotid arteries. The R-squared values were 0.964 for the left, and 0.993 for the right model. Overall, the three top features were BMI, age, and initial plaque thickness for both left and right plaques. However, the risk factors of stroke showed stronger interaction in predicting plaque thickening of the left carotid more than the right carotid artery. DISCUSSION The effect of each predictor on plaque thickness is complicated by interactions with other risk factors, particularly for the left carotid artery. The side of carotid artery involvement should be considered for stroke prevention.
Collapse
Affiliation(s)
- Tanya Mohammadi
- College of Science, School of Mathematics, Statistics, and Computer Science, University of Tehran, Tehran, Iran
| | - Zahra Hooshanginezhad
- School of Medicine, Department of Cardiology, Jahrom University of Medical Sciences, Jahrom, Iran
| | | | - Sina Dolatshahi
- Shahid Rajaiee Heart Center, School of Medicine, Iran University of Medical Sciences, Tehran, Iran
| |
Collapse
|
17
|
Rahman MS, Rahman HR, Prithula J, Chowdhury MEH, Ahmed MU, Kumar J, Murugappan M, Khan MS. Heart Failure Emergency Readmission Prediction Using Stacking Machine Learning Model. Diagnostics (Basel) 2023; 13:diagnostics13111948. [PMID: 37296800 DOI: 10.3390/diagnostics13111948] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2023] [Revised: 05/16/2023] [Accepted: 05/26/2023] [Indexed: 06/12/2023] Open
Abstract
Heart failure is a devastating disease that has high mortality rates and a negative impact on quality of life. Heart failure patients often experience emergency readmission after an initial episode, often due to inadequate management. A timely diagnosis and treatment of underlying issues can significantly reduce the risk of emergency readmissions. The purpose of this project was to predict emergency readmissions of discharged heart failure patients using classical machine learning (ML) models based on Electronic Health Record (EHR) data. The dataset used for this study consisted of 166 clinical biomarkers from 2008 patient records. Three feature selection techniques were studied along with 13 classical ML models using five-fold cross-validation. A stacking ML model was trained using the predictions of the three best-performing models for final classification. The stacking ML model provided an accuracy, precision, recall, specificity, F1-score, and area under the curve (AUC) of 89.41%, 90.10%, 89.41%, 87.83%, 89.28%, and 0.881, respectively. This indicates the effectiveness of the proposed model in predicting emergency readmissions. The healthcare providers can intervene pro-actively to reduce emergency hospital readmission risk and improve patient outcomes and decrease healthcare costs using the proposed model.
Collapse
Affiliation(s)
- Md Sohanur Rahman
- Department of Electrical and Electronics Engineering, University of Dhaka, Dhaka 1000, Bangladesh
| | - Hasib Ryan Rahman
- Department of Electrical and Electronics Engineering, University of Dhaka, Dhaka 1000, Bangladesh
| | - Johayra Prithula
- Department of Electrical and Electronics Engineering, University of Dhaka, Dhaka 1000, Bangladesh
| | | | - Mosabber Uddin Ahmed
- Department of Electrical and Electronics Engineering, University of Dhaka, Dhaka 1000, Bangladesh
| | - Jaya Kumar
- Department of Physiology, Faculty of Medicine, University Kebangsaan Malaysia, Kuala Lumpur 56000, Malaysia
| | - M Murugappan
- Intelligent Signal Processing (ISP) Research Lab, Department of Electronics and Communication Engineering, Kuwait College of Science and Technology, Block 4, Doha 13133, Kuwait
| | | |
Collapse
|
18
|
Quinino RM, Agena F, Modelli de Andrade LG, Furtado M, Chiavegatto Filho ADP, David-Neto E. A Machine Learning Prediction Model for Immediate Graft Function After Deceased Donor Kidney Transplantation. Transplantation 2023; 107:1380-1389. [PMID: 36872507 DOI: 10.1097/tp.0000000000004510] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/07/2023]
Abstract
BACKGROUND After kidney transplantation (KTx), the graft can evolve from excellent immediate graft function (IGF) to total absence of function requiring dialysis. Recipients with IGF do not seem to benefit from using machine perfusion, an expensive procedure, in the long term when compared with cold storage. This study proposes to develop a prediction model for IGF in KTx deceased donor patients using machine learning algorithms. METHODS Unsensitized recipients who received their first KTx deceased donor between January 1, 2010, and December 31, 2019, were classified according to the conduct of renal function after transplantation. Variables related to the donor, recipient, kidney preservation, and immunology were used. The patients were randomly divided into 2 groups: 70% were assigned to the training and 30% to the test group. Popular machine learning algorithms were used: eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine, Gradient Boosting classifier, Logistic Regression, CatBoost classifier, AdaBoost classifier, and Random Forest classifier. Comparative performance analysis on the test dataset was performed using the results of the AUC values, sensitivity, specificity, positive predictive value, negative predictive value, and F1 score. RESULTS Of the 859 patients, 21.7% (n = 186) had IGF. The best predictive performance resulted from the eXtreme Gradient Boosting model (AUC, 0.78; 95% CI, 0.71-0.84; sensitivity, 0.64; specificity, 0.78). Five variables with the highest predictive value were identified. CONCLUSIONS Our results indicated the possibility of creating a model for the prediction of IGF, enhancing the selection of patients who would benefit from an expensive treatment, as in the case of machine perfusion preservation.
Collapse
Affiliation(s)
- Raquel M Quinino
- Renal Transplant Service, Hospital das Clinicas, University of São Paulo School of Medicine, São Paulo, Brazil
| | - Fabiana Agena
- Renal Transplant Service, Hospital das Clinicas, University of São Paulo School of Medicine, São Paulo, Brazil
| | | | - Mariane Furtado
- Department of Epidemiology, School of Public Health, University of São Paulo, São Paulo, Brazil
| | | | - Elias David-Neto
- Renal Transplant Service, Hospital das Clinicas, University of São Paulo School of Medicine, São Paulo, Brazil
| |
Collapse
|
19
|
Miceli G, Basso MG, Rizzo G, Pintus C, Cocciola E, Pennacchio AR, Tuttolomondo A. Artificial Intelligence in Acute Ischemic Stroke Subtypes According to Toast Classification: A Comprehensive Narrative Review. Biomedicines 2023; 11:1138. [PMID: 37189756 PMCID: PMC10135701 DOI: 10.3390/biomedicines11041138] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2023] [Revised: 03/29/2023] [Accepted: 04/06/2023] [Indexed: 05/17/2023] Open
Abstract
The correct recognition of the etiology of ischemic stroke (IS) allows tempestive interventions in therapy with the aim of treating the cause and preventing a new cerebral ischemic event. Nevertheless, the identification of the cause is often challenging and is based on clinical features and data obtained by imaging techniques and other diagnostic exams. TOAST classification system describes the different etiologies of ischemic stroke and includes five subtypes: LAAS (large-artery atherosclerosis), CEI (cardio embolism), SVD (small vessel disease), ODE (stroke of other determined etiology), and UDE (stroke of undetermined etiology). AI models, providing computational methodologies for quantitative and objective evaluations, seem to increase the sensitivity of main IS causes, such as tomographic diagnosis of carotid stenosis, electrocardiographic recognition of atrial fibrillation, and identification of small vessel disease in magnetic resonance images. The aim of this review is to provide overall knowledge about the most effective AI models used in the differential diagnosis of ischemic stroke etiology according to the TOAST classification. According to our results, AI has proven to be a useful tool for identifying predictive factors capable of subtyping acute stroke patients in large heterogeneous populations and, in particular, clarifying the etiology of UDE IS especially detecting cardioembolic sources.
Collapse
Affiliation(s)
- Giuseppe Miceli
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties (ProMISE), Università Degli Studi di Palermo, Piazza Delle Cliniche 2, 90127 Palermo, Italy
- Internal Medicine and Stroke Care Ward, University Hospital, Policlinico “P. Giaccone”, 90141 Palermo, Italy
| | - Maria Grazia Basso
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties (ProMISE), Università Degli Studi di Palermo, Piazza Delle Cliniche 2, 90127 Palermo, Italy
- Internal Medicine and Stroke Care Ward, University Hospital, Policlinico “P. Giaccone”, 90141 Palermo, Italy
| | - Giuliana Rizzo
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties (ProMISE), Università Degli Studi di Palermo, Piazza Delle Cliniche 2, 90127 Palermo, Italy
- Internal Medicine and Stroke Care Ward, University Hospital, Policlinico “P. Giaccone”, 90141 Palermo, Italy
| | - Chiara Pintus
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties (ProMISE), Università Degli Studi di Palermo, Piazza Delle Cliniche 2, 90127 Palermo, Italy
- Internal Medicine and Stroke Care Ward, University Hospital, Policlinico “P. Giaccone”, 90141 Palermo, Italy
| | - Elena Cocciola
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties (ProMISE), Università Degli Studi di Palermo, Piazza Delle Cliniche 2, 90127 Palermo, Italy
- Internal Medicine and Stroke Care Ward, University Hospital, Policlinico “P. Giaccone”, 90141 Palermo, Italy
| | - Andrea Roberta Pennacchio
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties (ProMISE), Università Degli Studi di Palermo, Piazza Delle Cliniche 2, 90127 Palermo, Italy
- Internal Medicine and Stroke Care Ward, University Hospital, Policlinico “P. Giaccone”, 90141 Palermo, Italy
| | - Antonino Tuttolomondo
- Department of Health Promotion, Mother and Child Care, Internal Medicine and Medical Specialties (ProMISE), Università Degli Studi di Palermo, Piazza Delle Cliniche 2, 90127 Palermo, Italy
- Internal Medicine and Stroke Care Ward, University Hospital, Policlinico “P. Giaccone”, 90141 Palermo, Italy
| |
Collapse
|
20
|
Chung CC, Su ECY, Chen JH, Chen YT, Kuo CY. XGBoost-Based Simple Three-Item Model Accurately Predicts Outcomes of Acute Ischemic Stroke. Diagnostics (Basel) 2023; 13:diagnostics13050842. [PMID: 36899986 PMCID: PMC10000880 DOI: 10.3390/diagnostics13050842] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2023] [Revised: 02/19/2023] [Accepted: 02/21/2023] [Indexed: 02/25/2023] Open
Abstract
An all-inclusive and accurate prediction of outcomes for patients with acute ischemic stroke (AIS) is crucial for clinical decision-making. This study developed extreme gradient boosting (XGBoost)-based models using three simple factors-age, fasting glucose, and National Institutes of Health Stroke Scale (NIHSS) scores-to predict the three-month functional outcomes after AIS. We retrieved the medical records of 1848 patients diagnosed with AIS and managed at a single medical center between 2016 and 2020. We developed and validated the predictions and ranked the importance of each variable. The XGBoost model achieved notable performance, with an area under the curve of 0.8595. As predicted by the model, the patients with initial NIHSS score > 5, aged over 64 years, and fasting blood glucose > 86 mg/dL were associated with unfavorable prognoses. For patients receiving endovascular therapy, fasting glucose was the most important predictor. The NIHSS score at admission was the most significant predictor for those who received other treatments. Our proposed XGBoost model showed a reliable predictive power of AIS outcomes using readily available and simple predictors and also demonstrated the validity of the model for application in patients receiving different AIS treatments, providing clinical evidence for future optimization of AIS treatment strategies.
Collapse
Affiliation(s)
- Chen-Chih Chung
- Department of Neurology, Taipei Medical University—Shuang Ho Hospital, New Taipei City 235, Taiwan
- Department of Neurology, School of Medicine, College of Medicine, Taipei Medical University, Taipei City 110, Taiwan
- Taipei Neuroscience Institute, Taipei Medical University—Shuang Ho Hospital, New Taipei City 235, Taiwan
| | - Emily Chia-Yu Su
- Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, Taipei City 110, Taiwan
- Clinical Big Data Research Center, Taipei Medical University Hospital, Taipei City 110, Taiwan
| | - Jia-Hung Chen
- Department of Neurology, Taipei Medical University—Shuang Ho Hospital, New Taipei City 235, Taiwan
- Department of Neurology, School of Medicine, College of Medicine, Taipei Medical University, Taipei City 110, Taiwan
- Taipei Neuroscience Institute, Taipei Medical University—Shuang Ho Hospital, New Taipei City 235, Taiwan
| | - Yi-Tui Chen
- Smart Healthcare Interdisciplinary College, National Taipei University of Nursing and Health Sciences, Taipei City 112, Taiwan
- Department of Health Care Management, College of Health Technology, National Taipei University of Nursing and Health Sciences, Taipei City 112, Taiwan
- Department of Education and Research, Taipei City Hospital, Taipei City 103, Taiwan
| | - Chao-Yang Kuo
- Smart Healthcare Interdisciplinary College, National Taipei University of Nursing and Health Sciences, Taipei City 112, Taiwan
- Correspondence: ; Tel.: +886-2-28227101 (ext. 1385)
| |
Collapse
|
21
|
Song W, Liu Y, Qiu L, Qing J, Li A, Zhao Y, Li Y, Li R, Zhou X. Machine learning-based warning model for chronic kidney disease in individuals over 40 years old in underprivileged areas, Shanxi Province. Front Med (Lausanne) 2023; 9:930541. [PMID: 36698845 PMCID: PMC9868668 DOI: 10.3389/fmed.2022.930541] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Accepted: 12/19/2022] [Indexed: 01/11/2023] Open
Abstract
Introduction Chronic kidney disease (CKD) is a progressive disease with high incidence but early imperceptible symptoms. Since China's rural areas are subject to inadequate medical check-ups and single disease screening programme, it could easily translate into end-stage renal failure. This study aimed to construct an early warning model for CKD tailored to impoverished areas by employing machine learning (ML) algorithms with easily accessible parameters from ten rural areas in Shanxi Province, thereby, promoting a forward shift of treatment time and improving patients' quality of life. Methods From April to November 2019, CKD opportunistic screening was carried out in 10 rural areas in Shanxi Province. First, general information, physical examination data, blood and urine specimens were collected from 13,550 subjects. Afterward, feature selection of explanatory variables was performed using LASSO regression, and target datasets were balanced using the SMOTE (synthetic minority over-sampling technique) algorithm, i.e., albuminuria-to-creatinine ratio (ACR) and α1-microglobulin-to-creatinine ratio (MCR). Next, Bagging, Random Forest (RF) and eXtreme Gradient Boosting (XGBoost) were employed for classification of ACR outcomes and MCR outcomes, respectively. Results 12,330 rural residents were included in this study, with 20 explanatory variables. The cases with increased ACR and increased MCR represented 1,587 (12.8%) and 1,456 (11.8%), respectively. After conducting LASSO, 14 and 15 explanatory variables remained in these two datasets, respectively. Bagging, RF, and XGBoost performed well in classification, with the AUC reaching 0.74, 0.87, 0.87, 0.89 for ACR outcomes and 0.75, 0.88, 0.89, 0.90 for MCR outcomes. The five variables contributing most to the classification of ACR outcomes and MCR outcomes constituted SBP, TG, TC, and Hcy, DBP and age, TG, SBP, Hcy and FPG, respectively. Overall, the machine learning algorithms could emerge as a warning model for CKD. Conclusion ML algorithms in conjunction with rural accessible indexes boast good performance in classification, which allows for an early warning model for CKD. This model could help achieve large-scale population screening for CKD in poverty-stricken areas and should be promoted to improve the quality of life and reduce the mortality rate.
Collapse
Affiliation(s)
- Wenzhu Song
- School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Yanfeng Liu
- Department of Nephrology, Shanxi Provincial People’s Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China
| | - Lixia Qiu
- School of Public Health, Shanxi Medical University, Taiyuan, Shanxi, China
| | - Jianbo Qing
- Department of Nephrology, Shanxi Provincial People’s Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China
| | - Aizhong Li
- Shanxi Provincial Key Laboratory of Kidney Disease, Taiyuan, China
| | - Yan Zhao
- Department of Nephrology, Shanxi Provincial People’s Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China
| | - Yafeng Li
- Department of Nephrology, Shanxi Provincial People’s Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China,Shanxi Provincial Key Laboratory of Kidney Disease, Taiyuan, China,Core Laboratory, Shanxi Provincial People’s Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China,Academy of Microbial Ecology, Shanxi Medical University, Taiyuan, China
| | - Rongshan Li
- Department of Nephrology, Shanxi Provincial People’s Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China,Shanxi Provincial Key Laboratory of Kidney Disease, Taiyuan, China,*Correspondence: Rongshan Li,
| | - Xiaoshuang Zhou
- Department of Nephrology, Shanxi Provincial People’s Hospital (Fifth Hospital) of Shanxi Medical University, Taiyuan, China,Xiaoshuang Zhou,
| |
Collapse
|
22
|
Mao Y, Lan H, Lin W, Liang J, Huang H, Li L, Wen J, Chen G. Machine learning algorithms are comparable to conventional regression models in predicting distant metastasis of follicular thyroid carcinoma. Clin Endocrinol (Oxf) 2023; 98:98-109. [PMID: 35171531 DOI: 10.1111/cen.14693] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 01/28/2022] [Accepted: 02/03/2022] [Indexed: 12/16/2022]
Abstract
OBJECTIVE Distant metastasis often indicates a poor prognosis, so early screening and diagnosis play a significant role. Our study aims to construct and verify a predictive model based on machine learning (ML) algorithms that can estimate the risk of distant metastasis of newly diagnosed follicular thyroid carcinoma (FTC). DESIGN This was a retrospective study based on the Surveillance, Epidemiology, and End Results (SEER) database from 2004 to 2015. PATIENTS A total of 5809 FTC patients were included in the data analysis. Among them, there were 214 (3.68%) cases with distant metastasis. METHOD Univariate and multivariate logistic regression (LR) analyses were used to determine independent risk factors. Seven commonly used ML algorithms were applied for predictive model construction. We used the area under the receiver-operating characteristic (AUROC) curve to select the best ML algorithm. The optimal model was trained through 10-fold cross-validation and visualized by SHapley Additive exPlanations (SHAP). Finally, we compared it with the traditional LR method. RESULTS In terms of predicting distant metastasis, the AUROCs of the seven ML algorithms were 0.746-0.836 in the test set. Among them, the Extreme Gradient Boosting (XGBoost) had the best prediction performance, with an AUROC of 0.836 (95% confidence interval [CI]: 0.775-0.897). After 10-fold cross-validation, its predictive power could reach the best [AUROC: 0.855 (95% CI: 0.803-0.906)], which was slightly higher than the classic binary LR model [AUROC: 0.845 (95% CI: 0.818-0.873)]. CONCLUSIONS The XGBoost approach was comparable to the conventional LR method for predicting the risk of distant metastasis for FTC.
Collapse
Affiliation(s)
- Yaqian Mao
- Department of Endocrinology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, Fujian, China
| | - Huiyu Lan
- Department of Endocrinology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, Fujian, China
| | - Wei Lin
- Department of Endocrinology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, Fujian, China
| | - Jixing Liang
- Department of Endocrinology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, Fujian, China
| | - Huibin Huang
- Department of Endocrinology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, Fujian, China
| | - Liantao Li
- Department of Endocrinology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, Fujian, China
| | - Junping Wen
- Department of Endocrinology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, Fujian, China
| | - Gang Chen
- Department of Endocrinology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, Fujian, China
- Fujian Provincial Key Laboratory of Medical Analysis, Fujian Academy of Medical, Fuzhou, Fujian, China
| |
Collapse
|
23
|
Application of an Interpretable Machine Learning Model to Predict Lymph Node Metastasis in Patients with Laryngeal Carcinoma. JOURNAL OF ONCOLOGY 2022; 2022:6356399. [PMID: 36411795 PMCID: PMC9675609 DOI: 10.1155/2022/6356399] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/14/2022] [Revised: 10/01/2022] [Accepted: 10/03/2022] [Indexed: 11/13/2022]
Abstract
Objectives A more accurate preoperative prediction of lymph node metastasis (LNM) plays a decisive role in the selection of treatment in patients with laryngeal carcinoma (LC). This study aimed to develop a machine learning (ML) prediction model for predicting LNM in patients with LC. Methods We collected and retrospectively analysed 4887 LC patients with detailed demographical characteristics including age at diagnosis, race, sex, primary site, histology, number of tumours, T-stage, grade, and tumour size in the National Institutes of Health (NIH) Surveillance, Epidemiology, and End Results (SEER) database from 2005 to 2015. A correlation analysis of all variables was evaluated by the Pearson correlation. Independent risk factors for LC patients with LNM were identified by univariate and multivariate logistic regression analyses. Afterward, patients were randomly divided into training and test sets in a ratio of 8 to 2. On this basis, we established logistic regression (LR), k-nearest neighbor (KNN), support vector machine (SVM), extreme gradient boosting (XGBoost), random forest (RF), and light gradient boosting machine (LightGBM) algorithm models based on ML. The area under the receiver operating characteristic curve (AUC) value, accuracy, precision, recall rate, F1-score, specificity, and Brier score was adopted to evaluate and compare the prediction performance of the models. Finally, the Shapley additive explanation (SHAP) method was used to interpret the association between each feature variable and target variables based on the best model. Results Of the 4887 total LC patients, 3409 were without LNM (69.76%), and 1478 had LNM (30.24%). The result of the Pearson correlation showed that variables were weakly correlated with each other. The independent risk factors for LC patients with LNM were age at diagnosis, race, primary site, number of tumours, tumour size, grade, and T-stage. Among six models, XGBoost displayed a better performance for predicting LNM, with five performance metrics outperforming other models in the training set (AUC: 0.791 (95% CI: 0.776–0.806), accuracy: 0.739, recall rate: 0.638, F1-score: 0.663, and Brier score: 0.165), and similar results were observed in the test set. Moreover, the SHAP value of XGBoost was calculated, and the result showed that the three features, T-stage, primary site, and grade, had the greatest impact on predicting the outcomes. Conclusions The XGBoost model performed better and can be applied to forecast the LNM of LC, offering a valuable and significant reference for clinicians in advanced decision-making.
Collapse
|
24
|
Mao Y, Zhu Z, Pan S, Lin W, Liang J, Huang H, Li L, Wen J, Chen G. Value of machine learning algorithms for predicting diabetes risk: A subset analysis from a real-world retrospective cohort study. J Diabetes Investig 2022; 14:309-320. [PMID: 36345236 PMCID: PMC9889616 DOI: 10.1111/jdi.13937] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 10/04/2022] [Accepted: 10/16/2022] [Indexed: 11/11/2022] Open
Abstract
AIMS/INTRODUCTION To compare the application value of different machine learning (ML) algorithms for diabetes risk prediction. MATERIALS AND METHODS This is a 3-year retrospective cohort study with a total of 3,687 participants being included in the data analysis. Modeling variable screening and predictive model building were carried out using logistic regression (LR) analysis and 10-fold cross-validation, respectively. In total, six different ML algorithms, including random forests, light gradient boosting machine, extreme gradient boosting, adaptive boosting (AdaBoost), multi-layer perceptrons and gaussian naive bayes were used for model construction. Model performance was mainly evaluated by the area under the receiver operating characteristic curve. The best performing ML model was selected for comparison with the traditional LR model and visualized using Shapley additive explanations. RESULTS A total of eight risk factors most associated with the development of diabetes were identified by univariate and multivariate LR analysis, and they were visualized in the form of a nomogram. Among the six different ML models, the random forests model had the best predictive performance. After 10-fold cross-validation, its optimal model has an area under the receiver operating characteristic value of 0.855 (95% confidence interval [CI] 0.823-0.886) in the training set and 0.835 (95% CI 0.779-0.892) in the test set. In the traditional LR model, its area under the receiver operating characteristic value is 0.840 (95% CI 0.814-0.866) in the training set and 0.834 (95% CI 0.785-0.884) in the test set. CONCLUSIONS In the real-world epidemiological research, the combination of traditional variable screening and ML algorithm to construct a diabetes risk prediction model has satisfactory clinical application value.
Collapse
Affiliation(s)
- Yaqian Mao
- Department of Internal Medicine, Fujian Provincial Hospital South BranchShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Zheng Zhu
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Shuyao Pan
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Wei Lin
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Jixing Liang
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Huibin Huang
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Liantao Li
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Junping Wen
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina
| | - Gang Chen
- Department of Endocrinology, Fujian Provincial HospitalShengli Clinical Medical College of Fujian Medical UniversityFuzhouChina,Fujian Provincial Key Laboratory of Medical Analysis, Fujian Academy of MedicalFuzhouChina
| |
Collapse
|
25
|
Park HY, Park D, Kang HS, Kim H, Lee S, Im S. Post-stroke respiratory complications using machine learning with voice features from mobile devices. Sci Rep 2022; 12:16682. [PMID: 36202829 PMCID: PMC9537337 DOI: 10.1038/s41598-022-20348-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Accepted: 09/12/2022] [Indexed: 11/11/2022] Open
Abstract
Abnormal voice may identify those at risk of post-stroke aspiration. This study was aimed to determine whether machine learning algorithms with voice recorded via a mobile device can accurately classify those with dysphagia at risk of tube feeding and post-stroke aspiration pneumonia and be used as digital biomarkers. Voice samples from patients referred for swallowing disturbance in a university-affiliated hospital were collected prospectively using a mobile device. Subjects that required tube feeding were further classified to high risk of respiratory complication, based on the voluntary cough strength and abnormal chest x-ray images. A total of 449 samples were obtained, with 234 requiring tube feeding and 113 showing high risk of respiratory complications. The eXtreme gradient boosting multimodal models that included abnormal acoustic features and clinical variables showed high sensitivity levels of 88.7% (95% CI 82.6–94.7) and 84.5% (95% CI 76.9–92.1) in the classification of those at risk of tube feeding and at high risk of respiratory complications; respectively. In both cases, voice features proved to be the strongest contributing factors in these models. Voice features may be considered as viable digital biomarkers in those at risk of respiratory complications related to post-stroke dysphagia.
Collapse
Affiliation(s)
- Hae-Yeon Park
- Department of Rehabilitation Medicine, Seoul St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - DoGyeom Park
- Graduate School of Artificial Intelligence, Pohang University of Science and Technology (POSTECH), Pohang, Republic of Korea
| | - Hye Seon Kang
- Department of Pulmonary, Allergy and Critical Care Medicine, Bucheon St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea.,Department of Internal Medicine, Bucheon St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - HyunBum Kim
- Department of Otolaryngology-Head and Neck Surgery, Yeouido St. Mary's Hospital, College of Medicine, The Catholic University of Korea, Seoul, Republic of Korea
| | - Seungchul Lee
- Graduate School of Artificial Intelligence, Pohang University of Science and Technology (POSTECH), Pohang, Republic of Korea. .,Department of Mechanical Engineering, Pohang University of Science and Technology (POSTECH), 223, 5th Engineering Building, 77 Cheongam-Ro, Nam-Gu, Pohang, 37673, Gyeongbuk, Republic of Korea.
| | - Sun Im
- Department of Rehabilitation Medicine, Bucheon St. Mary's Hospital, College of Medicine, Catholic University of Korea, 327 Sosa-ro, Seoul, Bucheon-si, 14647, Gyeonggi-do, Republic of Korea.
| |
Collapse
|
26
|
Chen YC, Chung JH, Yeh YJ, Lou SJ, Lin HF, Lin CH, Hsien HH, Hung KW, Yeh SCJ, Shi HY. Predicting 30-Day Readmission for Stroke Using Machine Learning Algorithms: A Prospective Cohort Study. Front Neurol 2022; 13:875491. [PMID: 35860493 PMCID: PMC9289395 DOI: 10.3389/fneur.2022.875491] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2022] [Accepted: 06/13/2022] [Indexed: 11/13/2022] Open
Abstract
BackgroundMachine learning algorithms for predicting 30-day stroke readmission are rarely discussed. The aims of this study were to identify significant predictors of 30-day readmission after stroke and to compare prediction accuracy and area under the receiver operating characteristic (AUROC) curve in five models: artificial neural network (ANN), K nearest neighbor (KNN), random forest (RF), support vector machine (SVM), naive Bayes classifier (NBC), and Cox regression (COX) models.MethodsThe subjects of this prospective cohort study were 1,476 patients with a history of admission for stroke to one of six hospitals between March, 2014, and September, 2019. A training dataset (n = 1,033) was used for model development, and a testing dataset (n = 443) was used for internal validation. Another 167 patients with stroke recruited from October, to December, 2019, were enrolled in the dataset for external validation. A feature importance analysis was also performed to identify the significance of the selected input variables.ResultsFor predicting 30-day readmission after stroke, the ANN model had significantly (P < 0.001) higher performance indices compared to the other models. According to the ANN model results, the best predictor of 30-day readmission was PAC followed by nasogastric tube insertion and stroke type (P < 0.05). Using a machine learning ANN model to obtain an accurate estimate of 30-day readmission for stroke and to identify risk factors may improve the precision and efficacy of management for these patients.ConclusionUsing a machine-learning ANN model to obtain an accurate estimate of 30-day readmission for stroke and to identify risk factors may improve the precision and efficacy of management for these patients. For stroke patients who are candidates for PAC rehabilitation, these predictors have practical applications in educating patients in the expected course of recovery and health outcomes.
Collapse
Affiliation(s)
- Yu-Ching Chen
- Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, Kaohsiung, Taiwan
- Department of Public Health, College of Medicine, National Cheng-Kung University, Tainan, Taiwan
| | - Jo-Hsuan Chung
- Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Yu-Jo Yeh
- Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Shi-Jer Lou
- Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, Kaohsiung, Taiwan
- Graduate Institute of Technological and Vocational Education, National Pingtung University of Science and Technology, Pingtung, Taiwan
| | - Hsiu-Fen Lin
- Department of Neurology, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
- Department of Neurology, College of Medicine, Kaohsiung Medical University, Kaohsiung, Taiwan
| | - Ching-Huang Lin
- Division of Neurology, Kaohsiung Veterans General Hospital, Kaohsiung, Taiwan
| | - Hong-Hsi Hsien
- Department of Internal Medicine, St. Joseph Hospital, Kaohsiung, Taiwan
| | - Kuo-Wei Hung
- Division of Neurology, Department of Internal Medicine, Yuan's General Hospital, Kaohsiung, Taiwan
| | - Shu-Chuan Jennifer Yeh
- Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, Kaohsiung, Taiwan
- Department of Business Management, National Sun Yat-Sen University, Kaohsiung, Taiwan
| | - Hon-Yi Shi
- Department of Healthcare Administration and Medical Informatics, Kaohsiung Medical University, Kaohsiung, Taiwan
- Graduate Institute of Technological and Vocational Education, National Pingtung University of Science and Technology, Pingtung, Taiwan
- Department of Business Management, National Sun Yat-Sen University, Kaohsiung, Taiwan
- Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung, Taiwan
- Department of Medical Research, China Medical University Hospital, China Medical University, Taichung, Taiwan
- *Correspondence: Hon-Yi Shi
| |
Collapse
|
27
|
Feng X, Hua Y, Zou J, Jia S, Ji J, Xing Y, Zhou J, Liao J. Intelligible Models for HealthCare: Predicting the Probability of 6-Month Unfavorable Outcome in Patients with Ischemic Stroke. Neuroinformatics 2022; 20:575-585. [PMID: 34435319 DOI: 10.1007/s12021-021-09535-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/04/2021] [Indexed: 12/31/2022]
Abstract
Early prediction of unfavorable outcome after ischemic stroke is significant for clinical management. Machine learning as a novel computational modeling technique could help clinicians to address the challenge. We aim to investigate the applicability of machine learning models for individualized prediction in ischemic stroke patients and demonstrate the utility of various model-agnostic explanation techniques for machine learning predictions. A total of 499 consecutive patients with Unfavorable [modified Rankin Scale (mRS) score 3-6, n = 140] and favorable (mRS score 0-2, n = 359) outcome after 6-month from ischemic stroke were enrolled in this study. Four machine learning models, including Random Forest [RF], eXtreme Gradient Boosting [XGBoost], Adaptive Boosting [Adaboost] and Support Vector Machine [SVM] were performed with the area-under-the-curve (AUC): (90.20 ± 0.22)%, (86.91 ± 1.05)%, (86.49 ± 2.35)%, (81.89 ± 2.40)%, respectively. Three global interpretability techniques (Feature Importance shows the contribution of selected features, Partial Dependence Plot aims to visualize the average effect of a feature on the predicted probability of unfavorable outcome, Feature Interaction detects the change in the prediction that occurs by varying the features after considering the individual feature effects) and one local interpretability technique (Shapley Value indicates the probability of unfavorable outcome of different instances) have been applied to present the interpretability techniques via visualization. Thereby, the current study is important for better understanding intelligible healthcare analytics via explanations for the prediction of local and global levels, and potentially reduction of the mortality of patients with ischemic stroke by assisting clinicians in the decision-making process.
Collapse
Affiliation(s)
- Xiaobing Feng
- School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Yingrong Hua
- School of Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Jianjun Zou
- Department of Clinical Pharmacology, Nanjing First Hospital, Nanjing Medical University, Nanjing, China
| | - Shuopeng Jia
- School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Jiatong Ji
- School of Basic Medicine and Clinical Pharmacy, China Pharmaceutical University, Nanjing, China
| | - Yan Xing
- School of Science, China Pharmaceutical University, #639 Longmian Avenue, Jiangning District, Nanjing, 211198, China
| | - Junshan Zhou
- Department of Neurology, Nanjing First Hospital, Nanjing, China
| | - Jun Liao
- School of Science, China Pharmaceutical University, #639 Longmian Avenue, Jiangning District, Nanjing, 211198, China.
| |
Collapse
|
28
|
Mao Y, Huang Y, Xu L, Liang J, Lin W, Huang H, Li L, Wen J, Chen G. Surgical Methods and Social Factors Are Associated With Long-Term Survival in Follicular Thyroid Carcinoma: Construction and Validation of a Prognostic Model Based on Machine Learning Algorithms. Front Oncol 2022; 12:816427. [PMID: 35800057 PMCID: PMC9253987 DOI: 10.3389/fonc.2022.816427] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2021] [Accepted: 05/19/2022] [Indexed: 11/13/2022] Open
Abstract
BackgroundThis study aimed to establish and verify an effective machine learning (ML) model to predict the prognosis of follicular thyroid cancer (FTC), and compare it with the eighth edition of the American Joint Committee on Cancer (AJCC) model.MethodsKaplan-Meier method and Cox regression model were used to analyze the risk factors of cancer-specific survival (CSS). Propensity-score matching (PSM) was used to adjust the confounding factors of different surgeries. Nine different ML algorithms,including eXtreme Gradient Boosting (XGBoost), Light Gradient Boosting Machine (LightGBM), Random Forests (RF), Logistic Regression (LR), Adaptive Boosting (AdaBoost), Gaussian Naive Bayes (GaussianNB), K-Nearest Neighbor (KNN), Support Vector Machine (SVM) and Multi-Layer Perceptron (MLP),were used to build prognostic models of FTC.10-fold cross-validation and SHapley Additive exPlanations were used to train and visualize the optimal ML model.The AJCC model was built by multivariate Cox regression and visualized through nomogram. The performance of the XGBoost model and AJCC model was mainly assessed using the area under the receiver operating characteristic (AUROC).ResultsMultivariate Cox regression showed that age, surgical methods, marital status, T classification, N classification and M classification were independent risk factors of CSS. Among different surgeries, the prognosis of one-sided thyroid lobectomy plus isthmectomy (LO plus IO) was the best, followed by total thyroidectomy (hazard ratios: One-sided thyroid LO plus IO, 0.086[95% confidence interval (CI),0.025-0.290], P<0.001; total thyroidectomy (TT), 0.490[95%CI,0.295-0.814], P=0.006). PSM analysis proved that one-sided thyroid LO plus IO, TT, and partial thyroidectomy had no significant differences in long-term prognosis. Our study also revealed that married patients had better prognosis than single, widowed and separated patients (hazard ratios: single, 1.686[95%CI,1.146-2.479], P=0.008; widowed, 1.671[95%CI,1.163-2.402], P=0.006; separated, 4.306[95%CI,2.039-9.093], P<0.001). Among different ML algorithms, the XGBoost model had the best performance, followed by Gaussian NB, RF, LR, MLP, LightGBM, AdaBoost, KNN and SVM. In predicting FTC prognosis, the predictive performance of the XGBoost model was relatively better than the AJCC model (AUROC: 0.886 vs. 0.814).ConclusionFor high-risk groups, effective surgical methods and well marital status can improve the prognosis of FTC. Compared with the traditional AJCC model, the XGBoost model has relatively better prediction accuracy and clinical usage.
Collapse
Affiliation(s)
- Yaqian Mao
- Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
- Department of Endocrinology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
| | - Yanling Huang
- Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
- Department of Endocrinology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
| | - Lizhen Xu
- Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
- Department of Endocrinology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
| | - Jixing Liang
- Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
- Department of Endocrinology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
| | - Wei Lin
- Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
- Department of Endocrinology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
| | - Huibin Huang
- Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
- Department of Endocrinology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
| | - Liantao Li
- Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
- Department of Endocrinology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
| | - Junping Wen
- Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
- Department of Endocrinology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
| | - Gang Chen
- Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
- Department of Endocrinology, Fujian Provincial Hospital, Shengli Clinical Medical College of Fujian Medical University, Fuzhou, China
- Fujian Provincial Key Laboratory of Medical Analysis, Fujian Academy of Medical, Fuzhou, China
- *Correspondence: Gang Chen,
| |
Collapse
|
29
|
Wang X, Zhao X, Song G, Niu J, Xu T. Machine Learning-Based Evaluation on Craniodentofacial Morphological Harmony of Patients After Orthodontic Treatment. Front Physiol 2022; 13:862847. [PMID: 35615666 PMCID: PMC9124867 DOI: 10.3389/fphys.2022.862847] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2022] [Accepted: 04/01/2022] [Indexed: 11/13/2022] Open
Abstract
Objectives: Machine learning is increasingly being used in the medical field. Based on machine learning models, the present study aims to improve the prediction performance of craniodentofacial morphological harmony judgment after orthodontic treatment and to determine the most significant factors. Methods: A dataset of 180 subjects was randomly selected from a large sample of 3,706 finished orthodontic cases from six top orthodontic treatment centers around China. Thirteen algorithms were used to predict the value of the cephalometric morphological harmony score of each subject and to search for the optimal model. Based on the feature importance ranking and by removing features, the regression models of machine learning (including the Adaboost, ExtraTree, XGBoost, and linear regression models) were used to predict and compare the score of harmony for each subject from the dataset with cross validations. By analyzing the prediction values, the most optimal model and the most significant cephalometric characteristics were determined. Results: When nine features were included, the performance of the XGBoost regression model was MAE = 0.267, RMSE = 0.341, and Pearson correlation coefficient = 0.683, which indicated that the XGBoost regression model exhibited the best fitting and predicting performance for craniodentofacial morphological harmony judgment. Nine cephalometric features including L1/NB (inclination of the lower central incisors), ANB (sagittal position between the maxilla and mandible), LL-EP (distance from the point of the prominence of the lower lip to the aesthetic plane), SN/OP (inclination of the occlusal plane), SNB (sagittal position of the mandible in relation to the cranial base), U1/SN (inclination of the upper incisors to the cranial base), L1-NB (protrusion of the lower central incisors), Ns-Prn-Pos (nasal protrusion), and U1/L1 (relationship between the protrusions of the upper and lower central incisors) were revealed to significantly influence the judgment. Conclusion: The application of the XGBoost regression model enhanced the predictive ability regarding the craniodentofacial morphological harmony evaluation by experts after orthodontic treatment. Teeth position, teeth alignment, jaw position, and soft tissue morphology would be the most significant factors influencing the judgment. The methodology also provided guidance for the application of machine learning models to resolve medical problems characterized by limited sample size.
Collapse
Affiliation(s)
- Xin Wang
- Department of Orthodontics, Peking University School and Hospital of Stomatology, Beijing, China
| | - Xiaoke Zhao
- State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China
- Beijing Advanced Innovation Center for Big Data and Brain Computing (BDBC), Beihang University, Beijing, China
- Hangzhou Innovation Research Institute, Beihang University, Beijing, China
| | - Guangying Song
- Department of Orthodontics, Peking University School and Hospital of Stomatology, Beijing, China
- NHC Research Center of Engineering and Technology for Computerized Dentistry, Beijing, China
- *Correspondence: Guangying Song, ; Tianmin Xu,
| | - Jianwei Niu
- State Key Laboratory of Virtual Reality Technology and Systems, School of Computer Science and Engineering, Beihang University, Beijing, China
- Beijing Advanced Innovation Center for Big Data and Brain Computing (BDBC), Beihang University, Beijing, China
- Hangzhou Innovation Research Institute, Beihang University, Beijing, China
| | - Tianmin Xu
- Department of Orthodontics, Peking University School and Hospital of Stomatology, Beijing, China
- NHC Research Center of Engineering and Technology for Computerized Dentistry, Beijing, China
- *Correspondence: Guangying Song, ; Tianmin Xu,
| |
Collapse
|
30
|
Huang YC, Cheng YC, Jhou MJ, Chen M, Lu CJ. Important Risk Factors in Patients with Nonvalvular Atrial Fibrillation Taking Dabigatran Using Integrated Machine Learning Scheme-A Post Hoc Analysis. J Pers Med 2022; 12:756. [PMID: 35629177 PMCID: PMC9146635 DOI: 10.3390/jpm12050756] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Revised: 04/29/2022] [Accepted: 05/03/2022] [Indexed: 02/06/2023] Open
Abstract
Our study aims to develop an effective integrated machine learning (ML) scheme to predict vascular events and bleeding in patients with nonvalvular atrial fibrillation taking dabigatran and identify important risk factors. This study is a post-hoc analysis from the Randomized Evaluation of Long-Term Anticoagulant Therapy trial database. One traditional prediction method, logistic regression (LGR), and four ML techniques-naive Bayes, random forest (RF), classification and regression tree, and extreme gradient boosting (XGBoost)-were combined to construct our scheme. Area under the receiver operating characteristic curve (AUC) of RF (0.780) and XGBoost (0.717) was higher than that of LGR (0.674) in predicting vascular events. In predicting bleeding, AUC of RF (0.684) and XGBoost (0.618) showed higher values than those generated by LGR (0.605). Our integrated ML feature selection scheme based on the two convincing prediction techniques identified age, history of congestive heart failure and myocardial infarction, smoking, kidney function, and body mass index as major variables of vascular events; age, kidney function, smoking, bleeding history, concomitant use of specific drugs, and dabigatran dosage as major variables of bleeding. ML is an effective data analysis algorithm for solving complex medical data. Our results may provide preliminary direction for precision medicine.
Collapse
Affiliation(s)
- Yung-Chuan Huang
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan; (Y.-C.H.); (M.-J.J.); (M.C.)
- Department of Neurology, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City 24352, Taiwan;
| | - Yu-Chen Cheng
- Department of Neurology, Fu Jen Catholic University Hospital, Fu Jen Catholic University, New Taipei City 24352, Taiwan;
| | - Mao-Jhen Jhou
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan; (Y.-C.H.); (M.-J.J.); (M.C.)
| | - Mingchih Chen
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan; (Y.-C.H.); (M.-J.J.); (M.C.)
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 242062, Taiwan
| | - Chi-Jie Lu
- Graduate Institute of Business Administration, Fu Jen Catholic University, New Taipei City 242062, Taiwan; (Y.-C.H.); (M.-J.J.); (M.C.)
- Artificial Intelligence Development Center, Fu Jen Catholic University, New Taipei City 242062, Taiwan
- Department of Information Management, Fu Jen Catholic University, New Taipei City 242062, Taiwan
| |
Collapse
|
31
|
Huang X, Cao T, Chen L, Li J, Tan Z, Xu B, Xu R, Song Y, Zhou Z, Wang Z, Wei Y, Zhang Y, Li J, Huo Y, Qin X, Wu Y, Wang X, Wang H, Cheng X, Xu X, Liu L. Novel Insights on Establishing Machine Learning-Based Stroke Prediction Models Among Hypertensive Adults. Front Cardiovasc Med 2022; 9:901240. [PMID: 35600480 PMCID: PMC9120532 DOI: 10.3389/fcvm.2022.901240] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Accepted: 04/05/2022] [Indexed: 11/13/2022] Open
Abstract
Background Stroke is a major global health burden, and risk prediction is essential for the primary prevention of stroke. However, uncertainty remains about the optimal prediction model for analyzing stroke risk. In this study, we aim to determine the most effective stroke prediction method in a Chinese hypertensive population using machine learning and establish a general methodological pipeline for future analysis. Methods The training set included 70% of data (n = 14,491) from the China Stroke Primary Prevention Trial (CSPPT). Internal validation was processed with the rest 30% of CSPPT data (n = 6,211), and external validation was conducted using a nested case–control (NCC) dataset (n = 2,568). The primary outcome was the first stroke. Four received analysis methods were processed and compared: logistic regression (LR), stepwise logistic regression (SLR), extreme gradient boosting (XGBoost), and random forest (RF). Population characteristic data with inclusion and exclusion of laboratory variables were separately analyzed. Accuracy, sensitivity, specificity, kappa, and area under receiver operating characteristic curves (AUCs) were used to make model assessments with AUCs the top concern. Data balancing techniques, including random under-sampling (RUS) and synthetic minority over-sampling technique (SMOTE), were applied to process this unbalanced training set. Results The best model performance was observed in RUS-applied RF model with laboratory variables. Compared with null models (sensitivity = 0, specificity = 100, and mean AUCs = 0.643), data balancing techniques improved overall performance with RUS, demonstrating a more satisfactory effect in the current study (RUS: sensitivity = 63.9; specificity = 53.7; and mean AUCs = 0.624. Adding laboratory variables improved the performance of analysis methods. All results were reconfirmed in validation sets. The top 10 important variables were determined by the analysis method with the best performance. Conclusion Among the tested methods, the most effective stroke prediction model in targeted population is RUS-applied RF. From the insights, the current study revealed, we provided general frameworks for building machine learning-based prediction models.
Collapse
Affiliation(s)
- Xiao Huang
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
- *Correspondence: Xiao Huang
| | - Tianyu Cao
- Biological Anthropology, University of California, Santa Barbara, Santa Barbara, CA, United States
| | - Liangziqian Chen
- Department of Data Management, Shenzhen Evergreen Medical Institute, Shenzhen, China
| | - Junpei Li
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Ziheng Tan
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Benjamin Xu
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, United States
| | - Richard Xu
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, United States
| | - Yun Song
- Department of Data Management, Shenzhen Evergreen Medical Institute, Shenzhen, China
- Institute of Biomedicine, Anhui Medical University, Hefei, China
| | - Ziyi Zhou
- Department of Biomedical Engineering, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China
| | - Zhuo Wang
- Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China
| | - Yaping Wei
- Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China
| | - Yan Zhang
- Department of Cardiology, Peking University First Hospital, Beijing, China
| | - Jianping Li
- Department of Cardiology, Peking University First Hospital, Beijing, China
| | - Yong Huo
- Department of Cardiology, Peking University First Hospital, Beijing, China
| | - Xianhui Qin
- National Clinical Research Study Center for Kidney Disease, The State Key Laboratory for Organ Failure Research, Renal Division, Nanfang Hospital, Southern Medical University, Guangzhou, China
| | - Yanqing Wu
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Xiaobin Wang
- Department of Population, Family and Reproductive Health, Johns Hopkins University Bloomberg School of Public Health, Baltimore, MD, United States
| | - Hong Wang
- Department of Cardiovascular Science, Temple University Lewis Katz School of Medicine, Philadelphia, PA, United States
| | - Xiaoshu Cheng
- Department of Cardiology, The Second Affiliated Hospital of Nanchang University, Nanchang, China
| | - Xiping Xu
- Key Laboratory of Precision Nutrition and Food Quality, Ministry of Education, Department of Nutrition and Health, College of Food Sciences and Nutritional Engineering, China Agricultural University, Beijing, China
| | - Lishun Liu
- Department of Data Management, Shenzhen Evergreen Medical Institute, Shenzhen, China
- Department of Biomedical Engineering, Graduate School at Shenzhen, Tsinghua University, Shenzhen, China
- Lishun Liu
| |
Collapse
|
32
|
Heo J, Yoo J, Lee H, Lee IH, Kim JS, Park E, Kim YD, Nam HS. Prediction of Hidden Coronary Artery Disease Using Machine Learning in Patients With Acute Ischemic Stroke. Neurology 2022; 99:e55-e65. [PMID: 35470135 DOI: 10.1212/wnl.0000000000200576] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2021] [Accepted: 03/02/2022] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND AND OBJECTIVES A machine learning technique for identifying hidden coronary artery disease (CAD) might be useful. We developed and validated machine learning models to predict patients with hidden CAD and assess long-term outcomes in patients with acute ischemic stroke. METHODS Multidetector coronary computed tomography was performed for patients without known history of CAD. Primary outcomes were defined as having any degree of CAD and having obstructive CAD (≥50% stenosis). Demographic variables, risk factors, laboratory results, Trial of ORG 10172 in Acute Stroke Treatment (TOAST) classification, NIH Stroke Scale score, blood pressure, and carotid artery stenosis were used to develop and validate machine learning models to predict CAD. Area under the receiver operating characteristic curves (AUC) was calculated for performance analysis, and Kaplan-Meier and Cox survival analyses of long-term outcomes were performed. Major adverse cardiovascular events (MACE) were defined as ischemic stroke, myocardial infarction, unstable angina, urgent coronary revascularization, and cardiovascular mortality. RESULTS Overall, 1,710 patients were included for the training dataset and 348 patients for the validation dataset. An Extreme Gradient Boosting model was developed to predict any degree of CAD, which showed an AUC of 0.763 (95% CI 0.711-0.814) on validation. A logistic regression model was used to predict obstructive CAD and had an AUC of 0.714 (95% CI 0.692-0.799). During the first 5 years of follow-up, MACE occurred more frequently when predicted of any CAD (P = 0.022) or obstructive CAD (P < 0.001). Cox proportional analysis showed that the hazard ratio of MACE was 1.5 (95% CI 1.1-2.2; P = 0.016) when predicted of any CAD, whereas it was 1.9 (95% CI 1.3-2.6; P < 0.001) for obstructive CAD. DISCUSSION We demonstrated that machine learning may help identify hidden CAD in patients with acute ischemic stroke. Long-term outcomes were also associated with prediction results. CLASSIFICATION OF EVIDENCE This study provides Class II evidence that in patients with acute ischemic stroke with CAD risk factors but no known history of CAD, a machine learning model predicts CAD on multidetector coronary computed tomography with an AUC of 0.763 (95% CI 0.711-0.814).
Collapse
Affiliation(s)
- JoonNyung Heo
- Department of Neurology, Yonsei University College of Medicine, Seoul, Korea
| | - Joonsang Yoo
- Department of Neurology, Yonsei University College of Medicine, Yongin Severance Hospital, Yongin, Korea
| | - Hyungwoo Lee
- Department of Neurology, Yonsei University College of Medicine, Seoul, Korea
| | - Il Hyung Lee
- Department of Neurology, Yonsei University College of Medicine, Seoul, Korea
| | - Jung-Sun Kim
- Division of Cardiology, Department of Internal Medicine, Yonsei University College of Medicine, Seoul, Korea
| | - Eunjeong Park
- Integrative Research Center for Cerebrovascular and Cardiovascular Diseases, Seoul, Korea
| | - Young Dae Kim
- Department of Neurology, Yonsei University College of Medicine, Seoul, Korea
| | - Hyo Suk Nam
- Department of Neurology, Yonsei University College of Medicine, Seoul, Korea
| |
Collapse
|
33
|
Sung SF, Hsieh CY, Hu YH. Early Prediction of Functional Outcomes After Acute Ischemic Stroke Using Unstructured Clinical Text: Retrospective Cohort Study. JMIR Med Inform 2022; 10:e29806. [PMID: 35175201 PMCID: PMC8895286 DOI: 10.2196/29806] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2021] [Revised: 07/17/2021] [Accepted: 01/02/2022] [Indexed: 02/06/2023] Open
Abstract
Background Several prognostic scores have been proposed to predict functional outcomes after an acute ischemic stroke (AIS). Most of these scores are based on structured information and have been used to develop prediction models via the logistic regression method. With the increased use of electronic health records and the progress in computational power, data-driven predictive modeling by using machine learning techniques is gaining popularity in clinical decision-making. Objective We aimed to investigate whether machine learning models created by using unstructured text could improve the prediction of functional outcomes at an early stage after AIS. Methods We identified all consecutive patients who were hospitalized for the first time for AIS from October 2007 to December 2019 by using a hospital stroke registry. The study population was randomly split into a training (n=2885) and test set (n=962). Free text in histories of present illness and computed tomography reports was transformed into input variables via natural language processing. Models were trained by using the extreme gradient boosting technique to predict a poor functional outcome at 90 days poststroke. Model performance on the test set was evaluated by using the area under the receiver operating characteristic curve (AUC). Results The AUCs of text-only models ranged from 0.768 to 0.807 and were comparable to that of the model using National Institutes of Health Stroke Scale (NIHSS) scores (0.811). Models using both patient age and text achieved AUCs of 0.823 and 0.825, which were similar to those of the model containing age and NIHSS scores (0.841); the model containing preadmission comorbidities, level of consciousness, age, and neurological deficit (PLAN) scores (0.837); and the model containing Acute Stroke Registry and Analysis of Lausanne (ASTRAL) scores (0.840). Adding variables from clinical text improved the predictive performance of the model containing age and NIHSS scores, the model containing PLAN scores, and the model containing ASTRAL scores (the AUC increased from 0.841 to 0.861, from 0.837 to 0.856, and from 0.840 to 0.860, respectively). Conclusions Unstructured clinical text can be used to improve the performance of existing models for predicting poststroke functional outcomes. However, considering the different terminologies that are used across health systems, each individual health system may consider using the proposed methods to develop and validate its own models.
Collapse
Affiliation(s)
- Sheng-Feng Sung
- Division of Neurology, Department of Internal Medicine, Ditmanson Medical Foundation Chia-Yi Christian Hospital, Chiayi City, Taiwan.,Department of Nursing, Min-Hwei Junior College of Health Care Management, Tainan, Taiwan
| | - Cheng-Yang Hsieh
- Department of Neurology, Tainan Sin Lau Hospital, Tainan, Taiwan
| | - Ya-Han Hu
- Department of Information Management, National Central University, Taoyuan City, Taiwan
| |
Collapse
|
34
|
Artificial Intelligence: A Shifting Paradigm in Cardio-Cerebrovascular Medicine. J Clin Med 2021; 10:jcm10235710. [PMID: 34884412 PMCID: PMC8658222 DOI: 10.3390/jcm10235710] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2021] [Accepted: 12/02/2021] [Indexed: 12/21/2022] Open
Abstract
The future of healthcare is an organic blend of technology, innovation, and human connection. As artificial intelligence (AI) is gradually becoming a go-to technology in healthcare to improve efficiency and outcomes, we must understand our limitations. We should realize that our goal is not only to provide faster and more efficient care, but also to deliver an integrated solution to ensure that the care is fair and not biased to a group of sub-population. In this context, the field of cardio-cerebrovascular diseases, which encompasses a wide range of conditions-from heart failure to stroke-has made some advances to provide assistive tools to care providers. This article aimed to provide an overall thematic review of recent development focusing on various AI applications in cardio-cerebrovascular diseases to identify gaps and potential areas of improvement. If well designed, technological engines have the potential to improve healthcare access and equitability while reducing overall costs, diagnostic errors, and disparity in a system that affects patients and providers and strives for efficiency.
Collapse
|
35
|
Park D, Jeong E, Kim H, Pyun HW, Kim H, Choi YJ, Kim Y, Jin S, Hong D, Lee DW, Lee SY, Kim MC. Machine Learning-Based Three-Month Outcome Prediction in Acute Ischemic Stroke: A Single Cerebrovascular-Specialty Hospital Study in South Korea. Diagnostics (Basel) 2021; 11:diagnostics11101909. [PMID: 34679606 PMCID: PMC8534707 DOI: 10.3390/diagnostics11101909] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 10/07/2021] [Accepted: 10/13/2021] [Indexed: 01/02/2023] Open
Abstract
Background: Functional outcomes after acute ischemic stroke are of great concern to patients and their families, as well as physicians and surgeons who make the clinical decisions. We developed machine learning (ML)-based functional outcome prediction models in acute ischemic stroke. Methods: This retrospective study used a prospective cohort database. A total of 1066 patients with acute ischemic stroke between January 2019 and March 2021 were included. Variables such as demographic factors, stroke-related factors, laboratory findings, and comorbidities were utilized at the time of admission. Five ML algorithms were applied to predict a favorable functional outcome (modified Rankin Scale 0 or 1) at 3 months after stroke onset. Results: Regularized logistic regression showed the best performance with an area under the receiver operating characteristic curve (AUC) of 0.86. Support vector machines represented the second-highest AUC of 0.85 with the highest F1-score of 0.86, and finally, all ML models applied achieved an AUC > 0.8. The National Institute of Health Stroke Scale at admission and age were consistently the top two important variables for generalized logistic regression, random forest, and extreme gradient boosting models. Conclusions: ML-based functional outcome prediction models for acute ischemic stroke were validated and proven to be readily applicable and useful.
Collapse
Affiliation(s)
- Dougho Park
- Department of Rehabilitation Medicine, Pohang Stroke and Spine Hospital, Pohang 37659, Korea;
| | - Eunhwan Jeong
- Department of Neurology, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (E.J.); (H.K.)
| | - Haejong Kim
- Department of Neurology, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (E.J.); (H.K.)
| | - Hae Wook Pyun
- Department of Radiology, Pohang Stroke and Spine Hospital, Pohang 37659, Korea;
| | - Haemin Kim
- Department of Neurosurgery, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (H.K.); (Y.-J.C.); (Y.K.); (S.J.); (D.H.); (D.W.L.)
| | - Yeon-Ju Choi
- Department of Neurosurgery, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (H.K.); (Y.-J.C.); (Y.K.); (S.J.); (D.H.); (D.W.L.)
| | - Youngsoo Kim
- Department of Neurosurgery, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (H.K.); (Y.-J.C.); (Y.K.); (S.J.); (D.H.); (D.W.L.)
| | - Suntak Jin
- Department of Neurosurgery, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (H.K.); (Y.-J.C.); (Y.K.); (S.J.); (D.H.); (D.W.L.)
| | - Daeyoung Hong
- Department of Neurosurgery, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (H.K.); (Y.-J.C.); (Y.K.); (S.J.); (D.H.); (D.W.L.)
| | - Dong Woo Lee
- Department of Neurosurgery, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (H.K.); (Y.-J.C.); (Y.K.); (S.J.); (D.H.); (D.W.L.)
| | - Su Yun Lee
- Department of Neurology, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (E.J.); (H.K.)
- Correspondence: (S.Y.L.); (M.-C.K.)
| | - Mun-Chul Kim
- Department of Neurosurgery, Pohang Stroke and Spine Hospital, Pohang 37659, Korea; (H.K.); (Y.-J.C.); (Y.K.); (S.J.); (D.H.); (D.W.L.)
- Correspondence: (S.Y.L.); (M.-C.K.)
| |
Collapse
|
36
|
Agarwal S, Sharma S, Kumar M, Venishetty S, Bhardwaj A, Kaushal K, Gopi S, Mohta S, Gunjan D, Saraya A, Sarin SK. Development of a machine learning model to predict bleed in esophageal varices in compensated advanced chronic liver disease: A proof of concept. J Gastroenterol Hepatol 2021; 36:2935-2942. [PMID: 34050561 DOI: 10.1111/jgh.15560] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/23/2021] [Revised: 05/19/2021] [Accepted: 05/26/2021] [Indexed: 12/12/2022]
Abstract
BACKGROUND AND AIM Risk stratification beyond the endoscopic classification of esophageal varices (EVs) to predict first episode of variceal bleeding (VB) is currently limited in patients with compensated advanced chronic liver disease (cACLD). We aimed to assess if machine learning (ML) could be used for predicting future VB more accurately. METHODS In this retrospective analysis, data from patients of cACLD with EVs, laboratory parameters and liver stiffness measurement (LSM) were used to generate an extreme-gradient boosting (XGBoost) algorithm to predict the risk of VB. The performance characteristics of ML and endoscopic classification were compared in internal and external validation cohorts. Bleeding rates were estimated in subgroups identified upon risk stratification with combination of model and endoscopic classification. RESULTS Eight hundred twenty-eight patients of cACLD with EVs, predominantly related to non-alcoholic fatty liver disease (28.6%), alcohol (23.7%) and hepatitis B (23.1%) were included, with 455 (55%) having the high-risk varices. Over a median follow-up of 24 (12-43) months, 163 patients developed VB. The accuracy of machine learning (ML) based model to predict future VB was 98.7 (97.4-99.5)%, 93.7 (88.8-97.2)%, and 85.7 (82.1-90.5)% in derivation (n = 497), internal validation (n = 149), and external validation (n = 182) cohorts, respectively, which was better than endoscopic classification [58.9 (55.5-62.3)%] alone. Patients stratified high risk on both endoscopy and model had 1-year and 3-year bleeding rates of 31-43% and 64-85%, respectively, whereas those stratified as low risk on both had 1-year and 3-year bleeding rates of 0-1.6% and 0-3.4%, respectively. Endoscopic classification and LSM were the major determinants of model's performance. CONCLUSION Application of ML model improved the performance of endoscopic stratification to predict VB in patients with cACLD with EVs.
Collapse
Affiliation(s)
- Samagra Agarwal
- Department of Gastroenterology and Human Nutrition Unit, All India Institute of Medical Sciences, New Delhi, India
| | - Sanchit Sharma
- Department of Gastroenterology and Human Nutrition Unit, All India Institute of Medical Sciences, New Delhi, India
| | - Manoj Kumar
- Department of Hepatology and Liver Transplantation, Institute of Liver and Biliary Sciences, New Delhi, India
| | - Shantan Venishetty
- Department of Hepatology and Liver Transplantation, Institute of Liver and Biliary Sciences, New Delhi, India
| | - Ankit Bhardwaj
- Department of Hepatology and Liver Transplantation, Institute of Liver and Biliary Sciences, New Delhi, India
| | - Kanav Kaushal
- Department of Gastroenterology and Human Nutrition Unit, All India Institute of Medical Sciences, New Delhi, India
| | - Srikanth Gopi
- Department of Gastroenterology and Human Nutrition Unit, All India Institute of Medical Sciences, New Delhi, India
| | - Srikant Mohta
- Department of Gastroenterology and Human Nutrition Unit, All India Institute of Medical Sciences, New Delhi, India
| | - Deepak Gunjan
- Department of Gastroenterology and Human Nutrition Unit, All India Institute of Medical Sciences, New Delhi, India
| | - Anoop Saraya
- Department of Gastroenterology and Human Nutrition Unit, All India Institute of Medical Sciences, New Delhi, India
| | - Shiv Kumar Sarin
- Department of Hepatology and Liver Transplantation, Institute of Liver and Biliary Sciences, New Delhi, India
| |
Collapse
|
37
|
Shi R, Xu X, Li J, Li Y. Prediction and analysis of train arrival delay based on XGBoost and Bayesian optimization. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2021.107538] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
38
|
Machine learning-based approach for disease severity classification of carpal tunnel syndrome. Sci Rep 2021; 11:17464. [PMID: 34465860 PMCID: PMC8408248 DOI: 10.1038/s41598-021-97043-7] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2021] [Accepted: 08/12/2021] [Indexed: 12/23/2022] Open
Abstract
Identifying the severity of carpal tunnel syndrome (CTS) is essential to providing appropriate therapeutic interventions. We developed and validated machine-learning (ML) models for classifying CTS severity. Here, 1037 CTS hands with 11 variables each were retrospectively analyzed. CTS was confirmed using electrodiagnosis, and its severity was classified into three grades: mild, moderate, and severe. The dataset was randomly split into a training (70%) and test (30%) set. A total of 507 mild, 276 moderate, and 254 severe CTS hands were included. Extreme gradient boosting (XGB) showed the highest external validation accuracy in the multi-class classification at 76.6% (95% confidence interval [CI] 71.2–81.5). XGB also had an optimal model training accuracy of 76.1%. Random forest (RF) and k-nearest neighbors had the second-highest external validation accuracy of 75.6% (95% CI 70.0–80.5). For the RF and XGB models, the numeric rating scale of pain was the most important variable, and body mass index was the second most important. The one-versus-rest classification yielded improved external validation accuracies for each severity grade compared with the multi-class classification (mild, 83.6%; moderate, 78.8%; severe, 90.9%). The CTS severity classification based on the ML model was validated and is readily applicable to aiding clinical evaluations.
Collapse
|
39
|
Wei L, Cao Y, Zhang K, Xu Y, Zhou X, Meng J, Shen A, Ni J, Yao J, Shi L, Zhang Q, Wang P. Prediction of Progression to Severe Stroke in Initially Diagnosed Anterior Circulation Ischemic Cerebral Infarction. Front Neurol 2021; 12:652757. [PMID: 34220671 PMCID: PMC8249916 DOI: 10.3389/fneur.2021.652757] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2021] [Accepted: 05/10/2021] [Indexed: 11/24/2022] Open
Abstract
Purpose: Accurate prediction of the progression to severe stroke in initially diagnosed nonsevere patients with acute-subacute anterior circulation nonlacuna ischemic infarction (ASACNLII) is important in making clinical decision. This study aimed to apply a machine learning method to predict if the initially diagnosed nonsevere patients with ASACNLII would progress to severe stroke by using diffusion-weighted images and clinical information on admission. Methods: This retrospective study enrolled 344 patients with ASACNLII from June 2017 to August 2020 on admission, and 108 cases progressed to severe stroke during hospitalization within 3-21 days. The entire data were randomized into a training set (n = 271) and an independent test set (n = 73). A U-Net neural network was employed for automatic segmentation and volume measurement of the ischemic lesions. Predictive models were developed and used for evaluating the progression to severe stroke using different feature sets (the volume data, the clinical data, and the combination) and machine learning methods (random forest, support vector machine, and logistic regression). Results: The U-Net showed high correlation with manual segmentation in terms of Dice coefficient of 0.806 and R 2 value of the volume measurements of 0.960 in the test set. The random forest classifier of the volume + clinical combination achieved the best area under the receiver operating characteristic curve of 0.8358 (95% CI 0.7321-0.9269), and the accuracy, sensitivity, and specificity were 0.7780 (0.7397-0.7945), 0.7695 (0.6102-0.9074), and 0.8686 (0.6923-1.0), respectively. The Shapley additive explanation diagram showed the volume variable as the most important predictor. Conclusion: The U-Net was fully automatic and showed a high correlation with manual segmentation. An integrated approach combining clinical variables and stroke lesion volumes that were derived from the advanced machine learning algorithms had high accuracy in predicting the progression to severe stroke in ASACNLII patients.
Collapse
Affiliation(s)
- Lai Wei
- Department of Radiology, Tongji Hospital, Tongji University, Shanghai, China
| | - Yidi Cao
- Shanghai Key Laboratory of Artificial Intelligence for Medical Image and Knowledge Graph, Shanghai, China
- Institute of Healthcare Research, Shanghai, China
| | - Kangwei Zhang
- Department of Radiology, Tongji Hospital, Tongji University, Shanghai, China
| | - Yun Xu
- Department of Radiology, Tongji Hospital, Tongji University, Shanghai, China
| | - Xiang Zhou
- Department of Radiology, Tongji Hospital, Tongji University, Shanghai, China
| | - Jinxi Meng
- Department of Radiology, Tongji Hospital, Tongji University, Shanghai, China
| | - Aijun Shen
- Department of Radiology, Tongji Hospital, Tongji University, Shanghai, China
| | - Jiong Ni
- Department of Radiology, Tongji Hospital, Tongji University, Shanghai, China
| | - Jing Yao
- Department of Radiology, Tongji Hospital, Tongji University, Shanghai, China
| | - Lei Shi
- Shanghai Key Laboratory of Artificial Intelligence for Medical Image and Knowledge Graph, Shanghai, China
- Institute of Healthcare Research, Shanghai, China
| | - Qi Zhang
- Shanghai Key Laboratory of Artificial Intelligence for Medical Image and Knowledge Graph, Shanghai, China
- Institute of Healthcare Research, Shanghai, China
- Shanghai Institute for Advanced Communication and Data Science/School of Communication and Information Engineering, Shanghai University, Shanghai, China
| | - Peijun Wang
- Department of Radiology, Tongji Hospital, Tongji University, Shanghai, China
| |
Collapse
|
40
|
Zhang Z, Qiu H, Li W, Chen Y. A stacking-based model for predicting 30-day all-cause hospital readmissions of patients with acute myocardial infarction. BMC Med Inform Decis Mak 2020; 20:335. [PMID: 33317534 PMCID: PMC7734833 DOI: 10.1186/s12911-020-01358-w] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 11/30/2020] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Acute myocardial infarction (AMI) is a serious cardiovascular disease, followed by a high readmission rate within 30-days of discharge. Accurate prediction of AMI readmission is a crucial way to identify the high-risk group and optimize the distribution of medical resources. METHODS In this study, we propose a stacking-based model to predict the risk of 30-day unplanned all-cause hospital readmissions for AMI patients based on clinical data. Firstly, we conducted an under-sampling method of neighborhood cleaning rule (NCR) to alleviate the class imbalance and then utilized a feature selection method of SelectFromModel (SFM) to select effective features. Secondly, we adopted a self-adaptive approach to select base classifiers from eight candidate models according to their performances in datasets. Finally, we constructed a three-layer stacking model in which layer 1 and layer 2 were base-layer and level 3 was meta-layer. The predictions of the base-layer were used to train the meta-layer in order to make the final forecast. RESULTS The results show that the proposed model exhibits the highest AUC (0.720), which is higher than that of decision tree (0.681), support vector machine (0.707), random forest (0.701), extra trees (0.709), adaBoost (0.702), bootstrap aggregating (0.704), gradient boosting decision tree (0.710) and extreme gradient enhancement (0.713). CONCLUSION It is evident that our model could effectively predict the risk of 30-day all cause hospital readmissions for AMI patients and provide decision support for the administration.
Collapse
Affiliation(s)
- Zhen Zhang
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, No.2006, Xiyuan Ave, West Hi-Tech Zone, 611731, Chengdu, Sichuan, PR China.,Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, China
| | - Hang Qiu
- School of Computer Science and Engineering, University of Electronic Science and Technology of China, No.2006, Xiyuan Ave, West Hi-Tech Zone, 611731, Chengdu, Sichuan, PR China. .,Big Data Research Center, University of Electronic Science and Technology of China, Chengdu, China.
| | - Weihao Li
- Cardiology Division, West China Hospital, Sichuan University, No.17 People's South Road,Chengdu, 610041, Chengdu, Sichuan, PR China.,West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China
| | - Yucheng Chen
- Cardiology Division, West China Hospital, Sichuan University, No.17 People's South Road,Chengdu, 610041, Chengdu, Sichuan, PR China. .,West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China.
| |
Collapse
|
41
|
Kawakita S, Beaumont JL, Jucaud V, Everly MJ. Personalized prediction of delayed graft function for recipients of deceased donor kidney transplants with machine learning. Sci Rep 2020; 10:18409. [PMID: 33110142 PMCID: PMC7591492 DOI: 10.1038/s41598-020-75473-z] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2019] [Accepted: 10/15/2020] [Indexed: 02/06/2023] Open
Abstract
Machine learning (ML) has shown its potential to improve patient care over the last decade. In organ transplantation, delayed graft function (DGF) remains a major concern in deceased donor kidney transplantation (DDKT). To this end, we harnessed ML to build personalized prognostic models to predict DGF. Registry data were obtained on adult DDKT recipients for model development (n = 55,044) and validation (n = 6176). Incidence rates of DGF were 25.1% and 26.3% for the development and validation sets, respectively. Twenty-six predictors were identified via recursive feature elimination with random forest. Five widely-used ML algorithms-logistic regression (LR), elastic net, random forest, artificial neural network (ANN), and extreme gradient boosting (XGB) were trained and compared with a baseline LR model fitted with previously identified risk factors. The new ML models, particularly ANN with the area under the receiver operating characteristic curve (ROC-AUC) of 0.732 and XGB with ROC-AUC of 0.735, exhibited superior performance to the baseline model (ROC-AUC = 0.705). This study demonstrates the use of ML as a viable strategy to enable personalized risk quantification for medical applications. If successfully implemented, our models may aid in both risk quantification for DGF prevention clinical trials and personalized clinical decision making.
Collapse
Affiliation(s)
| | | | - Vadim Jucaud
- Terasaki Research Institute, Los Angeles, CA, USA
| | | |
Collapse
|
42
|
Li Y, Wang X, Chen C, Jing C, Wu T. Exploring firms’ innovation capabilities through learning systems. Neurocomputing 2020. [DOI: 10.1016/j.neucom.2020.03.100] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
43
|
A Machine Learning Approach to Predicting Readmission or Mortality in Patients Hospitalized for Stroke or Transient Ischemic Attack. APPLIED SCIENCES-BASEL 2020. [DOI: 10.3390/app10186337] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Readmissions after stroke are not only associated with greater levels of disability and a higher risk of mortality but also increase overall medical costs. Predicting readmission risk and understanding its causes are thus essential for healthcare resource allocation and quality improvement planning. By using machine learning techniques on initial admission data, this study aimed to develop prediction models for readmission or mortality after stroke. During model development, resampling methods were implemented to balance the class distribution. Two-layer nested cross-validation was used to build and evaluate the prediction models. A total of 3422 patients were included for analysis. The 90-day rate of readmission or mortality was 17.6%. This study identified several important predictive factors, including age, prior emergency department visits, pre-stroke functional status, stroke severity, body mass index, consciousness level, and use of a nasogastric tube. The Naïve Bayes model with class weighting to compensate for class imbalance achieved the highest discriminatory capacity in terms of the area under the receiver operating characteristic curve (0.661). Despite having room for improvement, the prediction models could be used for early risk assessment of patients with stroke. Identification of patients at high risk for readmission or mortality immediately after admission has the potential of enabling early discharge planning and transitional care interventions.
Collapse
|
44
|
Gao L, Ding Y. Disease prediction via Bayesian hyperparameter optimization and ensemble learning. BMC Res Notes 2020; 13:205. [PMID: 32276658 PMCID: PMC7146897 DOI: 10.1186/s13104-020-05050-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2020] [Accepted: 03/30/2020] [Indexed: 11/10/2022] Open
Abstract
Objective Early disease screening and diagnosis are important for improving patient survival. Thus, identifying early predictive features of disease is necessary. This paper presents a comprehensive comparative analysis of different Machine Learning (ML) systems and reports the standard deviation of the results obtained through sampling with replacement. The research emphasises on: (a) to analyze and compare ML strategies used to predict Breast Cancer (BC) and Cardiovascular Disease (CVD) and (b) to use feature importance ranking to identify early high-risk features. Results The Bayesian hyperparameter optimization method was more stable than the grid search and random search methods. In a BC diagnosis dataset, the Extreme Gradient Boosting (XGBoost) model had an accuracy of 94.74% and a sensitivity of 93.69%. The mean value of the cell nucleus in the Fine Needle Puncture (FNA) digital image of breast lump was identified as the most important predictive feature for BC. In a CVD dataset, the XGBoost model had an accuracy of 73.50% and a sensitivity of 69.54%. Systolic blood pressure was identified as the most important feature for CVD prediction.
Collapse
Affiliation(s)
- Liyuan Gao
- College of Science, Wuhan University of Science and Technology, Huangjiahu West Road, Wuhan, 430065, China
| | - Yongmei Ding
- College of Science, Wuhan University of Science and Technology, Huangjiahu West Road, Wuhan, 430065, China.
| |
Collapse
|