1
|
Wan X, Zhang R, Wang Y, Wei W, Song B, Zhang L, Hu Y. Predicting diabetic retinopathy based on routine laboratory tests by machine learning algorithms. Eur J Med Res 2025; 30:183. [PMID: 40102923 PMCID: PMC11921716 DOI: 10.1186/s40001-025-02442-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2024] [Accepted: 03/09/2025] [Indexed: 03/20/2025] Open
Abstract
OBJECTIVES This study aimed to identify risk factors for diabetic retinopathy (DR) and develop machine learning (ML)-based predictive models using routine laboratory data in patients with type 2 diabetes mellitus (T2DM). METHODS Clinical data from 4259 T2DM inpatients at Beijing Tongren Hospital were analyzed, divided into a model construction data set (N = 3936) and an external validation data set (N = 323). Using 39 optimal variables, a prediction model was constructed using the eXtreme Gradient Boosting (XGBoost) algorithm and compared with four other algorithms: support vector machine (SVM), gradient boosting decision tree (GBDT), neural network (NN), and logistic regression (LR). The Shapley Additive exPlanation (SHAP) method was employed to interpret the XGBoost model. External validation was performed to assess model performance. RESULTS DR was present in 47.69% (N = 1877) of T2DM patients in the model construction data set. Among the models tested, the XGBoost model performed best with an AUC of 0.831, accuracy of 0.757, sensitivity of 0.754, specificity of 0.759, and F1-score of 0.752. SHAP explained feature importance for XGBoost model and identified key risk factors for DR. External validation yielded an accuracy of 0.650 for the XGBoost model. CONCLUSIONS The XGBoost-based prediction model effectively assesses DR risk in T2DM patients using routine laboratory data, aiding clinicians in identifying high-risk individuals and guiding personalized management strategies, especially in medically underserved areas.
Collapse
Affiliation(s)
- Xiaohua Wan
- Department of Clinical Laboratory, Beijing Chao-Yang Hospital, Capital Medical University, Beijing, People's Republic of China
- Beijing Center for Clinical Laboratories, Beijing, People's Republic of China
- Department of Clinical Laboratory, Beijing Tongren Hospital, Capital Medical University, Beijing, People's Republic of China
| | - Ruihuan Zhang
- The Inner Mongolia Medical Intelligent Diagnostics Big Data Research Institute, Inner Mongolia, People's Republic of China
| | - Yanan Wang
- The Inner Mongolia Medical Intelligent Diagnostics Big Data Research Institute, Inner Mongolia, People's Republic of China
| | - Wei Wei
- Department of Medical Record, Beijing Tongren Hospital, Capital Medical University, Beijing, People's Republic of China
| | - Biao Song
- The Inner Mongolia Medical Intelligent Diagnostics Big Data Research Institute, Inner Mongolia, People's Republic of China.
| | - Lin Zhang
- Department of Medical Record, Beijing Tongren Hospital, Capital Medical University, Beijing, People's Republic of China.
- Department of Endocrinology, Beijing Tongren Hospital, Capital Medical University, Beijing, People's Republic of China.
- Beijing Diabetes Research Institute, Beijing, People's Republic of China.
| | - Yanwei Hu
- Department of Clinical Laboratory, Beijing Chao-Yang Hospital, Capital Medical University, Beijing, People's Republic of China.
- Beijing Center for Clinical Laboratories, Beijing, People's Republic of China.
| |
Collapse
|
2
|
Stolp J, Weber C, Ammon D, Scherag A, Fischer C, Kloos C, Wolf G, Schulze PC, Settmacher U, Bauer M, Stallmach A, Kiehntopf M, Betz B. Automated sample annotation for diabetes mellitus in healthcare integrated biobanking. Comput Struct Biotechnol J 2024; 24:724-733. [PMID: 39668942 PMCID: PMC11635603 DOI: 10.1016/j.csbj.2024.10.033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2024] [Revised: 10/20/2024] [Accepted: 10/20/2024] [Indexed: 12/14/2024] Open
Abstract
Healthcare integrated biobanking describes the annotation and collection of residual samples from hospitalized patients for research purposes. The central idea of the current work is to establish an automated workflow for sample annotation, selection and storage for diabetes mellitus. This is challenging due to incomplete data at the time of sample selection. The study evaluates a machine learning (ML) and natural language processing (NLP) based two-step procedure for timely and precise sample annotation for diabetes mellitus. Electronic health record data of 785 persons were extracted from the hospital information system. In the first step, a conditional inference forest (CIF) model was trained and tested based on laboratory values from the first 72 h of the hospital stay using test- (n = 550) and training data sets (n = 235). Performance was compared with a simple laboratory cut-off classifier (LCC) and a logistic regression (LR) model. Algorithms based on laboratory values, ICD-10 codes or information from discharge summaries extracted by a natural language processing software (NLP-DS) were evaluated as a second (review) step designed to increase the precision of annotations. For the first step, recall/precision/F1-score/accuracy were 71 %/86 %/0.78/0.82 for CIF and 77 %/70 %/0.74/0.75 for LR compared to 73 %/68 %/0.70/0.72 for LCC. NLP-DS was the best-performing second (review) step (93 %/100 %/0.97/0.97). Combining first-step models with NLP-DS increased precision to 100 % for all procedures (66 %/100 %/0.80/0.85 for CIF&NLP-DS, 72 %/100 %/0.84/87.2 for LR&NLP-DS and 66 %/100 %/0.80/0.85 for LCC&NLP-DS). The number of samples removed by NLP-DS was higher for LR&NLP-DS and LCC&NLP-DS (removal rate 35 % and 38 % of initially selected samples) compared to CIF&NLP-DS (removal rate of 20 %). The developed two-step procedure is an efficient implementable method for timely and precise annotation of samples from diabetic hospitalized patients.
Collapse
Affiliation(s)
- Johannes Stolp
- Department of Clinical Chemistry and Laboratory Diagnostics and Integrated Biobank Jena (IBBJ), Jena University Hospital – Friedrich Schiller University Jena, Jena, Germany
| | - Christoph Weber
- Department of Clinical Chemistry and Laboratory Diagnostics and Integrated Biobank Jena (IBBJ), Jena University Hospital – Friedrich Schiller University Jena, Jena, Germany
| | - Danny Ammon
- Data Integration Center, Jena University Hospital – Friedrich Schiller University Jena, Jena, Germany
| | - André Scherag
- Institute of Medical Statistics, Computer and Data Sciences (IMSID), Jena University Hospital – Friedrich Schiller University Jena, Jena, Germany
| | - Claudia Fischer
- Institute of Medical Statistics, Computer and Data Sciences (IMSID), Jena University Hospital – Friedrich Schiller University Jena, Jena, Germany
| | - Christof Kloos
- Department of Internal Medicine III, Jena University Hospital – Friedrich Schiller University Jena, Jena, Germany
| | - Gunter Wolf
- Department of Internal Medicine III, Jena University Hospital – Friedrich Schiller University Jena, Jena, Germany
| | - P. Christian Schulze
- Department of Internal Medicine I, Jena University Hospital – Friedrich Schiller University Jena, Jena, Germany
| | - Utz Settmacher
- Department of General Visceral and Vascular Surgery, Jena University Hospital – Friedrich Schiller University Jena, Jena, Germany
| | - Michael Bauer
- Department of Anesthesiology and Intensive Care Medicine, Jena University Hospital – Friedrich Schiller University Jena, Jena, Germany
| | - Andreas Stallmach
- Department of Internal Medicine IV, Jena University Hospital – Friedrich Schiller University Jena, Jena, Germany
| | - Michael Kiehntopf
- Department of Clinical Chemistry and Laboratory Diagnostics and Integrated Biobank Jena (IBBJ), Jena University Hospital – Friedrich Schiller University Jena, Jena, Germany
| | - Boris Betz
- Department of Clinical Chemistry and Laboratory Diagnostics and Integrated Biobank Jena (IBBJ), Jena University Hospital – Friedrich Schiller University Jena, Jena, Germany
| |
Collapse
|
3
|
Xu Y, Qiu S, Ye J, Chen D, Wang D, Zhou X, Sun Z. Performance of different machine learning algorithms in identifying undiagnosed diabetes based on nonlaboratory parameters and the influence of muscle strength: A cross-sectional study. J Diabetes Investig 2024; 15:743-750. [PMID: 38439210 PMCID: PMC11143412 DOI: 10.1111/jdi.14166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/03/2023] [Revised: 01/21/2024] [Accepted: 02/08/2024] [Indexed: 03/06/2024] Open
Abstract
AIMS/INTRODUCTION Machine learning algorithms based on the artificial neural network (ANN), support vector machine, naive Bayesian or logistic regression model are commonly used to identify diabetes. This study investigated which approach performed the best and whether muscle strength provided any incremental benefit in identifying undiagnosed diabetes in Chinese adults. METHODS This cross-sectional study enrolled 4,482 eligible participants from eight provinces in China, who were randomly divided into the training dataset (n = 3,586) and the testing dataset (n = 896). Muscle strength was assessed by handgrip strength and the number of chair stands in the 30-s chair stand test. An oral glucose tolerance test was used to ascertain undiagnosed diabetes. The areas under the curve (AUCs) were calculated accordingly and compared with each other. RESULTS Of the included participants, 233 had newly diagnosed diabetes. All the four machine learning algorithms, which were developed based on nonlaboratory parameters, showed acceptable discriminative ability in identifying undiagnosed diabetes (all AUCs >0.70), with the ANN approach performing the best (AUC 0.806). Adding handgrip strength or the 30-s chair stand test to this approach did not increase the AUC further (P = 0.39 and 0.26, respectively). Furthermore, compared with the New Chinese Diabetes Risk Score, the ANN approach showed a larger AUC in identifying undiagnosed diabetes (Pcomparison < 0.01), regardless of the addition of handgrip strength or the 30-s chair stand test. CONCLUSIONS The ANN approach performed the best in identifying undiagnosed diabetes in Chinese adults; however, the addition of muscle strength might not improve its efficacy.
Collapse
Affiliation(s)
- Ying Xu
- Department of Endocrine Metabolism, The First People's Hospital of Yunnan Province, The Affiliated Hospital of Kunming University of Science and Technology, Kunming, China
| | - Shanhu Qiu
- Department of General Practice, School of Medicine, Institute of Diabetes, Zhongda Hospital, Southeast University, Nanjing, China
| | - Jinli Ye
- School of Mathematics and Statistics, Yunnan University, Kunming, China
| | - Dan Chen
- School of Mathematics and Statistics, Yunnan University, Kunming, China
| | - Donglei Wang
- Department of Endocrinology, School of Medicine, Institute of Diabetes, Zhongda Hospital, Southeast University, Nanjing, China
| | - Xiaoying Zhou
- Department of Endocrinology, School of Medicine, Institute of Diabetes, Zhongda Hospital, Southeast University, Nanjing, China
| | - Zilin Sun
- Department of Endocrinology, School of Medicine, Institute of Diabetes, Zhongda Hospital, Southeast University, Nanjing, China
| |
Collapse
|
4
|
Liu J, Wang L, Qian Y, Shen Q, Yang M, Dong Y, Chen H, Yang Z, Liu Y, Cui X, Ma H, Jin G. Metabolic and Genetic Markers Improve Prediction of Incident Type 2 Diabetes: A Nested Case-Control Study in Chinese. J Clin Endocrinol Metab 2022; 107:3120-3127. [PMID: 35977051 PMCID: PMC9681609 DOI: 10.1210/clinem/dgac487] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/12/2022] [Indexed: 11/29/2022]
Abstract
CONTEXT It is essential to improve the current predictive ability for type 2 diabetes (T2D) risk. OBJECTIVE We aimed to identify novel metabolic markers for future T2D in Chinese individuals of Han ethnicity and to determine whether the combined effect of metabolic and genetic markers improves the accuracy of prediction models containing clinical factors. METHODS A nested case-control study containing 220 incident T2D patients and 220 age- and sex- matched controls from normoglycemic Chinese individuals of Han ethnicity was conducted within the Wuxi Non-Communicable Disease cohort with a 12-year follow-up. Metabolic profiling detection was performed by high-performance liquid chromatography‒mass spectrometry (HPLC-MS) by an untargeted strategy and 20 single nucleotide polymorphisms (SNPs) associated with T2D were genotyped using the Iplex Sequenom MassARRAY platform. Machine learning methods were used to identify metabolites associated with future T2D risk. RESULTS We found that abnormal levels of 5 metabolites were associated with increased risk of future T2D: riboflavin, cnidioside A, 2-methoxy-5-(1H-1, 2, 4-triazol-5-yl)- 4-(trifluoromethyl) pyridine, 7-methylxanthine, and mestranol. The genetic risk score (GRS) based on 20 SNPs was significantly associated with T2D risk (OR = 1.35; 95% CI, 1.08-1.70 per SD). The area under the receiver operating characteristic curve (AUC) was greater for the model containing metabolites, GRS, and clinical traits than for the model containing clinical traits only (0.960 vs 0.798, P = 7.91 × 10-16). CONCLUSION In individuals with normal fasting glucose levels, abnormal levels of 5 metabolites were associated with future T2D. The combination of newly discovered metabolic markers and genetic markers could improve the prediction of incident T2D.
Collapse
Affiliation(s)
| | | | - Yun Qian
- Correspondence: Yun Qian, PhD, Department of Health Promotion & Chronic Non-Communicable Disease Control. Wuxi Center for Disease Control and Prevention (The Affiliated Wuxi Center for Disease Control and Prevention of Nanjing Medical University), 499 Jincheng Rd, Wuxi 214023, China. E-mail:
| | - Qian Shen
- Department of Health Promotion & Chronic Non-Communicable Disease Control, Wuxi Center for Disease Control and Prevention (The Affiliated Wuxi Center for Disease Control and Prevention of Nanjing Medical University), Wuxi 214023, Jiangsu, China
| | - Man Yang
- Department of Health Promotion & Chronic Non-Communicable Disease Control, Wuxi Center for Disease Control and Prevention (The Affiliated Wuxi Center for Disease Control and Prevention of Nanjing Medical University), Wuxi 214023, Jiangsu, China
| | - Yunqiu Dong
- Department of Health Promotion & Chronic Non-Communicable Disease Control, Wuxi Center for Disease Control and Prevention (The Affiliated Wuxi Center for Disease Control and Prevention of Nanjing Medical University), Wuxi 214023, Jiangsu, China
| | - Hai Chen
- Department of Health Promotion & Chronic Non-Communicable Disease Control, Wuxi Center for Disease Control and Prevention (The Affiliated Wuxi Center for Disease Control and Prevention of Nanjing Medical University), Wuxi 214023, Jiangsu, China
| | - Zhijie Yang
- Department of Health Promotion & Chronic Non-Communicable Disease Control, Wuxi Center for Disease Control and Prevention (The Affiliated Wuxi Center for Disease Control and Prevention of Nanjing Medical University), Wuxi 214023, Jiangsu, China
| | - Yaqi Liu
- Department of Health Promotion & Chronic Non-Communicable Disease Control, Wuxi Center for Disease Control and Prevention (The Affiliated Wuxi Center for Disease Control and Prevention of Nanjing Medical University), Wuxi 214023, Jiangsu, China
| | - Xuan Cui
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing 211166, Jiangsu, China
| | - Hongxia Ma
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing 211166, Jiangsu, China
| | - Guangfu Jin
- Department of Epidemiology and Biostatistics, School of Public Health, Nanjing Medical University, Nanjing 211166, Jiangsu, China
| |
Collapse
|