1
|
Torres-Martos Á, Anguita-Ruiz A, Bustos-Aibar M, Ramírez-Mena A, Arteaga M, Bueno G, Leis R, Aguilera CM, Alcalá R, Alcalá-Fdez J. Multiomics and eXplainable artificial intelligence for decision support in insulin resistance early diagnosis: A pediatric population-based longitudinal study. Artif Intell Med 2024; 156:102962. [PMID: 39180924 DOI: 10.1016/j.artmed.2024.102962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2024] [Revised: 07/31/2024] [Accepted: 08/16/2024] [Indexed: 08/27/2024]
Abstract
Pediatric obesity can drastically heighten the risk of cardiometabolic alterations later in life, with insulin resistance standing as the cornerstone linking adiposity to the increased cardiovascular risk. Puberty has been pointed out as a critical stage after which obesity-associated insulin resistance is more difficult to revert. Timely prediction of insulin resistance in pediatric obesity is therefore vital for mitigating the risk of its associated comorbidities. The construction of effective and robust predictive systems for a complex health outcome like insulin resistance during the early stages of life demands the adoption of longitudinal designs for more causal inferences, and the integration of factors of varying nature involved in its onset. In this work, we propose an eXplainable Artificial Intelligence-based decision support pipeline for early diagnosis of insulin resistance in a longitudinal cohort of 90 children. For that, we leverage multi-omics (genomics and epigenomics) and clinical data from the pre-pubertal stage. Different data layers combinations, pre-processing techniques (missing values, feature selection, class imbalance, etc.), algorithms, training procedures were considered following good practices for Machine Learning. SHapley Additive exPlanations were provided for specialists to understand both the decision-making mechanisms of the system and the impact of the features on each automatic decision, an essential issue in high-risk areas such as this one where system decisions may affect people's lives. The system showed a relevant predictive ability (AUC and G-mean of 0.92). A deep exploration, both at the global and the local level, revealed promising biomarkers of insulin resistance in our population, highlighting classical markers, such as Body Mass Index z-score or leptin/adiponectin ratio, and novel ones such as methylation patterns of relevant genes, such as HDAC4, PTPRN2, MATN2, RASGRF1 and EBF1. Our findings highlight the importance of integrating multi-omics data and following eXplainable Artificial Intelligence trends when building decision support systems.
Collapse
Affiliation(s)
- Álvaro Torres-Martos
- Department of Biochemistry and Molecular Biology II, School of Pharmacy, "José Mataix Verdú" Institute of Nutrition and Food Technology (INYTA) and Center of Biomedical Research, University of Granada, Granada, 18071, Spain; Instituto de investigación Biosanitaria ibs.GRANADA, Granada, 18012, Spain; CIBER de Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, 28029, Spain.
| | - Augusto Anguita-Ruiz
- CIBER de Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, 28029, Spain; Barcelona Institute for Global Health, ISGlobal, Barcelona, 08003, Spain.
| | - Mireia Bustos-Aibar
- Department of Biochemistry and Molecular Biology II, School of Pharmacy, "José Mataix Verdú" Institute of Nutrition and Food Technology (INYTA) and Center of Biomedical Research, University of Granada, Granada, 18071, Spain; CIBER de Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, 28029, Spain; Growth, Exercise, Nutrition and Development (GENUD) Research Group, Institute for Health Research Aragón (IIS Aragón), Zaragoza, 50009, Spain.
| | - Alberto Ramírez-Mena
- Bioinformatics Unit, Centre for Genomics and Oncological Research, GENYO Pfizer/University of Granada/Andalusian Regional Government, PTS, Granada, 18016, Spain.
| | - María Arteaga
- Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, 18071, Spain.
| | - Gloria Bueno
- CIBER de Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, 28029, Spain; Growth, Exercise, Nutrition and Development (GENUD) Research Group, Institute for Health Research Aragón (IIS Aragón), Zaragoza, 50009, Spain; Pediatric Endocrinology Unit, Facultad de Medicina, Clinic University Hospital Lozano Blesa, University of Zaragoza, Zaragoza, 50009, Spain.
| | - Rosaura Leis
- CIBER de Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, 28029, Spain; Unit of Pediatric Gastroenterology, Hepatology and Nutrition, Pediatric Service, Hospital Clínico Universitario de Santiago. Unit of Investigation in Nutrition, Growth and Human Development of Galicia-USC, Pediatric Nutrition Research Group-Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, 15706, Spain.
| | - Concepción M Aguilera
- Department of Biochemistry and Molecular Biology II, School of Pharmacy, "José Mataix Verdú" Institute of Nutrition and Food Technology (INYTA) and Center of Biomedical Research, University of Granada, Granada, 18071, Spain; Instituto de investigación Biosanitaria ibs.GRANADA, Granada, 18012, Spain; CIBER de Fisiopatología de la Obesidad y Nutrición (CIBEROBN), Instituto de Salud Carlos III, Madrid, 28029, Spain.
| | - Rafael Alcalá
- Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, 18071, Spain.
| | - Jesús Alcalá-Fdez
- Department of Computer Science and Artificial Intelligence, Andalusian Research Institute in Data Science and Computational Intelligence (DaSCI), University of Granada, Granada, 18071, Spain.
| |
Collapse
|
2
|
Liu CH, Chang CF, Chen IC, Lin FM, Tzou SJ, Hsieh CB, Chu TW, Pei D. Machine Learning Prediction of Prediabetes in a Young Male Chinese Cohort with 5.8-Year Follow-Up. Diagnostics (Basel) 2024; 14:979. [PMID: 38786280 PMCID: PMC11119884 DOI: 10.3390/diagnostics14100979] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2024] [Revised: 04/26/2024] [Accepted: 04/29/2024] [Indexed: 05/25/2024] Open
Abstract
The identification of risk factors for future prediabetes in young men remains largely unexamined. This study enrolled 6247 young ethnic Chinese men with normal fasting plasma glucose at the baseline (FPGbase), and used machine learning (Mach-L) methods to predict prediabetes after 5.8 years. The study seeks to achieve the following: 1. Evaluate whether Mach-L outperformed traditional multiple linear regression (MLR). 2. Identify the most important risk factors. The baseline data included demographic, biochemistry, and lifestyle information. Two models were built, where Model 1 included all variables and Model 2 excluded FPGbase, since it had the most profound effect on prediction. Random forest, stochastic gradient boosting, eXtreme gradient boosting, and elastic net were used, and the model performance was compared using different error metrics. All the Mach-L errors were smaller than those for MLR, thus Mach-L provided the most accurate results. In descending order of importance, the key factors for Model 1 were FPGbase, body fat (BF), creatinine (Cr), thyroid stimulating hormone (TSH), WBC, and age, while those for Model 2 were BF, white blood cell, age, TSH, TG, and LDL-C. We concluded that FPGbase was the most important factor to predict future prediabetes. However, after removing FPGbase, WBC, TSH, BF, HDL-C, and age were the key factors after 5.8 years.
Collapse
Affiliation(s)
- Chi-Hao Liu
- Division of Nephrology, Department of Internal Medicine, Kaohsiung Armed Forces General Hospital, Kaohsiung 802, Taiwan;
| | - Chun-Feng Chang
- Divisions of Urology, Department of Surgery, Kaohsiung Armed Forces General Hospital, Kaohsiung 802, Taiwan;
- Divisions of Urology, Department of Surgery, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan
| | - I-Chien Chen
- Department of Nursing, Kaohsiung Armed Forces General Hospital, Kaohsiung 802, Taiwan;
| | - Fan-Min Lin
- Division of Pulmonary Medicine, Department of Internal Medicine, Kaohsiung Armed Forces General Hospital, Kaohsiung 802, Taiwan;
| | - Shiow-Jyu Tzou
- Teaching and Researching Center, Kaohsiung Armed Forces General Hospital, Kaohsiung 802, Taiwan;
- Institute of Medical Science and Technology, National Sun Yat-sen University, Kaohsiung 804, Taiwan
| | - Chung-Bao Hsieh
- Department of Surgery, Kaohsiung Armed Forces General Hospital, Kaohsiung 802, Taiwan;
| | - Ta-Wei Chu
- Department of Obstetrics and Gynecology, Tri-Service General Hospital, National Defense Medical Center, Taipei 114, Taiwan;
- MJ Health Research Foundation, Taipei 114, Taiwan
| | - Dee Pei
- Division of Endocrinology and Metabolism, Department of Internal Medicine, Fu Jen Catholic University Hospital, School of Medicine, College of Medicine, Fu Jen Catholic University, New Taipei 243, Taiwan
| |
Collapse
|
3
|
Xu Z, Hu Y, Shao X, Shi T, Yang J, Wan Q, Liu Y. The Efficacy of Machine Learning Models for Predicting the Prognosis of Heart Failure: A Systematic Review and Meta-Analysis. Cardiology 2024:1-19. [PMID: 38648752 DOI: 10.1159/000538639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Accepted: 03/28/2024] [Indexed: 04/25/2024]
Abstract
INTRODUCTION Heart failure (HF) is a major global public health concern. The application of machine learning (ML) to identify individuals at high risk and enable early intervention is a promising approach for improving HF prognosis. We aim to systematically evaluate the performance and value of ML models for predicting HF prognosis. METHODS PubMed, Web of Science, Scopus, and Embase online databases were searched up to April 30, 2023, to identify studies on the use of ML models to predict HF prognosis. HF prognosis primarily encompasses readmission and mortality. The meta-analysis was conducted by MedCalc software. Subgroup analyses include grouping based on types of ML models, time intervals, sample sizes, the number of predictive variables, validation methods, whether to conduct hyperparameter optimization and calibration, data set partitioning methods. RESULTS A total of 31 studies were included. The most common ML models were random forest, boosting, support vector machine, neural network. The area under the receiver operating characteristic curve (AUC) for predicting HF readmission was 0.675 (95% CI: 0.651-0.699, p < 0.001), and the AUC for predicting HF mortality was 0.790 (95% CI: 0.765-0.816, p < 0.001). Subgroup analyses revealed that models with the prediction time interval of 1 year, sample sizes ≥10,000, the number of predictive variables ≥100, external validation, hyperparameter tuning, calibration adjustment, and data set partitioning using 10-fold cross-validation exhibited favorable performance within their respective subgroups. CONCLUSION The performance of ML models in predicting HF readmission is relatively poor, while its performance in predicting HF mortality is moderate. The quality of the relevant studies is generally low, it is essential to enhance the predictive capabilities of ML models through targeted improvements in practical applications.
Collapse
Affiliation(s)
- Zhaohui Xu
- Department of Cardiovascular Disease, ShuGuang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai, China,
| | - Yinqin Hu
- Department of Cardiovascular Disease, ShuGuang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Xinyi Shao
- The Grier School, Tyrone, Pennsylvania, USA
| | - Tianyun Shi
- Department of Cardiovascular Disease, ShuGuang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Jiahui Yang
- Department of Cardiovascular Disease, ShuGuang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Qiqi Wan
- Department of Cardiovascular Disease, ShuGuang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Yongming Liu
- Department of Cardiovascular Disease, ShuGuang Hospital Affiliated to Shanghai University of Traditional Chinese Medicine, Shanghai, China
- Department of Cardiovascular Disease, Anhui Provincial Hospital of Integrated Medicine, Hefei Anhui, China
| |
Collapse
|
4
|
Dimitri P, Savage MO. Artificial intelligence in paediatric endocrinology: conflict or cooperation. J Pediatr Endocrinol Metab 2024; 37:209-221. [PMID: 38183676 DOI: 10.1515/jpem-2023-0554] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/17/2023] [Accepted: 12/18/2023] [Indexed: 01/08/2024]
Abstract
Artificial intelligence (AI) in medicine is transforming healthcare by automating system tasks, assisting in diagnostics, predicting patient outcomes and personalising patient care, founded on the ability to analyse vast datasets. In paediatric endocrinology, AI has been developed for diabetes, for insulin dose adjustment, detection of hypoglycaemia and retinopathy screening; bone age assessment and thyroid nodule screening; the identification of growth disorders; the diagnosis of precocious puberty; and the use of facial recognition algorithms in conditions such as Cushing syndrome, acromegaly, congenital adrenal hyperplasia and Turner syndrome. AI can also predict those most at risk from childhood obesity by stratifying future interventions to modify lifestyle. AI will facilitate personalised healthcare by integrating data from 'omics' analysis, lifestyle tracking, medical history, laboratory and imaging, therapy response and treatment adherence from multiple sources. As data acquisition and processing becomes fundamental, data privacy and protecting children's health data is crucial. Minimising algorithmic bias generated by AI analysis for rare conditions seen in paediatric endocrinology is an important determinant of AI validity in clinical practice. AI cannot create the patient-doctor relationship or assess the wider holistic determinants of care. Children have individual needs and vulnerabilities and are considered in the context of family relationships and dynamics. Importantly, whilst AI provides value through augmenting efficiency and accuracy, it must not be used to replace clinical skills.
Collapse
Affiliation(s)
- Paul Dimitri
- Department of Paediatric Endocrinology, Sheffield Children's NHS Foundation Trust, Sheffield, UK
| | - Martin O Savage
- Centre for Endocrinology, William Harvey Research Institute, Barts and the London School of Medicine & Dentistry, Queen Mary University of London, London, UK
| |
Collapse
|
5
|
Queipo M, Barbado J, Torres AM, Mateo J. Approaching Personalized Medicine: The Use of Machine Learning to Determine Predictors of Mortality in a Population with SARS-CoV-2 Infection. Biomedicines 2024; 12:409. [PMID: 38398012 PMCID: PMC10886784 DOI: 10.3390/biomedicines12020409] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 02/05/2024] [Accepted: 02/07/2024] [Indexed: 02/25/2024] Open
Abstract
The COVID-19 pandemic demonstrated the need to develop strategies to control a new viral infection. However, the different characteristics of the health system and population of each country and hospital would require the implementation of self-systems adapted to their characteristics. The objective of this work was to determine predictors that should identify the most severe patients with COVID-19 infection. Given the poor situation of the hospitals in the first wave, the analysis of the data from that period with an accurate and fast technique can be an important contribution. In this regard, machine learning is able to objectively analyze data in hourly sets and is used in many fields. This study included 291 patients admitted to a hospital in Spain during the first three months of the pandemic. After screening seventy-one features with machine learning methods, the variables with the greatest influence on predicting mortality in this population were lymphocyte count, urea, FiO2, potassium, and serum pH. The XGB method achieved the highest accuracy, with a precision of >95%. Our study shows that the machine learning-based system can identify patterns and, thus, create a tool to help hospitals classify patients according to their severity of illness in order to optimize admission.
Collapse
Affiliation(s)
- Mónica Queipo
- Autoimmunity and Inflammation Research Group, Río Hortega University Hospital, 47012 Valladolid, Spain
- Cooperative Research Network Focused on Health Results—Advanced Therapies (RICORS TERAV), 28220 Madrid, Spain
| | - Julia Barbado
- Autoimmunity and Inflammation Research Group, Río Hortega University Hospital, 47012 Valladolid, Spain
- Cooperative Research Network Focused on Health Results—Advanced Therapies (RICORS TERAV), 28220 Madrid, Spain
- Internal Medicine, Río Hortega University Hospital, 47012 Valladolid, Spain
| | - Ana María Torres
- Medical Analysis Expert Group, Institute of Technology, University of Castilla-La Mancha, 16071 Cuenca, Spain
- Medical Analysis Expert Group, Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
| | - Jorge Mateo
- Medical Analysis Expert Group, Institute of Technology, University of Castilla-La Mancha, 16071 Cuenca, Spain
- Medical Analysis Expert Group, Instituto de Investigación Sanitaria de Castilla-La Mancha (IDISCAM), 45071 Toledo, Spain
| |
Collapse
|
6
|
Cassidy B, Hoon Yap M, Pappachan JM, Ahmad N, Haycocks S, O'Shea C, Fernandez CJ, Chacko E, Jacob K, Reeves ND. Artificial intelligence for automated detection of diabetic foot ulcers: A real-world proof-of-concept clinical evaluation. Diabetes Res Clin Pract 2023; 205:110951. [PMID: 37848163 DOI: 10.1016/j.diabres.2023.110951] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Revised: 10/02/2023] [Accepted: 10/11/2023] [Indexed: 10/19/2023]
Abstract
OBJECTIVE Conduct a multicenter proof-of-concept clinical evaluation to assess the accuracy of an artificial intelligence system on a smartphone for automated detection of diabetic foot ulcers. METHODS The evaluation was undertaken with patients with diabetes (n = 81) from September 2020 to January 2021. A total of 203 foot photographs were collected using a smartphone, analysed using the artificial intelligence system, and compared against expert clinician judgement, with 162 images showing at least one ulcer, and 41 showing no ulcer. Sensitivity and specificity of the system against clinician decisions was determined and inter- and intra-rater reliability analysed. RESULTS Predictions/decisions made by the system showed excellent sensitivity (0.9157) and high specificity (0.8857). Merging of intersecting predictions improved specificity to 0.9243. High levels of inter- and intra-rater reliability for clinician agreement on the ability of the artificial intelligence system to detect diabetic foot ulcers was also demonstrated (Kα > 0.8000 for all studies, between and within raters). CONCLUSIONS We demonstrate highly accurate automated diabetic foot ulcer detection using an artificial intelligence system with a low-end smartphone. This is the first key stage in the creation of a fully automated diabetic foot ulcer detection and monitoring system, with these findings underpinning medical device development.
Collapse
Affiliation(s)
- Bill Cassidy
- Department of Computing Mathematics, Manchester Metropolitan University, John Dalton Building, Manchester M1 5GD, UK.
| | - Moi Hoon Yap
- Department of Computing Mathematics, Manchester Metropolitan University, John Dalton Building, Manchester M1 5GD, UK.
| | - Joseph M Pappachan
- Lancashire Teaching Hospitals NHS Foundation Trust, Preston PR2 9HT, UK.
| | - Naseer Ahmad
- Manchester University NHS Foundation Trust, Manchester M13 9WL, UK.
| | | | - Claire O'Shea
- Te Whatu Ora Health New Zealand Waikato, Pembroke Street, Hamilton 3240, New Zealand. claire.o'
| | - Cornelious J Fernandez
- Department of Endocrinology and Metabolism, Pilgrim Hospital, United Lincolnshire Hospitals NHS Trust, Boston LN2 5QY, UK.
| | - Elias Chacko
- Jersey General Hospital, The Parade, St Helier, JE1 3QS Jersey, UK.
| | - Koshy Jacob
- Eastbourne District General Hospital, Kings Drive, Eastbourne BN21 2UD, UK.
| | - Neil D Reeves
- Faculty of Science & Engineering, Manchester Metropolitan University, John Dalton Building, Manchester M1 5GD, UK.
| |
Collapse
|
7
|
Iparraguirre-Villanueva O, Espinola-Linares K, Flores Castañeda RO, Cabanillas-Carbonell M. Application of Machine Learning Models for Early Detection and Accurate Classification of Type 2 Diabetes. Diagnostics (Basel) 2023; 13:2383. [PMID: 37510127 PMCID: PMC10378239 DOI: 10.3390/diagnostics13142383] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 06/23/2023] [Accepted: 07/07/2023] [Indexed: 07/30/2023] Open
Abstract
Early detection of diabetes is essential to prevent serious complications in patients. The purpose of this work is to detect and classify type 2 diabetes in patients using machine learning (ML) models, and to select the most optimal model to predict the risk of diabetes. In this paper, five ML models, including K-nearest neighbor (K-NN), Bernoulli Naïve Bayes (BNB), decision tree (DT), logistic regression (LR), and support vector machine (SVM), are investigated to predict diabetic patients. A Kaggle-hosted Pima Indian dataset containing 768 patients with and without diabetes was used, including variables such as number of pregnancies the patient has had, blood glucose concentration, diastolic blood pressure, skinfold thickness, body insulin levels, body mass index (BMI), genetic background, diabetes in the family tree, age, and outcome (with/without diabetes). The results show that the K-NN and BNB models outperform the other models. The K-NN model obtained the best accuracy in detecting diabetes, with 79.6% accuracy, while the BNB model obtained 77.2% accuracy in detecting diabetes. Finally, it can be stated that the use of ML models for the early detection of diabetes is very promising.
Collapse
|
8
|
Zhao M, Wan J, Qin W, Huang X, Chen G, Zhao X. A machine learning-based diagnosis modelling of type 2 diabetes mellitus with environmental metal exposure. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2023; 235:107537. [PMID: 37037162 DOI: 10.1016/j.cmpb.2023.107537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Revised: 03/02/2023] [Accepted: 04/04/2023] [Indexed: 05/08/2023]
Abstract
BACKGROUND AND OBJECTIVE Increasing and compelling evidence has been proved that urinary and dietary metal exposure are underappreciated but potentially modifiable biomarkers for type 2 diabetes mellitus (T2DM). The aims of this study were (1) to identify the key potential biomarkers which contributed to T2DM with effective and parsimonious features and (2) to assess the utility of baseline variables and metal exposure in the diagnosis of T2DM. METHODS Based on the National Health and Nutrition Examination Survey (NHANES), we selected 9822 screening records with 82 significant variables covering demographics, lifestyle, anthropometric measures, diet and metal exposure for this study. Combining extreme gradient boosting (XGBoost), random forest and light gradient boosting machine (lightGBM), a soft voting ensemble model was proposed to measure the importance of 82 features. With this soft voting ensemble model and variance inflation factor (VIF), strong multicollinear features with low importance scores were further removed from candidate biomarkers. Then, a soft voting ensemble classifier was adopted to demonstrate the efficiency of the proposed feature selection method. RESULTS With the novel feature selection method, 12 baseline variables and 3 metal variables were selected to detect patients at risk for T2DM in our study. For metal variables, the dietary copper (Cu), urinary cadmium (Cd) and urinary mercury (Hg) metals were selected as the most remarkable metal exposure and the corresponding P-values were all less than 0.05. In a classification model of T2DM with 12 baseline biomarkers, the addition of 3 metal exposure improved the classification accuracy of T2DM from a traditional area under the curve (AUC) 0.792 of the receiver operating characteristic (ROC) to an AUC 0.847. CONCLUSIONS This was the first demonstration of T2DM classification with machine learning under urinary and dietary metal exposure. Improved prediction precision illustrated the effectiveness of the proposed machine learning-based diagnosis model facilitated lifestyle/dietary intervention for T2DM prevention.
Collapse
Affiliation(s)
- Min Zhao
- School of Science, Nantong University, Nantong, 226019, China
| | - Jin Wan
- School of Science, Nantong University, Nantong, 226019, China
| | - Wenzhi Qin
- School of Science, Nantong University, Nantong, 226019, China
| | - Xin Huang
- School of Science, Nantong University, Nantong, 226019, China
| | - Guangdi Chen
- Bioelectromagnetics Laboratory, and Department of Reproductive Endocrinology of Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310058, China
| | - Xinyuan Zhao
- Department of occupational Medicine and Environmental Toxicology, School of Public Health, Nantong University, Nantong, 226019, China.
| |
Collapse
|
9
|
Zheng Z, Si Z, Wang X, Meng R, Wang H, Zhao Z, Lu H, Wang H, Zheng Y, Hu J, He R, Chen Y, Yang Y, Li X, Xue L, Sun J, Wu J. Risk Prediction for the Development of Hyperuricemia: Model Development Using an Occupational Health Examination Dataset. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:3411. [PMID: 36834107 PMCID: PMC9967697 DOI: 10.3390/ijerph20043411] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Revised: 02/13/2023] [Accepted: 02/13/2023] [Indexed: 06/18/2023]
Abstract
OBJECTIVE Hyperuricemia has become the second most common metabolic disease in China after diabetes, and the disease burden is not optimistic. METHODS We used the method of retrospective cohort studies, a baseline survey completed from January to September 2017, and a follow-up survey completed from March to September 2019. A group of 2992 steelworkers was used as the study population. Three models of Logistic regression, CNN, and XG Boost were established to predict HUA incidence in steelworkers, respectively. The predictive effects of the three models were evaluated in terms of discrimination, calibration, and clinical applicability. RESULTS The training set results show that the accuracy of the Logistic regression, CNN, and XG Boost models was 84.4, 86.8, and 86.6, sensitivity was 68.4, 72.3, and 81.5, specificity was 82.0, 85.7, and 86.8, the area under the ROC curve was 0.734, 0.724, and 0.806, and Brier score was 0.121, 0.194, and 0.095, respectively. The XG Boost model effect evaluation index was better than the other two models, and similar results were obtained in the validation set. In terms of clinical applicability, the XG Boost model had higher clinical applicability than the Logistic regression and CNN models. CONCLUSION The prediction effect of the XG Boost model was better than the CNN and Logistic regression models and was suitable for the prediction of HUA onset risk in steelworkers.
Collapse
Affiliation(s)
- Ziwei Zheng
- Key Laboratory of Coal Mine Health and Safety of Hebei Province, School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| | - Zhikang Si
- Key Laboratory of Coal Mine Health and Safety of Hebei Province, School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| | - Xuelin Wang
- Key Laboratory of Coal Mine Health and Safety of Hebei Province, School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| | - Rui Meng
- Key Laboratory of Coal Mine Health and Safety of Hebei Province, School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| | - Hui Wang
- Key Laboratory of Coal Mine Health and Safety of Hebei Province, School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| | - Zekun Zhao
- Key Laboratory of Coal Mine Health and Safety of Hebei Province, School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| | - Haipeng Lu
- Key Laboratory of Coal Mine Health and Safety of Hebei Province, School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| | - Huan Wang
- Key Laboratory of Coal Mine Health and Safety of Hebei Province, School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| | - Yizhan Zheng
- Key Laboratory of Coal Mine Health and Safety of Hebei Province, School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| | - Jiaqi Hu
- Key Laboratory of Coal Mine Health and Safety of Hebei Province, School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| | - Runhui He
- College of Science, North China University of Science and Technology, Tangshan 063210, China
| | - Yuanyu Chen
- Key Laboratory of Coal Mine Health and Safety of Hebei Province, School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| | - Yongzhong Yang
- Key Laboratory of Coal Mine Health and Safety of Hebei Province, School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| | - Xiaoming Li
- Key Laboratory of Coal Mine Health and Safety of Hebei Province, School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| | - Ling Xue
- Key Laboratory of Coal Mine Health and Safety of Hebei Province, School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| | - Jian Sun
- School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| | - Jianhui Wu
- Key Laboratory of Coal Mine Health and Safety of Hebei Province, School of Public Health, North China University of Science and Technology, Tangshan 063210, China
| |
Collapse
|