1
|
MacCarthy G, Pazoki R. Evaluation of Machine Learning and Traditional Statistical Models to Assess the Value of Stroke Genetic Liability for Prediction of Risk of Stroke Within the UK Biobank. Healthcare (Basel) 2025; 13:1003. [PMID: 40361781 PMCID: PMC12071721 DOI: 10.3390/healthcare13091003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2025] [Revised: 04/18/2025] [Accepted: 04/19/2025] [Indexed: 05/15/2025] Open
Abstract
Background and Objective: Stroke is one of the leading causes of mortality and long-term disability in adults over 18 years of age globally, and its increasing incidence has become a global public health concern. Accurate stroke prediction is highly valuable for early intervention and treatment. There is a scarcity of studies evaluating the prediction value of genetic liability in the prediction of the risk of stroke. Materials and Methods: Our study involved 243,339 participants of European ancestry from the UK Biobank. We created stroke genetic liability using data from MEGASTROKE genome-wide association studies (GWASs). In our study, we built four predictive models with and without stroke genetic liability in the training set, namely a Cox proportional hazard (Coxph) model, gradient boosting model (GBM), decision tree (DT), and random forest (RF), to estimate time-to-event risk for stroke. We then assessed their performances in the testing set. Results: Each unit (standard deviation) increase in genetic liability increases the risk of incident stroke by 7% (HR = 1.07, 95% CI = 1.02, 1.12, p-value = 0.0030). The risk of stroke was greater in the higher genetic liability group, demonstrated by a 14% increased risk (HR = 1.14, 95% CI = 1.02, 1.27, p-value = 0.02) compared with the low genetic liability group. The Coxph model including genetic liability was the best-performing model for stroke prediction achieving an AUC of 69.54 (95% CI = 67.40, 71.68), NRI of 0.202 (95% CI = 0.12, 0.28; p-value = 0.000) and IDI of 1.0 × 10-4 (95% CI = 0.000, 3.0 × 10-4; p-value = 0.13) compared with the Cox model without genetic liability. Conclusions: Incorporating genetic liability in prediction models slightly improved prediction models of stroke beyond conventional risk factors.
Collapse
Affiliation(s)
- Gideon MacCarthy
- Cardiovascular and Metabolic Research Group, Department of Biosciences, College of Health, Medicine, and Life Sciences, Brunel University of London, Uxbridge UB8 3PH, UK;
| | - Raha Pazoki
- Cardiovascular and Metabolic Research Group, Department of Biosciences, College of Health, Medicine, and Life Sciences, Brunel University of London, Uxbridge UB8 3PH, UK;
- Department of Epidemiology and Biostatistics, School of Public Health, Imperial College London, London W2 1PG, UK
| |
Collapse
|
2
|
Nagarajah S, Alkandari A, Marques-Vidal P. Genetic risk scores: are they important for diabetes management? results from multiple cross-sectional studies. Diabetol Metab Syndr 2023; 15:227. [PMID: 37950303 PMCID: PMC10636836 DOI: 10.1186/s13098-023-01204-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/02/2023] [Accepted: 11/01/2023] [Indexed: 11/12/2023] Open
Abstract
BACKGROUND Several genetic risk scores (GRS) for type 2 diabetes (T2DM) have been published, but not replicated. We aimed to 1) replicate previous findings on the association between GRS on prevalence of T2DM and 2) assess the association between GRS and T2DM management in a sample of community-dwelling people from Switzerland. METHODS Four waves from a prospective study conducted in Lausanne. Seven GRS related to T2DM were selected, and compared between participants with and without T2DM, and between controlled and uncontrolled participants treated for T2DM. RESULTS Data from 5426, 4017, 2873 and 2170 participants from the baseline, first, second and third follow-ups, respectively, was used. In all study periods, participants with T2DM scored higher than participants without T2DM in six out of seven GRS. Data from 367, 437, 285 and 207 participants with T2DM was used. In all study periods, approximately half of participants treated for T2DM did not achieve adequate fasting blood glucose or HbA1c levels, and no difference between controlled and uncontrolled participants was found for all seven GRS. Power analyses showed that most GRS needed a sample size above 1000 to consider the difference between controlled and uncontrolled participants as statistically significant at p = 0.05. CONCLUSION In this study, we confirmed the association between most published GRS and diabetes. Conversely, no consistent association between GRS and diabetes control was found. Use of GRS to manage patients with T2DM in clinical practice is not justified.
Collapse
Affiliation(s)
- Sureka Nagarajah
- Department of Medicine, Internal Medicine, Lausanne University Hospital and University of Lausanne, Office BH10-642, 46 Rue du Bugnon, 1011, Lausanne, Switzerland
| | | | - Pedro Marques-Vidal
- Department of Medicine, Internal Medicine, Lausanne University Hospital and University of Lausanne, Office BH10-642, 46 Rue du Bugnon, 1011, Lausanne, Switzerland.
| |
Collapse
|
3
|
Mohsen F, Al-Absi HRH, Yousri NA, El Hajj N, Shah Z. A scoping review of artificial intelligence-based methods for diabetes risk prediction. NPJ Digit Med 2023; 6:197. [PMID: 37880301 PMCID: PMC10600138 DOI: 10.1038/s41746-023-00933-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2023] [Accepted: 09/25/2023] [Indexed: 10/27/2023] Open
Abstract
The increasing prevalence of type 2 diabetes mellitus (T2DM) and its associated health complications highlight the need to develop predictive models for early diagnosis and intervention. While many artificial intelligence (AI) models for T2DM risk prediction have emerged, a comprehensive review of their advancements and challenges is currently lacking. This scoping review maps out the existing literature on AI-based models for T2DM prediction, adhering to the PRISMA extension for Scoping Reviews guidelines. A systematic search of longitudinal studies was conducted across four databases, including PubMed, Scopus, IEEE-Xplore, and Google Scholar. Forty studies that met our inclusion criteria were reviewed. Classical machine learning (ML) models dominated these studies, with electronic health records (EHR) being the predominant data modality, followed by multi-omics, while medical imaging was the least utilized. Most studies employed unimodal AI models, with only ten adopting multimodal approaches. Both unimodal and multimodal models showed promising results, with the latter being superior. Almost all studies performed internal validation, but only five conducted external validation. Most studies utilized the area under the curve (AUC) for discrimination measures. Notably, only five studies provided insights into the calibration of their models. Half of the studies used interpretability methods to identify key risk predictors revealed by their models. Although a minority highlighted novel risk predictors, the majority reported commonly known ones. Our review provides valuable insights into the current state and limitations of AI-based models for T2DM prediction and highlights the challenges associated with their development and clinical integration.
Collapse
Affiliation(s)
- Farida Mohsen
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
| | - Hamada R H Al-Absi
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
| | - Noha A Yousri
- Genetic Medicine, Weill Cornell Medicine-Qatar, Qatar Foundation, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
- Computer and Systems Engineering, Alexandria University, Alexandria, Egypt
| | - Nady El Hajj
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
- College of Health and Life Sciences, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar
| | - Zubair Shah
- College of Science and Engineering, Hamad Bin Khalifa University, Qatar Foundation, 34110, Doha, Qatar.
| |
Collapse
|
4
|
Li S, Chen Y, Zhang L, Li R, Kang N, Hou J, Wang J, Bao Y, Jiang F, Zhu R, Wang C, Zhang L. An environment-wide association study for the identification of non-invasive factors for type 2 diabetes mellitus: Analysis based on the Henan Rural Cohort study. Diabetes Res Clin Pract 2023; 204:110917. [PMID: 37748711 DOI: 10.1016/j.diabres.2023.110917] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/12/2023] [Revised: 09/16/2023] [Accepted: 09/21/2023] [Indexed: 09/27/2023]
Abstract
AIM To explore the influencing factors of Type 2 diabetes mellitus (T2DM) in the rural population of Henan Province and evaluate the predictive ability of non-invasive factors to T2DM. METHODS A total of 30,020 participants from the Henan Rural Cohort Study in China were included in this study. The dataset was randomly divided into a training set and a testing set with a 50:50 split for validation purposes. We used logistic regression analysis to investigate the association between 56 factors and T2DM in the training set (false discovery rate < 5 %) and significant factors were further validated in the testing set (P < 0.05). Gradient Boosting Machine (GBM) model was used to determine the ability of the non-invasive variables to classify T2DM individuals accurately and the importance ranking of these variables. RESULTS The overall population prevalence of T2DM was 9.10 %. After adjusting for age, sex, educational level, marital status, and body measure index (BMI), we identified 13 non-invasive variables and 6 blood biochemical indexes associated with T2DM in the training and testing dataset. The top three factors according to the GBM importance ranking were pulse pressure (PP), urine glucose (UGLU), and waist-to-hip ratio (WHR). The GBM model achieved a receiver operating characteristic (AUC) curve of 0.837 with non-invasive variables and 0.847 for the full model. CONCLUSIONS Our findings demonstrate that non-invasive variables that can be easily measured and quickly obtained may be used to predict T2DM risk in rural populations in Henan Province.
Collapse
Affiliation(s)
- Shuoyi Li
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan 450001, PR China
| | - Ying Chen
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan 450001, PR China
| | - Liying Zhang
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan 450001, PR China
| | - Ruiying Li
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan 450001, PR China
| | - Ning Kang
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan 450001, PR China
| | - Jian Hou
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan 450001, PR China
| | - Jing Wang
- China-Australia Joint Research Center for Infectious Diseases, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi 710061, PR China
| | - Yining Bao
- China-Australia Joint Research Center for Infectious Diseases, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi 710061, PR China
| | - Feng Jiang
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan 450001, PR China
| | - Ruifang Zhu
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan 450001, PR China
| | - Chongjian Wang
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou, Henan 450001, PR China.
| | - Lei Zhang
- China-Australia Joint Research Center for Infectious Diseases, School of Public Health, Xi'an Jiaotong University Health Science Center, Xi'an, Shaanxi 710061, PR China; Artificial Intelligence and Modelling in Epidemiology Program, Melbourne Sexual Health Centre, Alfred Health, Melbourne, Australia; Central Clinical School, Faculty of Medicine, Monash University, Melbourne, Australia.
| |
Collapse
|
5
|
Kirk D, Kok E, Tufano M, Tekinerdogan B, Feskens EJM, Camps G. Machine Learning in Nutrition Research. Adv Nutr 2022; 13:2573-2589. [PMID: 36166846 PMCID: PMC9776646 DOI: 10.1093/advances/nmac103] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2022] [Revised: 08/02/2022] [Accepted: 09/22/2022] [Indexed: 01/29/2023] Open
Abstract
Data currently generated in the field of nutrition are becoming increasingly complex and high-dimensional, bringing with them new methods of data analysis. The characteristics of machine learning (ML) make it suitable for such analysis and thus lend itself as an alternative tool to deal with data of this nature. ML has already been applied in important problem areas in nutrition, such as obesity, metabolic health, and malnutrition. Despite this, experts in nutrition are often without an understanding of ML, which limits its application and therefore potential to solve currently open questions. The current article aims to bridge this knowledge gap by supplying nutrition researchers with a resource to facilitate the use of ML in their research. ML is first explained and distinguished from existing solutions, with key examples of applications in the nutrition literature provided. Two case studies of domains in which ML is particularly applicable, precision nutrition and metabolomics, are then presented. Finally, a framework is outlined to guide interested researchers in integrating ML into their work. By acting as a resource to which researchers can refer, we hope to support the integration of ML in the field of nutrition to facilitate modern research.
Collapse
Affiliation(s)
- Daniel Kirk
- Division of Human Nutrition and Health, Wageningen University and Research, Wageningen, The Netherlands
| | - Esther Kok
- Division of Human Nutrition and Health, Wageningen University and Research, Wageningen, The Netherlands
| | - Michele Tufano
- Division of Human Nutrition and Health, Wageningen University and Research, Wageningen, The Netherlands
| | - Bedir Tekinerdogan
- Information Technology Group, Wageningen University and Research, Wageningen, The Netherlands
| | - Edith J M Feskens
- Division of Human Nutrition and Health, Wageningen University and Research, Wageningen, The Netherlands
| | - Guido Camps
- Division of Human Nutrition and Health, Wageningen University and Research, Wageningen, The Netherlands.,OnePlanet Research Center, Wageningen, The Netherlands
| |
Collapse
|
6
|
Tuppad A, Patil SD. Machine learning for diabetes clinical decision support: a review. ADVANCES IN COMPUTATIONAL INTELLIGENCE 2022; 2:22. [PMID: 35434723 PMCID: PMC9006199 DOI: 10.1007/s43674-022-00034-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 02/27/2022] [Accepted: 03/03/2022] [Indexed: 12/14/2022]
Abstract
Type 2 diabetes has recently acquired the status of an epidemic silent killer, though it is non-communicable. There are two main reasons behind this perception of the disease. First, a gradual but exponential growth in the disease prevalence has been witnessed irrespective of age groups, geography or gender. Second, the disease dynamics are very complex in terms of multifactorial risks involved, initial asymptomatic period, different short-term and long-term complications posing serious health threat and related co-morbidities. Majority of its risk factors are lifestyle habits like physical inactivity, lack of exercise, high body mass index (BMI), poor diet, smoking except some inevitable ones like family history of diabetes, ethnic predisposition, ageing etc. Nowadays, machine learning (ML) is increasingly being applied for alleviation of diabetes health burden and many research works have been proposed in the literature to offer clinical decision support in different application areas as well. In this paper, we present a review of such efforts for the prevention and management of type 2 diabetes. Firstly, we present the medical gaps in diabetes knowledge base, guidelines and medical practice identified from relevant articles and highlight those that can be addressed by ML. Further, we review the ML research works in three different application areas namely—(1) risk assessment (statistical risk scores and ML-based risk models), (2) diagnosis (using non-invasive and invasive features), (3) prognosis (from normoglycemia/prior morbidity to incident diabetes and prognosis of incident diabetes to related complications). We discuss and summarize the shortcomings or gaps in the existing ML methodologies for diabetes to be addressed in future. This review provides the breadth of ML predictive modeling applications for diabetes while highlighting the medical and technological gaps as well as various aspects involved in ML-based diabetes clinical decision support.
Collapse
Affiliation(s)
- Ashwini Tuppad
- School of Computer Science and Engineering, REVA University, Rukmini Knowledge Park, Kattigenahalli, Bangalore, Karnataka India
| | - Shantala Devi Patil
- School of Computer Science and Engineering, REVA University, Rukmini Knowledge Park, Kattigenahalli, Bangalore, Karnataka India
| |
Collapse
|
7
|
Delpino F, Costa Â, Farias S, Chiavegatto Filho A, Arcêncio R, Nunes B. Machine learning for predicting chronic diseases: a systematic review. Public Health 2022; 205:14-25. [DOI: 10.1016/j.puhe.2022.01.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 10/26/2021] [Accepted: 01/11/2022] [Indexed: 12/12/2022]
|
8
|
Miyachi Y, Miyazawa T, Ogawa Y. HNF1A Mutations and Beta Cell Dysfunction in Diabetes. Int J Mol Sci 2022; 23:ijms23063222. [PMID: 35328643 PMCID: PMC8948720 DOI: 10.3390/ijms23063222] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 03/14/2022] [Accepted: 03/16/2022] [Indexed: 12/26/2022] Open
Abstract
Understanding the genetic factors of diabetes is essential for addressing the global increase in type 2 diabetes. HNF1A mutations cause a monogenic form of diabetes called maturity-onset diabetes of the young (MODY), and HNF1A single-nucleotide polymorphisms are associated with the development of type 2 diabetes. Numerous studies have been conducted, mainly using genetically modified mice, to explore the molecular basis for the development of diabetes caused by HNF1A mutations, and to reveal the roles of HNF1A in multiple organs, including insulin secretion from pancreatic beta cells, lipid metabolism and protein synthesis in the liver, and urinary glucose reabsorption in the kidneys. Recent studies using human stem cells that mimic MODY have provided new insights into beta cell dysfunction. In this article, we discuss the involvement of HNF1A in beta cell dysfunction by reviewing previous studies using genetically modified mice and recent findings in human stem cell-derived beta cells.
Collapse
|