1
|
McDonough C, Li YC, Vangeepuram N, Liu B, Pandey G. A comprehensive youth diabetes epidemiological dataset and web portal: Resource Development and Case Studies. JMIR Public Health Surveill 2024. [PMID: 38666756 DOI: 10.2196/53330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/03/2024] Open
Abstract
BACKGROUND The prevalence of Type 2 diabetes (DM) and prediabetes (preDM) has been increasing among youth in recent decades in the United States, prompting an urgent need for understanding and identifying their associated risk factors. Such efforts, however, have been hindered by the lack of easily accessible youth preDM/DM data. OBJECTIVE We aimed to first build a high quality, comprehensive epidemiological dataset focused on youth preDM/DM. Subsequently, we aimed to make this data accessible by creating a user-friendly web portal to share it and corresponding codes. Through this, we hope to address this significant gap and facilitate youth preDM/DM research. METHODS Building on data from the National Health and Nutrition Examination Survey (NHANES) from 1999 to 2018, we cleaned and harmonized hundreds of variables relevant to preDM/DM (fasting plasma glucose level ≥100 mg/dL and/or HbA1C ≥5.7%) for youth aged 12-19 years (n=15,149). We identified individual factors associated with preDM/DM risk using bivariate statistical analyses and predicted preDM/DM status using our Ensemble Integration (EI) framework for multi-domain machine learning. We then developed a user-friendly web portal named Prediabetes/diabetes in youth ONline Dashboard (POND) to share the data and codes. RESULTS We extracted 95 variables potentially relevant to preDM/DM risk organized into 4 domains (sociodemographic, health status, diet, and other lifestyle behaviors). The bivariate analyses identified 27 significant correlates of preDM/DM (P ≤0.0005, Bonferroni adjusted), including race/ethnicity, health insurance, BMI, added sugar intake, and screen time. Sixteen of these factors were also identified based on the EI methodology (Fisher's P of overlap=7.06x10^-6). In addition to those, the EI approach identified 11 additional predictive variables, including some known (e.g., meat and fruit intake and family income) and less recognized factors (e.g., number of rooms in homes). The factors identified in both analyses spanned over all 4 of the domains mentioned. These data and results, as well as other exploratory tools, can be accessed on POND (https://rstudio-connect.hpc.mssm.edu/POND/). CONCLUSIONS Using NHANES data, we built one of the largest public epidemiological datasets for studying youth preDM/DM and identified potential risk factors using complementary analytical approaches. Our results align with the multifactorial nature of preDM/DM with correlates across several domains. Also, our data-sharing platform, POND, facilitates a wide range of applications to inform future youth preDM/DM studies. CLINICALTRIAL
Collapse
Affiliation(s)
- Catherine McDonough
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, US
| | - Yan Chak Li
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl,, New York, US
| | - Nita Vangeepuram
- Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, US
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, US
| | - Bian Liu
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, US
| | - Gaurav Pandey
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, 1 Gustave L. Levy Pl, New York, US
| |
Collapse
|
2
|
Zrubka Z, Kertész G, Gulácsi L, Czere J, Hölgyesi Á, Nezhad HM, Mosavi A, Kovács L, Butte AJ, Péntek M. The Reporting Quality of Machine Learning Studies on Pediatric Diabetes Mellitus: Systematic Review. J Med Internet Res 2024; 26:e47430. [PMID: 38241075 PMCID: PMC10837761 DOI: 10.2196/47430] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2023] [Revised: 04/29/2023] [Accepted: 11/17/2023] [Indexed: 01/23/2024] Open
Abstract
BACKGROUND Diabetes mellitus (DM) is a major health concern among children with the widespread adoption of advanced technologies. However, concerns are growing about the transparency, replicability, biasedness, and overall validity of artificial intelligence studies in medicine. OBJECTIVE We aimed to systematically review the reporting quality of machine learning (ML) studies of pediatric DM using the Minimum Information About Clinical Artificial Intelligence Modelling (MI-CLAIM) checklist, a general reporting guideline for medical artificial intelligence studies. METHODS We searched the PubMed and Web of Science databases from 2016 to 2020. Studies were included if the use of ML was reported in children with DM aged 2 to 18 years, including studies on complications, screening studies, and in silico samples. In studies following the ML workflow of training, validation, and testing of results, reporting quality was assessed via MI-CLAIM by consensus judgments of independent reviewer pairs. Positive answers to the 17 binary items regarding sufficient reporting were qualitatively summarized and counted as a proxy measure of reporting quality. The synthesis of results included testing the association of reporting quality with publication and data type, participants (human or in silico), research goals, level of code sharing, and the scientific field of publication (medical or engineering), as well as with expert judgments of clinical impact and reproducibility. RESULTS After screening 1043 records, 28 studies were included. The sample size of the training cohort ranged from 5 to 561. Six studies featured only in silico patients. The reporting quality was low, with great variation among the 21 studies assessed using MI-CLAIM. The number of items with sufficient reporting ranged from 4 to 12 (mean 7.43, SD 2.62). The items on research questions and data characterization were reported adequately most often, whereas items on patient characteristics and model examination were reported adequately least often. The representativeness of the training and test cohorts to real-world settings and the adequacy of model performance evaluation were the most difficult to judge. Reporting quality improved over time (r=0.50; P=.02); it was higher than average in prognostic biomarker and risk factor studies (P=.04) and lower in noninvasive hypoglycemia detection studies (P=.006), higher in studies published in medical versus engineering journals (P=.004), and higher in studies sharing any code of the ML pipeline versus not sharing (P=.003). The association between expert judgments and MI-CLAIM ratings was not significant. CONCLUSIONS The reporting quality of ML studies in the pediatric population with DM was generally low. Important details for clinicians, such as patient characteristics; comparison with the state-of-the-art solution; and model examination for valid, unbiased, and robust results, were often the weak points of reporting. To assess their clinical utility, the reporting standards of ML studies must evolve, and algorithms for this challenging population must become more transparent and replicable.
Collapse
Affiliation(s)
- Zsombor Zrubka
- HECON Health Economics Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
| | - Gábor Kertész
- John von Neumann Faculty of Informatics, Óbuda University, Budapest, Hungary
| | - László Gulácsi
- HECON Health Economics Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
| | - János Czere
- Doctoral School of Innovation Management, Óbuda University, Budapest, Hungary
| | - Áron Hölgyesi
- HECON Health Economics Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
- Doctoral School of Molecular Medicine, Semmelweis University, Budapest, Hungary
| | - Hossein Motahari Nezhad
- HECON Health Economics Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
- Doctoral School of Business and Management, Corvinus University of Budapest, Budapest, Hungary
| | - Amir Mosavi
- John von Neumann Faculty of Informatics, Óbuda University, Budapest, Hungary
| | - Levente Kovács
- Physiological Controls Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
| | - Atul J Butte
- Bakar Computational Health Sciences Institute, University of California, San Francisco, CA, United States
| | - Márta Péntek
- HECON Health Economics Research Center, University Research and Innovation Center, Óbuda University, Budapest, Hungary
| |
Collapse
|
3
|
Li Z, Wei J, Lu S. Association between diabetic retinopathy and diabetic foot ulcer in patients with diabetes: A meta-analysis. Int Wound J 2023; 20:4077-4082. [PMID: 37554103 PMCID: PMC10681479 DOI: 10.1111/iwj.14299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 06/18/2023] [Accepted: 06/19/2023] [Indexed: 08/10/2023] Open
Abstract
This study aimed to explore the relationship between diabetic retinopathy (DR) and diabetic foot ulcers (DFUs) to provide evidence for the prevention of diabetic complications. PubMed, EMBASE, Web of Science, Cochrane Library, China National Knowledge Infrastructure, Chinese Biomedical Literature Database and Wanfang Data databases were searched from their inception until March 2023 for studies on the relationship between DR and DFU. Two researchers independently screened the literature and extracted data according to the inclusion and exclusion criteria. The meta-analysis was performed using the RevMan 5.3 software. Eleven articles referring to 10 208 patients were included, of whom 2191 patients had DFU and 8017 patients did not have DFU. The meta-analysis results showed that DR significantly increased the incidence of DFU (47.94% vs. 16.38%; OR, 4.13; 95% CI, 2.33-7.33; p < 0.001). The results of this study suggest that patients with DR have a higher risk of developing DFU, highlighting the importance of regular screening for these two complications to prevent serious adverse outcomes of diabetes. However, further high-quality studies are required to validate the conclusions of the present study.
Collapse
Affiliation(s)
- Ziye Li
- Department of OphthalmologyThe First Affiliated Hospital of Henan University of Science and TechnologyLuoyangChina
| | - Jing Wei
- Department of OphthalmologyThe First Affiliated Hospital of Henan University of Science and TechnologyLuoyangChina
| | - Song Lu
- Department of OphthalmologyThe First Affiliated Hospital of Henan University of Science and TechnologyLuoyangChina
| |
Collapse
|
4
|
McDonough C, Li YC, Vangeepuram N, Liu B, Pandey G. Facilitating youth diabetes studies with the most comprehensive epidemiological dataset available through a public web portal. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.08.02.23293517. [PMID: 37577465 PMCID: PMC10418570 DOI: 10.1101/2023.08.02.23293517] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
The prevalence of type 2 diabetes mellitus (DM) and prediabetes (preDM) is rapidly increasing among youth, posing significant health and economic consequences. To address this growing concern, we created the most comprehensive youth-focused diabetes dataset to date derived from National Health and Nutrition Examination Survey (NHANES) data from 1999 to 2018. The dataset, consisting of 15,149 youth aged 12 to 19 years, encompasses preDM/DM relevant variables from sociodemographic, health status, diet, and other lifestyle behavior domains. An interactive web portal, POND (Prediabetes/diabetes in youth ONline Dashboard), was developed to provide public access to the dataset, allowing users to explore variables potentially associated with youth preDM/DM. Leveraging statistical and machine learning methods, we conducted two case studies, revealing established and lesser-known variables linked to youth preDM/DM. This dataset and portal can facilitate future studies to inform prevention and management strategies for youth prediabetes and diabetes.
Collapse
Affiliation(s)
- Catherine McDonough
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Yan Chak Li
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Nita Vangeepuram
- Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY, USA
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Bian Liu
- Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Gaurav Pandey
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
5
|
Afsaneh E, Sharifdini A, Ghazzaghi H, Ghobadi MZ. Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review. Diabetol Metab Syndr 2022; 14:196. [PMID: 36572938 PMCID: PMC9793536 DOI: 10.1186/s13098-022-00969-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 12/16/2022] [Indexed: 12/28/2022] Open
Abstract
Diabetes as a metabolic illness can be characterized by increased amounts of blood glucose. This abnormal increase can lead to critical detriment to the other organs such as the kidneys, eyes, heart, nerves, and blood vessels. Therefore, its prediction, prognosis, and management are essential to prevent harmful effects and also recommend more useful treatments. For these goals, machine learning algorithms have found considerable attention and have been developed successfully. This review surveys the recently proposed machine learning (ML) and deep learning (DL) models for the objectives mentioned earlier. The reported results disclose that the ML and DL algorithms are promising approaches for controlling blood glucose and diabetes. However, they should be improved and employed in large datasets to affirm their applicability.
Collapse
|
6
|
Kushwaha S, Srivastava R, Jain R, Sagar V, Aggarwal AK, Bhadada SK, Khanna P. Harnessing machine learning models for non-invasive pre-diabetes screening in children and adolescents. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2022; 226:107180. [PMID: 36279639 DOI: 10.1016/j.cmpb.2022.107180] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/24/2022] [Revised: 10/02/2022] [Accepted: 10/06/2022] [Indexed: 06/16/2023]
Abstract
BACKGROUND AND OBJECTIVES Pre-diabetes has been identified as an intermediate diagnosis and a sign of a relatively high chance of developing diabetes in the future. Diabetes has become one of the most frequent chronic disorders in children and adolescents around the world; therefore, predicting the onset of pre-diabetes allows a person at risk to make efforts to avoid or restrict disease progression. This research aims to create and implement a cross-validated machine learning model that can predict pre-diabetes using non-invasive methods. METHODS We have analysed the national representative dataset of children and adolescents (5-19 years) to develop a machine learning model for non-invasive pre-diabetes screening. Based on HbA1c levels the data (n = 26,567) was segregated into normal (n = 23,777) and pre-diabetes (n = 2790). We have considered eight features, six hyper-tuned machine learning models and different metrics for model evaluation. The final model was selected based on the area under the receiver operator curve (AUC), Cohen's kappa and cross-validation score. The selected model was integrated into the screening tool for automated pre-diabetes prediction. RESULTS The XG boost classifier was the best model, including all eight features. The 10-fold cross-validation score was highest for the XG boost model (90.13%) and least for the support vector machine (61.17%). The AUC was highest for RF (0.970), followed by GB (0.968), XGB (0.959), ETC (0.918), DT (0.908), and SVM (0.574) models. The XGB model was used to develop the screening tool. CONCLUSION We have developed and deployed a machine learning model for automated real-time pre-diabetes screening. The screening tool can be used over computers and can be transformed into software for easy usage. The detection of pre-diabetes in the pediatric age may help avoid its enhancement. Machine learning can also show great competence in determining important features in pre-diabetes.
Collapse
Affiliation(s)
- Savitesh Kushwaha
- Department of Community Medicine and School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India
| | - Rachana Srivastava
- Department of Community Medicine and School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India
| | - Rachita Jain
- Department of Community Medicine and School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India
| | - Vivek Sagar
- Department of Community Medicine and School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India
| | - Arun Kumar Aggarwal
- Department of Community Medicine and School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India
| | - Sanjay Kumar Bhadada
- Department of Endocrinology, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India
| | - Poonam Khanna
- Department of Community Medicine and School of Public Health, Postgraduate Institute of Medical Education and Research, Chandigarh 160012, India.
| |
Collapse
|
7
|
Hu H, Lai T, Farid F. Feasibility Study of Constructing a Screening Tool for Adolescent Diabetes Detection Applying Machine Learning Methods. SENSORS (BASEL, SWITZERLAND) 2022; 22:s22166155. [PMID: 36015915 PMCID: PMC9416136 DOI: 10.3390/s22166155] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/05/2022] [Revised: 08/02/2022] [Accepted: 08/15/2022] [Indexed: 06/02/2023]
Abstract
Prediabetes and diabetes are becoming alarmingly prevalent among adolescents over the past decade. However, an effective screening tool that can assess diabetes risks smoothly is still in its infancy. In order to contribute to such significant gaps, this research proposes a machine learning-based predictive model to detect adolescent diabetes. The model applies supervised machine learning and a novel feature selection method to the National Health and Nutritional Examination Survey datasets after an exhaustive search to select reliable and accurate data. The best model achieved an area under the curve (AUC) score of 71%. This research proves that a screening tool based on supervised machine learning models can assist in the automated detection of youth diabetes. It also identifies some critical predictors to such detection using Lasso Regression, Random Forest Importance and Gradient Boosted Tree Importance feature selection methods. The most contributing features to Youth diabetes detection are physical characteristics (e.g., waist, leg length, gender), dietary information (e.g., water, protein, sodium) and demographics. These predictors can be further utilised in other areas of medical research, such as electronic medical history.
Collapse
Affiliation(s)
- Hansel Hu
- Atlas Advisors, Australia Pty Ltd., Sydney, NSW 2000, Australia
| | - Tin Lai
- School of Computer Science, Faculty of Engineering, The University of Sydney, Camperdown, NSW 2006, Australia
| | - Farnaz Farid
- Cybersecurity and Behavioural Science, School of Social Sciences, Western Sydney University, Penrith, NSW 2751, Australia
| |
Collapse
|
8
|
Liu X, Zhang W, Zhang Q, Chen L, Zeng T, Zhang J, Min J, Tian S, Zhang H, Huang H, Wang P, Hu X, Chen L. Development and validation of a machine learning-augmented algorithm for diabetes screening in community and primary care settings: A population-based study. Front Endocrinol (Lausanne) 2022; 13:1043919. [PMID: 36518245 PMCID: PMC9742532 DOI: 10.3389/fendo.2022.1043919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 11/11/2022] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Opportunely screening for diabetes is crucial to reduce its related morbidity, mortality, and socioeconomic burden. Machine learning (ML) has excellent capability to maximize predictive accuracy. We aim to develop ML-augmented models for diabetes screening in community and primary care settings. METHODS 8425 participants were involved from a population-based study in Hubei, China since 2011. The dataset was split into a development set and a testing set. Seven different ML algorithms were compared to generate predictive models. Non-laboratory features were employed in the ML model for community settings, and laboratory test features were further introduced in the ML+lab models for primary care. The area under the receiver operating characteristic curve (AUC), area under the precision-recall curve (auPR), and the average detection costs per participant of these models were compared with their counterparts based on the New China Diabetes Risk Score (NCDRS) currently recommended for diabetes screening. RESULTS The AUC and auPR of the ML model were 0·697and 0·303 in the testing set, seemingly outperforming those of NCDRS by 10·99% and 64·67%, respectively. The average detection cost of the ML model was 12·81% lower than that of NCDRS with the same sensitivity (0·72). Moreover, the average detection cost of the ML+FPG model is the lowest among the ML+lab models and less than that of the ML model and NCDRS+FPG model. CONCLUSION The ML model and the ML+FPG model achieved higher predictive accuracy and lower detection costs than their counterpart based on NCDRS. Thus, the ML-augmented algorithm is potential to be employed for diabetes screening in community and primary care settings.
Collapse
Affiliation(s)
- XiaoHuan Liu
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - Weiyue Zhang
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - Qiao Zhang
- Department of Cardiovascular Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Long Chen
- Department of Computer Science and Technology, Tsinghua University, Beijing, China
| | - TianShu Zeng
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - JiaoYue Zhang
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - Jie Min
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - ShengHua Tian
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - Hao Zhang
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | | | - Ping Wang
- Precision Health Program, Department of Radiology, College of Human Medicine, Michigan State University, East Lansing, MI, United States
| | - Xiang Hu
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
- *Correspondence: LuLu Chen, ; Xiang Hu,
| | - LuLu Chen
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
- *Correspondence: LuLu Chen, ; Xiang Hu,
| |
Collapse
|
9
|
Fregoso-Aparicio L, Noguez J, Montesinos L, García-García JA. Machine learning and deep learning predictive models for type 2 diabetes: a systematic review. Diabetol Metab Syndr 2021; 13:148. [PMID: 34930452 PMCID: PMC8686642 DOI: 10.1186/s13098-021-00767-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 12/07/2021] [Indexed: 12/12/2022] Open
Abstract
Diabetes Mellitus is a severe, chronic disease that occurs when blood glucose levels rise above certain limits. Over the last years, machine and deep learning techniques have been used to predict diabetes and its complications. However, researchers and developers still face two main challenges when building type 2 diabetes predictive models. First, there is considerable heterogeneity in previous studies regarding techniques used, making it challenging to identify the optimal one. Second, there is a lack of transparency about the features used in the models, which reduces their interpretability. This systematic review aimed at providing answers to the above challenges. The review followed the PRISMA methodology primarily, enriched with the one proposed by Keele and Durham Universities. Ninety studies were included, and the type of model, complementary techniques, dataset, and performance parameters reported were extracted. Eighteen different types of models were compared, with tree-based algorithms showing top performances. Deep Neural Networks proved suboptimal, despite their ability to deal with big and dirty data. Balancing data and feature selection techniques proved helpful to increase the model's efficiency. Models trained on tidy datasets achieved almost perfect models.
Collapse
Affiliation(s)
- Luis Fregoso-Aparicio
- School of Engineering and Sciences, Tecnologico de Monterrey, Av Lago de Guadalupe KM 3.5, Margarita Maza de Juarez, 52926 Cd Lopez Mateos, Mexico
| | - Julieta Noguez
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - Luis Montesinos
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - José A. García-García
- Hospital General de Mexico Dr. Eduardo Liceaga, Dr. Balmis 148, Doctores, Cuauhtemoc, 06720 Mexico City, Mexico
| |
Collapse
|