1
|
Khalilnejad A, Sun RT, Kompala T, Painter S, James R, Wang Y. Proactive Identification of Patients with Diabetes at Risk of Uncontrolled Outcomes during a Diabetes Management Program: Conceptualization and Development Study Using Machine Learning. JMIR Form Res 2024; 8:e54373. [PMID: 38669074 PMCID: PMC11087850 DOI: 10.2196/54373] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2023] [Revised: 01/12/2024] [Accepted: 01/20/2024] [Indexed: 04/28/2024] Open
Abstract
BACKGROUND The growth in the capabilities of telehealth have made it possible to identify individuals with a higher risk of uncontrolled diabetes and provide them with targeted support and resources to help them manage their condition. Thus, predictive modeling has emerged as a valuable tool for the advancement of diabetes management. OBJECTIVE This study aimed to conceptualize and develop a novel machine learning (ML) approach to proactively identify participants enrolled in a remote diabetes monitoring program (RDMP) who were at risk of uncontrolled diabetes at 12 months in the program. METHODS Registry data from the Livongo for Diabetes RDMP were used to design separate dynamic predictive ML models to predict participant outcomes at each monthly checkpoint of the participants' program journey (month-n models) from the first day of onboarding (month-0 model) up to the 11th month (month-11 model). A participant's program journey began upon onboarding into the RDMP and monitoring their own blood glucose (BG) levels through the RDMP-provided BG meter. Each participant passed through 12 predicative models through their first year enrolled in the RDMP. Four categories of participant attributes (ie, survey data, BG data, medication fills, and health signals) were used for feature construction. The models were trained using the light gradient boosting machine and underwent hyperparameter tuning. The performance of the models was evaluated using standard metrics, including precision, recall, specificity, the area under the curve, the F1-score, and accuracy. RESULTS The ML models exhibited strong performance, accurately identifying observable at-risk participants, with recall ranging from 70% to 94% and precision from 40% to 88% across the 12-month program journey. Unobservable at-risk participants also showed promising performance, with recall ranging from 61% to 82% and precision from 42% to 61%. Overall, model performance improved as participants progressed through their program journey, demonstrating the importance of engagement data in predicting long-term clinical outcomes. CONCLUSIONS This study explored the Livongo for Diabetes RDMP participants' temporal and static attributes, identification of diabetes management patterns and characteristics, and their relationship to predict diabetes management outcomes. Proactive targeting ML models accurately identified participants at risk of uncontrolled diabetes with a high level of precision that was generalizable through future years within the RDMP. The ability to identify participants who are at risk at various time points throughout the program journey allows for personalized interventions to improve outcomes. This approach offers significant advancements in the feasibility of large-scale implementation in remote monitoring programs and can help prevent uncontrolled glycemic levels and diabetes-related complications. Future research should include the impact of significant changes that can affect a participant's diabetes management.
Collapse
|
2
|
Reza MS, Amin R, Yasmin R, Kulsum W, Ruhi S. Improving diabetes disease patients classification using stacking ensemble method with PIMA and local healthcare data. Heliyon 2024; 10:e24536. [PMID: 38312584 PMCID: PMC10834804 DOI: 10.1016/j.heliyon.2024.e24536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2023] [Revised: 01/06/2024] [Accepted: 01/10/2024] [Indexed: 02/06/2024] Open
Abstract
Diabetes mellitus, a chronic metabolic disorder, continues to be a major public health issue around the world. It is estimated that one in every two diabetics is undiagnosed. Early diagnosis and management of diabetes can also prevent or delay the onset of complications. With the help of a variety of machine learning and deep learning models, stacking algorithms, and other techniques, our study's goal is to detect diseases early. In this study, we propose two stacking-based models for diabetes disease classification using a combination of the PIMA Indian diabetes dataset, simulated data, and additional data collected from a local healthcare facility. We use both the classical and deep neural network stacking ensemble methods to combine the predictions of multiple classification models and improve classification accuracy and robustness. In the evaluation protocol, we used both the train-test and cross-validation (CV) techniques to validate our proposed model. The highest accuracy is obtained by stacking ensemble with three NN architectures, resulting in an accuracy of 95.50 %, precision of 94 %, recall of 97 %, and f1-score of 96 % using 5-fold CV on simulation study. The stacked accuracy obtained from ML algorithms for the Pima Indian Diabetes dataset is 75.03 % using the train-test split protocol, while the accuracy obtained from the CV protocol is 77.10 % on the stacked model. The range of performance scores that outperformed the CV protocol 2.23 %-12 %. Our proposed method achieves a high accuracy range from 92 % to 95 %, precision, recall, and F1-score ranges from 88 % to 96 % using classical and deep neural network (NN)-based stacking method on the primary dataset. The proposed dataset and ensemble method could be useful in the early detection and treatment of diabetes, as well as in the advancement of machine learning and data analysis techniques in the healthcare industry.
Collapse
Affiliation(s)
- Md Shamim Reza
- Department of Statistics, Pabna University of Science and Technology, Pabna, 6600, Bangladesh
| | - Ruhul Amin
- Department of Statistics, Pabna University of Science and Technology, Pabna, 6600, Bangladesh
| | - Rubia Yasmin
- Department of Statistics, Pabna University of Science and Technology, Pabna, 6600, Bangladesh
| | - Woomme Kulsum
- Department of Statistics, Pabna University of Science and Technology, Pabna, 6600, Bangladesh
| | - Sabba Ruhi
- Department of Statistics, Pabna University of Science and Technology, Pabna, 6600, Bangladesh
| |
Collapse
|
3
|
Shojaee-Mend H, Velayati F, Tayefi B, Babaee E. Prediction of Diabetes Using Data Mining and Machine Learning Algorithms: A Cross-Sectional Study. Healthc Inform Res 2024; 30:73-82. [PMID: 38359851 PMCID: PMC10879823 DOI: 10.4258/hir.2024.30.1.73] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2023] [Revised: 01/24/2024] [Accepted: 01/24/2024] [Indexed: 02/17/2024] Open
Abstract
OBJECTIVES This study aimed to develop a model to predict fasting blood glucose status using machine learning and data mining, since the early diagnosis and treatment of diabetes can improve outcomes and quality of life. METHODS This crosssectional study analyzed data from 3376 adults over 30 years old at 16 comprehensive health service centers in Tehran, Iran who participated in a diabetes screening program. The dataset was balanced using random sampling and the synthetic minority over-sampling technique (SMOTE). The dataset was split into training set (80%) and test set (20%). Shapley values were calculated to select the most important features. Noise analysis was performed by adding Gaussian noise to the numerical features to evaluate the robustness of feature importance. Five different machine learning algorithms, including CatBoost, random forest, XGBoost, logistic regression, and an artificial neural network, were used to model the dataset. Accuracy, sensitivity, specificity, accuracy, the F1-score, and the area under the curve were used to evaluate the model. RESULTS Age, waist-to-hip ratio, body mass index, and systolic blood pressure were the most important factors for predicting fasting blood glucose status. Though the models achieved similar predictive ability, the CatBoost model performed slightly better overall with 0.737 area under the curve (AUC). CONCLUSIONS A gradient boosted decision tree model accurately identified the most important risk factors related to diabetes. Age, waist-to-hip ratio, body mass index, and systolic blood pressure were the most important risk factors for diabetes, respectively. This model can support planning for diabetes management and prevention.
Collapse
Affiliation(s)
- Hassan Shojaee-Mend
- Infectious Diseases Research Center, Gonabad University of Medical Sciences, Gonabad,
Iran
| | - Farnia Velayati
- Telemedicine Research Center, National Research Institute of Tuberculosis and Lung Diseases (NRITLD), Shahid Beheshti University of Medical Sciences, Tehran,
Iran
| | - Batool Tayefi
- Preventive Medicine and Public Health Research Center, Psychosocial Health Research Institute, Department of Community and Family Medicine, School of Medicine, Iran University of Medical Sciences, Tehran,
Iran
| | - Ebrahim Babaee
- Preventive Medicine and Public Health Research Center, Psychosocial Health Research Institute, Department of Community and Family Medicine, School of Medicine, Iran University of Medical Sciences, Tehran,
Iran
- Vaccine Research Center, Iran University of Medical Sciences, Tehran,
Iran
| |
Collapse
|
4
|
Ojurongbe TA, Afolabi HA, Oyekale A, Bashiru KA, Ayelagbe O, Ojurongbe O, Abbasi SA, Adegoke NA. Predictive model for early detection of type 2 diabetes using patients' clinical symptoms, demographic features, and knowledge of diabetes. Health Sci Rep 2024; 7:e1834. [PMID: 38274131 PMCID: PMC10808992 DOI: 10.1002/hsr2.1834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2023] [Revised: 12/07/2023] [Accepted: 01/05/2024] [Indexed: 01/27/2024] Open
Abstract
Background and Aims With the global rise in type 2 diabetes, predictive modeling has become crucial for early detection, particularly in populations with low routine medical checkup profiles. This study aimed to develop a predictive model for type 2 diabetes using health check-up data focusing on clinical details, demographic features, biochemical markers, and diabetes knowledge. Methods Data from 444 Nigerian patients were collected and analysed. We used 80% of this data set for training, and the remaining 20% for testing. Multivariable penalized logistic regression was employed to predict the disease onset, incorporating waist-hip ratio (WHR), triglycerides (TG), catalase, and atherogenic indices of plasma (AIP). Results The predictive model demonstrated high accuracy, with an area under the curve of 99% (95% CI = 97%-100%) for the training set and 94% (95% CI = 89%-99%) for the test set. Notably, an increase in WHR (adjusted odds ratio [AOR] = 70.35; 95% CI = 10.04-493.1, p-value < 0.001) and elevated AIP (AOR = 4.55; 95% CI = 1.48-13.95, p-value = 0.008) levels were significantly associated with a higher risk of type 2 diabetes, while higher catalase levels (AOR = 0.33; 95% CI = 0.22-0.49, p < 0.001) correlated with a decreased risk. In contrast, TG levels (AOR = 1.04; 95% CI = 0.40-2.71, p-value = 0.94) were not associated with the disease. Conclusion This study emphasizes the importance of using distinct clinical and biochemical markers for early type 2 diabetes detection in Nigeria, reflecting global trends in diabetes modeling, and highlighting the need for context-specific methods. The development of a web application based on these results aims to facilitate the early identification of individuals at risk, potentially reducing health complications, and improving diabetes management strategies in diverse settings.
Collapse
Affiliation(s)
| | | | - Adesola Oyekale
- Department of Chemical PathologyLadoke Akintola University of TechnologyOgbomosoNigeria
| | | | - Olubunmi Ayelagbe
- Department of Chemical PathologyLadoke Akintola University of TechnologyOgbomosoNigeria
| | - Olusola Ojurongbe
- Humboldt Research Hub‐Center for Emerging and Re‐emerging Infectious DiseasesLadoke Akintola University of TechnologyOgbomosoNigeria
- Department of Medical Microbiology and ParasitologyLadoke Akintola University of TechnologyOgbomosoNigeria
| | - Saddam Akber Abbasi
- Statistics Program, Department of Mathematics, Statistics, and Physics, College of Arts and SciencesQatar UniversityDohaQatar
- Statistical Consulting Unit, College of Arts and SciencesQatar UniversityDohaQatar
| | | |
Collapse
|
5
|
Das A, Dhillon P. Application of machine learning in measurement of ageing and geriatric diseases: a systematic review. BMC Geriatr 2023; 23:841. [PMID: 38087195 PMCID: PMC10717316 DOI: 10.1186/s12877-023-04477-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 11/10/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND As the ageing population continues to grow in many countries, the prevalence of geriatric diseases is on the rise. In response, healthcare providers are exploring novel methods to enhance the quality of life for the elderly. Over the last decade, there has been a remarkable surge in the use of machine learning in geriatric diseases and care. Machine learning has emerged as a promising tool for the diagnosis, treatment, and management of these conditions. Hence, our study aims to find out the present state of research in geriatrics and the application of machine learning methods in this area. METHODS This systematic review followed Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines and focused on healthy ageing in individuals aged 45 and above, with a specific emphasis on the diseases that commonly occur during this process. The study mainly focused on three areas, that are machine learning, the geriatric population, and diseases. Peer-reviewed articles were searched in the PubMed and Scopus databases with inclusion criteria of population above 45 years, must have used machine learning methods, and availability of full text. To assess the quality of the studies, Joanna Briggs Institute's (JBI) critical appraisal tool was used. RESULTS A total of 70 papers were selected from the 120 identified papers after going through title screening, abstract screening, and reference search. Limited research is available on predicting biological or brain age using deep learning and different supervised machine learning methods. Neurodegenerative disorders were found to be the most researched disease, in which Alzheimer's disease was focused the most. Among non-communicable diseases, diabetes mellitus, hypertension, cancer, kidney diseases, and cardiovascular diseases were included, and other rare diseases like oral health-related diseases and bone diseases were also explored in some papers. In terms of the application of machine learning, risk prediction was the most common approach. Half of the studies have used supervised machine learning algorithms, among which logistic regression, random forest, XG Boost were frequently used methods. These machine learning methods were applied to a variety of datasets including population-based surveys, hospital records, and digitally traced data. CONCLUSION The review identified a wide range of studies that employed machine learning algorithms to analyse various diseases and datasets. While the application of machine learning in geriatrics and care has been well-explored, there is still room for future development, particularly in validating models across diverse populations and utilizing personalized digital datasets for customized patient-centric care in older populations. Further, we suggest a scope of Machine Learning in generating comparable ageing indices such as successful ageing index.
Collapse
Affiliation(s)
- Ayushi Das
- International Institute for Population Sciences, Deonar, Mumbai, 400088, India
| | - Preeti Dhillon
- Department of Survey Research and Data Analytics, International Institute for Population Sciences, Deonar, Mumbai, 400088, India.
| |
Collapse
|
6
|
Li J, Li Y, Wang C, Mao Z, Yang T, Li Y, Xing W, Li Z, Zhao J, Li L. Dietary Potassium and Magnesium Intake with Risk of Type 2 Diabetes Mellitus Among Rural China: the Henan Rural Cohort Study. Biol Trace Elem Res 2023:10.1007/s12011-023-03993-6. [PMID: 38049705 DOI: 10.1007/s12011-023-03993-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 11/29/2023] [Indexed: 12/06/2023]
Abstract
Previous studies exploring the relationship between dietary potassium and magnesium intake and the risk of type 2 diabetes mellitus (T2DM) have yielded inconsistent results and the lack evidence from rural China. Therefore, we aimed to investigate the association between dietary potassium and magnesium intake and the risk of T2DM in rural China. Data was collected from the Henan Rural Cohort Study in 2017. A validated semi-quantitative food frequency questionnaire assessed dietary potassium and magnesium intake. Logistic regression models were used to calculate odds ratio (ORs) and 95% confidence intervals (CIs) to evaluate the effect of dietary potassium, magnesium and the potassium-magnesium ratio on the risk of T2DM. A total of 38384 individuals were included in the study, and 3616 participants developed T2DM. Logistic regression analysis revealed that the OR (95% CI) of the highest versus dietary potassium and magnesium and potassium-magnesium ratio intakes were 0.67 (0.59, 0.75), 0.76 (0.67, 0.88), and 0.57 (0.50, 0.66), respectively, compared to the subjects with the lowest quartile of intakes. In addition, gender partially influences the relationship between dietary magnesium and T2DM prevalence (P-interaction = 0.042). The group with the highest dietary potassium and dietary magnesium intake had the lowest risk of T2DM, with an OR (95% CI) of 0.63 (0.51-0.77). Dietary potassium and magnesium intake are important modifiable risk factors for T2DM in rural China. Dietary potassium intake > 1.8g/day, dietary magnesium intake > 358.6mg/day and < 414.7mg/day and potassium-magnesium ratio >5.1 should be encouraged to prevent better and manage T2DM.
Collapse
Affiliation(s)
- Jia Li
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, Henan, 450001, People's Republic of China
| | - Yuqian Li
- Department of Clinical Pharmacology, School of Pharmaceutical Science, Zhengzhou University, Zhengzhou, Henan, China
| | - Chongjian Wang
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, Henan, 450001, People's Republic of China
| | - Zhenxing Mao
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, Henan, 450001, People's Republic of China
| | - Tianyu Yang
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, Henan, 450001, People's Republic of China
| | - Yan Li
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, Henan, 450001, People's Republic of China
| | - Wenguo Xing
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, Henan, 450001, People's Republic of China
| | - Zhuoyang Li
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, Henan, 450001, People's Republic of China
| | - Jiaoyan Zhao
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, Henan, 450001, People's Republic of China
| | - Linlin Li
- Department of Epidemiology and Health Statistics, College of Public Health, Zhengzhou University, 100 Kexue Avenue, Zhengzhou, Henan, 450001, People's Republic of China.
| |
Collapse
|
7
|
Ou Q, Jin W, Lin L, Lin D, Chen K, Quan H. LASSO-based machine learning algorithm to predict the incidence of diabetes in different stages. Aging Male 2023; 26:2205510. [PMID: 37156752 DOI: 10.1080/13685538.2023.2205510] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 05/10/2023] Open
Abstract
BACKGROUND Formal risk assessment is crucial for diabetes prevention. We aimed to establish a practical nomogram for predicting the risk incidence of prediabetes and prediabetes conversion to diabetes. METHODS A cohort of 1428 subjects was collected to develop prediction models. The LASSO was used to screen for important risk factors in prediabetes and diabetes and was compared with other algorithms (LR, RF, SVM, LDA, NB, and Treebag). Multivariate logistic regression analysis was used to construct the prediction model of prediabetes and diabetes, and drawn the predictive nomogram. The performance of the nomograms was evaluated by receiver-operating characteristic curve and calibration. RESULTS These findings revealed that the other six algorithms were not as good as LASSO in terms of diabetes risk prediction. The nomogram for individualized prediction of prediabetes included "Age," "FH," "Insulin_F," "hypertension," "Tgab," "HDL-C," "Proinsulin_F," and "TG" and the nomogram of prediabetes to diabetes included "Age," "FH," "Proinsulin_E," and "HDL-C". The results showed that the two models had certain discrimination, with the AUC of 0.78 and 0.70, respectively. The calibration curve of the two models also indicated good consistency. CONCLUSIONS We established early warning models for prediabetes and diabetes, which can help identify prediabetes and diabetes high-risk populations in advance.
Collapse
Affiliation(s)
- Qianying Ou
- Department of Endocrinology, Hainan General Hospital, Hainan Affiliated Hospital of Hainan Medical University, Haikou, China
| | - Wei Jin
- Department of Endocrinology, Hainan General Hospital, Hainan Affiliated Hospital of Hainan Medical University, Haikou, China
| | - Leweihua Lin
- Department of Endocrinology, Hainan General Hospital, Hainan Affiliated Hospital of Hainan Medical University, Haikou, China
| | - Danhong Lin
- Department of Endocrinology, Hainan General Hospital, Hainan Affiliated Hospital of Hainan Medical University, Haikou, China
| | - Kaining Chen
- Department of Endocrinology, Hainan General Hospital, Hainan Affiliated Hospital of Hainan Medical University, Haikou, China
| | - Huibiao Quan
- Department of Endocrinology, Hainan General Hospital, Hainan Affiliated Hospital of Hainan Medical University, Haikou, China
| |
Collapse
|
8
|
Hendawi R, Li J, Roy S. A Mobile App That Addresses Interpretability Challenges in Machine Learning-Based Diabetes Predictions: Survey-Based User Study. JMIR Form Res 2023; 7:e50328. [PMID: 37955948 PMCID: PMC10682931 DOI: 10.2196/50328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2023] [Revised: 09/12/2023] [Accepted: 10/08/2023] [Indexed: 11/14/2023] Open
Abstract
BACKGROUND Machine learning approaches, including deep learning, have demonstrated remarkable effectiveness in the diagnosis and prediction of diabetes. However, these approaches often operate as opaque black boxes, leaving health care providers in the dark about the reasoning behind predictions. This opacity poses a barrier to the widespread adoption of machine learning in diabetes and health care, leading to confusion and eroding trust. OBJECTIVE This study aimed to address this critical issue by developing and evaluating an explainable artificial intelligence (AI) platform, XAI4Diabetes, designed to empower health care professionals with a clear understanding of AI-generated predictions and recommendations for diabetes care. XAI4Diabetes not only delivers diabetes risk predictions but also furnishes easily interpretable explanations for complex machine learning models and their outcomes. METHODS XAI4Diabetes features a versatile multimodule explanation framework that leverages machine learning, knowledge graphs, and ontologies. The platform comprises the following four essential modules: (1) knowledge base, (2) knowledge matching, (3) prediction, and (4) interpretation. By harnessing AI techniques, XAI4Diabetes forecasts diabetes risk and provides valuable insights into the prediction process and outcomes. A structured, survey-based user study assessed the app's usability and influence on participants' comprehension of machine learning predictions in real-world patient scenarios. RESULTS A prototype mobile app was meticulously developed and subjected to thorough usability studies and satisfaction surveys. The evaluation study findings underscore the substantial improvement in medical professionals' comprehension of key aspects, including the (1) diabetes prediction process, (2) data sets used for model training, (3) data features used, and (4) relative significance of different features in prediction outcomes. Most participants reported heightened understanding of and trust in AI predictions following their use of XAI4Diabetes. The satisfaction survey results further revealed a high level of overall user satisfaction with the tool. CONCLUSIONS This study introduces XAI4Diabetes, a versatile multi-model explainable prediction platform tailored to diabetes care. By enabling transparent diabetes risk predictions and delivering interpretable insights, XAI4Diabetes empowers health care professionals to comprehend the AI-driven decision-making process, thereby fostering transparency and trust. These advancements hold the potential to mitigate biases and facilitate the broader integration of AI in diabetes care.
Collapse
Affiliation(s)
- Rasha Hendawi
- North Dakota State University, Fargo, ND, United States
| | - Juan Li
- North Dakota State University, Fargo, ND, United States
| | - Souradip Roy
- North Dakota State University, Fargo, ND, United States
| |
Collapse
|
9
|
Murtha JA, Birstler J, Stalter L, Jawara D, Hanlon BM, Hanrahan LP, Churpek MM, Funk LM. Identifying Young Adults at High Risk for Weight Gain Using Machine Learning. J Surg Res 2023; 291:7-16. [PMID: 37329635 PMCID: PMC10524852 DOI: 10.1016/j.jss.2023.05.015] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2023] [Revised: 04/25/2023] [Accepted: 05/16/2023] [Indexed: 06/19/2023]
Abstract
INTRODUCTION Weight gain among young adults continues to increase. Identifying adults at high risk for weight gain and intervening before they gain weight could have a major public health impact. Our objective was to develop and test electronic health record-based machine learning models to predict weight gain in young adults with overweight/class 1 obesity. METHODS Seven machine learning models were assessed, including three regression models, random forest, single-layer neural network, gradient-boosted decision trees, and support vector machine (SVM) models. Four categories of predictors were included: 1) demographics; 2) obesity-related health conditions; 3) laboratory data and vital signs; and 4) neighborhood-level variables. The cohort was split 60:40 for model training and validation. Area under the receiver operating characteristic curves (AUC) were calculated to determine model accuracy at predicting high-risk individuals, defined by ≥ 10% total body weight gain within 2 y. Variable importance was measured via generalized analysis of variance procedures. RESULTS Of the 24,183 patients (mean [SD] age, 32.0 [6.3] y; 55.1% females) in the study, 14.2% gained ≥10% total body weight. Area under the receiver operating characteristic curves varied from 0.557 (SVM) to 0.675 (gradient-boosted decision trees). Age, sex, and baseline body mass index were the most important predictors among the models except SVM and neural network. CONCLUSIONS Our machine learning models performed similarly and had modest accuracy for identifying young adults at risk of weight gain. Future models may need to incorporate behavioral and/or genetic information to enhance model accuracy.
Collapse
Affiliation(s)
| | - Jen Birstler
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin
| | - Lily Stalter
- Department of Surgery, University of Wisconsin, Madison, Wisconsin
| | - Dawda Jawara
- Department of Surgery, University of Wisconsin, Madison, Wisconsin
| | - Bret M Hanlon
- Department of Surgery, University of Wisconsin, Madison, Wisconsin; Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin
| | - Lawrence P Hanrahan
- Department of Family Medicine and Community Health, University of Wisconsin School of Medicine and Public Health, Madison, Wisconsin
| | - Matthew M Churpek
- Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin; Department of Medicine, University of Wisconsin, Madison, Wisconsin
| | - Luke M Funk
- Department of Surgery, University of Wisconsin, Madison, Wisconsin; Department of Surgery, William S. Middleton Memorial VA, Madison, Wisconsin.
| |
Collapse
|
10
|
Ganie SM, Pramanik PKD, Bashir Malik M, Mallik S, Qin H. An ensemble learning approach for diabetes prediction using boosting techniques. Front Genet 2023; 14:1252159. [PMID: 37953921 PMCID: PMC10639159 DOI: 10.3389/fgene.2023.1252159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2023] [Accepted: 10/16/2023] [Indexed: 11/14/2023] Open
Abstract
Introduction: Diabetes is considered one of the leading healthcare concerns affecting millions worldwide. Taking appropriate action at the earliest stages of the disease depends on early diabetes prediction and identification. To support healthcare providers for better diagnosis and prognosis of diseases, machine learning has been explored in the healthcare industry in recent years. Methods: To predict diabetes, this research has conducted experiments on five boosting algorithms on the Pima diabetes dataset. The dataset was obtained from the University of California, Irvine (UCI) machine learning repository, which contains several important clinical features. Exploratory data analysis was used to identify the characteristics of the dataset. Moreover, upsampling, normalisation, feature selection, and hyperparameter tuning were employed for predictive analytics. Results: The results were analysed using various statistical/machine learning metrics and k-fold cross-validation techniques. Gradient boosting achieved the greatest accuracy rate of 92.85% among all the classifiers. Precision, recall, f1-score, and receiver operating characteristic (ROC) curves were used to further validate the model. Discussion: The suggested model outperformed the current studies in terms of prediction accuracy, demonstrating its applicability to other diseases with similar predicate indications.
Collapse
Affiliation(s)
| | | | - Majid Bashir Malik
- Department of Computer Science, Baba Ghulam Shah Badshah University, Rajauri, India
| | - Saurav Mallik
- Department of Environmental Health, School of Public Health, Harvard University, Boston, MA, United States
| | - Hong Qin
- College of Engineering and Computer Science, University of Tennessee at Chattanooga, Chattanooga, TN, United States
| |
Collapse
|
11
|
Patro KK, Allam JP, Sanapala U, Marpu CK, Samee NA, Alabdulhafith M, Plawiak P. An effective correlation-based data modeling framework for automatic diabetes prediction using machine and deep learning techniques. BMC Bioinformatics 2023; 24:372. [PMID: 37784049 PMCID: PMC10544445 DOI: 10.1186/s12859-023-05488-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2023] [Accepted: 09/19/2023] [Indexed: 10/04/2023] Open
Abstract
The rising risk of diabetes, particularly in emerging countries, highlights the importance of early detection. Manual prediction can be a challenging task, leading to the need for automatic approaches. The major challenge with biomedical datasets is data scarcity. Biomedical data is often difficult to obtain in large quantities, which can limit the ability to train deep learning models effectively. Biomedical data can be noisy and inconsistent, which can make it difficult to train accurate models. To overcome the above-mentioned challenges, this work presents a new framework for data modeling that is based on correlation measures between features and can be used to process data effectively for predicting diabetes. The standard, publicly available Pima Indians Medical Diabetes (PIMA) dataset is utilized to verify the effectiveness of the proposed techniques. Experiments using the PIMA dataset showed that the proposed data modeling method improved the accuracy of machine learning models by an average of 9%, with deep convolutional neural network models achieving an accuracy of 96.13%. Overall, this study demonstrates the effectiveness of the proposed strategy in the early and reliable prediction of diabetes.
Collapse
Affiliation(s)
- Kiran Kumar Patro
- Department of ECE, Aditya Institute of Technology and Management, Tekkali, AP, 532201, India
| | - Jaya Prakash Allam
- School of Computer Science and Engineering, VIT Vellore, Katpadi, Vellore, Tamil Nadu, 632014, India.
| | | | - Chaitanya Kumar Marpu
- Department of ECE, Aditya Institute of Technology and Management, Tekkali, AP, 532201, India
| | - Nagwan Abdel Samee
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh, 11671, Saudi Arabia
| | - Maali Alabdulhafith
- Department of Information Technology, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh, 11671, Saudi Arabia
| | - Pawel Plawiak
- Department of Computer Science, Faculty of Computer Science and Telecommunications, Cracow University of Technology, Warszawska 24, 31-155, Krakow, Poland
- Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Bałtycka 5, 44-100, Gliwice, Poland
| |
Collapse
|
12
|
Jiang L, Xia Z, Zhu R, Gong H, Wang J, Li J, Wang L. Diabetes risk prediction model based on community follow-up data using machine learning. Prev Med Rep 2023; 35:102358. [PMID: 37654514 PMCID: PMC10465943 DOI: 10.1016/j.pmedr.2023.102358] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 07/31/2023] [Accepted: 08/01/2023] [Indexed: 09/02/2023] Open
Abstract
Diabetes is a chronic metabolic disease characterized by hyperglycemia, the follow-up management of diabetes patients is mostly in the community, but the relationship between key lifestyle indicators in community follow-up and the risk of diabetes is unclear. In order to explore the association between key life characteristic indicators of community follow-up and the risk of diabetes, 252,176 follow-up records of people with diabetes patients from 2016 to 2023 were obtained from Haizhu District, Guangzhou. According to the follow-up data, the key life characteristic indicators that affect diabetes are determined, and the optimal feature subset is obtained through feature selection technology to accurately assess the risk of diabetes. A diabetes risk assessment model based on a random forest classifier was designed, which used optimal feature parameter selection and algorithm model comparison, with an accuracy of 91.24% and an AUC corresponding to the ROC curve of 97%. In order to improve the applicability of the model in clinical and real life, a diabetes risk score card was designed and tested using the original data, the accuracy was 95.15%, and the model reliability was high. The diabetes risk prediction model based on community follow-up big data mining can be used for large-scale risk screening and early warning by community doctors based on patient follow-up data, further promoting diabetes prevention and control strategies, and can also be used for wearable devices or intelligent biosensors for individual patient self examination, in order to improve lifestyle and reduce risk factor levels.
Collapse
Affiliation(s)
- Liangjun Jiang
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| | - Zhenhua Xia
- Electronics & Information School of Yangtze University, Jingzhou, China
| | - Ronghui Zhu
- Shenzhen Nanshan Medical Group HQ, Shenzhen, China
| | - Haimei Gong
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| | - Jing Wang
- E-link Wisdom Co., Ltd, Shenzhen, China
| | - Juan Li
- Haizhu District Community Health Development Guidance Center, Guangzhou, China
| | - Lei Wang
- College of Information and Communication Engineering, State Key Lab of Marine Resource Utilisation in South China Sea, Hainan University, Haikou, China
| |
Collapse
|
13
|
Lv K, Cui C, Fan R, Zha X, Wang P, Zhang J, Zhang L, Ke J, Zhao D, Cui Q, Yang L. Detection of diabetic patients in people with normal fasting glucose using machine learning. BMC Med 2023; 21:342. [PMID: 37674168 PMCID: PMC10483877 DOI: 10.1186/s12916-023-03045-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/20/2023] [Accepted: 08/23/2023] [Indexed: 09/08/2023] Open
Abstract
BACKGROUND Diabetes mellitus (DM) is a chronic metabolic disease that could produce severe complications threatening life. Its early detection is thus quite important for the timely prevention and treatment. Normally, fasting blood glucose (FBG) by physical examination is used for large-scale screening of DM; however, some people with normal fasting glucose (NFG) actually have suffered from diabetes but are missed by the examination. This study aimed to investigate whether common physical examination indexes for diabetes can be used to identify the diabetes individuals from the populations with NFG. METHODS The physical examination data from over 60,000 individuals with NFG in three Chinese cohorts were used. The diabetes patients were defined by HbA1c ≥ 48 mmol/mol (6.5%). We constructed the models using multiple machine learning methods, including logistic regression, random forest, deep neural network, and support vector machine, and selected the optimal one on the validation set. A framework using permutation feature importance algorithm was devised to discover the personalized risk factors. RESULTS The prediction model constructed by logistic regression achieved the best performance with an AUC, sensitivity, and specificity of 0.899, 85.0%, and 81.1% on the validation set and 0.872, 77.9%, and 81.0% on the test set, respectively. Following feature selection, the final classifier only requiring 13 features, named as DRING (diabetes risk of individuals with normal fasting glucose), exhibited reliable performance on two newly recruited independent datasets, with the AUC of 0.964 and 0.899, the balanced accuracy of 84.2% and 81.1%, the sensitivity of 100% and 76.2%, and the specificity of 68.3% and 86.0%, respectively. The feature importance ranking analysis revealed that BMI, age, sex, absolute lymphocyte count, and mean corpuscular volume are important factors for the risk stratification of diabetes. With a case, the framework for identifying personalized risk factors revealed FBG, age, and BMI as significant hazard factors that contribute to an increased incidence of diabetes. DRING webserver is available for ease of application ( http://www.cuilab.cn/dring ). CONCLUSIONS DRING was demonstrated to perform well on identifying the diabetes individuals among populations with NFG, which could aid in early diagnosis and interventions for those individuals who are most likely missed.
Collapse
Affiliation(s)
- Kun Lv
- Key Laboratory of Non-Coding RNA Transformation Research of Anhui Higher Education Institutes, Wuhu, China.
- Central Laboratory, First Affiliated Hospital of Wannan Medical College, Wuhu, People's Republic of China.
| | - Chunmei Cui
- Department of Biomedical Informatics, State Key Laboratory of Vascular Homeostasis and Remodeling, School of Basic Medical Sciences, Peking University, Beijing, People's Republic of China.
| | - Rui Fan
- Department of Biomedical Informatics, State Key Laboratory of Vascular Homeostasis and Remodeling, School of Basic Medical Sciences, Peking University, Beijing, People's Republic of China
| | - Xiaojuan Zha
- Laboratory Medicine, First Affiliated Hospital of Wannan Medical College, Wuhu, People's Republic of China
| | - Pengyu Wang
- Department of Pathophysiology, Harbin Medical University, Harbin, People's Republic of China
| | - Jun Zhang
- Medical College of Shihezi University, Shihezi, People's Republic of China
| | - Lina Zhang
- Department of Laboratory Diagnosis, Daqing Oil Field General Hospital, Daqing, People's Republic of China
| | - Jing Ke
- Beijing Key Laboratory of Diabetes Research and Care, Center for Endocrine Metabolism and Immune Diseases, Beijing Luhe Hospital, Capital Medical University, Beijing, People's Republic of China
| | - Dong Zhao
- Beijing Key Laboratory of Diabetes Research and Care, Center for Endocrine Metabolism and Immune Diseases, Beijing Luhe Hospital, Capital Medical University, Beijing, People's Republic of China.
| | - Qinghua Cui
- Department of Biomedical Informatics, State Key Laboratory of Vascular Homeostasis and Remodeling, School of Basic Medical Sciences, Peking University, Beijing, People's Republic of China.
| | - Liming Yang
- Department of Pathophysiology, Harbin Medical University, Harbin, People's Republic of China.
- National Key Laboratory of Frigid Zone Cardiovascular Diseases (NKLFZCD), Harbin Medical University, Harbin, People's Republic of China.
- NHC Key Laboratory of Cell Transplantation, The First Affiliated Hospital of Harbin Medical University, Harbin, People's Republic of China.
| |
Collapse
|
14
|
Nakamura K, Uchino E, Sato N, Araki A, Terayama K, Kojima R, Murashita K, Itoh K, Mikami T, Tamada Y, Okuno Y. Individual health-disease phase diagrams for disease prevention based on machine learning. J Biomed Inform 2023; 144:104448. [PMID: 37467834 DOI: 10.1016/j.jbi.2023.104448] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Revised: 07/09/2023] [Accepted: 07/16/2023] [Indexed: 07/21/2023]
Abstract
Early disease detection and prevention methods based on effective interventions are gaining attention worldwide. Progress in precision medicine has revealed that substantial heterogeneity exists in health data at the individual level and that complex health factors are involved in chronic disease development. Machine-learning techniques have enabled precise personal-level disease prediction by capturing individual differences in multivariate data. However, it is challenging to identify what aspects should be improved for disease prevention based on future disease-onset prediction because of the complex relationships among multiple biomarkers. Here, we present a health-disease phase diagram (HDPD) that represents an individual's health state by visualizing the future-onset boundary values of multiple biomarkers that fluctuate early in the disease progression process. In HDPDs, future-onset predictions are represented by perturbing multiple biomarker values while accounting for dependencies among variables. We constructed HDPDs for 11 diseases using longitudinal health checkup cohort data of 3,238 individuals, comprising 3,215 measurement items and genetic data. The improvement of biomarker values to the non-onset region in HDPD remarkably prevented future disease onset in 7 out of 11 diseases. HDPDs can represent individual physiological states in the onset process and be used as intervention goals for disease prevention.
Collapse
Affiliation(s)
- Kazuki Nakamura
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan; Research and Business Development Department, Kyowa Hakko Bio Co., Ltd., Tokyo 100-0004, Japan
| | - Eiichiro Uchino
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
| | - Noriaki Sato
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
| | - Ayano Araki
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
| | - Kei Terayama
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan; Graduate School of Medical Life Science, Yokohama City University, Kanagawa 230-0045, Japan
| | - Ryosuke Kojima
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan
| | - Koichi Murashita
- Center of Innovation Research Initiatives Organization (The Center of Healthy Aging Innovation), Graduate School of Medicine, Hirosaki University, Aomori 036-8562, Japan
| | - Ken Itoh
- Department of Stress Response Science, Graduate School of Medicine, Hirosaki University, Aomori 036-8562, Japan
| | - Tatsuya Mikami
- Innovation Center for Health Promotion, Graduate School of Medicine, Hirosaki University, Aomori 036-8562, Japan
| | - Yoshinori Tamada
- Innovation Center for Health Promotion, Graduate School of Medicine, Hirosaki University, Aomori 036-8562, Japan
| | - Yasushi Okuno
- Department of Biomedical Data Intelligence, Graduate School of Medicine, Kyoto University, Kyoto 606-8507, Japan.
| |
Collapse
|
15
|
Pyrros A, Borstelmann SM, Mantravadi R, Zaiman Z, Thomas K, Price B, Greenstein E, Siddiqui N, Willis M, Shulhan I, Hines-Shah J, Horowitz JM, Nikolaidis P, Lungren MP, Rodríguez-Fernández JM, Gichoya JW, Koyejo S, Flanders AE, Khandwala N, Gupta A, Garrett JW, Cohen JP, Layden BT, Pickhardt PJ, Galanter W. Opportunistic detection of type 2 diabetes using deep learning from frontal chest radiographs. Nat Commun 2023; 14:4039. [PMID: 37419921 PMCID: PMC10328953 DOI: 10.1038/s41467-023-39631-x] [Citation(s) in RCA: 7] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Accepted: 06/19/2023] [Indexed: 07/09/2023] Open
Abstract
Deep learning (DL) models can harness electronic health records (EHRs) to predict diseases and extract radiologic findings for diagnosis. With ambulatory chest radiographs (CXRs) frequently ordered, we investigated detecting type 2 diabetes (T2D) by combining radiographic and EHR data using a DL model. Our model, developed from 271,065 CXRs and 160,244 patients, was tested on a prospective dataset of 9,943 CXRs. Here we show the model effectively detected T2D with a ROC AUC of 0.84 and a 16% prevalence. The algorithm flagged 1,381 cases (14%) as suspicious for T2D. External validation at a distinct institution yielded a ROC AUC of 0.77, with 5% of patients subsequently diagnosed with T2D. Explainable AI techniques revealed correlations between specific adiposity measures and high predictivity, suggesting CXRs' potential for enhanced T2D screening.
Collapse
Affiliation(s)
- Ayis Pyrros
- Duly Health and Care, Department of Radiology, Downers Grove, IL, USA.
- Department of Biomedical and Health Information Sciences, University of Illinois Chicago, Chicago, IL, USA.
| | | | | | - Zachary Zaiman
- Department of Radiology, Emory University, Atlanta, GA, USA
| | - Kaesha Thomas
- Department of Radiology, Emory University, Atlanta, GA, USA
| | - Brandon Price
- Department of Radiology, Florida State University, Tallahassee, FL, USA
| | - Eugene Greenstein
- Department of Cardiology, Duly Health and Care, Downers Grove, IL, USA
| | - Nasir Siddiqui
- Duly Health and Care, Department of Radiology, Downers Grove, IL, USA
| | - Melinda Willis
- Duly Health and Care, Department of Radiology, Downers Grove, IL, USA
| | | | - John Hines-Shah
- Duly Health and Care, Department of Radiology, Downers Grove, IL, USA
| | | | - Paul Nikolaidis
- Department of Radiology, Northwestern University, Chicago, IL, USA
| | - Matthew P Lungren
- Department of Biomedical and Health Information Sciences, UCSF, San Francisco, CA, USA
- Center for Artificial Intelligence in Medicine, Stanford University, Stanford, CA, USA
- Microsoft, Microsoft Corporation, Redmond, USA
| | | | | | - Sanmi Koyejo
- Department of Computer Science, Stanford University, Stanford, CA, USA
| | - Adam E Flanders
- Department of Radiology, Thomas Jefferson University, Philadelphia, PA, USA
| | | | - Amit Gupta
- Department of Radiology, University Hospitals Cleveland Medical Center, Cleveland, OH, USA
| | - John W Garrett
- Department of Radiology, University of Wisconsin, Madison, WI, USA
| | - Joseph Paul Cohen
- Center for Artificial Intelligence in Medicine, Stanford University, Stanford, CA, USA
| | - Brian T Layden
- Department of Medicine, University of Illinois Chicago, Chicago, IL, USA
| | | | - William Galanter
- Department of Medicine, University of Illinois Chicago, Chicago, IL, USA
| |
Collapse
|
16
|
Alhussan AA, Abdelhamid AA, Towfek SK, Ibrahim A, Eid MM, Khafaga DS, Saraya MS. Classification of Diabetes Using Feature Selection and Hybrid Al-Biruni Earth Radius and Dipper Throated Optimization. Diagnostics (Basel) 2023; 13:2038. [PMID: 37370932 DOI: 10.3390/diagnostics13122038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Revised: 06/03/2023] [Accepted: 06/06/2023] [Indexed: 06/29/2023] Open
Abstract
INTRODUCTION In public health, machine learning algorithms have been used to predict or diagnose chronic epidemiological disorders such as diabetes mellitus, which has reached epidemic proportions due to its widespread occurrence around the world. Diabetes is just one of several diseases for which machine learning techniques can be used in the diagnosis, prognosis, and assessment procedures. METHODOLOGY In this paper, we propose a new approach for boosting the classification of diabetes based on a new metaheuristic optimization algorithm. The proposed approach proposes a new feature selection algorithm based on a dynamic Al-Biruni earth radius and dipper-throated optimization algorithm (DBERDTO). The selected features are then classified using a random forest classifier with its parameters optimized using the proposed DBERDTO. RESULTS The proposed methodology is evaluated and compared with recent optimization methods and machine learning models to prove its efficiency and superiority. The overall accuracy of diabetes classification achieved by the proposed approach is 98.6%. On the other hand, statistical tests have been conducted to assess the significance and the statistical difference of the proposed approach based on the analysis of variance (ANOVA) and Wilcoxon signed-rank tests. CONCLUSIONS The results of these tests confirmed the superiority of the proposed approach compared to the other classification and optimization methods.
Collapse
Affiliation(s)
- Amel Ali Alhussan
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Abdelaziz A Abdelhamid
- Department of Computer Science, College of Computing and Information Technology, Shaqra University, Shaqra 11961, Saudi Arabia
- Department of Computer Science, Faculty of Computer and Information Sciences, Ain Shams University, Cairo 11566, Egypt
| | - S K Towfek
- Computer Science and Intelligent Systems Research Center, Blacksburg, VA 24060, USA
- Department of Communications and Electronics, Delta Higher Institute of Engineering and Technology, Mansoura 35111, Egypt
| | - Abdelhameed Ibrahim
- Computer Engineering and Control Systems Department, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt
| | - Marwa M Eid
- Faculty of Artificial Intelligence, Delta University for Science and Technology, Mansoura 11152, Egypt
| | - Doaa Sami Khafaga
- Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, P.O. Box 84428, Riyadh 11671, Saudi Arabia
| | - Mohamed S Saraya
- Computer Engineering and Control Systems Department, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt
| |
Collapse
|
17
|
Cheng YL, Wu YR, Lin KD, Lin CHR, Lin IM. Using Machine Learning for the Risk Factors Classification of Glycemic Control in Type 2 Diabetes Mellitus. Healthcare (Basel) 2023; 11:healthcare11081141. [PMID: 37107975 PMCID: PMC10138388 DOI: 10.3390/healthcare11081141] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 04/05/2023] [Accepted: 04/13/2023] [Indexed: 04/29/2023] Open
Abstract
Several risk factors are related to glycemic control in patients with type 2 diabetes mellitus (T2DM), including demographics, medical conditions, negative emotions, lipid profiles, and heart rate variability (HRV; to present cardiac autonomic activity). The interactions between these risk factors remain unclear. This study aimed to use machine learning methods of artificial intelligence to explore the relationships between various risk factors and glycemic control in T2DM patients. The study utilized a database from Lin et al. (2022) that included 647 T2DM patients. Regression tree analysis was conducted to identify the interactions among risk factors that contribute to glycated hemoglobin (HbA1c) values, and various machine learning methods were compared for their accuracy in classifying T2DM patients. The results of the regression tree analysis revealed that high depression scores may be a risk factor in one subgroup but not in others. When comparing different machine learning classification methods, the random forest algorithm emerged as the best-performing method with a small set of features. Specifically, the random forest algorithm achieved 84% accuracy, 95% area under the curve (AUC), 77% sensitivity, and 91% specificity. Using machine learning methods can provide significant value in accurately classifying patients with T2DM when considering depression as a risk factor.
Collapse
Affiliation(s)
- Yi-Ling Cheng
- Department of Psychology, College of Humanities and Social Sciences, Kaohsiung Medical University, Kaohsiung 807378, Taiwan
| | - Ying-Ru Wu
- Department of Psychology, College of Humanities and Social Sciences, Kaohsiung Medical University, Kaohsiung 807378, Taiwan
| | | | - Chun-Hung Richard Lin
- Department of Computer Science and Engineering, National Sun Yat-sen University, Kaohsiung 80424, Taiwan
| | - I-Mei Lin
- Department of Psychology, College of Humanities and Social Sciences, Kaohsiung Medical University, Kaohsiung 807378, Taiwan
- Department of Medical Research, Kaohsiung Medical University Hospital, Kaohsiung 807378, Taiwan
| |
Collapse
|
18
|
Hyde B, Paoli CJ, Panjabi S, Bettencourt KC, Bell Lynum KS, Selej M. A claims-based, machine-learning algorithm to identify patients with pulmonary arterial hypertension. Pulm Circ 2023; 13:e12237. [PMID: 37287599 PMCID: PMC10243208 DOI: 10.1002/pul2.12237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/02/2022] [Revised: 04/14/2023] [Accepted: 05/01/2023] [Indexed: 06/09/2023] Open
Abstract
Many patients with pulmonary arterial hypertension (PAH) experience substantial delays in diagnosis, which is associated with worse outcomes and higher costs. Tools for diagnosing PAH sooner may lead to earlier treatment, which may delay disease progression and adverse outcomes including hospitalization and death. We developed a machine-learning (ML) algorithm to identify patients at risk for PAH earlier in their symptom journey and distinguish them from patients with similar early symptoms not at risk for developing PAH. Our supervised ML model analyzed retrospective, de-identified data from the US-based Optum® Clinformatics® Data Mart claims database (January 2015 to December 2019). Propensity score matched PAH and non-PAH (control) cohorts were established based on observed differences. Random forest models were used to classify patients as PAH or non-PAH at diagnosis and at 6 months prediagnosis. The PAH and non-PAH cohorts included 1339 and 4222 patients, respectively. At 6 months prediagnosis, the model performed well in distinguishing PAH and non-PAH patients, with area under the curve of the receiver operating characteristic of 0.84, recall (sensitivity) of 0.73, and precision of 0.50. Key features distinguishing PAH from non-PAH cohorts were a longer time between first symptom and the prediagnosis model date (i.e., 6 months before diagnosis); more diagnostic and prescription claims, circulatory claims, and imaging procedures, leading to higher overall healthcare resource utilization; and more hospitalizations. Our model distinguishes between patients with and without PAH at 6 months before diagnosis and illustrates the feasibility of using routine claims data to identify patients at a population level who might benefit from PAH-specific screening and/or earlier specialist referral.
Collapse
Affiliation(s)
- Bethany Hyde
- Janssen Business Technology Commercial Data Insights & Data ScienceTitusvilleNew JerseyUSA
| | | | | | | | | | - Mona Selej
- Janssen R&D Data ScienceSouth San FranciscoCaliforniaUSA
| |
Collapse
|
19
|
Verma N, Singh S, Prasad D. Performance analysis and comparison of Machine Learning and LoRa-based Healthcare model. Neural Comput Appl 2023; 35:12751-12761. [PMID: 37192938 PMCID: PMC9989556 DOI: 10.1007/s00521-023-08411-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 02/13/2023] [Indexed: 03/09/2023]
Abstract
Diabetes Mellitus (DM) is a widespread condition that is one of the main causes of health disasters around the world, and health monitoring is one of the sustainable development topics. Currently, the Internet of Things (IoT) and Machine Learning (ML) technologies work together to provide a reliable method of monitoring and predicting Diabetes Mellitus. In this paper, we present the performance of a model for patient real-time data collection that employs the Hybrid Enhanced Adaptive Data Rate (HEADR) algorithm for the Long-Range (LoRa) protocol of the IoT. On the Contiki Cooja simulator, the LoRa protocol's performance is measured in terms of high dissemination and dynamic data transmission range allocation. Furthermore, by employing classification methods for the detection of diabetes severity levels on acquired data via the LoRa (HEADR) protocol, Machine Learning prediction takes place. For prediction, a variety of Machine Learning classifiers are employed, and the final results are compared with the already existing models where the Random Forest and Decision Tree classifiers outperform the others in terms of precision, recall, F-measure, and receiver operating curve (ROC) in the Python programming language. We also discovered that using k-fold cross-validation on k-neighbors, Logistic regression (LR), and Gaussian Nave Bayes (GNB) classifiers boosted the accuracy.
Collapse
Affiliation(s)
- Navneet Verma
- Computer Science and Engineering Department, DCRUST, Murthal, Sonipat, 131027 India
| | - Sukhdip Singh
- Computer Science and Engineering Department, DCRUST, Murthal, Sonipat, 131027 India
| | - Devendra Prasad
- Computer Science and Engineering Department, PIET, Samalkha, Panipat, 132103 India
| |
Collapse
|
20
|
A hybrid super ensemble learning model for the early-stage prediction of diabetes risk. Med Biol Eng Comput 2023; 61:785-797. [PMID: 36602674 DOI: 10.1007/s11517-022-02749-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2022] [Accepted: 12/22/2022] [Indexed: 01/06/2023]
Abstract
Diabetes mellitus has become a rapidly growing chronic health problem worldwide. There has been a noticeable increase in diabetes cases in the last two decades. Recent advances in ensemble machine learning methods play an important role in the early detection of diabetes mellitus. These methods are both faster and less costly than traditional methods. This study aims to propose a new super ensemble learning model to enable an early diagnosis of diabetes mellitus. Super learner is a cross-validation-based approach that makes better predictions by combining prediction results of more than one machine learning algorithm. The proposed super learner model was created with four base-learners (logistic regression, decision tree, random forest, gradient boosting) and a meta learner (support vector machines) as a result of a case study. Three different dataset were used to measure the robustness of the proposed model. Chi-square was determined as an optimal feature selection technique from five different techniques, and also hyper-parameter settings were made with GridSearch. Finally, the proposed new super learner model achieved to obtain the best accuracy results in the detection of Diabetes mellitus compared to the base-learners for the early-stage diabetes risk prediction (99.6%), PIMA (92%), and diabetes 130-US hospitals (98%) dataset, respectively. This study revealed that super learner algorithms can be effectively used in the detection of diabetes mellitus. Also, obtaining of the high and convincing statistical scores shows the robustness of the proposed super learner model.
Collapse
|
21
|
Predicting the Onset of Diabetes with Machine Learning Methods. J Pers Med 2023; 13:jpm13030406. [PMID: 36983587 PMCID: PMC10057336 DOI: 10.3390/jpm13030406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Revised: 02/16/2023] [Accepted: 02/22/2023] [Indexed: 03/03/2023] Open
Abstract
The number of people suffering from diabetes in Taiwan has continued to rise in recent years. According to the statistics of the International Diabetes Federation, about 537 million people worldwide (10.5% of the global population) suffer from diabetes, and it is estimated that 643 million people will develop the condition (11.3% of the total population) by 2030. If this trend continues, the number will jump to 783 million (12.2%) by 2045. At present, the number of people with diabetes in Taiwan has reached 2.18 million, with an average of one in ten people suffering from the disease. In addition, according to the Bureau of National Health Insurance in Taiwan, the prevalence rate of diabetes among adults in Taiwan has reached 5% and is increasing each year. Diabetes can cause acute and chronic complications that can be fatal. Meanwhile, chronic complications can result in a variety of disabilities or organ decline. If holistic treatments and preventions are not provided to diabetic patients, it will lead to the consumption of more medical resources and a rapid decline in the quality of life of society as a whole. In this study, based on the outpatient examination data of a Taipei Municipal medical center, 15,000 women aged between 20 and 80 were selected as the subjects. These women were patients who had gone to the medical center during 2018–2020 and 2021–2022 with or without the diagnosis of diabetes. This study investigated eight different characteristics of the subjects, including the number of pregnancies, plasma glucose level, diastolic blood pressure, sebum thickness, insulin level, body mass index, diabetes pedigree function, and age. After sorting out the complete data of the patients, this study used Microsoft Machine Learning Studio to train the models of various kinds of neural networks, and the prediction results were used to compare the predictive ability of the various parameters for diabetes. Finally, this study found that after comparing the models using two-class logistic regression as well as the two-class neural network, two-class decision jungle, or two-class boosted decision tree for prediction, the best model was the two-class boosted decision tree, as its area under the curve could reach a score of 0.991, which was better than other models.
Collapse
|
22
|
Dweekat OY, Lam SS. Optimized design of hybrid genetic algorithm with multilayer perceptron to predict patients with diabetes. Soft comput 2023. [DOI: 10.1007/s00500-023-07876-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/09/2023]
|
23
|
Zaizar-Fregoso SA, Lara-Esqueda A, Hernández-Suarez CM, Delgado-Enciso J, Garcia-Nevares A, Canseco-Avila LM, Guzman-Esquivel J, Rodriguez-Sanchez IP, Martinez-Fierro ML, Ceja-Espiritu G, Ochoa-Díaz-Lopez H, Espinoza-Gomez F, Sanchez-Diaz I, Delgado-Enciso I. Using Artificial Intelligence to Develop a Multivariate Model with a Machine Learning Model to Predict Complications in Mexican Diabetic Patients without Arterial Hypertension (National Nested Case-Control Study): Metformin and Elevated Normal Blood Pressure Are Risk Factors, and Obesity Is Protective. J Diabetes Res 2023; 2023:8898958. [PMID: 36846513 PMCID: PMC9949947 DOI: 10.1155/2023/8898958] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/14/2022] [Revised: 01/30/2023] [Accepted: 01/31/2023] [Indexed: 02/18/2023] Open
Abstract
Diabetes mellitus is a disease with no cure that can cause complications and even death. Moreover, over time, it will lead to chronic complications. Predictive models have been used to identify people with a tendency to develop diabetes mellitus. At the same time, there is limited information regarding the chronic complications of patients with diabetes. Our study is aimed at creating a machine-learning model that will be able to identify the risk factors of a diabetic patient developing chronic complications such as amputations, myocardial infarction, stroke, nephropathy, and retinopathy. The design is a national nested case-control study with 63,776 patients and 215 predictors with four years of data. Using an XGBoost model, the prediction of chronic complications has an AUC of 84%, and the model has identified the risk factors for chronic complications in patients with diabetes. According to the analysis, the most crucial risk factors based on SHAP values (Shapley additive explanations) are continued management, metformin treatment, age between 68 and 104 years, nutrition consultation, and treatment adherence. But we highlight two exciting findings. The first is a reaffirmation that high blood pressure figures across patients with diabetes without hypertension become a significant risk factor at diastolic > 70 mmHg (OR: 1.095, 95% CI: 1.078-1.113) or systolic > 120 mmHg (OR: 1.147, 95% CI: 1.124-1.171). Furthermore, people with diabetes with a BMI > 32 (overall obesity) (OR: 0.816, 95% CI: 0.8-0.833) have a statistically significant protective factor, which the paradox of obesity may explain. In conclusion, the results we have obtained show that artificial intelligence is a powerful and feasible tool to use for this type of study. However, we suggest that more studies be conducted to verify and elaborate upon our findings.
Collapse
Affiliation(s)
| | - Agustin Lara-Esqueda
- Facultad de Psicología y Terapia de la Comunicación Humana de la Universidad Juárez del Estado Durango, Durango 81301, Mexico
| | | | - Josuel Delgado-Enciso
- Fundacion para la Etica Educacion e Investigacion del Cancer del Instituto Estatal de Cancerologia de Colima AC, Colima 28085, Mexico
| | | | - Luis M. Canseco-Avila
- Facultad de Ciencias Químicas Campus IV, Universidad Autónoma de Chiapas, Tapachula, 30700 Chiapas, Mexico
| | - Jose Guzman-Esquivel
- Instituto Mexicano del Seguro Social, Delegación Colima, Villa de Alvarez, 28983 Colima, Mexico
| | - Iram P. Rodriguez-Sanchez
- Facultad de Ciencias Biológicas, Universidad Autonoma de Nuevo Leon, San Nicolás de los Garza, 66455 Nuevo Leon, Mexico
| | | | | | - Hector Ochoa-Díaz-Lopez
- Departamento de Salud, El Colegio de La Frontera Sur, San Cristóbal de Las Casas, 29290 Chiapas, Mexico
| | | | - Iyari Sanchez-Diaz
- Subdirección de Prevención y Protección a la Salud, Instituto de Seguridad y Servicios Sociales de los Trabajadores del Estado, Ciudad de Mexico, 14070, Mexico
| | - Ivan Delgado-Enciso
- Facultad de Medicina, Universidad de Colima, Colima 28040, Mexico
- Instituto Estatal de Cancerología, Servicios de Salud del Estado de Colima, Colima 28085, Mexico
| |
Collapse
|
24
|
Afsaneh E, Sharifdini A, Ghazzaghi H, Ghobadi MZ. Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review. Diabetol Metab Syndr 2022; 14:196. [PMID: 36572938 PMCID: PMC9793536 DOI: 10.1186/s13098-022-00969-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 10/06/2022] [Accepted: 12/16/2022] [Indexed: 12/28/2022] Open
Abstract
Diabetes as a metabolic illness can be characterized by increased amounts of blood glucose. This abnormal increase can lead to critical detriment to the other organs such as the kidneys, eyes, heart, nerves, and blood vessels. Therefore, its prediction, prognosis, and management are essential to prevent harmful effects and also recommend more useful treatments. For these goals, machine learning algorithms have found considerable attention and have been developed successfully. This review surveys the recently proposed machine learning (ML) and deep learning (DL) models for the objectives mentioned earlier. The reported results disclose that the ML and DL algorithms are promising approaches for controlling blood glucose and diabetes. However, they should be improved and employed in large datasets to affirm their applicability.
Collapse
|
25
|
Cardozo G, Tirloni SF, Pereira Moro AR, Marques JLB. Use of Artificial Intelligence in the Search for New Information Through Routine Laboratory Tests: Systematic Review. JMIR BIOINFORMATICS AND BIOTECHNOLOGY 2022; 3:e40473. [PMID: 36644762 PMCID: PMC9828303 DOI: 10.2196/40473] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Revised: 08/28/2022] [Accepted: 10/31/2022] [Indexed: 11/05/2022]
Abstract
Background In recent decades, the use of artificial intelligence has been widely explored in health care. Similarly, the amount of data generated in the most varied medical processes has practically doubled every year, requiring new methods of analysis and treatment of these data. Mainly aimed at aiding in the diagnosis and prevention of diseases, this precision medicine has shown great potential in different medical disciplines. Laboratory tests, for example, almost always present their results separately as individual values. However, physicians need to analyze a set of results to propose a supposed diagnosis, which leads us to think that sets of laboratory tests may contain more information than those presented separately for each result. In this way, the processes of medical laboratories can be strongly affected by these techniques. Objective In this sense, we sought to identify scientific research that used laboratory tests and machine learning techniques to predict hidden information and diagnose diseases. Methods The methodology adopted used the population, intervention, comparison, and outcomes principle, searching the main engineering and health sciences databases. The search terms were defined based on the list of terms used in the Medical Subject Heading database. Data from this study were presented descriptively and followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses; 2020) statement flow diagram and the National Institutes of Health tool for quality assessment of articles. During the analysis, the inclusion and exclusion criteria were independently applied by 2 authors, with a third author being consulted in cases of disagreement. Results Following the defined requirements, 40 studies presenting good quality in the analysis process were selected and evaluated. We found that, in recent years, there has been a significant increase in the number of works that have used this methodology, mainly because of COVID-19. In general, the studies used machine learning classification models to predict new information, and the most used parameters were data from routine laboratory tests such as the complete blood count. Conclusions Finally, we conclude that laboratory tests, together with machine learning techniques, can predict new tests, thus helping the search for new diagnoses. This process has proved to be advantageous and innovative for medical laboratories. It is making it possible to discover hidden information and propose additional tests, reducing the number of false negatives and helping in the early discovery of unknown diseases.
Collapse
Affiliation(s)
- Glauco Cardozo
- Federal Institute of Santa Catarina Florianópolis Brazil
| | | | | | | |
Collapse
|
26
|
Simaiya S, Kaur R, Sandhu JK, Alsafyani M, Alroobaea R, alsekait DM, Margala M, Chakrabarti P. A novel multistage ensemble approach for prediction and classification of diabetes. Front Physiol 2022; 13:1085240. [PMID: 36601350 PMCID: PMC9807241 DOI: 10.3389/fphys.2022.1085240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2022] [Accepted: 11/22/2022] [Indexed: 12/23/2022] Open
Abstract
Diabetes mellitus is a metabolic syndrome affecting millions of people worldwide. Every year, the rate of occurrence rises drastically. Diabetes-related problems across several vital organs of the body can be fatal if left untreated. Diabetes must be detected early to receive proper treatment, preventing the condition from escalating to severe problems. Tremendous health sciences and biotechnology advancements have resulted in massive data that generated massive Electronic Health Records and clinical information. The exponential increase of electronically gathered information has resulted in more complicated, accurate prediction models that can be updated continuously using machine learning techniques. This research mainly emphasizes discovering the best ensemble model for predicting diabetes. A new multistage ensemble model is proposed for diabetes prediction. In this model, accuracy is predicated on the Pima Indian Diabetes dataset. The accuracy of the proposed ensemble model is compared with the existing machine learning model, and the experimental results demonstrate the performance of the proposed model in terms of higher Precision, f-measure, Recall, and area under the curve.
Collapse
Affiliation(s)
- Sarita Simaiya
- Department of Computer Science and Engineering, Institute of Engineering and Technology, Chandigarh University, Mohali, Punjab, India,School of Computing and Informatics, University of Louisiana, Lafayette, LA, United States,*Correspondence: Sarita Simaiya, ; Martin Margala,
| | - Rajwinder Kaur
- Chitkara University Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India
| | - Jasminder Kaur Sandhu
- Department of Computer Science and Engineering, Institute of Engineering and Technology, Chandigarh University, Mohali, Punjab, India
| | - Majed Alsafyani
- Department Science, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
| | - Roobaea Alroobaea
- Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, Saudi Arabia
| | - Deema mohammed alsekait
- Department of Computer Science and Information Technology, Applied College, Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia
| | - Martin Margala
- School of Computing and Informatics, University of Louisiana, Lafayette, LA, United States,*Correspondence: Sarita Simaiya, ; Martin Margala,
| | | |
Collapse
|
27
|
Using Recurrent Neural Networks for Predicting Type-2 Diabetes from Genomic and Tabular Data. Diagnostics (Basel) 2022; 12:diagnostics12123067. [PMID: 36553074 PMCID: PMC9776641 DOI: 10.3390/diagnostics12123067] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2022] [Revised: 12/01/2022] [Accepted: 12/04/2022] [Indexed: 12/12/2022] Open
Abstract
The development of genomic technology for smart diagnosis and therapies for various diseases has lately been the most demanding area for computer-aided diagnostic and treatment research. Exponential breakthroughs in artificial intelligence and machine intelligence technologies could pave the way for identifying challenges afflicting the healthcare industry. Genomics is paving the way for predicting future illnesses, including cancer, Alzheimer's disease, and diabetes. Machine learning advancements have expedited the pace of biomedical informatics research and inspired new branches of computational biology. Furthermore, knowing gene relationships has resulted in developing more accurate models that can effectively detect patterns in vast volumes of data, making classification models important in various domains. Recurrent Neural Network models have a memory that allows them to quickly remember knowledge from previous cycles and process genetic data. The present work focuses on type 2 diabetes prediction using gene sequences derived from genomic DNA fragments through automated feature selection and feature extraction procedures for matching gene patterns with training data. The suggested model was tested using tabular data to predict type 2 diabetes based on several parameters. The performance of neural networks incorporating Recurrent Neural Network (RNN) components, Long Short-Term Memory (LSTM), and Gated Recurrent Units (GRU) was tested in this research. The model's efficiency is assessed using the evaluation metrics such as Sensitivity, Specificity, Accuracy, F1-Score, and Mathews Correlation Coefficient (MCC). The suggested technique predicted future illnesses with fair Accuracy. Furthermore, our research showed that the suggested model could be used in real-world scenarios and that input risk variables from an end-user Android application could be kept and evaluated on a secure remote server.
Collapse
|
28
|
Kanda E, Suzuki A, Makino M, Tsubota H, Kanemata S, Shirakawa K, Yajima T. Machine learning models for prediction of HF and CKD development in early-stage type 2 diabetes patients. Sci Rep 2022; 12:20012. [PMID: 36411366 PMCID: PMC9678863 DOI: 10.1038/s41598-022-24562-2] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 11/17/2022] [Indexed: 11/23/2022] Open
Abstract
Chronic kidney disease (CKD) and heart failure (HF) are the first and most frequent comorbidities associated with mortality risks in early-stage type 2 diabetes mellitus (T2DM). However, efficient screening and risk assessment strategies for identifying T2DM patients at high risk of developing CKD and/or HF (CKD/HF) remains to be established. This study aimed to generate a novel machine learning (ML) model to predict the risk of developing CKD/HF in early-stage T2DM patients. The models were derived from a retrospective cohort of 217,054 T2DM patients without a history of cardiovascular and renal diseases extracted from a Japanese claims database. Among algorithms used for the ML, extreme gradient boosting exhibited the best performance for CKD/HF diagnosis and hospitalization after internal validation and was further validated using another dataset including 16,822 patients. In the external validation, 5-years prediction area under the receiver operating characteristic curves for CKD/HF diagnosis and hospitalization were 0.718 and 0.837, respectively. In Kaplan-Meier curves analysis, patients predicted to be at high risk showed significant increase in CKD/HF diagnosis and hospitalization compared with those at low risk. Thus, the developed model predicted the risk of developing CKD/HF in T2DM patients with reasonable probability in the external validation cohort. Clinical approach identifying T2DM at high risk of developing CKD/HF using ML models may contribute to improved prognosis by promoting early diagnosis and intervention.
Collapse
Affiliation(s)
- Eiichiro Kanda
- grid.415086.e0000 0001 1014 2000Medical Science, Kawasaki Medical University, Okayama, Japan
| | - Atsushi Suzuki
- grid.256115.40000 0004 1761 798XDepartment of Endocrinology, Diabetes and Metabolism, Fujita Health University, Toyoake, Aichi Japan
| | - Masaki Makino
- grid.256115.40000 0004 1761 798XDepartment of Endocrinology, Diabetes and Metabolism, Fujita Health University, Toyoake, Aichi Japan
| | - Hiroo Tsubota
- grid.476017.30000 0004 0376 5631AstraZeneca K.K., Osaka, Japan
| | - Satomi Kanemata
- grid.459873.40000 0004 0376 2510Ono Pharmaceutical Co., Ltd., Osaka, Japan
| | | | | |
Collapse
|
29
|
Diabetes Mellitus Disease Prediction Using Machine Learning Classifiers with Oversampling and Feature Augmentation. ADVANCES IN HUMAN-COMPUTER INTERACTION 2022. [DOI: 10.1155/2022/9220560] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The technical improvements in healthcare sector today have given rise to many new inventions in the field of artificial intelligence. Patterns for disease identification are carried out, and the onset of prediction of many diseases is detected. Diseases include diabetes mellitus disease, fatal heart diseases, and symptomatic cancer. There are many algorithms that have played a critical role in the prediction of diseases. This paper proposes an ML based approach for diabetes mellitus disease prediction. For diabetes prediction, many ML algorithms are compared and used in the proposed work, and finally the three ML classifiers providing the highest accuracy are determined: RF, GBM, and LGBM. The accuracy of prediction is obtained using two types of datasets. They are Pima Indians dataset and a curated dataset. The ML classifiers LGBM, GB, and RF are used to build a predictive model, and the accuracy of each classifier is noted and compared. In addition to the generalized prediction mechanism, the data augmentation technique is also used, and the final accuracy of prediction is obtained for the classifiers LGBM, GB, and RF. A comparative study and demonstration between augmentation and non-augmentation are also discussed for the two datasets used in order to further improve the performance accuracy for predicting diabetes disease.
Collapse
|
30
|
Morgan-Benita JA, Galván-Tejada CE, Cruz M, Galván-Tejada JI, Gamboa-Rosales H, Arceo-Olague JG, Luna-García H, Celaya-Padilla JM. Hard Voting Ensemble Approach for the Detection of Type 2 Diabetes in Mexican Population with Non-Glucose Related Features. Healthcare (Basel) 2022; 10:healthcare10081362. [PMID: 35893185 PMCID: PMC9331873 DOI: 10.3390/healthcare10081362] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/11/2022] [Accepted: 07/15/2022] [Indexed: 11/16/2022] Open
Abstract
Type 2 diabetes mellitus (T2DM) represents one of the biggest health problems in Mexico, and it is extremely important to early detect this disease and its complications. For a noninvasive detection of T2DM, a machine learning (ML) approach that uses ensemble classification models with dichotomous output that is also fast and effective for early detection and prediction of T2D can be used. In this article, an ensemble technique by hard voting is designed and implemented using generalized linear regression (GLM), support vector machines (SVM) and artificial neural networks (ANN) for the classification of T2DM patients. In the materials and methods as a first step, the data is balanced, standardized, imputed and integrated into the three models to classify the patients in a dichotomous result. For the selection of features, an implementation of LASSO is developed, with a 10-fold cross-validation and for the final validation, the Area Under the Curve (AUC) is used. The results in LASSO showed 12 features, which are used in the implemented models to obtain the best possible scenario in the developed ensemble model. The algorithm with the best performance of the three is SVM, this model obtained an AUC of 92% ± 3%. The ensemble model built with GLM, SVM and ANN obtained an AUC of 90% ± 3%.
Collapse
Affiliation(s)
- Jorge A. Morgan-Benita
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Carlos E. Galván-Tejada
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Miguel Cruz
- Unidad de Investigación Médica en Bioquímica, Hospital de Especialidades, Centro Médico Nacional Siglo XXI, Instituto Mexicano del Seguro Social, Av. Cuauhtémoc 330, Col. Doctores, Del. Cuauhtémoc, Mexico City 06720, Mexico;
| | - Jorge I. Galván-Tejada
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Hamurabi Gamboa-Rosales
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Jose G. Arceo-Olague
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
| | - Huizilopoztli Luna-García
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
- Correspondence: (H.L.-G.); (J.M.C.-P.)
| | - José M. Celaya-Padilla
- Unidad Académica de Ingeniería Eléctrica, Universidad Autónoma de Zacatecas, Jardín Juárez 147, Centro, Zacatecas 98000, Mexico; (J.A.M.-B.); (C.E.G.-T.); (J.I.G.-T.); (H.G.-R.); (J.G.A.-O.)
- Correspondence: (H.L.-G.); (J.M.C.-P.)
| |
Collapse
|
31
|
Application of machine learning methods for the prediction of true fasting status in patients performing blood tests. Sci Rep 2022; 12:11929. [PMID: 35831336 PMCID: PMC9279373 DOI: 10.1038/s41598-022-15161-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/27/2021] [Accepted: 06/20/2022] [Indexed: 11/28/2022] Open
Abstract
The fasting blood glucose (FBG) values extracted from electronic medical records (EMR) are assumed valid in existing research, which may cause diagnostic bias due to misclassification of fasting status. We proposed a machine learning (ML) algorithm to predict the fasting status of blood samples. This cross-sectional study was conducted using the EMR of a medical center from 2003 to 2018 and a total of 2,196,833 ontological FBGs from the outpatient service were enrolled. The theoretical true fasting status are identified by comparing the values of ontological FBG with average glucose levels derived from concomitant tested HbA1c based on multi-criteria. In addition to multiple logistic regression, we extracted 67 features to predict the fasting status by eXtreme Gradient Boosting (XGBoost). The discrimination and calibration of the prediction models were also assessed. Real-world performance was gauged by the prevalence of ineffective glucose measurement (IGM). Of the 784,340 ontologically labeled fasting samples, 77.1% were considered theoretical FBGs. The median (IQR) glucose and HbA1c level of ontological and theoretical fasting samples in patients without diabetes mellitus (DM) were 94.0 (87.0, 102.0) mg/dL and 5.6 (5.4, 5.9)%, and 92.0 (86.0, 99.0) mg/dL and 5.6 (5.4, 5.9)%, respectively. The XGBoost showed comparable calibration and AUROC of 0.887 than that of 0.868 in multiple logistic regression in the parsimonious approach and identified important predictors of glucose level, home-to-hospital distance, age, and concomitantly serum creatinine and lipid testing. The prevalence of IGM dropped from 27.8% based on ontological FBGs to 0.48% by using algorithm-verified FBGs. The proposed ML algorithm or multiple logistic regression model aids in verification of the fasting status.
Collapse
|
32
|
Liu Q, Zhou Q, He Y, Zou J, Guo Y, Yan Y. Predicting the 2-Year Risk of Progression from Prediabetes to Diabetes Using Machine Learning among Chinese Elderly Adults. J Pers Med 2022; 12:jpm12071055. [PMID: 35887552 PMCID: PMC9324396 DOI: 10.3390/jpm12071055] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2022] [Revised: 06/06/2022] [Accepted: 06/23/2022] [Indexed: 11/18/2022] Open
Abstract
Identifying people with a high risk of developing diabetes among those with prediabetes may facilitate the implementation of a targeted lifestyle and pharmacological interventions. We aimed to establish machine learning models based on demographic and clinical characteristics to predict the risk of incident diabetes. We used data from the free medical examination service project for elderly people who were 65 years or older to develop logistic regression (LR), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost) machine learning models for the follow-up results of 2019 and 2020 and performed internal validation. The receiver operating characteristic (ROC), sensitivity, specificity, accuracy, and F1 score were used to select the model with better performance. The average annual progression rate to diabetes in prediabetic elderly people was 14.21%. Each model was trained using eight features and one outcome variable from 9607 prediabetic individuals, and the performance of the models was assessed in 2402 prediabetes patients. The predictive ability of four models in the first year was better than in the second year. The XGBoost model performed relatively efficiently (ROC: 0.6742 for 2019 and 0.6707 for 2020). We established and compared four machine learning models to predict the risk of progression from prediabetes to diabetes. Although there was little difference in the performance of the four models, the XGBoost model had a relatively good ROC value, which might perform well in future exploration in this field.
Collapse
Affiliation(s)
- Qing Liu
- Department of Epidemiology, School of Public Health, Wuhan University, Wuhan 430071, China; (Q.L.); (Q.Z.)
| | - Qing Zhou
- Department of Epidemiology, School of Public Health, Wuhan University, Wuhan 430071, China; (Q.L.); (Q.Z.)
| | - Yifeng He
- School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; (Y.H.); (J.Z.)
| | - Jingui Zou
- School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; (Y.H.); (J.Z.)
| | - Yan Guo
- Wuhan Center for Disease Control and Prevention, Wuhan 430015, China;
| | - Yaqiong Yan
- Wuhan Center for Disease Control and Prevention, Wuhan 430015, China;
- Correspondence:
| |
Collapse
|
33
|
Liu Q, Zhang M, He Y, Zhang L, Zou J, Yan Y, Guo Y. Predicting the Risk of Incident Type 2 Diabetes Mellitus in Chinese Elderly Using Machine Learning Techniques. J Pers Med 2022; 12:jpm12060905. [PMID: 35743691 PMCID: PMC9224915 DOI: 10.3390/jpm12060905] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2022] [Revised: 05/21/2022] [Accepted: 05/27/2022] [Indexed: 02/04/2023] Open
Abstract
Early identification of individuals at high risk of diabetes is crucial for implementing early intervention strategies. However, algorithms specific to elderly Chinese adults are lacking. The aim of this study is to build effective prediction models based on machine learning (ML) for the risk of type 2 diabetes mellitus (T2DM) in Chinese elderly. A retrospective cohort study was conducted using the health screening data of adults older than 65 years in Wuhan, China from 2018 to 2020. With a strict data filtration, 127,031 records from the eligible participants were utilized. Overall, 8298 participants were diagnosed with incident T2DM during the 2-year follow-up (2019–2020). The dataset was randomly split into training set (n = 101,625) and test set (n = 25,406). We developed prediction models based on four ML algorithms: logistic regression (LR), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost). Using LASSO regression, 21 prediction features were selected. The Random under-sampling (RUS) was applied to address the class imbalance, and the Shapley Additive Explanations (SHAP) was used to calculate and visualize feature importance. Model performance was evaluated by the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and accuracy. The XGBoost model achieved the best performance (AUC = 0.7805, sensitivity = 0.6452, specificity = 0.7577, accuracy = 0.7503). Fasting plasma glucose (FPG), education, exercise, gender, and waist circumference (WC) were the top five important predictors. This study showed that XGBoost model can be applied to screen individuals at high risk of T2DM in the early phrase, which has the strong potential for intelligent prevention and control of diabetes. The key features could also be useful for developing targeted diabetes prevention interventions.
Collapse
Affiliation(s)
- Qing Liu
- Department of Epidemiology, School of Public Health, Wuhan University, Wuhan 430071, China; (Q.L.); (M.Z.)
| | - Miao Zhang
- Department of Epidemiology, School of Public Health, Wuhan University, Wuhan 430071, China; (Q.L.); (M.Z.)
| | - Yifeng He
- School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; (Y.H.); (J.Z.)
| | - Lei Zhang
- School of Mathematics and Statistics, Wuhan University, Wuhan 430070, China;
| | - Jingui Zou
- School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China; (Y.H.); (J.Z.)
| | - Yaqiong Yan
- Wuhan Center for Disease Control and Prevention, Wuhan 430015, China;
| | - Yan Guo
- Wuhan Center for Disease Control and Prevention, Wuhan 430015, China;
- Correspondence:
| |
Collapse
|
34
|
Tuppad A, Patil SD. Machine learning for diabetes clinical decision support: a review. ADVANCES IN COMPUTATIONAL INTELLIGENCE 2022; 2:22. [PMID: 35434723 PMCID: PMC9006199 DOI: 10.1007/s43674-022-00034-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2021] [Revised: 02/27/2022] [Accepted: 03/03/2022] [Indexed: 12/14/2022]
Abstract
Type 2 diabetes has recently acquired the status of an epidemic silent killer, though it is non-communicable. There are two main reasons behind this perception of the disease. First, a gradual but exponential growth in the disease prevalence has been witnessed irrespective of age groups, geography or gender. Second, the disease dynamics are very complex in terms of multifactorial risks involved, initial asymptomatic period, different short-term and long-term complications posing serious health threat and related co-morbidities. Majority of its risk factors are lifestyle habits like physical inactivity, lack of exercise, high body mass index (BMI), poor diet, smoking except some inevitable ones like family history of diabetes, ethnic predisposition, ageing etc. Nowadays, machine learning (ML) is increasingly being applied for alleviation of diabetes health burden and many research works have been proposed in the literature to offer clinical decision support in different application areas as well. In this paper, we present a review of such efforts for the prevention and management of type 2 diabetes. Firstly, we present the medical gaps in diabetes knowledge base, guidelines and medical practice identified from relevant articles and highlight those that can be addressed by ML. Further, we review the ML research works in three different application areas namely—(1) risk assessment (statistical risk scores and ML-based risk models), (2) diagnosis (using non-invasive and invasive features), (3) prognosis (from normoglycemia/prior morbidity to incident diabetes and prognosis of incident diabetes to related complications). We discuss and summarize the shortcomings or gaps in the existing ML methodologies for diabetes to be addressed in future. This review provides the breadth of ML predictive modeling applications for diabetes while highlighting the medical and technological gaps as well as various aspects involved in ML-based diabetes clinical decision support.
Collapse
Affiliation(s)
- Ashwini Tuppad
- School of Computer Science and Engineering, REVA University, Rukmini Knowledge Park, Kattigenahalli, Bangalore, Karnataka India
| | - Shantala Devi Patil
- School of Computer Science and Engineering, REVA University, Rukmini Knowledge Park, Kattigenahalli, Bangalore, Karnataka India
| |
Collapse
|
35
|
Use of Machine Learning and Routine Laboratory Tests for Diabetes Mellitus Screening. BIOMED RESEARCH INTERNATIONAL 2022; 2022:8114049. [PMID: 35392258 PMCID: PMC8983182 DOI: 10.1155/2022/8114049] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/21/2021] [Revised: 02/18/2022] [Accepted: 03/10/2022] [Indexed: 12/28/2022]
Abstract
Most patients with diabetes mellitus are asymptomatic, which leads to delayed and more complex treatment. At the same time, most individuals are routinely subjected to standard clinical laboratory examinations, which create large health datasets over a lifetime. Computer processing has been used to search for health anomalies and predict diseases using clinical examinations. This work studied machine learning models to support the screening of diabetes through routine laboratory tests using data from laboratory tests of 62,496 patients. The classification and regression models used were the K-nearest neighbor, support vector machines, Bayes naïve, random forest models, and artificial neural networks. Glycated hemoglobin, a test used for diabetes diagnosis, was used as the target. Regression models calculated glycated hemoglobin directly and were later classified. The performance of classification computer models has been studied under various subdataset partitions and combinations (e.g., healthy, prediabetic, and diabetes, as well as no healthy and no diabetes). The best single performance was achieved with the artificial neural network model when detecting prediabetes or diabetes. The artificial neural network classification model scored 78.1%, 78.7%, and 78.4% for sensitivity, precision, and F1 scores, respectively, when identifying no healthy group. Other models also had good results, depending on what is desired. Machine learning-based models can predict glycated hemoglobin values from routine laboratory tests and can be used as a screening tool to refer a patient for further testing.
Collapse
|
36
|
Delpino F, Costa Â, Farias S, Chiavegatto Filho A, Arcêncio R, Nunes B. Machine learning for predicting chronic diseases: a systematic review. Public Health 2022; 205:14-25. [DOI: 10.1016/j.puhe.2022.01.007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Revised: 10/26/2021] [Accepted: 01/11/2022] [Indexed: 12/12/2022]
|
37
|
Xu X, Ge Z, Chow EPF, Yu Z, Lee D, Wu J, Ong JJ, Fairley CK, Zhang L. A Machine-Learning-Based Risk-Prediction Tool for HIV and Sexually Transmitted Infections Acquisition over the Next 12 Months. J Clin Med 2022; 11:jcm11071818. [PMID: 35407428 PMCID: PMC8999359 DOI: 10.3390/jcm11071818] [Citation(s) in RCA: 15] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2022] [Revised: 03/18/2022] [Accepted: 03/23/2022] [Indexed: 11/16/2022] Open
Abstract
Background: More than one million people acquire sexually transmitted infections (STIs) every day globally. It is possible that predicting an individual’s future risk of HIV/STIs could contribute to behaviour change or improve testing. We developed a series of machine learning models and a subsequent risk-prediction tool for predicting the risk of HIV/STIs over the next 12 months. Methods: Our data included individuals who were re-tested at the clinic for HIV (65,043 consultations), syphilis (56,889 consultations), gonorrhoea (60,598 consultations), and chlamydia (63,529 consultations) after initial consultations at the largest public sexual health centre in Melbourne from 2 March 2015 to 31 December 2019. We used the receiver operating characteristic (AUC) curve to evaluate the model’s performance. The HIV/STI risk-prediction tool was delivered via a web application. Results: Our risk-prediction tool had an acceptable performance on the testing datasets for predicting HIV (AUC = 0.72), syphilis (AUC = 0.75), gonorrhoea (AUC = 0.73), and chlamydia (AUC = 0.67) acquisition. Conclusions: Using machine learning techniques, our risk-prediction tool has acceptable reliability in predicting HIV/STI acquisition over the next 12 months. This tool may be used on clinic websites or digital health platforms to form part of an intervention tool to increase testing or reduce future HIV/STI risk.
Collapse
Affiliation(s)
- Xianglong Xu
- Melbourne Sexual Health Centre, Alfred Health, Melbourne, VIC 3053, Australia; (X.X.); (E.P.F.C.); (D.L.); (J.J.O.); (C.K.F.)
- Central Clinical School, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, VIC 3800, Australia;
- China Australia Joint Research Center for Infectious Diseases, School of Public Health, Xi’an Jiaotong University Health Science Centre, Xi’an 710061, China
| | - Zongyuan Ge
- Monash e-Research Centre, Faculty of Engineering, Airdoc Research, Nvidia AI Technology Research Centre, Monash University, Melbourne, VIC 3800, Australia;
| | - Eric P. F. Chow
- Melbourne Sexual Health Centre, Alfred Health, Melbourne, VIC 3053, Australia; (X.X.); (E.P.F.C.); (D.L.); (J.J.O.); (C.K.F.)
- Central Clinical School, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, VIC 3800, Australia;
- Centre for Epidemiology and Biostatistics, Melbourne School of Population and Global Health, The University of Melbourne, Melbourne, VIC 3053, Australia
| | - Zhen Yu
- Central Clinical School, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, VIC 3800, Australia;
- Monash e-Research Centre, Faculty of Engineering, Airdoc Research, Nvidia AI Technology Research Centre, Monash University, Melbourne, VIC 3800, Australia;
| | - David Lee
- Melbourne Sexual Health Centre, Alfred Health, Melbourne, VIC 3053, Australia; (X.X.); (E.P.F.C.); (D.L.); (J.J.O.); (C.K.F.)
| | - Jinrong Wu
- Research Centre for Data Analytics and Cognition, La Trobe University, Bundoora, VIC 3086, Australia;
| | - Jason J. Ong
- Melbourne Sexual Health Centre, Alfred Health, Melbourne, VIC 3053, Australia; (X.X.); (E.P.F.C.); (D.L.); (J.J.O.); (C.K.F.)
- Central Clinical School, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, VIC 3800, Australia;
- China Australia Joint Research Center for Infectious Diseases, School of Public Health, Xi’an Jiaotong University Health Science Centre, Xi’an 710061, China
| | - Christopher K. Fairley
- Melbourne Sexual Health Centre, Alfred Health, Melbourne, VIC 3053, Australia; (X.X.); (E.P.F.C.); (D.L.); (J.J.O.); (C.K.F.)
- Central Clinical School, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, VIC 3800, Australia;
- China Australia Joint Research Center for Infectious Diseases, School of Public Health, Xi’an Jiaotong University Health Science Centre, Xi’an 710061, China
| | - Lei Zhang
- Melbourne Sexual Health Centre, Alfred Health, Melbourne, VIC 3053, Australia; (X.X.); (E.P.F.C.); (D.L.); (J.J.O.); (C.K.F.)
- Central Clinical School, Faculty of Medicine, Nursing and Health Sciences, Monash University, Melbourne, VIC 3800, Australia;
- China Australia Joint Research Center for Infectious Diseases, School of Public Health, Xi’an Jiaotong University Health Science Centre, Xi’an 710061, China
- Department of Epidemiology and Biostatistics, College of Public Health, Zhengzhou University, Zhengzhou 450001, China
- Correspondence:
| |
Collapse
|
38
|
Tissue-Specific Methylation Biosignatures for Monitoring Diseases: An In Silico Approach. Int J Mol Sci 2022; 23:ijms23062959. [PMID: 35328380 PMCID: PMC8952417 DOI: 10.3390/ijms23062959] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2022] [Revised: 03/01/2022] [Accepted: 03/03/2022] [Indexed: 02/06/2023] Open
Abstract
Tissue-specific gene methylation events are key to the pathogenesis of several diseases and can be utilized for diagnosis and monitoring. Here, we established an in silico pipeline to analyze high-throughput methylome datasets to identify specific methylation fingerprints in three pathological entities of major burden, i.e., breast cancer (BrCa), osteoarthritis (OA) and diabetes mellitus (DM). Differential methylation analysis was conducted to compare tissues/cells related to the pathology and different types of healthy tissues, revealing Differentially Methylated Genes (DMGs). Highly performing and low feature number biosignatures were built with automated machine learning, including: (1) a five-gene biosignature discriminating BrCa tissue from healthy tissues (AUC 0.987 and precision 0.987), (2) three equivalent OA cartilage-specific biosignatures containing four genes each (AUC 0.978 and precision 0.986) and (3) a four-gene pancreatic β-cell-specific biosignature (AUC 0.984 and precision 0.995). Next, the BrCa biosignature was validated using an independent ccfDNA dataset showing an AUC and precision of 1.000, verifying the biosignature’s applicability in liquid biopsy. Functional and protein interaction prediction analysis revealed that most DMGs identified are involved in pathways known to be related to the studied diseases or pointed to new ones. Overall, our data-driven approach contributes to the maximum exploitation of high-throughput methylome readings, helping to establish specific disease profiles to be applied in clinical practice and to understand human pathology.
Collapse
|
39
|
Liquid Biopsy in Type 2 Diabetes Mellitus Management: Building Specific Biosignatures via Machine Learning. J Clin Med 2022; 11:jcm11041045. [PMID: 35207316 PMCID: PMC8876363 DOI: 10.3390/jcm11041045] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Revised: 02/09/2022] [Accepted: 02/15/2022] [Indexed: 02/05/2023] Open
Abstract
Background: The need for minimally invasive biomarkers for the early diagnosis of type 2 diabetes (T2DM) prior to the clinical onset and monitoring of β-pancreatic cell loss is emerging. Here, we focused on studying circulating cell-free DNA (ccfDNA) as a liquid biopsy biomaterial for accurate diagnosis/monitoring of T2DM. Methods: ccfDNA levels were directly quantified in sera from 96 T2DM patients and 71 healthy individuals via fluorometry, and then fragment DNA size profiling was performed by capillary electrophoresis. Following this, ccfDNA methylation levels of five β-cell-related genes were measured via qPCR. Data were analyzed by automated machine learning to build classifying predictive models. Results: ccfDNA levels were found to be similar between groups but indicative of apoptosis in T2DM. INS (Insulin), IAPP (Islet Amyloid Polypeptide-Amylin), GCK (Glucokinase), and KCNJ11 (Potassium Inwardly Rectifying Channel Subfamily J member 11) levels differed significantly between groups. AutoML analysis delivered biosignatures including GCK, IAPP and KCNJ11 methylation, with the highest ever reported discriminating performance of T2DM from healthy individuals (AUC 0.927). Conclusions: Our data unravel the value of ccfDNA as a minimally invasive biomaterial carrying important clinical information for T2DM. Upon prospective clinical evaluation, the built biosignature can be disruptive for T2DM clinical management.
Collapse
|
40
|
Odukoya O, Nwaneri S, Odeniyi I, Akodu B, Oluwole E, Olorunfemi G, Popoola O, Osuntoki A. Development and Comparison of Three Data Models for Predicting Diabetes Mellitus Using Risk Factors in a Nigerian Population. Healthc Inform Res 2022; 28:58-67. [PMID: 35172091 PMCID: PMC8850175 DOI: 10.4258/hir.2022.28.1.58] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2020] [Accepted: 08/11/2021] [Indexed: 11/23/2022] Open
Abstract
Objectives This study developed and compared the performance of three widely used predictive models—logistic regression (LR), artificial neural network (ANN), and decision tree (DT)—to predict diabetes mellitus using the socio-demographic, lifestyle, and physical attributes of a population of Nigerians. Methods We developed three predictive models using 10 input variables. Data preprocessing steps included the removal of missing values and outliers, min-max normalization, and feature extraction using principal component analysis. Data training and validation were accomplished using 10-fold cross-validation. Accuracy, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under the receiver operating characteristic curve (AUROC) were used as performance evaluation metrics. Analysis and model development were performed in R version 3.6.1. Results The mean age of the participants was 50.52 ± 16.14 years. The classification accuracy, sensitivity, specificity, PPV, and NPV for LR were, respectively, 81.31%, 84.32%, 77.24%, 72.75%, and 82.49%. Those for ANN were 98.64%, 98.37%, 99.00%, 98.61%, and 98.83%, and those for DT were 99.05%, 99.76%, 98.08%, 98.77%, and 99.82%, respectively. The best-performing and poorest-performing classifiers were DT and LR, with 99.05% and 81.31% accuracy, respectively. Similarly, the DT algorithm achieved the best AUC value (0.992) compared to ANN (0.976) and LR (0.892). Conclusions Our study demonstrated that DT, LR, and ANN models can be used effectively for the prediction of diabetes mellitus in the Nigerian population based on certain risk factors. An overall comparative analysis of the models showed that the DT model performed better than LR and ANN.
Collapse
Affiliation(s)
- Oluwakemi Odukoya
- Department of Community Health and Primary Care, College of Medicine, University of Lagos, Lagos State, Nigeria
| | - Solomon Nwaneri
- Department of Biomedical Engineering, College of Medicine, University of Lagos, Lagos State, Nigeria
- Department of Biomedical Engineering, Faculty of Engineering, University of Lagos, Lagos State, Nigeria
| | - Ifedayo Odeniyi
- Endocrinology Unit, Department of Internal Medicine, College of Medicine, University of Lagos, Lagos State, Nigeria
| | - Babatunde Akodu
- Department of Community Health and Primary Care, College of Medicine, University of Lagos, Lagos State, Nigeria
| | - Esther Oluwole
- Department of Community Health and Primary Care, College of Medicine, University of Lagos, Lagos State, Nigeria
| | - Gbenga Olorunfemi
- Division of Epidemiology and Biostatistics, School of Public Health, University of Witwatersrand, Johannesburg, South Africa
| | - Oluwatoyin Popoola
- Department of Biomedical Engineering, College of Medicine, University of Lagos, Lagos State, Nigeria
- Department of Biomedical Engineering, Faculty of Engineering, University of Lagos, Lagos State, Nigeria
| | - Akinniyi Osuntoki
- Department of Biochemistry, College of Medicine, University of Lagos, Lagos State, Nigeria
| |
Collapse
|
41
|
Liu X, Zhang W, Zhang Q, Chen L, Zeng T, Zhang J, Min J, Tian S, Zhang H, Huang H, Wang P, Hu X, Chen L. Development and validation of a machine learning-augmented algorithm for diabetes screening in community and primary care settings: A population-based study. Front Endocrinol (Lausanne) 2022; 13:1043919. [PMID: 36518245 PMCID: PMC9742532 DOI: 10.3389/fendo.2022.1043919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 11/11/2022] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Opportunely screening for diabetes is crucial to reduce its related morbidity, mortality, and socioeconomic burden. Machine learning (ML) has excellent capability to maximize predictive accuracy. We aim to develop ML-augmented models for diabetes screening in community and primary care settings. METHODS 8425 participants were involved from a population-based study in Hubei, China since 2011. The dataset was split into a development set and a testing set. Seven different ML algorithms were compared to generate predictive models. Non-laboratory features were employed in the ML model for community settings, and laboratory test features were further introduced in the ML+lab models for primary care. The area under the receiver operating characteristic curve (AUC), area under the precision-recall curve (auPR), and the average detection costs per participant of these models were compared with their counterparts based on the New China Diabetes Risk Score (NCDRS) currently recommended for diabetes screening. RESULTS The AUC and auPR of the ML model were 0·697and 0·303 in the testing set, seemingly outperforming those of NCDRS by 10·99% and 64·67%, respectively. The average detection cost of the ML model was 12·81% lower than that of NCDRS with the same sensitivity (0·72). Moreover, the average detection cost of the ML+FPG model is the lowest among the ML+lab models and less than that of the ML model and NCDRS+FPG model. CONCLUSION The ML model and the ML+FPG model achieved higher predictive accuracy and lower detection costs than their counterpart based on NCDRS. Thus, the ML-augmented algorithm is potential to be employed for diabetes screening in community and primary care settings.
Collapse
Affiliation(s)
- XiaoHuan Liu
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - Weiyue Zhang
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - Qiao Zhang
- Department of Cardiovascular Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Long Chen
- Department of Computer Science and Technology, Tsinghua University, Beijing, China
| | - TianShu Zeng
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - JiaoYue Zhang
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - Jie Min
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - ShengHua Tian
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | - Hao Zhang
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
| | | | - Ping Wang
- Precision Health Program, Department of Radiology, College of Human Medicine, Michigan State University, East Lansing, MI, United States
| | - Xiang Hu
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
- *Correspondence: LuLu Chen, ; Xiang Hu,
| | - LuLu Chen
- Department of Endocrinology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
- Hubei provincial Clinical Research Center for Diabetes and Metabolic Disorders, Wuhan, China
- *Correspondence: LuLu Chen, ; Xiang Hu,
| |
Collapse
|
42
|
Li J, Xu Z, Xu T, Lin S. Predicting Diabetes in Patients with Metabolic Syndrome Using Machine-Learning Model Based on Multiple Years' Data. Diabetes Metab Syndr Obes 2022; 15:2951-2961. [PMID: 36186938 PMCID: PMC9525025 DOI: 10.2147/dmso.s381146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 09/16/2022] [Indexed: 11/23/2022] Open
Abstract
PURPOSE To evaluate the performance of machine-learning models based on multiple years of continuous data to predict incident diabetes among patients with metabolic syndrome. PATIENTS AND METHODS The dataset comprises the health records from 2008 to 2020 including 4510 nondiabetic participants with metabolic syndrome (MetS) at baseline and with at least 6 years of records. MetS was defined according to the International Diabetes Federation (IDF) criteria. Overall, 332 patients developed incident diabetes during the 7±1.4 years of follow-up. Three popular classification algorithms were evaluated on the dataset: logistic regression, random forest, and Xgboost. Five models including single-year models (year 1, year 2, and year 3) and multiple-year models (year 1-2 and year 1-3) were developed for each algorithm. RESULTS The model performances improved with the increasing longitudinal dataset as the area under the receiver operating characteristic curve (AUROC) was boosted for both random forest (year 1-3: AUROC=0.893; year 3: AUROC=0.862; year 1-2: AUROC=0.847; year 2: AUROC=0.838) and Xgboost (year 1-3: AUROC=0.897; year 3: AUROC=0.833; year 1-2: AUROC=0.856; year 2: AUROC=0.823) model. In the multiple-year models, the highest fasting plasma glucose, followed by the mean or lowest level of HbA1c and BMI had the most important predictive value for the onset of diabetes. In the "1-3" year model, "delta weight" which reflects the fluctuations of yearly change of weight was the fourth-most important feature. CONCLUSION This study demonstrated improved performance with the accumulation of longitudinal data when using machine learning for diabetes prediction in MetS patients. For individuals with similar clinical parameters, the variation trends of these parameters could change the risk of future diabetes. This result indicated that models based on longitudinal multiple years' data may provide more personalized assessment tools for risk evaluation.
Collapse
Affiliation(s)
- Jing Li
- Department of Health Management, Peking Union Medical College Hospital, Beijing, People’s Republic of China
| | - Zheng Xu
- Department of AI Research, Digital Health China Technologies Co. Ltd, Beijing, People’s Republic of China
| | - Tengda Xu
- Department of Health Management, Peking Union Medical College Hospital, Beijing, People’s Republic of China
| | - Songbai Lin
- Department of Health Management, Peking Union Medical College Hospital, Beijing, People’s Republic of China
- Correspondence: Songbai Lin, Department of Health Management, Peking Union Medical College Hospital, 1# Shuaifuyuan, Dongcheng District, Beijing, 100730, People’s Republic of China, Tel +86 10 6915 9901, Fax +86 10 6915 9901, Email
| |
Collapse
|
43
|
Samet S, Laouar MR, Bendib I, Eom S. Analysis and Prediction of Diabetes Disease Using Machine Learning Methods. INTERNATIONAL JOURNAL OF DECISION SUPPORT SYSTEM TECHNOLOGY 2022. [DOI: 10.4018/ijdsst.303943] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
To increase healthcare quality, early illness prediction helps patients prevent potentially life-threatening health issues before it is too late. Artificial intelligence is a rapidly evolving area, and its applications to diabetes, a worldwide epidemic, have the potential to revolutionize the way diabetes is diagnosed and managed. A total of six supervised machine learning algorithms based on patient data were used and compared to predict the diagnosis of diabetes mellitus. For experiments, the Pima Indians Diabetes Database was used, and their missing values were carefully handled by different techniques. For random train-test splits, the Random Forest classification algorithm achieved an accuracy rate of 92 percent. This model outperforms other state-of-the-art approaches due to the application of a combination of techniques for dealing with missing values (the mixture of imputing missing values techniques) that is proposed. With this approach, the models of this manuscript achieved better accuracy than prior work done with the Pima diabetes data.
Collapse
Affiliation(s)
- Sarra Samet
- Laboratory of Mathematics, Informatics, and Systems (LAMIS), University of Larbi Tebessi, Algeria
| | - Mohamed Ridda Laouar
- Laboratory of Mathematics, Informatics, and Systems (LAMIS), University of Larbi Tebessi, Algeria
| | - Issam Bendib
- Laboratory of Mathematics, Informatics, and Systems (LAMIS), University of Larbi Tebessi, Algeria
| | - Sean Eom
- Department of Management, Southeast Missouri State University, USA
| |
Collapse
|
44
|
Fregoso-Aparicio L, Noguez J, Montesinos L, García-García JA. Machine learning and deep learning predictive models for type 2 diabetes: a systematic review. Diabetol Metab Syndr 2021; 13:148. [PMID: 34930452 PMCID: PMC8686642 DOI: 10.1186/s13098-021-00767-9] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 12/07/2021] [Indexed: 12/12/2022] Open
Abstract
Diabetes Mellitus is a severe, chronic disease that occurs when blood glucose levels rise above certain limits. Over the last years, machine and deep learning techniques have been used to predict diabetes and its complications. However, researchers and developers still face two main challenges when building type 2 diabetes predictive models. First, there is considerable heterogeneity in previous studies regarding techniques used, making it challenging to identify the optimal one. Second, there is a lack of transparency about the features used in the models, which reduces their interpretability. This systematic review aimed at providing answers to the above challenges. The review followed the PRISMA methodology primarily, enriched with the one proposed by Keele and Durham Universities. Ninety studies were included, and the type of model, complementary techniques, dataset, and performance parameters reported were extracted. Eighteen different types of models were compared, with tree-based algorithms showing top performances. Deep Neural Networks proved suboptimal, despite their ability to deal with big and dirty data. Balancing data and feature selection techniques proved helpful to increase the model's efficiency. Models trained on tidy datasets achieved almost perfect models.
Collapse
Affiliation(s)
- Luis Fregoso-Aparicio
- School of Engineering and Sciences, Tecnologico de Monterrey, Av Lago de Guadalupe KM 3.5, Margarita Maza de Juarez, 52926 Cd Lopez Mateos, Mexico
| | - Julieta Noguez
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - Luis Montesinos
- School of Engineering and Sciences, Tecnologico de Monterrey, Ave. Eugenio Garza Sada 2501, 64849 Monterrey, Nuevo Leon Mexico
| | - José A. García-García
- Hospital General de Mexico Dr. Eduardo Liceaga, Dr. Balmis 148, Doctores, Cuauhtemoc, 06720 Mexico City, Mexico
| |
Collapse
|
45
|
Nomura A, Noguchi M, Kometani M, Furukawa K, Yoneda T. Artificial Intelligence in Current Diabetes Management and Prediction. Curr Diab Rep 2021; 21:61. [PMID: 34902070 PMCID: PMC8668843 DOI: 10.1007/s11892-021-01423-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 07/13/2021] [Indexed: 10/28/2022]
Abstract
PURPOSE OF REVIEW Artificial intelligence (AI) can make advanced inferences based on a large amount of data. The mainstream technologies of the AI boom in 2021 are machine learning (ML) and deep learning, which have made significant progress due to the increase in computational resources accompanied by the dramatic improvement in computer performance. In this review, we introduce AI/ML-based medical devices and prediction models regarding diabetes. RECENT FINDINGS In the field of diabetes, several AI-/ML-based medical devices and regarding automatic retinal screening, clinical diagnosis support, and patient self-management tool have already been approved by the US Food and Drug Administration. As for new-onset diabetes prediction using ML methods, its performance is not superior to conventional risk stratification models that use statistical approaches so far. Despite the current situation, it is expected that the predictive performance of AI will soon be maximized by a large amount of organized data and abundant computational resources, which will contribute to a dramatic improvement in the accuracy of disease prediction models for diabetes.
Collapse
Affiliation(s)
- Akihiro Nomura
- Department of Biomedical Informatics, CureApp Institute, Karuizawa, Japan.
- Innovative Clinical Research Center, Kanazawa University, 13-1 Takaramachi, Kanazawa, 9208641, Japan.
- Department of Cardiovascular Medicine, Kanazawa University Graduate School of Medical Sciences, Kanazawa, Japan.
- Department of Health Promotion and Medicine of the Future, Kanazawa University Graduate School of Medical Sciences, Kanazawa, Japan.
| | - Masahiro Noguchi
- Department of Cardiovascular Medicine, Kanazawa University Graduate School of Medical Sciences, Kanazawa, Japan
| | - Mitsuhiro Kometani
- Department of Health Promotion and Medicine of the Future, Kanazawa University Graduate School of Medical Sciences, Kanazawa, Japan
| | - Kenji Furukawa
- Department of Health Promotion and Medicine of the Future, Kanazawa University Graduate School of Medical Sciences, Kanazawa, Japan
- Health Care Center, Japan Advanced Institute of Science and Technology, Nomi, Japan
| | - Takashi Yoneda
- Department of Health Promotion and Medicine of the Future, Kanazawa University Graduate School of Medical Sciences, Kanazawa, Japan
| |
Collapse
|
46
|
Hatmal MM, Alshaer W, Mahmoud IS, Al-Hatamleh MAI, Al-Ameer HJ, Abuyaman O, Zihlif M, Mohamud R, Darras M, Al Shhab M, Abu-Raideh R, Ismail H, Al-Hamadi A, Abdelhay A. Investigating the association of CD36 gene polymorphisms (rs1761667 and rs1527483) with T2DM and dyslipidemia: Statistical analysis, machine learning based prediction, and meta-analysis. PLoS One 2021; 16:e0257857. [PMID: 34648514 PMCID: PMC8516279 DOI: 10.1371/journal.pone.0257857] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2021] [Accepted: 09/11/2021] [Indexed: 12/15/2022] Open
Abstract
CD36 (cluster of differentiation 36) is a membrane protein involved in lipid metabolism and has been linked to pathological conditions associated with metabolic disorders, such as diabetes and dyslipidemia. A case-control study was conducted and included 177 patients with type-2 diabetes mellitus (T2DM) and 173 control subjects to study the involvement of CD36 gene rs1761667 (G>A) and rs1527483 (C>T) polymorphisms in the pathogenesis of T2DM and dyslipidemia among Jordanian population. Lipid profile, blood sugar, gender and age were measured and recorded. Also, genotyping analysis for both polymorphisms was performed. Following statistical analysis, 10 different neural networks and machine learning (ML) tools were used to predict subjects with diabetes or dyslipidemia. Towards further understanding of the role of CD36 protein and gene in T2DM and dyslipidemia, a protein-protein interaction network and meta-analysis were carried out. For both polymorphisms, the genotypic frequencies were not significantly different between the two groups (p > 0.05). On the other hand, some ML tools like multilayer perceptron gave high prediction accuracy (≥ 0.75) and Cohen's kappa (κ) (≥ 0.5). Interestingly, in K-star tool, the accuracy and Cohen's κ values were enhanced by including the genotyping results as inputs (0.73 and 0.46, respectively, compared to 0.67 and 0.34 without including them). This study confirmed, for the first time, that there is no association between CD36 polymorphisms and T2DM or dyslipidemia among Jordanian population. Prediction of T2DM and dyslipidemia, using these extensive ML tools and based on such input data, is a promising approach for developing diagnostic and prognostic prediction models for a wide spectrum of diseases, especially based on large medical databases.
Collapse
Affiliation(s)
- Ma’mon M. Hatmal
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
- * E-mail:
| | - Walhan Alshaer
- Cell Therapy Centre, The University of Jordan, Amman, Jordan
| | - Ismail S. Mahmoud
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Mohammad A. I. Al-Hatamleh
- Department of Immunology, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian, Kelantan, Malaysia
| | - Hamzeh J. Al-Ameer
- Department of Biology and Biotechnology, American University of Madaba, Madaba, Jordan
- Department of Pharmacology, Faculty of Medicine, The University of Jordan, Amman, Jordan
| | - Omar Abuyaman
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Malek Zihlif
- Department of Pharmacology, Faculty of Medicine, The University of Jordan, Amman, Jordan
| | - Rohimah Mohamud
- Department of Immunology, School of Medical Sciences, Universiti Sains Malaysia, Kubang Kerian, Kelantan, Malaysia
| | - Mais Darras
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Mohammad Al Shhab
- Department of Pharmacology, Faculty of Medicine, The University of Jordan, Amman, Jordan
| | - Rand Abu-Raideh
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Hilweh Ismail
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Ali Al-Hamadi
- Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, The Hashemite University, Zarqa, Jordan
| | - Ali Abdelhay
- Department of Pharmacology, Faculty of Medicine, The University of Jordan, Amman, Jordan
| |
Collapse
|
47
|
Nguyen P, Ohnmacht AJ, Galhoz A, Büttner M, Theis F, Menden MP. Künstliche Intelligenz und maschinelles Lernen in der Diabetesforschung. DIABETOLOGE 2021. [DOI: 10.1007/s11428-021-00817-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
48
|
Anderson P, Gadgil R, Johnson WA, Schwab E, Davidson JM. Reducing variability of breast cancer subtype predictors by grounding deep learning models in prior knowledge. Comput Biol Med 2021; 138:104850. [PMID: 34536702 DOI: 10.1016/j.compbiomed.2021.104850] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2021] [Revised: 08/31/2021] [Accepted: 09/05/2021] [Indexed: 12/23/2022]
Abstract
Deep learning neural networks have improved performance in many cancer informatics problems, including breast cancer subtype classification. However, many networks experience underspecificationwheremultiplecombinationsofparametersachievesimilarperformance, bothin training and validation. Additionally, certain parameter combinations may perform poorly when the test distribution differs from the training distribution. Embedding prior knowledge from the literature may address this issue by boosting predictive models that provide crucial, in-depth information about a given disease. Breast cancer research provides a wealth of such knowledge, particularly in the form of subtype biomarkers and genetic signatures. In this study, we draw on past research on breast cancer subtype biomarkers, label propagation, and neural graph machines to present a novel methodology for embedding knowledge into machine learning systems. We embed prior knowledge into the loss function in the form of inter-subject distances derived from a well-known published breast cancer signature. Our results show that this methodology reduces predictor variability on state-of-the-art deep learning architectures and increases predictor consistency leading to improved interpretation. We find that pathway enrichment analysis is more consistent after embedding knowledge. This novel method applies to a broad range of existing studies and predictive models. Our method moves the traditional synthesis of predictive models from an arbitrary assignment of weights to genes toward a more biologically meaningful approach of incorporating knowledge.
Collapse
Affiliation(s)
- Paul Anderson
- Department of Computer Science and Software Engineering, California Polytechnic State University, San Luis Obispo, CA, USA
| | - Richa Gadgil
- Department of Computer Science and Software Engineering, California Polytechnic State University, San Luis Obispo, CA, USA
| | - William A Johnson
- Department of Biology, California Polytechnic State University, San Luis Obispo, CA, USA
| | - Ella Schwab
- Department of Biology, California Polytechnic State University, San Luis Obispo, CA, USA
| | - Jean M Davidson
- Department of Biology, California Polytechnic State University, San Luis Obispo, CA, USA.
| |
Collapse
|
49
|
Lee E, Jung SY, Hwang HJ, Jung J. Patient-Level Cancer Prediction Models From a Nationwide Patient Cohort: Model Development and Validation. JMIR Med Inform 2021; 9:e29807. [PMID: 34459743 PMCID: PMC8438609 DOI: 10.2196/29807] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2021] [Revised: 07/07/2021] [Accepted: 07/26/2021] [Indexed: 01/14/2023] Open
Abstract
Background Nationwide population-based cohorts provide a new opportunity to build automated risk prediction models at the patient level, and claim data are one of the more useful resources to this end. To avoid unnecessary diagnostic intervention after cancer screening tests, patient-level prediction models should be developed. Objective We aimed to develop cancer prediction models using nationwide claim databases with machine learning algorithms, which are explainable and easily applicable in real-world environments. Methods As source data, we used the Korean National Insurance System Database. Every Korean in ≥40 years old undergoes a national health checkup every 2 years. We gathered all variables from the database including demographic information, basic laboratory values, anthropometric values, and previous medical history. We applied conventional logistic regression methods, light gradient boosting methods, neural networks, survival analysis, and one-class embedding classifier methods to effectively analyze high dimension data based on deep learning–based anomaly detection. Performance was measured with area under the curve and area under precision recall curve. We validated our models externally with a health checkup database from a tertiary hospital. Results The one-class embedding classifier model received the highest area under the curve scores with values of 0.868, 0.849, 0.798, 0.746, 0.800, 0.749, and 0.790 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. For area under precision recall curve, the light gradient boosting models had the highest score with values of 0.383, 0.401, 0.387, 0.300, 0.385, 0.357, and 0.296 for liver, lung, colorectal, pancreatic, gastric, breast, and cervical cancers, respectively. Conclusions Our results show that it is possible to easily develop applicable cancer prediction models with nationwide claim data using machine learning. The 7 models showed acceptable performances and explainability, and thus can be distributed easily in real-world environments.
Collapse
Affiliation(s)
- Eunsaem Lee
- Department of Mathematics, Pohang University of Science and Technology, Pohang-si, Republic of Korea
| | - Se Young Jung
- Office of eHealth Research and Businesses, Seoul National University Bundang Hospital, Seongnam-si, Republic of Korea
| | - Hyung Ju Hwang
- Department of Mathematics, Pohang University of Science and Technology, Pohang-si, Republic of Korea
| | - Jaewoo Jung
- AMSquare Corporation, Pohang-si, Republic of Korea
| |
Collapse
|
50
|
Development and validation of a new diabetes index for the risk classification of present and new-onset diabetes: multicohort study. Sci Rep 2021; 11:15748. [PMID: 34344964 PMCID: PMC8333254 DOI: 10.1038/s41598-021-95341-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2021] [Accepted: 07/26/2021] [Indexed: 02/07/2023] Open
Abstract
In this study, we aimed to propose a novel diabetes index for the risk classification based on machine learning techniques with a high accuracy for diabetes mellitus. Upon analyzing their demographic and biochemical data, we classified the 2013-16 Korea National Health and Nutrition Examination Survey (KNHANES), the 2017-18 KNHANES, and the Korean Genome and Epidemiology Study (KoGES), as the derivation, internal validation, and external validation sets, respectively. We constructed a new diabetes index using logistic regression (LR) and calculated the probability of diabetes in the validation sets. We used the area under the receiver operating characteristic curve (AUROC) and Cox regression analysis to measure the performance of the internal and external validation sets, respectively. We constructed a gender-specific diabetes prediction model, having a resultant AUROC of 0.93 and 0.94 for men and women, respectively. Based on this probability, we classified participants into five groups and analyzed cumulative incidence from the KoGES dataset. Group 5 demonstrated significantly worse outcomes than those in other groups. Our novel model for predicting diabetes, based on two large-scale population-based cohort studies, showed high sensitivity and selectivity. Therefore, our diabetes index can be used to classify individuals at high risk of diabetes.
Collapse
|