1
|
Ye X, Wang X, Wang Y, Lin H. Predicting cognitive function among Chinese community-dwelling older adults: A supervised machine learning approach. Prev Med 2025; 196:108307. [PMID: 40349986 DOI: 10.1016/j.ypmed.2025.108307] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/03/2025] [Revised: 05/08/2025] [Accepted: 05/08/2025] [Indexed: 05/14/2025]
Abstract
OBJECTIVE Identifying cognitive impairment early enough could support timely intervention of cognitive impairment and facilitate successful cognitive aging. We aimed to build more precise prediction models for cognitive function using less variable input among Chinese community-dwelling older adults. METHODS We used data from a prospective cohort of 13,906 older adults aged 60 years and above from the nationally representative China Health and Retirement Longitudinal Study (CHARLS) 2011-2020. The Gradient Boosting Classifier (GBC) and gradient boosting regressor (GBR) models were used to predict an individual's current cognitive function. For future cognition prediction, we trained GBR models to analyze the prediction error over the years. RESULTS Among 68 features, ten features were finally selected to develop the model: education attainment, childhood friendship, age, instrumental activities of daily living (IADLs), hukou type, mobility, sleep duration, gender, residence, and social participation. Our model exhibited robust performance in predicting current and future cognitive function. When an individual's current cognitive function was assessed as a dichotomous classification of cognitive impairment presence, the GBC model achieved an area under the receiver operating characteristic (ROC) of 0.832. When the outcome was forecasted as a continuous variable, the model achieved a root mean square error (RMSE) loss of 3.356 in the test set. For predicting future cognition, models taking into account the current cognitive state demonstrated superior performance. CONCLUSIONS Our study offers a practical tool to aid in the early identification of cognitive impairment, thus supporting timely interventions in the community environment and potentially contributing to successful cognitive aging.
Collapse
Affiliation(s)
- Xin Ye
- Institute for Global Public Policy, Fudan University, Shanghai 200433, China; LSE-Fudan Research Centre for Global Public Policy, Fudan University, Shanghai 200433, China.
| | - Xinfeng Wang
- Institute for Global Public Policy, Fudan University, Shanghai 200433, China
| | - Yu Wang
- Fudan Institute for Advanced Study in Social Sciences, Fudan University, Shanghai 200433, China
| | - Hugo Lin
- CentraleSupélec, Paris-Saclay University, Paris 91192, France
| |
Collapse
|
2
|
Gao H, Schneider S, Hernandez R, Harris J, Maupin D, Junghaenel DU, Kapteyn A, Stone A, Zelinski E, Meijer E, Lee PJ, Orriens B, Jin H. Early Identification of Cognitive Impairment in Community Environments Through Modeling Subtle Inconsistencies in Questionnaire Responses: Machine Learning Model Development and Validation. JMIR Form Res 2024; 8:e54335. [PMID: 39536306 PMCID: PMC11602764 DOI: 10.2196/54335] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2023] [Revised: 06/18/2024] [Accepted: 09/23/2024] [Indexed: 11/16/2024] Open
Abstract
BACKGROUND The underdiagnosis of cognitive impairment hinders timely intervention of dementia. Health professionals working in the community play a critical role in the early detection of cognitive impairment, yet still face several challenges such as a lack of suitable tools, necessary training, and potential stigmatization. OBJECTIVE This study explored a novel application integrating psychometric methods with data science techniques to model subtle inconsistencies in questionnaire response data for early identification of cognitive impairment in community environments. METHODS This study analyzed questionnaire response data from participants aged 50 years and older in the Health and Retirement Study (waves 8-9, n=12,942). Predictors included low-quality response indices generated using the graded response model from four brief questionnaires (optimism, hopelessness, purpose in life, and life satisfaction) assessing aspects of overall well-being, a focus of health professionals in communities. The primary and supplemental predicted outcomes were current cognitive impairment derived from a validated criterion and dementia or mortality in the next ten years. Seven predictive models were trained, and the performance of these models was evaluated and compared. RESULTS The multilayer perceptron exhibited the best performance in predicting current cognitive impairment. In the selected four questionnaires, the area under curve values for identifying current cognitive impairment ranged from 0.63 to 0.66 and was improved to 0.71 to 0.74 when combining the low-quality response indices with age and gender for prediction. We set the threshold for assessing cognitive impairment risk in the tool based on the ratio of underdiagnosis costs to overdiagnosis costs, and a ratio of 4 was used as the default choice. Furthermore, the tool outperformed the efficiency of age or health-based screening strategies for identifying individuals at high risk for cognitive impairment, particularly in the 50- to 59-year and 60- to 69-year age groups. The tool is available on a portal website for the public to access freely. CONCLUSIONS We developed a novel prediction tool that integrates psychometric methods with data science to facilitate "passive or backend" cognitive impairment assessments in community settings, aiming to promote early cognitive impairment detection. This tool simplifies the cognitive impairment assessment process, making it more adaptable and reducing burdens. Our approach also presents a new perspective for using questionnaire data: leveraging, rather than dismissing, low-quality data.
Collapse
Affiliation(s)
- Hongxin Gao
- School of Health Sciences, University of Surrey, Guildford, United Kingdom
| | - Stefan Schneider
- Center for Self-Report Science, University of Southern California, Los Angeles, CA, United States
- Department of Psychology, University of Southern California, Los Angeles, CA, United States
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, CA, United States
- Center for Economic and Social Research, University of Southern California, Los Angeles, CA, United States
| | - Raymond Hernandez
- Center for Self-Report Science, University of Southern California, Los Angeles, CA, United States
- Center for Economic and Social Research, University of Southern California, Los Angeles, CA, United States
| | - Jenny Harris
- School of Health Sciences, University of Surrey, Guildford, United Kingdom
| | - Danny Maupin
- School of Health Sciences, University of Surrey, Guildford, United Kingdom
| | - Doerte U Junghaenel
- Center for Self-Report Science, University of Southern California, Los Angeles, CA, United States
- Department of Psychology, University of Southern California, Los Angeles, CA, United States
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, CA, United States
- Center for Economic and Social Research, University of Southern California, Los Angeles, CA, United States
| | - Arie Kapteyn
- Center for Economic and Social Research, University of Southern California, Los Angeles, CA, United States
| | - Arthur Stone
- Center for Self-Report Science, University of Southern California, Los Angeles, CA, United States
- Department of Psychology, University of Southern California, Los Angeles, CA, United States
- Center for Economic and Social Research, University of Southern California, Los Angeles, CA, United States
| | - Elizabeth Zelinski
- Leonard Davis School of Gerontology, University of Southern California, Los Angeles, CA, United States
| | - Erik Meijer
- Center for Economic and Social Research, University of Southern California, Los Angeles, CA, United States
| | - Pey-Jiuan Lee
- Center for Self-Report Science, University of Southern California, Los Angeles, CA, United States
- Center for Economic and Social Research, University of Southern California, Los Angeles, CA, United States
| | - Bart Orriens
- Center for Economic and Social Research, University of Southern California, Los Angeles, CA, United States
| | - Haomiao Jin
- School of Health Sciences, University of Surrey, Guildford, United Kingdom
| |
Collapse
|
3
|
Nabavi A, Safari F, Kashkooli M, Sadat Nabavizadeh S, Molavi Vardanjani H. Early prediction of cognitive impairment in adults aged 20 years and older using machine learning and biomarkers of heavy metal exposure. Curr Res Toxicol 2024; 7:100198. [PMID: 39497907 PMCID: PMC11533558 DOI: 10.1016/j.crtox.2024.100198] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2024] [Revised: 09/13/2024] [Accepted: 10/08/2024] [Indexed: 11/07/2024] Open
Abstract
Background Cognitive impairment poses a growing health challenge as populations age. Heavy metals are implicated as environmental risk factors, but their role is not fully understood. Machine learning can integrate multi-factorial data to predict cognitive outcomes. Objective To develop and validate machine learning models for early prediction of cognitive impairment risk using demographics, clinical factors, and biomarkers of heavy metal exposure. Method A retrospective analysis was conducted using 2011-2014 NHANES data. Participants aged ≥ 20 underwent cognitive testing. Variables included demographics, medical history, lifestyle factors, and blood and urine levels of lead, cadmium, manganese, and other metals. Machine learning algorithms were trained on 90 % of data and evaluated on 10 %. Performance was assessed using metrics like accuracy, AUC, and sensitivity. Result A final sample of 2,933 participants was analyzed. The stacking ensemble model achieved the best performance with an AUC of 0.778 for test data, sensitivity of 0.879. Important predictors included age, gender, hypertension, education, urinary cadmium and blood manganese levels. Conclusion Machine learning can effectively predict cognitive impairment risk using comprehensive clinical and exposure data. Incorporating heavy metal biomarkers enhanced prediction and provided insights into environmental contributions to cognitive decline. Prospective studies are needed to validate models over time.
Collapse
Affiliation(s)
- Ali Nabavi
- Student Research Committee, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Farimah Safari
- Student Research Committee, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Mohammad Kashkooli
- Student Research Committee, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Sara Sadat Nabavizadeh
- Department of Otolaryngology, Otolaryngology Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Hossein Molavi Vardanjani
- Research Center for Traditional Medicine and History of Medicine, Shiraz University of Medical Sciences, Shiraz, Iran
| |
Collapse
|
4
|
Cui X, Zheng X, Lu Y. Prediction Model for Cognitive Impairment among Disabled Older Adults: A Development and Validation Study. Healthcare (Basel) 2024; 12:1028. [PMID: 38786438 PMCID: PMC11121056 DOI: 10.3390/healthcare12101028] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2024] [Revised: 05/02/2024] [Accepted: 05/13/2024] [Indexed: 05/25/2024] Open
Abstract
Disabled older adults exhibited a higher risk for cognitive impairment. Early identification is crucial in alleviating the disease burden. This study aims to develop and validate a prediction model for identifying cognitive impairment among disabled older adults. A total of 2138, 501, and 746 participants were included in the development set and two external validation sets. Logistic regression, support vector machine, random forest, and XGBoost were introduced to develop the prediction model. A nomogram was further established to demonstrate the prediction model directly and vividly. Logistic regression exhibited better predictive performance on the test set with an area under the curve of 0.875. It maintained a high level of precision (0.808), specification (0.788), sensitivity (0.770), and F1-score (0.788) compared with the machine learning models. We further simplified and established a nomogram based on the logistic regression, comprising five variables: age, daily living activities, instrumental activity of daily living, hearing impairment, and visual impairment. The areas under the curve of the nomogram were 0.871, 0.825, and 0.863 in the internal and two external validation sets, respectively. This nomogram effectively identifies the risk of cognitive impairment in disabled older adults.
Collapse
Affiliation(s)
| | | | - Yun Lu
- School of International Pharmaceutical Business, China Pharmaceutical University, 639 Longmian Avenue, Jiangning District, Nanjing 211198, China; (X.C.); (X.Z.)
| |
Collapse
|
5
|
Zhang Y, Xu J, Zhang C, Zhang X, Yuan X, Ni W, Zhang H, Zheng Y, Zhao Z. Community screening for dementia among older adults in China: a machine learning-based strategy. BMC Public Health 2024; 24:1206. [PMID: 38693495 PMCID: PMC11062005 DOI: 10.1186/s12889-024-18692-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2023] [Accepted: 04/23/2024] [Indexed: 05/03/2024] Open
Abstract
BACKGROUND Dementia is a leading cause of disability in people older than 65 years worldwide. However, diagnosing dementia in its earliest symptomatic stages remains challenging. This study combined specific questions from the AD8 scale with comprehensive health-related characteristics, and used machine learning (ML) to construct diagnostic models of cognitive impairment (CI). METHODS The study was based on the Shenzhen Healthy Ageing Research (SHARE) project, and we recruited 823 participants aged 65 years and older, who completed a comprehensive health assessment and cognitive function assessments. Permutation importance was used to select features. Five ML models using BalanceCascade were applied to predict CI: a support vector machine (SVM), multilayer perceptron (MLP), AdaBoost, gradient boosting decision tree (GBDT), and logistic regression (LR). An AD8 score ≥ 2 was used to define CI as a baseline. SHapley Additive exPlanations (SHAP) values were used to interpret the results of ML models. RESULTS The first and sixth items of AD8, platelets, waist circumference, body mass index, carcinoembryonic antigens, age, serum uric acid, white blood cells, abnormal electrocardiogram, heart rate, and sex were selected as predictive features. Compared to the baseline (AUC = 0.65), the MLP showed the highest performance (AUC: 0.83 ± 0.04), followed by AdaBoost (AUC: 0.80 ± 0.04), SVM (AUC: 0.78 ± 0.04), GBDT (0.76 ± 0.04). Furthermore, the accuracy, sensitivity and specificity of four ML models were higher than the baseline. SHAP summary plots based on MLP showed the most influential feature on model decision for positive CI prediction was female sex, followed by older age and lower waist circumference. CONCLUSIONS The diagnostic models of CI applying ML, especially the MLP, were substantially more effective than the traditional AD8 scale with a score of ≥ 2 points. Our findings may provide new ideas for community dementia screening and to promote such screening while minimizing medical and health resources.
Collapse
Affiliation(s)
- Yan Zhang
- Department of Elderly Health Management, Shenzhen Center for Chronic Disease Control, No.2021, Buxin Road, Shenzhen, Guangdong, 518020, China
| | - Jian Xu
- Department of Elderly Health Management, Shenzhen Center for Chronic Disease Control, No.2021, Buxin Road, Shenzhen, Guangdong, 518020, China
| | - Chi Zhang
- Shenzhen Yiwei Technology Company, Shenzhen, Guangdong, 518000, China
| | - Xu Zhang
- National Engineering Laboratory of Big Data System Computing Technology, Shenzhen University, Shenzhen, Guangdong, 518060, China
| | - Xueli Yuan
- Department of Elderly Health Management, Shenzhen Center for Chronic Disease Control, No.2021, Buxin Road, Shenzhen, Guangdong, 518020, China
| | - Wenqing Ni
- Department of Elderly Health Management, Shenzhen Center for Chronic Disease Control, No.2021, Buxin Road, Shenzhen, Guangdong, 518020, China
| | - Hongmin Zhang
- Department of Elderly Health Management, Shenzhen Center for Chronic Disease Control, No.2021, Buxin Road, Shenzhen, Guangdong, 518020, China
| | - Yijin Zheng
- Department of Elderly Health Management, Shenzhen Center for Chronic Disease Control, No.2021, Buxin Road, Shenzhen, Guangdong, 518020, China
| | - Zhiguang Zhao
- Department of Elderly Health Management, Shenzhen Center for Chronic Disease Control, No.2021, Buxin Road, Shenzhen, Guangdong, 518020, China.
| |
Collapse
|
6
|
Lei C, Wu G, Cui Y, Xia H, Chen J, Zhan X, Lv Y, Li M, Zhang R, Zhu X. Development and validation of a cognitive dysfunction risk prediction model for the abdominal obesity population. Front Endocrinol (Lausanne) 2024; 15:1290286. [PMID: 38481441 PMCID: PMC10932956 DOI: 10.3389/fendo.2024.1290286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 01/22/2024] [Indexed: 03/26/2024] Open
Abstract
Objectives This study was aimed to develop a nomogram that can accurately predict the likelihood of cognitive dysfunction in individuals with abdominal obesity by utilizing various predictor factors. Methods A total of 1490 cases of abdominal obesity were randomly selected from the National Health and Nutrition Examination Survey (NHANES) database for the years 2011-2014. The diagnostic criteria for abdominal obesity were as follows: waist size ≥ 102 cm for men and waist size ≥ 88 cm for women, and cognitive function was assessed by Consortium to Establish a Registry for Alzheimer's Disease (CERAD), Word Learning subtest, Delayed Word Recall Test, Animal Fluency Test (AFT), and Digit Symbol Substitution Test (DSST). The cases were divided into two sets: a training set consisting of 1043 cases (70%) and a validation set consisting of 447 cases (30%). To create the model nomogram, multifactor logistic regression models were constructed based on the selected predictors identified through LASSO regression analysis. The model's performance was assessed using several metrics, including the consistency index (C-index), the area under the receiver operating characteristic (ROC) curve (AUC), calibration curves, and decision curve analysis (DCA) to assess the clinical benefit of the model. Results The multivariate logistic regression analysis revealed that age, sex, education level, 24-hour total fat intake, red blood cell folate concentration, depression, and moderate work activity were significant predictors of cognitive dysfunction in individuals with abdominal obesity (p < 0.05). These predictors were incorporated into the nomogram. The C-indices for the training and validation sets were 0.814 (95% CI: 0.875-0.842) and 0.805 (95% CI: 0.758-0.851), respectively. The corresponding AUC values were 0.814 (95% CI: 0.875-0.842) and 0.795 (95% CI: 0.753-0.847). The calibration curves demonstrated a satisfactory level of agreement between the nomogram model and the observed data. The DCA indicated that early intervention for at-risk populations would provide a net benefit, as indicated by the line graph. Conclusion Age, sex, education level, 24-hour total fat intake, red blood cell folate concentration, depression, and moderate work activity were identified as predictive factors for cognitive dysfunction in individuals with abdominal obesity. In conclusion, the nomogram model developed in this study can effectively predict the clinical risk of cognitive dysfunction in individuals with abdominal obesity.
Collapse
Affiliation(s)
- Chun Lei
- General Practice, The First Affiliated Hospital of Jinan University, Guangzhou, Guangdong, China
| | - Gangjie Wu
- General Practice, The First Affiliated Hospital of Jinan University, Guangzhou, Guangdong, China
| | - Yan Cui
- School of Traditional Chinese Medicine, Jinan University, Guangzhou, Guangdong, China
| | - Hui Xia
- General Practice, The First Affiliated Hospital of Jinan University, Guangzhou, Guangdong, China
| | - Jianbing Chen
- School of Traditional Chinese Medicine, Jinan University, Guangzhou, Guangdong, China
| | - Xiaoyao Zhan
- School of Traditional Chinese Medicine, Jinan University, Guangzhou, Guangdong, China
| | - Yanlan Lv
- School of Traditional Chinese Medicine, Jinan University, Guangzhou, Guangdong, China
| | - Meng Li
- School of Traditional Chinese Medicine, Jinan University, Guangzhou, Guangdong, China
| | - Ronghua Zhang
- College of Pharmacy, Jinan University, Guangzhou, Guangdong, China
- Cancer Research Institution, Jinan University, Guangzhou, Guangdong, China
- Guangdong Provincial Key Laboratory of Traditional Chinese Medicine Informatization, Jinan University, Guangzhou, Guangdong, China
| | - Xiaofeng Zhu
- School of Traditional Chinese Medicine, Jinan University, Guangzhou, Guangdong, China
- Traditional Chinese Medicine Department, The First Affiliated Hospital of Jinan University, Guangzhou, Guangdong, China
| |
Collapse
|
7
|
Das A, Dhillon P. Application of machine learning in measurement of ageing and geriatric diseases: a systematic review. BMC Geriatr 2023; 23:841. [PMID: 38087195 PMCID: PMC10717316 DOI: 10.1186/s12877-023-04477-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2023] [Accepted: 11/10/2023] [Indexed: 12/18/2023] Open
Abstract
BACKGROUND As the ageing population continues to grow in many countries, the prevalence of geriatric diseases is on the rise. In response, healthcare providers are exploring novel methods to enhance the quality of life for the elderly. Over the last decade, there has been a remarkable surge in the use of machine learning in geriatric diseases and care. Machine learning has emerged as a promising tool for the diagnosis, treatment, and management of these conditions. Hence, our study aims to find out the present state of research in geriatrics and the application of machine learning methods in this area. METHODS This systematic review followed Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines and focused on healthy ageing in individuals aged 45 and above, with a specific emphasis on the diseases that commonly occur during this process. The study mainly focused on three areas, that are machine learning, the geriatric population, and diseases. Peer-reviewed articles were searched in the PubMed and Scopus databases with inclusion criteria of population above 45 years, must have used machine learning methods, and availability of full text. To assess the quality of the studies, Joanna Briggs Institute's (JBI) critical appraisal tool was used. RESULTS A total of 70 papers were selected from the 120 identified papers after going through title screening, abstract screening, and reference search. Limited research is available on predicting biological or brain age using deep learning and different supervised machine learning methods. Neurodegenerative disorders were found to be the most researched disease, in which Alzheimer's disease was focused the most. Among non-communicable diseases, diabetes mellitus, hypertension, cancer, kidney diseases, and cardiovascular diseases were included, and other rare diseases like oral health-related diseases and bone diseases were also explored in some papers. In terms of the application of machine learning, risk prediction was the most common approach. Half of the studies have used supervised machine learning algorithms, among which logistic regression, random forest, XG Boost were frequently used methods. These machine learning methods were applied to a variety of datasets including population-based surveys, hospital records, and digitally traced data. CONCLUSION The review identified a wide range of studies that employed machine learning algorithms to analyse various diseases and datasets. While the application of machine learning in geriatrics and care has been well-explored, there is still room for future development, particularly in validating models across diverse populations and utilizing personalized digital datasets for customized patient-centric care in older populations. Further, we suggest a scope of Machine Learning in generating comparable ageing indices such as successful ageing index.
Collapse
Affiliation(s)
- Ayushi Das
- International Institute for Population Sciences, Deonar, Mumbai, 400088, India
| | - Preeti Dhillon
- Department of Survey Research and Data Analytics, International Institute for Population Sciences, Deonar, Mumbai, 400088, India.
| |
Collapse
|
8
|
Huang Y, Huang Z, Yang Q, Jin H, Xu T, Fu Y, Zhu Y, Zhang X, Chen C. Predicting mild cognitive impairment among Chinese older adults: a longitudinal study based on long short-term memory networks and machine learning. Front Aging Neurosci 2023; 15:1283243. [PMID: 37937119 PMCID: PMC10626462 DOI: 10.3389/fnagi.2023.1283243] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 10/10/2023] [Indexed: 11/09/2023] Open
Abstract
Background Mild cognitive impairment (MCI) is a transitory yet reversible stage of dementia. Systematic, scientific and population-wide early screening system for MCI is lacking. This study aimed to construct prediction models using longitudinal data to identify potential MCI patients and explore its critical features among Chinese older adults. Methods A total of 2,128 participants were selected from wave 5-8 of Chinese Longitudinal Healthy Longevity Study. Cognitive function was measured using the Chinese version of Mini-Mental State Examination. Long- short-term memory (LSTM) and three machine learning techniques, including 8 sociodemographic features and 12 health behavior and health status features, were used to predict individual risk of MCI in the next year. Performances of prediction models were evaluated through receiver operating curve and decision curve analysis. The importance of predictors in prediction models were explored using Shapley Additive explanation (SHAP) model. Results The area under the curve values of three models were around 0.90 and decision curve analysis indicated that the net benefit of XGboost and Random Forest were approximate when threshold is lower than 0.8. SHAP models showed that age, education, respiratory disease, gastrointestinal ulcer and self-rated health are the five most important predictors of MCI. Conclusion This screening method of MCI, combining LSTM and machine learning, successfully predicted the risk of MCI using longitudinal datasets, and enables health care providers to implement early intervention to delay the process from MCI to dementia, reducing the incidence and treatment cost of dementia ultimately.
Collapse
Affiliation(s)
- Yucheng Huang
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Zishuo Huang
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, Zhejiang, China
- School of Innovation and Entrepreneurship, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Qingren Yang
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, Zhejiang, China
- School of Innovation and Entrepreneurship, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Haojie Jin
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Tingke Xu
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Yating Fu
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Yue Zhu
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Xiangyang Zhang
- The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, China
| | - Chun Chen
- School of Public Health and Management, Wenzhou Medical University, Wenzhou, Zhejiang, China
- Center for Healthy China Research, Wenzhou Medical University, Wenzhou, Zhejiang, China
| |
Collapse
|
9
|
Huang AA, Huang SY. Exploring Depression and Nutritional Covariates Amongst US Adults using Shapely Additive Explanations. Health Sci Rep 2023; 6:e1635. [PMID: 37867784 PMCID: PMC10588337 DOI: 10.1002/hsr2.1635] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2023] [Revised: 05/02/2023] [Accepted: 10/10/2023] [Indexed: 10/24/2023] Open
Abstract
Background Depression affects personal and public well-being and identification of natural therapeutics such as nutrition is necessary to help alleviate this public health concern. Objective The study aimed to identify feature importance in a machine learning model using solely nutrition covariates. Methods A retrospective analysis was conducted using a modern, nationally representative cohort, the National Health and Nutrition Examination Surveys (NHANES 2017-2020). Depressive symptoms were evaluated using the validated 9-item Patient Health Questionnaire (PHQ-9), and all adult patients (total of 7929 individuals) who completed the PHQ-9 and total nutritional intake questionnaire were included in the study. Univariable regression was used to identify significant nutritional covariates to be included in a machine learning model and feature importance was reported. The acquisition and analysis of the data were authorized by the National Center for Health Statistics Ethics Review Board. Results 7929 patients met the inclusion criteria in this study. The machine learning model had 24 out of a total of 60 features that were found to be significant on univariate analysis (p < 0.01 used). In the XGBoost model the model had an Area Under the Receiver Operator Characteristic Curve (AUROC) = 0.603, Sensitivity = 0.943, Specificity = 0.163. The top four highest ranked features by gain, a measure of the percentage contribution of the covariate to the overall model prediction, were Potassium Intake (Gain = 6.8%), Vitamin E Intake (Gain = 5.7%), Number of Foods and Beverages Reported (Gain = 5.7%), and Vitamin K Intake (Gain 5.6%). Conclusion Machine learning models with feature importance can be utilized to identify nutritional covariates for further study in patients with clinical symptoms of depression.
Collapse
Affiliation(s)
| | - Samuel Y. Huang
- Virginia Commonwealth University School of MedicineRichmondVirginiaUSA
| |
Collapse
|