1
|
Colineaux H, Lepage B, Chauvin P, Dimeglio C, Delpierre C, Lefèvre T. Contribution of Structure Learning Algorithms in Social Epidemiology: Application to Real-World Data. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2025; 22:348. [PMID: 40238329 PMCID: PMC11941975 DOI: 10.3390/ijerph22030348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Revised: 01/27/2025] [Accepted: 02/03/2025] [Indexed: 04/18/2025]
Abstract
Epidemiologists often handle large datasets with numerous variables and are currently seeing a growing wealth of techniques for data analysis, such as machine learning. Critical aspects involve addressing causality, often based on observational data, and dealing with the complex relationships between variables to uncover the overall structure of variable interactions, causal or not. Structure learning (SL) methods aim to automatically or semi-automatically reveal the structure of variables' relationships. The objective of this study is to delineate some of the potential contributions and limitations of structure learning methods when applied to social epidemiology topics and the search for determinants of healthcare system access. We applied SL techniques to a real-world dataset, namely the 2010 wave of the SIRS cohort, which included a sample of 3006 adults from the Paris region, France. Healthcare utilization, encompassing both direct and indirect access to care, was the primary outcome. Candidate determinants included health status, demographic characteristics, and socio-cultural and economic positions. We present two approaches: a non-automated epidemiological method (an initial expert knowledge network and stepwise logistic regression models) and three SL techniques using various algorithms, with and without knowledge constraints. We compared the results based on the presence, direction, and strength of specific links within the produced network. Although the interdependencies and relative strengths identified by both approaches were similar, the SL algorithms detect fewer associations with the outcome than the non-automated method. Relationships between variables were sometimes incorrectly oriented when using a purely data-driven approach. SL algorithms can be valuable in exploratory stages, helping to generate new hypotheses or mining novel databases. However, results should be validated against prior knowledge and supplemented with additional confirmatory analyses.
Collapse
Affiliation(s)
- Helene Colineaux
- EQUITY Team, Centre d’Epidémiologie et de Recherche en Santé des POPulations (CERPOP), Institut National de la Santé et de la Recherche Médicale (INSERM)—Toulouse III University, 37 Allées Jules Guesde, 31062 Toulouse, France
| | - Benoit Lepage
- EQUITY Team, Centre d’Epidémiologie et de Recherche en Santé des POPulations (CERPOP), Institut National de la Santé et de la Recherche Médicale (INSERM)—Toulouse III University, 37 Allées Jules Guesde, 31062 Toulouse, France
- Epidemiology Department, Toulouse Teaching Hospital, 37 Allées Jules Guesde, 31062 Toulouse, France
| | - Pierre Chauvin
- UMRS 1136, Pierre Louis Institute of Epidemiology and Public Health, Department of Social Epidemiology, Institut National de la Santé et de la Recherche Médicale (INSERM), Sorbonne University, 75005 Paris, France; (P.C.); (T.L.)
| | - Chloe Dimeglio
- Toulouse Institute for Infectious and Inflammatory Diseases (INFINITY), Institut National de la Santé et de la Recherche Médicale (INSERM), UMR 1291, Centre National de la Recherche Scientifique (CNRS), UMR 5051, 31300 Toulouse, France
| | - Cyrille Delpierre
- EQUITY Team, Centre d’Epidémiologie et de Recherche en Santé des POPulations (CERPOP), Institut National de la Santé et de la Recherche Médicale (INSERM)—Toulouse III University, 37 Allées Jules Guesde, 31062 Toulouse, France
| | - Thomas Lefèvre
- UMRS 1136, Pierre Louis Institute of Epidemiology and Public Health, Department of Social Epidemiology, Institut National de la Santé et de la Recherche Médicale (INSERM), Sorbonne University, 75005 Paris, France; (P.C.); (T.L.)
| |
Collapse
|
2
|
Abegaz TM, Ahmed M, Ali AA, Bhagavathula AS. Predicting Health-Related Quality of Life Using Social Determinants of Health: A Machine Learning Approach with the All of Us Cohort. Bioengineering (Basel) 2025; 12:166. [PMID: 40001685 PMCID: PMC11851811 DOI: 10.3390/bioengineering12020166] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2025] [Revised: 01/28/2025] [Accepted: 02/07/2025] [Indexed: 02/27/2025] Open
Abstract
This study applied machine learning (ML) algorithms to predict health-related quality of life (HRQOL) using comprehensive social determinants of health (SDOH) features. Data from the All of Us dataset, comprising participants with complete HRQOL and SDOH records, were analyzed. The primary outcome was HRQOL, which encompassed physical and mental health components, while SDOH features included social, educational, economic, environmental, and healthcare access factors. Three ML algorithms, namely logistic regression, XGBoost, and Random Forest, were tested. The models achieved accuracy ranges of 0.73-0.77 for HRQOL, 0.70-0.71 for physical health, and 0.72-0.77 for mental health, with corresponding area under the curve ranges of 0.81-0.84, 0.74-0.76, and 0.83-0.85, respectively. Emotional stability, activity management, spiritual beliefs, and comorbidity were identified as key predictors. These findings underscore the critical role of SDOH in predicting HRQOL and suggests future research to focus on applying such models to diverse patient populations and specific clinical conditions.
Collapse
Affiliation(s)
- Tadesse M. Abegaz
- Division of Pharmacy Practice and Science, College of Pharmacy, The Ohio State University, 281 W Lane Ave, Columbus, OH 43210, USA
| | - Muktar Ahmed
- Adelaide Medical School, Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA 5005, Australia;
| | - Askal Ayalew Ali
- Economic, Social and Administrative Pharmacy (ESAP), Institute of Public Heath, College of Pharmacy and Pharmaceutical Sciences, Florida A&M University, Tallahassee, FL 32307, USA;
| | - Akshaya Srikanth Bhagavathula
- Department of Public Health, College of Health and Human Services, North Dakota State University, Fargo, ND 58108, USA;
| |
Collapse
|
3
|
Zeng X, Ma Q, Huang CX, Xiao JJ, Fu X, Ren YF, Qu YL, Xiang HX, Lei M, Zheng RY, Zhong Y, Xiao P, Zhuang X, You FM, He JW. Diagnostic potential of salivary microbiota in persistent pulmonary nodules: identifying biomarkers and functional pathways using 16S rRNA sequencing and machine learning. J Transl Med 2024; 22:1079. [PMID: 39609902 PMCID: PMC11603953 DOI: 10.1186/s12967-024-05802-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2024] [Accepted: 10/23/2024] [Indexed: 11/30/2024] Open
Abstract
BACKGROUND The aim of this study was to explore the microbial variations and biomarkers in the oral environment of patients with persistent pulmonary nodules (pPNs) and to reveal the potential biological functions of the salivary microbiota in pPNs. MATERIALS AND METHODS This study included a total of 483 participants (141 healthy controls and 342 patients with pPNs) from June 2022 and January 2024. Saliva samples were subjected to sequencing of the V3-V4 region of the 16S rRNA gene to assess microbial diversity and differential abundance. Seven advanced machine learning algorithms (logistic regression, support vector machine, multi-layer perceptron, naïve Bayes, random forest, gradient boosting decision tree, and LightGBM) were utilized to evaluate performance and identify key microorganisms, with fivefold cross-validation employed to ensure robustness. The Shapley Additive exPlanations (SHAP) algorithm was employed to explain the contribution of these core microbiotas to the predictive model. Additionally, the PICRUSt2 algorithm was used to predict the microbial functions. RESULTS The salivary microbial composition in pPNs group showed significantly lower α- and β-diversity compared to healthy controls. A high-accuracy LightGBM model was developed, identifying six core genera-Fusobacterium, Solobacterium, Actinomyces, Porphyromonas, Atopobium, and Peptostreptococcus-as pPNs biomarkers. Additionally, a visualization pPNs risk prediction system was developed. The immune responses and metabolic activities differences in salivary microbiota between the patients with pPNs and healthy controls were revealed. CONCLUSIONS This study highlights the potential clinical applications of the salivary microbiota for enable earlier detection and targeted interventions, offering significant promise for advancing clinical management and improving patient outcomes in pPNs.
Collapse
Affiliation(s)
- Xiao Zeng
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan Province, China
| | - Qiong Ma
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan Province, China
| | - Chun-Xia Huang
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan Province, China
| | - Jun-Jie Xiao
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan Province, China
| | - Xi Fu
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan Province, China
- TCM Regulating Metabolic Diseases Key Laboratory of Sichuan Province, Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan Province, China
| | - Yi-Feng Ren
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan Province, China
- TCM Regulating Metabolic Diseases Key Laboratory of Sichuan Province, Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan Province, China
| | - Yu-Li Qu
- College of Artificial Intelligence, Xi'an Jiaotong University, Xian, 710061, Shanxi Province, China
| | - Hong-Xia Xiang
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan Province, China
| | - Mao Lei
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan Province, China
| | - Ru-Yi Zheng
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan Province, China
| | - Yang Zhong
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan Province, China
| | - Ping Xiao
- Department of Thoracic Surgery, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, School of Medicine, University of Electronic Science and Technology of China, Chengdu, 610042, Sichuan Province, China
| | - Xiang Zhuang
- Department of Thoracic Surgery, Sichuan Cancer Hospital & Institute, Sichuan Cancer Center, School of Medicine, University of Electronic Science and Technology of China, Chengdu, 610042, Sichuan Province, China
| | - Feng-Ming You
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan Province, China.
- TCM Regulating Metabolic Diseases Key Laboratory of Sichuan Province, Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan Province, China.
| | - Jia-Wei He
- Hospital of Chengdu University of Traditional Chinese Medicine, Chengdu, 610072, Sichuan Province, China.
| |
Collapse
|
4
|
Moccia C, Moirano G, Popovic M, Pizzi C, Fariselli P, Richiardi L, Ekstrøm CT, Maule M. Machine learning in causal inference for epidemiology. Eur J Epidemiol 2024; 39:1097-1108. [PMID: 39535572 PMCID: PMC11599438 DOI: 10.1007/s10654-024-01173-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2023] [Accepted: 10/21/2024] [Indexed: 11/16/2024]
Abstract
In causal inference, parametric models are usually employed to address causal questions estimating the effect of interest. However, parametric models rely on the correct model specification assumption that, if not met, leads to biased effect estimates. Correct model specification is challenging, especially in high-dimensional settings. Incorporating Machine Learning (ML) into causal analyses may reduce the bias arising from model misspecification, since ML methods do not require the specification of a functional form of the relationship between variables. However, when ML predictions are directly plugged in a predefined formula of the effect of interest, there is the risk of introducing a "plug-in bias" in the effect measure. To overcome this problem and to achieve useful asymptotic properties, new estimators that combine the predictive potential of ML and the ability of traditional statistical methods to make inference about population parameters have been proposed. For epidemiologists interested in taking advantage of ML for causal inference investigations, we provide an overview of three estimators that represent the current state-of-art, namely Targeted Maximum Likelihood Estimation (TMLE), Augmented Inverse Probability Weighting (AIPW) and Double/Debiased Machine Learning (DML).
Collapse
Affiliation(s)
- Chiara Moccia
- Cancer Epidemiology Unit, Department of Medical Sciences, University of Turin and CPO Piedmont, Via Santena 7, Turin, 10126, Italy.
| | - Giovenale Moirano
- Cancer Epidemiology Unit, Department of Medical Sciences, University of Turin and CPO Piedmont, Via Santena 7, Turin, 10126, Italy
| | - Maja Popovic
- Cancer Epidemiology Unit, Department of Medical Sciences, University of Turin and CPO Piedmont, Via Santena 7, Turin, 10126, Italy
| | - Costanza Pizzi
- Cancer Epidemiology Unit, Department of Medical Sciences, University of Turin and CPO Piedmont, Via Santena 7, Turin, 10126, Italy
| | - Piero Fariselli
- Department of Medical Sciences, University of Turin, Turin, Italy
| | - Lorenzo Richiardi
- Cancer Epidemiology Unit, Department of Medical Sciences, University of Turin and CPO Piedmont, Via Santena 7, Turin, 10126, Italy
| | - Claus Thorn Ekstrøm
- Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark
| | - Milena Maule
- Cancer Epidemiology Unit, Department of Medical Sciences, University of Turin and CPO Piedmont, Via Santena 7, Turin, 10126, Italy
| |
Collapse
|
5
|
Askar M, Tafavvoghi M, Småbrekke L, Bongo LA, Svendsen K. Using machine learning methods to predict all-cause somatic hospitalizations in adults: A systematic review. PLoS One 2024; 19:e0309175. [PMID: 39178283 PMCID: PMC11343463 DOI: 10.1371/journal.pone.0309175] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2024] [Accepted: 08/06/2024] [Indexed: 08/25/2024] Open
Abstract
AIM In this review, we investigated how Machine Learning (ML) was utilized to predict all-cause somatic hospital admissions and readmissions in adults. METHODS We searched eight databases (PubMed, Embase, Web of Science, CINAHL, ProQuest, OpenGrey, WorldCat, and MedNar) from their inception date to October 2023, and included records that predicted all-cause somatic hospital admissions and readmissions of adults using ML methodology. We used the CHARMS checklist for data extraction, PROBAST for bias and applicability assessment, and TRIPOD for reporting quality. RESULTS We screened 7,543 studies of which 163 full-text records were read and 116 met the review inclusion criteria. Among these, 45 predicted admission, 70 predicted readmission, and one study predicted both. There was a substantial variety in the types of datasets, algorithms, features, data preprocessing steps, evaluation, and validation methods. The most used types of features were demographics, diagnoses, vital signs, and laboratory tests. Area Under the ROC curve (AUC) was the most used evaluation metric. Models trained using boosting tree-based algorithms often performed better compared to others. ML algorithms commonly outperformed traditional regression techniques. Sixteen studies used Natural language processing (NLP) of clinical notes for prediction, all studies yielded good results. The overall adherence to reporting quality was poor in the review studies. Only five percent of models were implemented in clinical practice. The most frequently inadequately addressed methodological aspects were: providing model interpretations on the individual patient level, full code availability, performing external validation, calibrating models, and handling class imbalance. CONCLUSION This review has identified considerable concerns regarding methodological issues and reporting quality in studies investigating ML to predict hospitalizations. To ensure the acceptability of these models in clinical settings, it is crucial to improve the quality of future studies.
Collapse
Affiliation(s)
- Mohsen Askar
- Faculty of Health Sciences, Department of Pharmacy, UiT-The Arctic University of Norway, Tromsø, Norway
| | - Masoud Tafavvoghi
- Faculty of Science and Technology, Department of Computer Science, UiT-The Arctic University of Norway, Tromsø, Norway
| | - Lars Småbrekke
- Faculty of Health Sciences, Department of Pharmacy, UiT-The Arctic University of Norway, Tromsø, Norway
| | - Lars Ailo Bongo
- Faculty of Science and Technology, Department of Computer Science, UiT-The Arctic University of Norway, Tromsø, Norway
| | - Kristian Svendsen
- Faculty of Health Sciences, Department of Pharmacy, UiT-The Arctic University of Norway, Tromsø, Norway
| |
Collapse
|
6
|
Franklin G, Stephens R, Piracha M, Tiosano S, Lehouillier F, Koppel R, Elkin PL. The Sociodemographic Biases in Machine Learning Algorithms: A Biomedical Informatics Perspective. Life (Basel) 2024; 14:652. [PMID: 38929638 PMCID: PMC11204917 DOI: 10.3390/life14060652] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2024] [Revised: 04/24/2024] [Accepted: 04/26/2024] [Indexed: 06/28/2024] Open
Abstract
Artificial intelligence models represented in machine learning algorithms are promising tools for risk assessment used to guide clinical and other health care decisions. Machine learning algorithms, however, may house biases that propagate stereotypes, inequities, and discrimination that contribute to socioeconomic health care disparities. The biases include those related to some sociodemographic characteristics such as race, ethnicity, gender, age, insurance, and socioeconomic status from the use of erroneous electronic health record data. Additionally, there is concern that training data and algorithmic biases in large language models pose potential drawbacks. These biases affect the lives and livelihoods of a significant percentage of the population in the United States and globally. The social and economic consequences of the associated backlash cannot be underestimated. Here, we outline some of the sociodemographic, training data, and algorithmic biases that undermine sound health care risk assessment and medical decision-making that should be addressed in the health care system. We present a perspective and overview of these biases by gender, race, ethnicity, age, historically marginalized communities, algorithmic bias, biased evaluations, implicit bias, selection/sampling bias, socioeconomic status biases, biased data distributions, cultural biases and insurance status bias, conformation bias, information bias and anchoring biases and make recommendations to improve large language model training data, including de-biasing techniques such as counterfactual role-reversed sentences during knowledge distillation, fine-tuning, prefix attachment at training time, the use of toxicity classifiers, retrieval augmented generation and algorithmic modification to mitigate the biases moving forward.
Collapse
Affiliation(s)
- Gillian Franklin
- Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA; (G.F.); (R.S.); (M.P.); (F.L.); (R.K.)
- Department of Veterans Affairs, Knowledge Based Systems and Western New York, Veterans Affairs, Buffalo, NY 14215, USA
| | - Rachel Stephens
- Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA; (G.F.); (R.S.); (M.P.); (F.L.); (R.K.)
| | - Muhammad Piracha
- Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA; (G.F.); (R.S.); (M.P.); (F.L.); (R.K.)
| | - Shmuel Tiosano
- Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA; (G.F.); (R.S.); (M.P.); (F.L.); (R.K.)
| | - Frank Lehouillier
- Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA; (G.F.); (R.S.); (M.P.); (F.L.); (R.K.)
- Department of Veterans Affairs, Knowledge Based Systems and Western New York, Veterans Affairs, Buffalo, NY 14215, USA
| | - Ross Koppel
- Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA; (G.F.); (R.S.); (M.P.); (F.L.); (R.K.)
- Institute for Biomedical Informatics, Perelman School of Medicine, and Sociology Department, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Peter L. Elkin
- Department of Biomedical Informatics, University at Buffalo, Buffalo, NY 14203, USA; (G.F.); (R.S.); (M.P.); (F.L.); (R.K.)
- Department of Veterans Affairs, Knowledge Based Systems and Western New York, Veterans Affairs, Buffalo, NY 14215, USA
| |
Collapse
|
7
|
Bala J, Newson JJ, Thiagarajan TC. Hierarchy of demographic and social determinants of mental health: analysis of cross-sectional survey data from the Global Mind Project. BMJ Open 2024; 14:e075095. [PMID: 38490653 PMCID: PMC10946366 DOI: 10.1136/bmjopen-2023-075095] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 02/16/2024] [Indexed: 03/17/2024] Open
Abstract
OBJECTIVES To understand the extent to which various demographic and social determinants predict mental health status and their relative hierarchy of predictive power in order to prioritise and develop population-based preventative approaches. DESIGN Cross-sectional analysis of survey data. SETTING Internet-based survey from 32 countries across North America, Europe, Latin America, Middle East and North Africa, Sub-Saharan Africa, South Asia and Australia, collected between April 2020 and December 2021. PARTICIPANTS 270 000 adults aged 18-85+ years who participated in the Global Mind Project. OUTCOME MEASURES We used 120+ demographic and social determinants to predict aggregate mental health status and scores of individuals (mental health quotient (MHQ)) and determine their relative predictive influence using various machine learning models including gradient boosting and random forest classification for various demographic stratifications by age, gender, geographical region and language. Outcomes reported include model performance metrics of accuracy, precision, recall, F1 scores and importance of individual factors determined by reduction in the squared error attributable to that factor. RESULTS Across all demographic classification models, 80% of those with negative MHQs were correctly identified, while regression models predicted specific MHQ scores within ±15% of the position on the scale. Predictions were higher for older ages (0.9+ accuracy, 0.9+ F1 Score; 65+ years) and poorer for younger ages (0.68 accuracy, 0.68 F1 Score; 18-24 years). Across all age groups, genders, regions and language groups, lack of social interaction and sufficient sleep were several times more important than all other factors. For younger ages (18-24 years), other highly predictive factors included cyberbullying and sexual abuse while not being able to work was high for ages 45-54 years. CONCLUSION Social determinants of traumas, adversities and lifestyle can account for 60%-90% of mental health challenges. However, additional factors are at play, particularly for younger ages, that are not included in these data and need further investigation.
Collapse
|
8
|
Chivardi C, Zamudio Sosa A. Factors influencing the technical efficiency of diabetes care at primary care level in Mexico. Health Policy Plan 2024; 39:318-326. [PMID: 38153766 PMCID: PMC11423844 DOI: 10.1093/heapol/czad122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Revised: 11/17/2023] [Accepted: 12/20/2023] [Indexed: 12/29/2023] Open
Abstract
Diabetes prevalence is rising globally, especially in low- and middle-income countries like Mexico, posing challenges for healthcare systems that require efficient primary care to manage the disease. However, healthcare efficiency is influenced by factors beyond decision-makers, including socioeconomic and political conditions. This study aims to evaluate the technical efficiency of primary healthcare for diabetes patients in Mexico over a 12-year period and explore the impact of contextual variables on efficiency. A longitudinal analysis was conducted using administrative and socio-demographic data from 242 health jurisdictions between 2009 and 2020. Data envelopment analysis with bootstrapping and output orientation was used to measure the technical efficiency; health resources in infrastructure and human resources were used as inputs. As outcome, the number of patients receiving treatment for diabetes and the number of patients with controlled diabetes were considered. Machine learning algorithms were employed to analyse multiple factors affecting the provision of diabetes health services and assess heterogeneity and trends in efficiency across different health jurisdictions. The average technical efficiency in primary healthcare for diabetes patients was 0.44 (CI: 0.41-0.46) in 2009, reaching a peak of 0.71 (CI: 0.69-0.72) in 2016, and moderately declining to 0.60 (CI: 0.57-0.62) in 2020; these differences were statistically significant. The random forest analysis identified the marginalization index, primary healthcare coverage, proportion of indigenous population and demand for health services as the most influential variables in predicting efficiency levels. This research underscores the crucial need for the formulation of targeted public policies aimed at extending the scope of primary healthcare services, with a particular focus on addressing the unique challenges faced by marginalized and indigenous populations. According to our results, it is necessary that medical care management adjust to the specific demands and needs of these populations to guarantee equitable care in Mexico.
Collapse
Affiliation(s)
- Carlos Chivardi
- Centre for Health Economics (CHE), University of York, York YO10 5DD, United Kingdom
| | - Alejandro Zamudio Sosa
- School of Psychology, National Autonomous University of Mexico (UNAM), Mexico City 04510, Mexico
| |
Collapse
|
9
|
Daoud A, Johansson FD. The impact of austerity on children: Uncovering effect heterogeneity by political, economic, and family factors in low- and middle-income countries. SOCIAL SCIENCE RESEARCH 2024; 118:102973. [PMID: 38336420 DOI: 10.1016/j.ssresearch.2023.102973] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/20/2022] [Revised: 07/26/2023] [Accepted: 12/07/2023] [Indexed: 02/12/2024]
Abstract
Which children are most vulnerable when their government imposes austerity? Research tends to focus on either the political-economic level or the family level. Using a sample of nearly two million children in 67 countries, this study synthesizes theories from family sociology and political science to examine the heterogeneous effects on child poverty of economic shocks following the implementation of an International Monetary Fund (IMF) program. To discover effect heterogeneity, we apply machine learning to policy evaluation. We find that children's average probability of falling into poverty increases by 14 percentage points. We find substantial effect heterogeneity, with family wealth and governments' education spending as the two most important moderators. In contrast to studies that emphasize the vulnerability of low-income families, we find that middle-class children face an equally high risk of poverty. Our results show that synthesizing family and political factors yield deeper knowledge of how economic shocks affect children.
Collapse
Affiliation(s)
- Adel Daoud
- Center for Population and Development Studies, Harvard T.H. Chan School of Public Health, Harvard University, Boston MA, USA; Institute for Analytical Sociology, Linköping University, Sweden; The Division of Data Science and Artificial Intelligence, The Department of Computer Science and Engineering, Chalmers University of Technology, Sweden.
| | - Fredrik D Johansson
- The Division of Data Science and Artificial Intelligence, The Department of Computer Science and Engineering, Chalmers University of Technology, Sweden
| |
Collapse
|
10
|
Afzal HB, Jahangir T, Mei Y, Madden A, Sarker A, Kim S. Can adverse childhood experiences predict chronic health conditions? Development of trauma-informed, explainable machine learning models. Front Public Health 2024; 11:1309490. [PMID: 38332940 PMCID: PMC10851779 DOI: 10.3389/fpubh.2023.1309490] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2023] [Accepted: 12/27/2023] [Indexed: 02/10/2024] Open
Abstract
Introduction Decades of research have established the association between adverse childhood experiences (ACEs) and adult onset of chronic diseases, influenced by health behaviors and social determinants of health (SDoH). Machine Learning (ML) is a powerful tool for computing these complex associations and accurately predicting chronic health conditions. Methods Using the 2021 Behavioral Risk Factor Surveillance Survey, we developed several ML models-random forest, logistic regression, support vector machine, Naïve Bayes, and K-Nearest Neighbor-over data from a sample of 52,268 respondents. We predicted 13 chronic health conditions based on ACE history, health behaviors, SDoH, and demographics. We further assessed each variable's importance in outcome prediction for model interpretability. We evaluated model performance via the Area Under the Curve (AUC) score. Results With the inclusion of data on ACEs, our models outperformed or demonstrated similar accuracies to existing models in the literature that used SDoH to predict health outcomes. The most accurate models predicted diabetes, pulmonary diseases, and heart attacks. The random forest model was the most effective for diabetes (AUC = 0.784) and heart attacks (AUC = 0.732), and the logistic regression model most accurately predicted pulmonary diseases (AUC = 0.753). The strongest predictors across models were age, ever monitored blood sugar or blood pressure, count of the monitoring behaviors for blood sugar or blood pressure, BMI, time of last cholesterol check, employment status, income, count of vaccines received, health insurance status, and total ACEs. A cumulative measure of ACEs was a stronger predictor than individual ACEs. Discussion Our models can provide an interpretable, trauma-informed framework to identify and intervene with at-risk individuals early to prevent chronic health conditions and address their inequalities in the U.S.
Collapse
Affiliation(s)
- Hanin B. Afzal
- Department of Mechanical and Industrial Engineering, University of Toronto, Toronto, ON, Canada
| | - Tasfia Jahangir
- Department of Behavioral, Social and Health Education Sciences, Rollins School of Public Health, Emory University, Atlanta, GA, United States
| | - Yiyang Mei
- School of Law, Emory University, Atlanta, GA, United States
| | - Annabelle Madden
- Teachers College, Columbia University, New York, NY, United States
| | - Abeed Sarker
- Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, GA, United States
| | - Sangmi Kim
- Nell Hodgson Woodruff School of Nursing, Emory University, Atlanta, GA, United States
| |
Collapse
|
11
|
Yau FFF, Chiu IM, Wu KH, Cheng CY, Lee WC, Chen HC, Cheng CI, Chen TY. Machine learning-based prediction of coronary care unit readmission: A multihospital validation study. Digit Health 2024; 10:20552076241277030. [PMID: 39224796 PMCID: PMC11367690 DOI: 10.1177/20552076241277030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2024] [Accepted: 08/06/2024] [Indexed: 09/04/2024] Open
Abstract
Objective Readmission to the coronary care unit (CCU) has significant implications for patient outcomes and healthcare expenditure, emphasizing the urgency to accurately identify patients at high readmission risk. This study aims to construct and externally validate a predictive model for CCU readmission using machine learning (ML) algorithms across multiple hospitals. Methods Patient information, including demographics, medical history, and laboratory test results were collected from electronic health record system and contributed to a total of 40 features. Five ML models: logistic regression, random forest, support vector machine, gradient boosting, and multilayer perceptron were employed to estimate the readmission risk. Results The gradient boosting model was selected demonstrated superior performance with an area under the receiver operating characteristic curve (AUC) of 0.887 in the internal validation set. Further external validation in hold-out test set and three other medical centers upheld the model's robustness with consistent high AUCs, ranging from 0.852 to 0.879. Conclusion The results endorse the integration of ML algorithms in healthcare to enhance patient risk stratification, potentially optimizing clinical interventions, and diminishing the burden of CCU readmissions.
Collapse
Affiliation(s)
- Fei-Fei Flora Yau
- Department of Emergency Medicine, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
| | - I-Min Chiu
- Department of Emergency Medicine, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
- Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, Los Angeles, CA, USA
| | - Kuan-Han Wu
- Department of Emergency Medicine, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
| | - Chi-Yung Cheng
- Department of Emergency Medicine, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
| | - Wei-Chieh Lee
- Division of Cardiology, Department of Internal Medicine, Chi Mei Medical Center, Tainan, Taiwan
| | - Huang-Chung Chen
- Division of Cardiology, Department of Internal Medicine, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
| | - Cheng-I Cheng
- Division of Cardiology, Department of Internal Medicine, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
| | - Tien-Yu Chen
- Division of Cardiology, Department of Internal Medicine, Kaohsiung Chang Gung Memorial Hospital, Kaohsiung, Taiwan
| |
Collapse
|
12
|
Ratnayake I, Pepper S, Anderson A, Alsup A, Mudaranthakam DP. An R Shiny Application (SDOH) for Predictive Modeling Using Regional Social Determinants of Health Survey Responses. INTERNATIONAL JOURNAL OF SOCIAL DETERMINANTS OF HEALTH AND HEALTH SERVICES 2024; 54:21-27. [PMID: 37697462 PMCID: PMC10797831 DOI: 10.1177/27551938231201011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/31/2023] [Revised: 07/06/2023] [Accepted: 08/01/2023] [Indexed: 09/13/2023]
Abstract
Social determinants of health (SDoH) surveys are data sets that provide useful health-related information about individuals and communities. This study aims to develop a user-friendly web application that allows clinicians to get a predictive insight into the social needs of their patients before their in-patient visits using SDoH survey data to provide an improved and personalized service. The study used a longitudinal survey that consisted of 108,563 patient responses to 12 questions. Questions were designed to have a binary outcome as the response and the patient's most recent responses for each of these questions were modeled independently by incorporating explanatory variables. Multiple classification and regression techniques were used, including logistic regression, Bayesian generalized linear model, extreme gradient boosting, gradient boosting, neural networks, and random forests. Based on the area under the curve values, gradient boosting models provided the highest precision values. Finally, the models were incorporated into an R Shiny application, enabling users to predict and compare the impact of SDoH on patients' lives. The tool is freely hosted online by the University of Kansas Medical Center's Department of Biostatistics and Data Science. The supporting materials for the application are publicly accessible on GitHub.
Collapse
Affiliation(s)
- Isuru Ratnayake
- Department of Biostatistics & Data Science, The University of Kansas Medical Center, Kansas City, KS, USA
| | - Sam Pepper
- Department of Biostatistics & Data Science, The University of Kansas Medical Center, Kansas City, KS, USA
| | - Aliyah Anderson
- Department of Biostatistics & Data Science, The University of Kansas Medical Center, Kansas City, KS, USA
| | - Alexander Alsup
- PULM Pulmonary and Critical Care Medicine, The University of Kansas Medical Center, Kansas City, KS, USA
| | | |
Collapse
|
13
|
Wright L, Staatz CB, Silverwood RJ, Bann D. Trends in the ability of socioeconomic position to predict individual body mass index: an analysis of repeated cross-sectional data, 1991-2019. BMC Med 2023; 21:434. [PMID: 37957618 PMCID: PMC10644438 DOI: 10.1186/s12916-023-03103-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/26/2023] [Accepted: 10/04/2023] [Indexed: 11/15/2023] Open
Abstract
BACKGROUND The widening of group-level socioeconomic differences in body mass index (BMI) has received considerable research attention. However, the predictive power of socioeconomic position (SEP) indicators at the individual level remains uncertain, as does the potential temporal variation in their predictive value. Examining this is important given the increasing incorporation of SEP indicators into predictive algorithms and calls to reduce social inequality to tackle the obesity epidemic. We thus investigated SEP differences in BMI over three decades of the obesity epidemic in England, comparing population-wide (SEP group differences in mean BMI) and individual-level (out-of-sample prediction of individuals' BMI) approaches to understanding social inequalities. METHODS We used repeated cross-sectional data from the Health Survey for England, 1991-2019. BMI (kg/m2) was measured objectively, and SEP was measured via educational attainment, occupational class, and neighbourhood index of deprivation. We ran random forest models for each survey year and measure of SEP adjusting for age and sex. RESULTS The mean and variance of BMI increased within each SEP group over the study period. Mean differences in BMI by SEP group also increased: differences between lowest and highest education groups were 1.0 kg/m2 (0.4, 1.6) in 1991 and 1.3 kg/m2 (0.7, 1.8) in 2019. At the individual level, the predictive capacity of SEP was low, though increased in later years: including education in models improved predictive accuracy (mean absolute error) by 0.14% (- 0.9, 1.08) in 1991 and 1.05% (0.18, 1.82) in 2019. Similar patterns were obtained for occupational class and neighbourhood deprivation and when analysing obesity as an outcome. CONCLUSIONS SEP has become increasingly important at the population (group difference) and individual (prediction) levels. However, predictive ability remains low, suggesting limited utility of including SEP in prediction algorithms. Assuming links are causal, abolishing SEP differences in BMI could have a large effect on population health but would neither reverse the obesity epidemic nor reduce much of the variation in BMI.
Collapse
Affiliation(s)
- Liam Wright
- Centre for Longitudinal Studies, Social Research Institute, University College London, 55-59 Gordon Square, London, WC1H 0NT, UK.
| | - Charis Bridger Staatz
- Centre for Longitudinal Studies, Social Research Institute, University College London, 55-59 Gordon Square, London, WC1H 0NT, UK
| | - Richard J Silverwood
- Centre for Longitudinal Studies, Social Research Institute, University College London, 55-59 Gordon Square, London, WC1H 0NT, UK
| | - David Bann
- Centre for Longitudinal Studies, Social Research Institute, University College London, 55-59 Gordon Square, London, WC1H 0NT, UK
| |
Collapse
|
14
|
Morris MC, Moradi H, Aslani M, Sims M, Schlundt D, Kouros CD, Goodin B, Lim C, Kinney K. Predicting incident cardiovascular disease among African-American adults: A deep learning approach to evaluate social determinants of health in the Jackson heart study. PLoS One 2023; 18:e0294050. [PMID: 37948388 PMCID: PMC10637695 DOI: 10.1371/journal.pone.0294050] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2023] [Accepted: 10/24/2023] [Indexed: 11/12/2023] Open
Abstract
The present study sought to leverage machine learning approaches to determine whether social determinants of health improve prediction of incident cardiovascular disease (CVD). Participants in the Jackson Heart study with no history of CVD at baseline were followed over a 10-year period to determine first CVD events (i.e., coronary heart disease, stroke, heart failure). Three modeling algorithms (i.e., Deep Neural Network, Random Survival Forest, Penalized Cox Proportional Hazards) were used to evaluate three feature sets (i.e., demographics and standard/biobehavioral CVD risk factors [FS1], FS1 combined with psychosocial and socioeconomic CVD risk factors [FS2], and FS2 combined with environmental features [FS3]) as predictors of 10-year CVD risk. Contrary to hypothesis, overall predictive accuracy did not improve when adding social determinants of health. However, social determinants of health comprised eight of the top 15 predictors of first CVD events. The social determinates of health indicators included four socioeconomic factors (insurance status and types), one psychosocial factor (discrimination burden), and three environmental factors (density of outdoor physical activity resources, including instructional and water activities; modified retail food environment index excluding alcohol; and favorable food stores). Findings suggest that whereas understanding biological determinants may identify who is currently at risk for developing CVD and in need of secondary prevention, understanding upstream social determinants of CVD risk could guide primary prevention efforts by identifying where and how policy and community-level interventions could be targeted to facilitate changes in individual health behaviors.
Collapse
Affiliation(s)
- Matthew C. Morris
- Department of Anesthesiology, Vanderbilt University Medical Center, Nashville, Tennessee, United States of America
- Department of Psychiatry and Human Behavior, University of Mississippi Medical Center, Jackson, Mississippi, United States of America
| | - Hamidreza Moradi
- Department of Data Science, University of Mississippi Medical Center, Jackson, Mississippi, United States of America
- Department of Computer Science, University of North Carolina Agricultural and Technical State University, Greensboro, North Carolina, United States of America
| | - Maryam Aslani
- Department of Data Analytics, University of North Texas, Denton, Texas, United States of America
| | - Mario Sims
- Department of Social Medicine, Population, and Public Health, University of California, Riverside, California, United States of America
| | - David Schlundt
- Department of Psychology, Vanderbilt University, Nashville, Tennessee, United States of America
| | - Chrystyna D. Kouros
- Department of Psychology, Southern Methodist University, Dallas, Texas, United States of America
| | - Burel Goodin
- Department of Psychology, University of Alabama at Birmingham, Birmingham, Alabama, Texas, United States of America
- Department of Anesthesiology, Washington University in St. Louis, St. Louis, Missouri, United States of America
| | - Crystal Lim
- Department of Health Psychology, University of Missouri, Columbia, Missouri, Texas, United States of America
| | - Kerry Kinney
- Department of Psychology, Vanderbilt University, Nashville, Tennessee, United States of America
| |
Collapse
|
15
|
McNeill E, Lindenfeld Z, Mostafa L, Zein D, Silver D, Pagán J, Weeks WB, Aerts A, Des Rosiers S, Boch J, Chang JE. Uses of Social Determinants of Health Data to Address Cardiovascular Disease and Health Equity: A Scoping Review. J Am Heart Assoc 2023; 12:e030571. [PMID: 37929716 PMCID: PMC10727404 DOI: 10.1161/jaha.123.030571] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/12/2023] [Accepted: 09/06/2023] [Indexed: 11/07/2023]
Abstract
Background Cardiovascular disease is the leading cause of morbidity and mortality worldwide. Prior research suggests that social determinants of health have a compounding effect on health and are associated with cardiovascular disease. This scoping review explores what and how social determinants of health data are being used to address cardiovascular disease and improve health equity. Methods and Results After removing duplicate citations, the initial search yielded 4110 articles for screening, and 50 studies were identified for data extraction. Most studies relied on similar data sources for social determinants of health, including geocoded electronic health record data, national survey responses, and census data, and largely focused on health care access and quality, and the neighborhood and built environment. Most focused on developing interventions to improve health care access and quality or characterizing neighborhood risk and individual risk. Conclusions Given that few interventions addressed economic stability, education access and quality, or community context and social risk, the potential for harnessing social determinants of health data to reduce the burden of cardiovascular disease remains unrealized.
Collapse
Affiliation(s)
- Elizabeth McNeill
- Department of Public Health Policy and ManagementNew York University School of Global Public HealthNew YorkNYUSA
| | - Zoe Lindenfeld
- Department of Public Health Policy and ManagementNew York University School of Global Public HealthNew YorkNYUSA
| | - Logina Mostafa
- Department of Public Health Policy and ManagementNew York University School of Global Public HealthNew YorkNYUSA
| | - Dina Zein
- Department of Public Health Policy and ManagementNew York University School of Global Public HealthNew YorkNYUSA
| | - Diana Silver
- Department of Public Health Policy and ManagementNew York University School of Global Public HealthNew YorkNYUSA
| | - José Pagán
- Department of Public Health Policy and ManagementNew York University School of Global Public HealthNew YorkNYUSA
| | - William B. Weeks
- Microsoft Corporation, Precision Population Health, Microsoft ResearchRedmondWAUSA
| | - Ann Aerts
- The Novartis FoundationBaselSwitzerland
| | | | | | - Ji Eun Chang
- Department of Public Health Policy and ManagementNew York University School of Global Public HealthNew YorkNYUSA
| |
Collapse
|
16
|
Shen H, Zhao H, Jiang Y. Machine Learning Algorithms for Predicting Stunting among Under-Five Children in Papua New Guinea. CHILDREN (BASEL, SWITZERLAND) 2023; 10:1638. [PMID: 37892302 PMCID: PMC10605317 DOI: 10.3390/children10101638] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 09/27/2023] [Accepted: 09/28/2023] [Indexed: 10/29/2023]
Abstract
Preventing stunting is particularly important for healthy development across the life course. In Papua New Guinea (PNG), the prevalence of stunting in children under five years old has consistently not improved. Therefore, the primary objective of this study was to employ multiple machine learning algorithms to identify the most effective model and key predictors for stunting prediction in children in PNG. The study used data from the 2016-2018 Papua New Guinea Demographic Health Survey, including from 3380 children with complete height-for-age data. The least absolute shrinkage and selection operator (LASSO) and random-forest-recursive feature elimination were used for feature selection. Logistic regression, a conditional decision tree, a support vector machine with a radial basis function kernel, and an extreme gradient boosting machine (XGBoost) were employed to construct the prediction model. The performance of the final model was evaluated using accuracy, precision, recall, F1 score, and area under the curve (AUC). The results of the study showed that LASSO-XGBoost has the best performance for predicting stunting in PNG (AUC: 0.765; 95% CI: 0.714-0.819) with accuracy, precision, recall, and F1 scores of 0.728, 0.715, 0.628, and 0.669, respectively. Combined with the SHAP value method, the optimal prediction model identified living in the Highlands Region, the age of the child, being in the richest family, and having a larger or smaller birth size as the top five important characteristics for predicting stunting. Based on the model, the findings support the necessity of preventing stunting early in life. Emphasizing the nutritional status of vulnerable maternal and child populations in PNG is recommended to promote maternal and child health and overall well-being.
Collapse
Affiliation(s)
| | | | - Yi Jiang
- School of Public Health, Chongqing Medical University, Chongqing 400016, China; (H.S.); (H.Z.)
| |
Collapse
|
17
|
Qasrawi R, Hoteit M, Tayyem R, Bookari K, Al Sabbah H, Kamel I, Dashti S, Allehdan S, Bawadi H, Waly M, Ibrahim MO, Polo SV, Al-Halawa DA. Machine learning techniques for the identification of risk factors associated with food insecurity among adults in Arab countries during the COVID-19 pandemic. BMC Public Health 2023; 23:1805. [PMID: 37716999 PMCID: PMC10505318 DOI: 10.1186/s12889-023-16694-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2023] [Accepted: 09/01/2023] [Indexed: 09/18/2023] Open
Abstract
BACKGROUND A direct consequence of global warming, and strongly correlated with poor physical and mental health, food insecurity is a rising global concern associated with low dietary intake. The Coronavirus pandemic has further aggravated food insecurity among vulnerable communities, and thus has sparked the global conversation of equal food access, food distribution, and improvement of food support programs. This research was designed to identify the key features associated with food insecurity during the COVID-19 pandemic using Machine learning techniques. Seven machine learning algorithms were used in the model, which used a dataset of 32 features. The model was designed to predict food insecurity across ten Arab countries in the Gulf and Mediterranean regions. A total of 13,443 participants were extracted from the international Corona Cooking Survey conducted by 38 different countries during the COVID -19 pandemic. RESULTS The findings indicate that Jordanian, Palestinian, Lebanese, and Saudi Arabian respondents reported the highest rates of food insecurity in the region (15.4%, 13.7%, 13.7% and 11.3% respectively). On the other hand, Oman and Bahrain reported the lowest rates (5.4% and 5.5% respectively). Our model obtained accuracy levels of 70%-82% in all algorithms. Gradient Boosting and Random Forest techniques had the highest performance levels in predicting food insecurity (82% and 80% respectively). Place of residence, age, financial instability, difficulties in accessing food, and depression were found to be the most relevant features associated with food insecurity. CONCLUSIONS The ML algorithms seem to be an effective method in early detection and prediction of food insecurity and can profoundly aid policymaking. The integration of ML approaches in public health strategies could potentially improve the development of targeted and effective interventions to combat food insecurity in these regions and globally.
Collapse
Affiliation(s)
- Radwan Qasrawi
- Department of Computer Science, Al-Quds University, Jerusalem, Palestine.
- Department of Computer Engineering, Istinye University, Istanbul, 34010, Turkey.
| | - Maha Hoteit
- Faculty of Public Health, Lebanese University, Beirut, Lebanon
- PHENOL Research Group (Public Health Nutrition Program Lebanon), Faculty of Public Health, Lebanese University, Beirut, Lebanon
- Lebanese University Nutrition Surveillance Center (LUNSC), Lebanese Food Drugs and Chemical Administrations, Lebanese University, Beirut, Lebanon
| | - Reema Tayyem
- Department of Human Nutrition, College of Health Sciences, QU-Health, Qatar University, Doha, Qatar
- Department of Nutrition and Food Technology, Faculty of Agriculture, University of Jordan, Amman, 11942, Jordan
| | - Khlood Bookari
- National Nutrition Committee, Saudi Food and Drug Authority, Riyadh, Saudi Arabia
- Department of Clinical Nutrition, Faculty of Applied Medical Sciences, Taibah University, Madinah, Saudi Arabia
| | - Haleama Al Sabbah
- Department of Health Sciences, College of Natural and Health Sciences, Zayed University, Dubai, United Arab Emirates
| | | | - Somaia Dashti
- Public Authority for Applied Education and Training, Kuwait City, Kuwait
| | - Sabika Allehdan
- Department of Biology, College of Science, University of Bahrain, Zallaq, Bahrain
| | - Hiba Bawadi
- Department of Human Nutrition, College of Health Sciences, QU-Health, Qatar University, Doha, Qatar
| | - Mostafa Waly
- Food Science and Nutrition Department, College of Agricultural and Marine Sciences, Sultan Qaboos University, Muscat, Oman
| | - Mohammed O Ibrahim
- Department of Nutrition and Food Technology, Faculty of Agriculture, Mu'tah University, Karak, Jordan
| | | | - Diala Abu Al-Halawa
- Department of Faculty of Medicine, Al Quds University, Jerusalem, Palestine.
| |
Collapse
|
18
|
Lotfata A, Moosazadeh M, Helbich M, Hoseini B. Socioeconomic and environmental determinants of asthma prevalence: a cross-sectional study at the U.S. County level using geographically weighted random forests. Int J Health Geogr 2023; 22:18. [PMID: 37563691 PMCID: PMC10413687 DOI: 10.1186/s12942-023-00343-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2023] [Accepted: 08/04/2023] [Indexed: 08/12/2023] Open
Abstract
BACKGROUND Some studies have established associations between the prevalence of new-onset asthma and asthma exacerbation and socioeconomic and environmental determinants. However, research remains limited concerning the shape of these associations, the importance of the risk factors, and how these factors vary geographically. OBJECTIVE We aimed (1) to examine ecological associations between asthma prevalence and multiple socio-physical determinants in the United States; and (2) to assess geographic variations in their relative importance. METHODS Our study design is cross sectional based on county-level data for 2020 across the United States. We obtained self-reported asthma prevalence data of adults aged 18 years or older for each county. We applied conventional and geographically weighted random forest (GWRF) to investigate the associations between asthma prevalence and socioeconomic (e.g., poverty) and environmental determinants (e.g., air pollution and green space). To enhance the interpretability of the GWRF, we (1) assessed the shape of the associations through partial dependence plots, (2) ranked the determinants according to their global importance scores, and (3) mapped the local variable importance spatially. RESULTS Of the 3059 counties, the average asthma prevalence was 9.9 (standard deviation ± 0.99). The GWRF outperformed the conventional random forest. We found an indication, for example, that temperature was inversely associated with asthma prevalence, while poverty showed positive associations. The partial dependence plots showed that these associations had a non-linear shape. Ranking the socio-physical environmental factors concerning their global importance showed that smoking prevalence and depression prevalence were most relevant, while green space and limited language were of minor relevance. The local variable importance measures showed striking geographical differences. CONCLUSION Our findings strengthen the evidence that socio-physical environments play a role in explaining asthma prevalence, but their relevance seems to vary geographically. The results are vital for implementing future asthma prevention programs that should be tailor-made for specific areas.
Collapse
Affiliation(s)
- Aynaz Lotfata
- Department of Pathology, Microbiology, and Immunology, School of Veterinary Medicine, University of California, Davis, CA, USA
| | - Mohammad Moosazadeh
- Integrated Engineering, Department of Environmental Science and Engineering, College of Engineering, KyungHee University, Yongin, 446-701, Republic of Korea
| | - Marco Helbich
- Department of Human Geography and Spatial Planning, Faculty of Geosciences, University Utrecht, Utrecht, The Netherlands
| | - Benyamin Hoseini
- Pharmaceutical Research Center, Pharmaceutical Technology Institute, Mashhad University of Medical Sciences, Mashhad, Iran.
- Department of Medical Informatics, Faculty of Medicine, Mashhad University of Medical Sciences, Mashhad, Iran.
| |
Collapse
|
19
|
Hobensack M, Song J, Chae S, Kennedy E, Zolnoori M, Bowles KH, McDonald MV, Evans L, Topaz M. Capturing Concerns about Patient Deterioration in Narrative Documentation in Home Healthcare. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2023; 2022:552-559. [PMID: 37128448 PMCID: PMC10148365] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Home healthcare (HHC) agencies provide care to more than 3.4 million adults per year. There is value in studying HHC narrative notes to identify patients at risk for deterioration. This study aimed to build machine learning algorithms to identify "concerning" narrative notes of HHC patients and identify emerging themes. Six algorithms were applied to narrative notes (n = 4,000) from a HHC agency to classify notes as either "concerning" or "not concerning." Topic modeling using Latent Dirichlet Allocation bag of words was conducted to identify emerging themes from the concerning notes. Gradient Boosted Trees demonstrated the best performance with a F-score = 0.74 and AUC = 0.96. Emerging themes were related to patient-clinician communication, HHC services provided, gait challenges, mobility concerns, wounds, and caregivers. Most themes have been cited by previous literature as increasing risk for adverse events. In the future, such algorithms can support early identification of patients at risk for deterioration.
Collapse
Affiliation(s)
| | - Jiyoun Song
- Columbia University School of Nursing, New York, NY, USA
| | - Sena Chae
- University of Iowa College of Nursing, Iowa City, IA, USA
| | - Erin Kennedy
- University of Pennsylvania School of Nursing, Philadelphia, PA, USA
| | | | - Kathryn H Bowles
- Center for Home Care Policy & Research, Visiting Nurse Service of New York, New York, NY, USA
- University of Pennsylvania School of Nursing, Philadelphia, PA, USA
| | - Margaret V McDonald
- Center for Home Care Policy & Research, Visiting Nurse Service of New York, New York, NY, USA
| | - Lauren Evans
- Center for Home Care Policy & Research, Visiting Nurse Service of New York, New York, NY, USA
| | - Maxim Topaz
- Columbia University School of Nursing, New York, NY, USA
- Center for Home Care Policy & Research, Visiting Nurse Service of New York, New York, NY, USA
| |
Collapse
|
20
|
Iacobelli F, Yang A, Tom L, Leung IS, Crissman J, Salgado R, Simon M. Predicting Social Determinants of Health in Patient Navigation: Case Study. JMIR Form Res 2023; 7:e42683. [PMID: 36976634 PMCID: PMC10131925 DOI: 10.2196/42683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 01/12/2023] [Accepted: 02/07/2023] [Indexed: 02/10/2023] Open
Abstract
BACKGROUND Patient navigation (PN) programs have demonstrated efficacy in improving health outcomes for marginalized populations across a range of clinical contexts by addressing barriers to health care, including social determinants of health (SDoHs). However, it can be challenging for navigators to identify SDoHs by asking patients directly because of many factors, including patients' reluctance to disclose information, communication barriers, and the variable resources and experience levels of patient navigators. Navigators could benefit from strategies that augment their ability to gather SDoH data. Machine learning can be leveraged as one of these strategies to identify SDoH-related barriers. This could further improve health outcomes, particularly in underserved populations. OBJECTIVE In this formative study, we explored novel machine learning-based approaches to predict SDoHs in 2 Chicago area PN studies. In the first approach, we applied machine learning to data that include comments and interaction details between patients and navigators, whereas the second approach augmented patients' demographic information. This paper presents the results of these experiments and provides recommendations for data collection and the application of machine learning techniques more generally to the problem of predicting SDoHs. METHODS We conducted 2 experiments to explore the feasibility of using machine learning to predict patients' SDoHs using data collected from PN research. The machine learning algorithms were trained on data collected from 2 Chicago area PN studies. In the first experiment, we compared several machine learning algorithms (logistic regression, random forest, support vector machine, artificial neural network, and Gaussian naive Bayes) to predict SDoHs from both patient demographics and navigator's encounter data over time. In the second experiment, we used multiclass classification with augmented information, such as transportation time to a hospital, to predict multiple SDoHs for each patient. RESULTS In the first experiment, the random forest classifier achieved the highest accuracy among the classifiers tested. The overall accuracy to predict SDoHs was 71.3%. In the second experiment, multiclass classification effectively predicted a few patients' SDoHs based purely on demographic and augmented data. The best accuracy of these predictions overall was 73%. However, both experiments yielded high variability in individual SDoH predictions and correlations that become salient among SDoHs. CONCLUSIONS To our knowledge, this study is the first approach to applying PN encounter data and multiclass learning algorithms to predict SDoHs. The experiments discussed yielded valuable lessons, including the awareness of model limitations and bias, planning for standardization of data sources and measurement, and the need to identify and anticipate the intersectionality and clustering of SDoHs. Although our focus was on predicting patients' SDoHs, machine learning can have a broad range of applications in the field of PN, from tailoring intervention delivery (eg, supporting PN decision-making) to informing resource allocation for measurement, and PN supervision.
Collapse
Affiliation(s)
- Francisco Iacobelli
- Department of Computer Science, Northeastern Illinois University, Chicago, IL, United States
- Center for Advancing Safety of Machine Intelligence, Northwestern University, Evanston, IL, United States
| | - Anna Yang
- Center for Health Equity Transformation, Feinberg School of Medicine Chicago, Northwestern University, Chicago, IL, United States
- Department of Obstetrics and Gynecology, Feinberg School of Medicine Chicago, Northwestern University, Chicago, IL, United States
| | - Laura Tom
- Center for Health Equity Transformation, Feinberg School of Medicine Chicago, Northwestern University, Chicago, IL, United States
- Department of Obstetrics and Gynecology, Feinberg School of Medicine Chicago, Northwestern University, Chicago, IL, United States
| | - Ivy S Leung
- Center for Health Equity Transformation, Feinberg School of Medicine Chicago, Northwestern University, Chicago, IL, United States
- Department of Obstetrics and Gynecology, Feinberg School of Medicine Chicago, Northwestern University, Chicago, IL, United States
| | - John Crissman
- Department of Computer Science, Northeastern Illinois University, Chicago, IL, United States
| | - Rufino Salgado
- Department of Computer Science, Northeastern Illinois University, Chicago, IL, United States
| | - Melissa Simon
- Center for Health Equity Transformation, Feinberg School of Medicine Chicago, Northwestern University, Chicago, IL, United States
- Department of Obstetrics and Gynecology, Feinberg School of Medicine Chicago, Northwestern University, Chicago, IL, United States
- Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine Chicago, Northwestern University, Chicago, IL, United States
| |
Collapse
|
21
|
Alpert J, Kim HJ, McDonnell C, Guo Y, George TJ, Bian J, Wu Y. Barriers and Facilitators of Obtaining Social Determinants of Health of Patients With Cancer Through the Electronic Health Record Using Natural Language Processing Technology: Qualitative Feasibility Study With Stakeholder Interviews. JMIR Form Res 2022; 6:e43059. [PMID: 36574288 PMCID: PMC9832350 DOI: 10.2196/43059] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2022] [Revised: 12/01/2022] [Accepted: 12/05/2022] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Social determinants of health (SDoH), such as geographic neighborhoods, access to health care, education, and social structure, are important factors affecting people's health and health outcomes. The SDoH of patients are scarcely documented in a discrete format in electronic health records (EHRs) but are often available in free-text clinical narratives such as physician notes. Innovative methods like natural language processing (NLP) are being developed to identify and extract SDoH from EHRs, but it is imperative that the input of key stakeholders is included as NLP systems are designed. OBJECTIVE This study aims to understand the feasibility, challenges, and benefits of developing an NLP system to uncover SDoH from clinical narratives by conducting interviews with key stakeholders: (1) oncologists, (2) data analysts, (3) citizen scientists, and (4) patient navigators. METHODS Individuals who frequently work with SDoH data were invited to participate in semistructured interviews. All interviews were recorded and subsequently transcribed. After coding transcripts and developing a codebook, the constant comparative method was used to generate themes. RESULTS A total of 16 participants were interviewed (5 data analysts, 4 patient navigators, 4 physicians, and 3 citizen scientists). Three main themes emerged, accompanied by subthemes. The first theme, importance and approaches to obtaining SDoH, describes how every participant (n=16, 100%) regarded SDoH as important. In particular, proximity to the hospital and income levels were frequently relied upon. Communication about SDoH typically occurs during the initial conversation with the oncologist, but more personal information is often acquired by patient navigators. The second theme, SDoH exists in numerous forms, exemplified how SDoH arises during informal communication and can be difficult to enter into the EHR. The final theme, incorporating SDoH into health services research, addresses how more informed SDoH can be collected. One strategy is to empower patients so they are aware about the importance of SDoH, as well as employing NLP techniques to make narrative data available in a discrete format, which can provide oncologists with actionable data summaries. CONCLUSIONS Extracting SDoH from EHRs was considered valuable and necessary, but obstacles such as narrative data format can make the process difficult. NLP can be a potential solution, but as the technology is developed, it is important to consider how key stakeholders document SDoH, apply the NLP systems, and use the extracted SDoH in health outcome studies.
Collapse
Affiliation(s)
- Jordan Alpert
- Cleveland Clinic, Center for Value-Based Care Research, Cleveland, OH, United States
| | - Hyehyun Julia Kim
- College of Journalism and Communications, University of Florida, Gainesville, FL, United States
| | - Cara McDonnell
- Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Yi Guo
- Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Thomas J George
- Division of Hematology and Oncology, Department of Medicine, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Jiang Bian
- Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| | - Yonghui Wu
- Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, United States
| |
Collapse
|
22
|
Wright E, Chen JT, Beckfield J, Theodore N, Krieger N. Workplace hazards and health among informally employed domestic workers in 14 cities, United States, 2011-2012: Using four approaches to characterize workers' patterns of exposures. Am J Ind Med 2022; 65:959-974. [PMID: 36222491 DOI: 10.1002/ajim.23433] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 09/22/2022] [Accepted: 09/23/2022] [Indexed: 02/01/2023]
Abstract
BACKGROUND We characterized informally employed US domestic workers' (DWers) exposure to patterns of workplace hazards, as well as to single hazards, and examined associations with DWers' work-related and general health. METHODS We analyzed cross-sectional data from the sole nationwide survey of informally employed US DWers with work-related hazards data, conducted in 14 cities (2011-2012; N = 2086). We characterized DWers' exposures using four approaches: single exposures (n = 19 hazards), composite exposure to hazards selected a priori, classification trees, and latent class analysis. We used city fixed effects regression to estimate the risk ratio (RR) of work-related back injury, work-related illness, and fair-to-poor self-rated health associated with exposure as defined by each approach. RESULTS Across all four approaches-net of individual, household, and occupational characteristics, and city fixed effects-exposure to workplace hazards was associated with increased risk of the three health outcomes. For work-related back injury, the estimated RR associated with heavy lifting (the single hazard with the largest RR), exposure to all three hazards selected a priori (worker did heavy lifting, climbed to clean, and worked long hours) versus none, exposure to the two hazards identified by classification trees (heavy lifting, verbally abused) versus "no heavy lifting," and membership in the most- versus least-exposed latent class were, respectively, 3.4 (95% confidence interval [CI] 2.7-4.1); 6.5 (95% CI 4.8-8.7); 4.4 (95% CI 3.6-5.3), and 6.6 (95% CI 4.6-9.4). CONCLUSIONS Measures of joint work-related exposures were more strongly associated than single exposures with informally employed US DWers' health profiles.
Collapse
Affiliation(s)
- Emily Wright
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Jarvis T Chen
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Jason Beckfield
- Department of Sociology, Harvard University, Cambridge, Massachusetts, USA
| | - Nik Theodore
- Department of Urban Planning and Policy, University of Illinois Chicago, Chicago, Illinois, USA
| | - Nancy Krieger
- Department of Social and Behavioral Sciences, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
23
|
Wang B, Liu F, Deveaux L, Ash A, Gerber B, Allison J, Herbert C, Poitier M, MacDonell K, Li X, Stanton B. Predicting Adolescent Intervention Non-responsiveness for Precision HIV Prevention Using Machine Learning. AIDS Behav 2022; 27:1392-1402. [PMID: 36255592 PMCID: PMC10129965 DOI: 10.1007/s10461-022-03874-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/23/2022] [Indexed: 11/28/2022]
Abstract
Interventions to teach protective behaviors may be differentially effective within an adolescent population. Identifying the characteristics of youth who are less likely to respond to an intervention can guide program modifications to improve its effectiveness. Using comprehensive longitudinal data on adolescent risk behaviors, perceptions, sensation-seeking, peer and family influence, and neighborhood risk factors from 2564 grade 10-12 students in The Bahamas, this study employs machine learning approaches (support vector machines, logistic regression, decision tree, and random forest) to identify important predictors of non-responsiveness for precision prevention. We used 80% of the data to train the models and the rest for model testing. Among different machine learning algorithms, the random forest model using longitudinal data and the Boruta feature selection approach predicted intervention non-responsiveness best, achieving sensitivity of 85.4%, specificity of 78.4% and AUROC of 0.93 on the training data, and sensitivity of 84.3%, specificity of 67.1%, and AUROC of 0.85 on the test data. Key predictors include self-efficacy, perceived response cost, parent monitoring, vulnerability, response efficacy, HIV/AIDS knowledge, communication about condom use, and severity of HIV/STI. Machine learning can yield powerful predictive models to identify adolescents who are unlikely to respond to an intervention. Such models can guide the development of alternative strategies that may be more effective with intervention non-responders.
Collapse
Affiliation(s)
- Bo Wang
- Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, 368 Plantation Street, Albert Sherman Center, Worcester, MA, 01605, USA.
| | - Feifan Liu
- Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, 368 Plantation Street, Albert Sherman Center, Worcester, MA, 01605, USA
| | - Lynette Deveaux
- Office of HIV/AIDS, Ministry of Health, Shirley Street, Nassau, Bahamas
| | - Arlene Ash
- Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, 368 Plantation Street, Albert Sherman Center, Worcester, MA, 01605, USA
| | - Ben Gerber
- Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, 368 Plantation Street, Albert Sherman Center, Worcester, MA, 01605, USA
| | - Jeroan Allison
- Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, 368 Plantation Street, Albert Sherman Center, Worcester, MA, 01605, USA
| | - Carly Herbert
- Department of Population and Quantitative Health Sciences, University of Massachusetts Chan Medical School, 368 Plantation Street, Albert Sherman Center, Worcester, MA, 01605, USA
| | - Maxwell Poitier
- Office of HIV/AIDS, Ministry of Health, Shirley Street, Nassau, Bahamas
| | - Karen MacDonell
- Department of Family Medicine and Public Health Sciences, Wayne State University School of Medicine, Detroit, MI, USA
| | - Xiaoming Li
- Department of Health Promotion, Education, and Behavior, University of South Carolina Arnold School of Public, Columbia, SC, USA
| | - Bonita Stanton
- Hackensack Meridian School of Medicine, 340 Kingsland ST., Nutley, NJ, 07110, USA
| |
Collapse
|
24
|
Baird A, Cheng Y, Xia Y. Use of machine learning to examine disparities in completion of substance use disorder treatment. PLoS One 2022; 17:e0275054. [PMID: 36149868 PMCID: PMC9506659 DOI: 10.1371/journal.pone.0275054] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 09/11/2022] [Indexed: 11/19/2022] Open
Abstract
The objective of this work is to examine disparities in the completion of substance use disorder treatment in the U.S. Our data is from the Treatment Episode Dataset Discharge (TEDS-D) datasets from the U.S. Substance Abuse and Mental Health Services Administration (SAMHSA) for 2017-2019. We apply a two-stage virtual twins model (random forest + decision tree) where, in the first stage (random forest), we determine differences in treatment completion probability associated with race/ethnicity, income source, no co-occurrence of mental health disorders, gender (biological), no health insurance, veteran status, age, and primary substance (alcohol or opioid). In the second stage (decision tree), we identify subgroups associated with probability differences, where such subgroups are more or less likely to complete treatment. We find the subgroups most likely to complete substance use disorder treatment, when the subgroup represents more than 1% of the sample, are those with no mental health condition co-occurrence (4.8% more likely when discharged from an ambulatory outpatient treatment program, representing 62% of the sample; and 10% more likely for one of the more specifically defined subgroups representing 10% of the sample), an income source of job-related wages/salary (4.3% more likely when not having used in the 30 days primary to discharge and when primary substance is not alcohol only, representing 28% of the sample), and white non-Hispanics (2.7% more likely when discharged from residential long-term treatment, representing 9% of the sample). Important implications are that: 1) those without a co-occurring mental health condition are the most likely to complete treatment, 2) those with job related wages or income are more likely to complete treatment, and 3) racial/ethnicity disparities persist in favor of white non-Hispanic individuals seeking to complete treatment. Thus, additional resources may be needed to combat such disparities.
Collapse
Affiliation(s)
- Aaron Baird
- Institute of Health Administration, Robinson College of Business, Georgia State University, Atlanta, Georgia, United States of America
| | - Yichen Cheng
- Institute for Insight, Robinson College of Business, Georgia State University, Atlanta, Georgia, United States of America
| | - Yusen Xia
- Institute for Insight, Robinson College of Business, Georgia State University, Atlanta, Georgia, United States of America
| |
Collapse
|
25
|
Palanivinayagam A, Kumar VV, Mahesh TR, Singh KK, Singh A. Machine Learning-Based COVID-19 Classification Using E-Adopted CT Scans. INTERNATIONAL JOURNAL OF E-ADOPTION 2022. [DOI: 10.4018/ijea.310001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In recent years, several machine learning models were successfully deployed in various fields. However, a huge quantity of data is required for training good machine learning. Data are distributivity stored across multiple sources and centralizing those data leads to privacy and security issues. To solve this problem, the proposed federated-based method works by exchanging the parameters of three locally trained machine learning models without compromising privacy. Each machine learning model uses the e-adoption of CT scans for improving their training knowledge. The CT scans are electronically transferred between various medical centers. Proper care is taken to prevent identify loss from the e-adopted data. To normalize the parameters, a novel weighting scheme is also exchanged along with the parameters. Thus, the global model is trained with more heterogeneous samples to increase performance. Based on the experiment, the proposed algorithm has obtained 89% of accuracy, which is 32% more than the existing machine learning models.
Collapse
|
26
|
Classification of Parkinson's disease and its stages using machine learning. Sci Rep 2022; 12:14036. [PMID: 35982070 PMCID: PMC9388671 DOI: 10.1038/s41598-022-18015-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 08/03/2022] [Indexed: 11/19/2022] Open
Abstract
As digital health technology becomes more pervasive, machine learning (ML) provides a robust way to analyze and interpret the myriad of collected features. The purpose of this preliminary work was to use ML classification to assess the benefits and relevance of neurocognitive features both tablet-based assessments and self-reported metrics, as they relate to Parkinson’s Disease (PD) and its stages [Hoehn and Yahr (H&Y) Stages 1–5]. Further, this work aims to compare perceived versus sensor-based neurocognitive abilities. In this study, 75 participants (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$n = 50$$\end{document}n=50 PD; \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$n = 25$$\end{document}n=25 control) completed 14 tablet-based neurocognitive functional tests (e.g., motor, memory, speech, executive, and multifunction), functional movement assessments (e.g., Berg Balance Scale), and standardized health questionnaires (e.g., PDQ-39). Decision tree classification of sensor-based features allowed for the discrimination of PD from healthy controls with an accuracy of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$92.6\%$$\end{document}92.6%, and early and advanced stages of PD with an accuracy of \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$73.7\%$$\end{document}73.7%; compared to the current gold standard tools [e.g., standardized health questionnaires (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$78.3\%$$\end{document}78.3% accuracy) and functional movement assessments (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$70\%$$\end{document}70% accuracy)]. Significant features were also identified using decision tree classification. Device magnitude of acceleration was significant in 12 of 14 tests (\documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$85.7\%$$\end{document}85.7%), regardless of test type. For classification between diagnosed and control populations, 17 motor (e.g., device magnitude of acceleration), 9 accuracy (e.g., number of correct/incorrect interactions), and 8 timing features (e.g., time to between interactions) were significant. For classification between early (H&Y Stages 1 and 2) and advanced (H&Y Stages 3, 4, and 5) stages of PD, 7 motor, 12 accuracy, and 14 timing features were significant. Finally, this work depicts that perceived functionality of individuals with PD differed from sensor-based functionalities. In early-stage PD was shown to be \documentclass[12pt]{minimal}
\usepackage{amsmath}
\usepackage{wasysym}
\usepackage{amsfonts}
\usepackage{amssymb}
\usepackage{amsbsy}
\usepackage{mathrsfs}
\usepackage{upgreek}
\setlength{\oddsidemargin}{-69pt}
\begin{document}$$21.6\%$$\end{document}21.6% lower than sensor-based scores with notable perceived deficits in memory and executive function. However, individuals in advanced stages had elevated perceptions (1.57x) for executive and behavioral functions compared to early-stage populations. Machine learning in digital health systems allows for a more comprehensive understanding of neurodegenerative diseases and their stages and may also depict new features that influence the ways digital health technology should be configured.
Collapse
|
27
|
Rundle AG, Bader MDM, Mooney SJ. Machine Learning Approaches for Measuring Neighborhood Environments in Epidemiologic Studies. CURR EPIDEMIOL REP 2022; 9:175-182. [PMID: 35789918 PMCID: PMC9244309 DOI: 10.1007/s40471-022-00296-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/03/2022] [Indexed: 11/30/2022]
Abstract
Purpose of review Innovations in information technology, initiatives by local governments to share administrative data, and growing inventories of data available from commercial data aggregators have immensely expanded the information available to describe neighborhood environments, supporting an approach to research we call Urban Health Informatics. This review evaluates the application of machine learning to this new wealth of data for studies of the effects of neighborhood environments on health. Recent findings Prominent machine learning applications in this field include automated image analysis of archived imagery such as Google Street View images, variable selection methods to identify neighborhood environment factors that predict health outcomes from large pools of exposure variables, and spatial interpolation methods to estimate neighborhood conditions across large geographic areas. Summary In each domain, we highlight successes and cautions in the application of machine learning, particularly highlighting legal issues in applying machine learning approaches to Google’s geo-spatial data.
Collapse
Affiliation(s)
- Andrew G. Rundle
- Department of Epidemiology, Mailman School of Public Health, Columbia University, New York City, NY USA
| | | | - Stephen J. Mooney
- Department of Epidemiology, School of Public Health, University of Washington, Seattle, WA USA
| |
Collapse
|
28
|
Berg K, Doktorchik C, Quan H, Saini V. Automating data collection methods in electronic health record systems: a Social Determinant of Health (SDOH) viewpoint. Health Syst (Basingstoke) 2022; 12:472-480. [PMID: 38235302 PMCID: PMC10791104 DOI: 10.1080/20476965.2022.2075796] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2021] [Accepted: 04/26/2022] [Indexed: 10/18/2022] Open
Abstract
Social Determinant of Health (SDOH) data are important targets for research and innovation in Health Information Systems (HIS). The ways we envision SDOH in "smart" information systems will play a considerable role in shaping future population health landscapes. Current methods for data collection can capture wide ranges of SDOH factors, in standardised and non-standardised formats, from both primary and secondary sources. Advances in automating data linkage and text classification show particular promise for enhancing SDOH in HIS. One challenge is that social communication processes embedded in data collection are directly related to the inequalities that HIS attempt to measure and redress. To advance equity, it is imperative thatcare-providers, researchers, technicians, and administrators attend to power dynamics in HIS standards and practices. We recommend: 1. Investing in interdisciplinary and intersectoral knowledge generation and translation. 2. Developing novel methods for data discovery, linkage and analysis through participatory research. 3. Channelling information into upstream evidence-informed policy.
Collapse
Affiliation(s)
- Kelsey Berg
- Alberta Health Services, University of Lethbridge
| | | | | | | |
Collapse
|
29
|
Muhajarine N, Adeyinka DA, McCutcheon J, Green KL, Fahlman M, Kallio N. COVID-19 vaccine hesitancy and refusal and associated factors in an adult population in Saskatchewan, Canada: Evidence from predictive modelling. PLoS One 2021; 16:e0259513. [PMID: 34767603 PMCID: PMC8589208 DOI: 10.1371/journal.pone.0259513] [Citation(s) in RCA: 40] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 10/20/2021] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND A high population level of vaccination is required to control the COVID-19 pandemic, but not all Canadians are convinced of the value and safety of vaccination. Understanding more about these individuals can aid in developing strategies to increase their acceptance of a COVID-19 vaccine. The objectives of this study were to describe COVID-19 vaccine acceptance, hesitancy and refusal rates and associated factors in Saskatchewan, Canada. METHODS This is a cross-sequential study that consisted of pooled responses from weighted samples of 9,252 Saskatchewan adults (≥18 years) across nine rounds of data collection between May 4, 2020 and April 3, 2021. The outcome variable was vaccine intention: vaccine acceptance, hesitancy, and refusal. The independent variables were layered into socio-demographic factors, risk of exposure to coronavirus, mitigating behaviours, and perceptions of COVID-19. Data were analyzed using multinomial logistic regression and a classification and regression tree. RESULTS Seventy-six percent of the respondents indicated that they had been or were willing to be vaccinated, 13% had not yet decided, and the remaining 11% said they would not be vaccinated. Factors that increased the likelihood of vaccine refusal and hesitancy were lower education level, financial instability, Indigenous status, and not being concerned about spreading the coronavirus. Perceiving COVID-19 to be more of a threat to one's community and believing that one had a higher risk of illness or death from COVID-19 decreased the likelihood of both vaccine refusal and hesitancy. Women and newcomers to Canada were more likely to be unsure about getting vaccinated. Respondents who did not plan to be vaccinated were less likely to wear face masks and practice physical distancing. CONCLUSION While many Canadians have voluntarily and eagerly become vaccinated already, reaching sufficient coverage of the population is likely to require targeted efforts to convince those who are resistant or unsure. Identifying and overcoming any barriers to vaccination that exist within the socio-demographic groups we found were least likely to be vaccinated is a crucial component.
Collapse
Affiliation(s)
- Nazeem Muhajarine
- Department of Community Health and Epidemiology, College of Medicine, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
- Saskatchewan Population Health and Evaluation Research Unit, Saskatoon, Saskatchewan, Canada
- * E-mail:
| | - Daniel A. Adeyinka
- Department of Community Health and Epidemiology, College of Medicine, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
- Saskatchewan Population Health and Evaluation Research Unit, Saskatoon, Saskatchewan, Canada
| | - Jessica McCutcheon
- Canadian Hub for Applied and Social Research, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Kathryn L. Green
- Department of Community Health and Epidemiology, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| | - Miles Fahlman
- HACAN Consulting Ltd., Saskatoon, Saskatchewan, Canada
| | - Natalie Kallio
- Department of Community Health and Epidemiology, College of Medicine, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
- Saskatchewan Population Health and Evaluation Research Unit, Saskatoon, Saskatchewan, Canada
| |
Collapse
|
30
|
Abstract
OBJECTIVE Child undernutrition is a global public health problem with serious implications. In this study, we estimate predictive algorithms for the determinants of childhood stunting by using various machine learning (ML) algorithms. DESIGN This study draws on data from the Ethiopian Demographic and Health Survey of 2016. Five ML algorithms including eXtreme gradient boosting, k-nearest neighbours (k-NN), random forest, neural network and the generalised linear models were considered to predict the socio-demographic risk factors for undernutrition in Ethiopia. SETTING Households in Ethiopia. PARTICIPANTS A total of 9471 children below 5 years of age participated in this study. RESULTS The descriptive results show substantial regional variations in child stunting, wasting and underweight in Ethiopia. Also, among the five ML algorithms, xgbTree algorithm shows a better prediction ability than the generalised linear mixed algorithm. The best predicting algorithm (xgbTree) shows diverse important predictors of undernutrition across the three outcomes which include time to water source, anaemia history, child age greater than 30 months, small birth size and maternal underweight, among others. CONCLUSIONS The xgbTree algorithm was a reasonably superior ML algorithm for predicting childhood undernutrition in Ethiopia compared to other ML algorithms considered in this study. The findings support improvement in access to water supply, food security and fertility regulation, among others, in the quest to considerably improve childhood nutrition in Ethiopia.
Collapse
|