1
|
Davis VH, Qiang JR, Adekoya MacCarthy I, Howse D, Seshie AZ, Kosowan L, Delahunty-Pike A, Abaga E, Cooney J, Robinson M, Senior D, Zsager A, Aubrey-Bassler K, Irwin M, Jackson LA, Katz A, Marshall EG, Muhajarine N, Neudorf C, Garies S, Pinto AD. Perspectives on Using Artificial Intelligence to Derive Social Determinants of Health Data From Medical Records in Canada: Large Multijurisdictional Qualitative Study. J Med Internet Res 2025; 27:e52244. [PMID: 40053728 PMCID: PMC11926464 DOI: 10.2196/52244] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Revised: 10/31/2024] [Accepted: 11/29/2024] [Indexed: 03/09/2025] Open
Abstract
BACKGROUND Data on the social determinants of health could be used to improve care, support quality improvement initiatives, and track progress toward health equity. However, this data collection is not widespread. Artificial intelligence (AI), specifically natural language processing and machine learning, could be used to derive social determinants of health data from electronic medical records. This could reduce the time and resources required to obtain social determinants of health data. OBJECTIVE This study aimed to understand perspectives of a diverse sample of Canadians on the use of AI to derive social determinants of health information from electronic medical record data, including benefits and concerns. METHODS Using a qualitative description approach, in-depth interviews were conducted with 195 participants purposefully recruited from Ontario, Newfoundland and Labrador, Manitoba, and Saskatchewan. Transcripts were analyzed using an inductive and deductive content analysis. RESULTS A total of 4 themes were identified. First, AI was described as the inevitable future, facilitating more efficient, accessible social determinants of health information and use in primary care. Second, participants expressed concerns about potential health care harms and a distrust in AI and public systems. Third, some participants indicated that AI could lead to a loss of the human touch in health care, emphasizing a preference for strong relationships with providers and individualized care. Fourth, participants described the critical importance of consent and the need for strong safeguards to protect patient data and trust. CONCLUSIONS These findings provide important considerations for the use of AI in health care, and particularly when health care administrators and decision makers seek to derive social determinants of health data.
Collapse
Affiliation(s)
- Victoria H Davis
- Department of Health Behavior and Health Equity, School of Public Health, University of Michigan-Ann Arbor, Ann Arbor, MI, United States
| | - Jinfan Rose Qiang
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, Toronto, ON, Canada
| | - Itunuoluwa Adekoya MacCarthy
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, Toronto, ON, Canada
| | - Dana Howse
- Primary Healthcare Research Unit, Memorial University of Newfoundland and Labrador, St. John's, NL, Canada
| | - Abigail Zita Seshie
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, Toronto, ON, Canada
| | - Leanne Kosowan
- Department of Family Medicine, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, Canada
| | | | - Eunice Abaga
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, Toronto, ON, Canada
| | - Jane Cooney
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, Toronto, ON, Canada
| | - Marjeiry Robinson
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, Toronto, ON, Canada
| | - Dorothy Senior
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, Toronto, ON, Canada
| | - Alexander Zsager
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, Toronto, ON, Canada
| | - Kris Aubrey-Bassler
- Primary Healthcare Research Unit, Memorial University of Newfoundland and Labrador, St. John's, NL, Canada
| | - Mandi Irwin
- Department of Family Medicine, Dalhousie University, Halifax, NS, Canada
| | - Lois A Jackson
- School of Health and Human Performance, Dalhousie University, Halifax, NS, Canada
| | - Alan Katz
- Department of Family Medicine, Rady Faculty of Health Sciences, University of Manitoba, Winnipeg, MB, Canada
| | | | - Nazeem Muhajarine
- Department of Community Health & Epidemiology, College of Medicine, University of Saskatchewan, Saskatoon, SK, Canada
| | - Cory Neudorf
- Department of Community Health & Epidemiology, College of Medicine, University of Saskatchewan, Saskatoon, SK, Canada
| | - Stephanie Garies
- Department of Family Medicine, University of Calgary, Calgary, Canada
| | - Andrew D Pinto
- Upstream Lab, MAP Centre for Urban Health Solutions, Li Ka Shing Knowledge Institute, Unity Health Toronto, Toronto, ON, Canada
- Department of Family and Community Medicine, Faculty of Medicine, University of Toronto, Toronto, ON, Canada
- Department of Family and Community Medicine, St. Michael's Hospital, Toronto, ON, Canada
- Dalla Lana School of Public Health, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
2
|
Baminiwatte R, Torsu B, Scherbakov D, Mollalo A, Obeid JS, Alekseyenko AV, Lenert LA. Machine learning in healthcare citizen science: A scoping review. Int J Med Inform 2025; 195:105766. [PMID: 39740357 PMCID: PMC11810576 DOI: 10.1016/j.ijmedinf.2024.105766] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 11/20/2024] [Accepted: 12/15/2024] [Indexed: 01/02/2025]
Abstract
OBJECTIVES This scoping review aims to clarify the definition and trajectory of citizen-led scientific research (so-called citizen science) within the healthcare domain, examine the degree of integration of machine learning (ML) and the participation levels of citizen scientists in health-related projects. MATERIALS AND METHODS In January and September 2024 we conducted a comprehensive search in PubMed, Scopus, Web of Science, and EBSCOhost platform for peer-reviewed publications that combine citizen science and machine learning (ML) in healthcare. Articles were excluded if citizens were merely passive data providers or if only professional scientists were involved. RESULTS Out of an initial 1,395 screened, 56 articles spanning from 2013 to 2024 met the inclusion criteria. The majority of research projects were conducted in the U.S. (n = 20, 35.7 %), followed by Germany (n = 6, 10.7 %), with Spain, Canada, and the UK each contributing three studies (5.4 %). Data collection was the primary form of citizen scientist involvement (n = 29, 51.8 %), which included capturing images, sharing data online, and mailing samples. Data annotation was the next most common activity (n = 15, 26.8 %), followed by participation in ML model challenges (n = 8, 14.3 %) and decision-making contributions (n = 3, 5.4 %). Mosquitoes (n = 10, 34.5 %) and air pollution samples (n = 7, 24.2 %) were the main data objects collected by citizens for ML analysis. Classification tasks were the most prevalent ML method (n = 30, 52.6 %), with Convolutional Neural Networks being the most frequently used algorithm (n = 13, 20 %). DISCUSSION AND CONCLUSIONS Citizen science in healthcare is currently an American and European construct with growing expansion in Asia. Citizens are contributing data, and labeling data for ML methods, but only infrequently analyzing or leading studies. Projects that use "crowd-sourced" data and "citizen science" should be differentiated depending on the degree of involvement of citizens.
Collapse
Affiliation(s)
- Ranga Baminiwatte
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina (MUSC), Charleston, SC 29425, USA
| | - Blessing Torsu
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina (MUSC), Charleston, SC 29425, USA
| | - Dmitry Scherbakov
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina (MUSC), Charleston, SC 29425, USA
| | - Abolfazl Mollalo
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina (MUSC), Charleston, SC 29425, USA
| | - Jihad S Obeid
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina (MUSC), Charleston, SC 29425, USA
| | - Alexander V Alekseyenko
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina (MUSC), Charleston, SC 29425, USA
| | - Leslie A Lenert
- Biomedical Informatics Center, Department of Public Health Sciences, Medical University of South Carolina (MUSC), Charleston, SC 29425, USA.
| |
Collapse
|
3
|
Roy S, Morrell S, Zhao L, Homayouni R. Large-scale identification of social and behavioral determinants of health from clinical notes: comparison of Latent Semantic Indexing and Generative Pretrained Transformer (GPT) models. BMC Med Inform Decis Mak 2024; 24:296. [PMID: 39390479 PMCID: PMC11465786 DOI: 10.1186/s12911-024-02705-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Accepted: 09/30/2024] [Indexed: 10/12/2024] Open
Abstract
BACKGROUND Social and behavioral determinants of health (SBDH) are associated with a variety of health and utilization outcomes, yet these factors are not routinely documented in the structured fields of electronic health records (EHR). The objective of this study was to evaluate different machine learning approaches for detection of SBDH from the unstructured clinical notes in the EHR. METHODS Latent Semantic Indexing (LSI) was applied to 2,083,180 clinical notes corresponding to 46,146 patients in the MIMIC-III dataset. Using LSI, patients were ranked based on conceptual relevance to a set of keywords (lexicons) pertaining to 15 different SBDH categories. For Generative Pretrained Transformer (GPT) models, API requests were made with a Python script to connect to the OpenAI services in Azure, using gpt-3.5-turbo-1106 and gpt-4-1106-preview models. Prediction of SBDH categories were performed using a logistic regression model that included age, gender, race and SBDH ICD-9 codes. RESULTS LSI retrieved patients according to 15 SBDH domains, with an overall average PPV ≥ 83%. Using manually curated gold standard (GS) sets for nine SBDH categories, the macro-F1 score of LSI (0.74) was better than ICD-9 (0.71) and GPT-3.5 (0.54), but lower than GPT-4 (0.80). Due to document size limitations, only a subset of the GS cases could be processed by GPT-3.5 (55.8%) and GPT-4 (94.2%), compared to LSI (100%). Using common GS subsets for nine different SBDH categories, the macro-F1 of ICD-9 combined with either LSI (mean 0.88, 95% CI 0.82-0.93), GPT-3.5 (0.86, 0.82-0.91) or GPT-4 (0.88, 0.83-0.94) was not significantly different. After including age, gender, race and ICD-9 in a logistic regression model, the AUC for prediction of six out of the nine SBDH categories was higher for LSI compared to GPT-4.0. CONCLUSIONS These results demonstrate that the LSI approach performs comparable to more recent large language models, such as GPT-3.5 and GPT-4.0, when using the same set of documents. Importantly, LSI is robust, deterministic, and does not have document-size limitations or cost implications, which make it more amenable to real-world applications in health systems.
Collapse
Affiliation(s)
- Sujoy Roy
- Foundational Medical Studies, Population Health Informatics, Oakland University William Beaumont School of Medicine, Oakland University, 586 Pioneer Dr, 460 O'Dowd Hall, Rochester, MI, 48309-4482, USA
| | | | - Lili Zhao
- Biostatistics, Beaumont Research Institute, Corewell Health, Royal Oak, Michigan, USA
| | - Ramin Homayouni
- Foundational Medical Studies, Population Health Informatics, Oakland University William Beaumont School of Medicine, Oakland University, 586 Pioneer Dr, 460 O'Dowd Hall, Rochester, MI, 48309-4482, USA.
- Population Health & Health Equity Research, Beaumont Research Institute, Corewell Health, Royal Oak, Michigan, USA.
| |
Collapse
|
4
|
Yu Z, Peng C, Yang X, Dang C, Adekkanattu P, Gopal Patra B, Peng Y, Pathak J, Wilson DL, Chang CY, Lo-Ciganic WH, George TJ, Hogan WR, Guo Y, Bian J, Wu Y. Identifying social determinants of health from clinical narratives: A study of performance, documentation ratio, and potential bias. J Biomed Inform 2024; 153:104642. [PMID: 38621641 PMCID: PMC11141428 DOI: 10.1016/j.jbi.2024.104642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 04/09/2024] [Accepted: 04/12/2024] [Indexed: 04/17/2024]
Abstract
OBJECTIVE To develop a natural language processing (NLP) package to extract social determinants of health (SDoH) from clinical narratives, examine the bias among race and gender groups, test the generalizability of extracting SDoH for different disease groups, and examine population-level extraction ratio. METHODS We developed SDoH corpora using clinical notes identified at the University of Florida (UF) Health. We systematically compared 7 transformer-based large language models (LLMs) and developed an open-source package - SODA (i.e., SOcial DeterminAnts) to facilitate SDoH extraction from clinical narratives. We examined the performance and potential bias of SODA for different race and gender groups, tested the generalizability of SODA using two disease domains including cancer and opioid use, and explored strategies for improvement. We applied SODA to extract 19 categories of SDoH from the breast (n = 7,971), lung (n = 11,804), and colorectal cancer (n = 6,240) cohorts to assess patient-level extraction ratio and examine the differences among race and gender groups. RESULTS We developed an SDoH corpus using 629 clinical notes of cancer patients with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH, and another cross-disease validation corpus using 200 notes from opioid use patients with 4,342 SDoH concepts/attributes. We compared 7 transformer models and the GatorTron model achieved the best mean average strict/lenient F1 scores of 0.9122 and 0.9367 for SDoH concept extraction and 0.9584 and 0.9593 for linking attributes to SDoH concepts. There is a small performance gap (∼4%) between Males and Females, but a large performance gap (>16 %) among race groups. The performance dropped when we applied the cancer SDoH model to the opioid cohort; fine-tuning using a smaller opioid SDoH corpus improved the performance. The extraction ratio varied in the three cancer cohorts, in which 10 SDoH could be extracted from over 70 % of cancer patients, but 9 SDoH could be extracted from less than 70 % of cancer patients. Individuals from the White and Black groups have a higher extraction ratio than other minority race groups. CONCLUSIONS Our SODA package achieved good performance in extracting 19 categories of SDoH from clinical narratives. The SODA package with pre-trained transformer models is available at https://github.com/uf-hobi-informatics-lab/SODA_Docker.
Collapse
Affiliation(s)
- Zehao Yu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Cheng Peng
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Xi Yang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Chong Dang
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Prakash Adekkanattu
- Information Technologies and Services, Weill Cornell Medicine, New York, NY, USA
| | - Braja Gopal Patra
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Jyotishman Pathak
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
| | - Debbie L Wilson
- Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
| | - Ching-Yuan Chang
- Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
| | - Wei-Hsuan Lo-Ciganic
- Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
| | - Thomas J George
- Division of Hematology & Oncology, Department of Medicine, College of Medicine, University of Florida, Gainesville, FL, USA
| | - William R Hogan
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
| | - Yi Guo
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Jiang Bian
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
| | - Yonghui Wu
- Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.
| |
Collapse
|