Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Thompson HM, Sharma B, Bhalla S, Boley R, McCluskey C, Dligach D, Churpek MM, Karnik NS, Afshar M. Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups. J Am Med Inform Assoc 2021;28:2393-2403. [PMID: 34383925 PMCID: PMC8510285 DOI: 10.1093/jamia/ocab148] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Revised: 06/28/2021] [Accepted: 07/01/2021] [Indexed: 12/24/2022] Open

For:	Thompson HM, Sharma B, Bhalla S, Boley R, McCluskey C, Dligach D, Churpek MM, Karnik NS, Afshar M. Bias and fairness assessment of a natural language processing opioid misuse classifier: detection and mitigation of electronic health record data disadvantages across racial subgroups. J Am Med Inform Assoc 2021;28:2393-2403. [PMID: 34383925 PMCID: PMC8510285 DOI: 10.1093/jamia/ocab148] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2021] [Revised: 06/28/2021] [Accepted: 07/01/2021] [Indexed: 12/24/2022] Open

Number

Cited by Other Article(s)

Xie K, Ojemann WKS, Gallagher RS, Shinohara RT, Lucas A, Hill CE, Hamilton RH, Johnson KB, Roth D, Litt B, Ellis CA. Disparities in seizure outcomes revealed by large language models. J Am Med Inform Assoc 2024;31:1348-1355. [PMID: 38481027 PMCID: PMC11105138 DOI: 10.1093/jamia/ocae047] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Revised: 02/21/2024] [Accepted: 02/23/2024] [Indexed: 03/26/2024] Open

Affiliation(s)

Kevin Xie Department of Bioengineering, University of Pennsylvania, Philadelphia, PA 19104, United States Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, United States
William K S Ojemann Department of Bioengineering, University of Pennsylvania, Philadelphia, PA 19104, United States Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, United States
Ryan S Gallagher Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, United States Department of Neurology, University of Pennsylvania, Philadelphia, PA 19104, United States
Russell T Shinohara Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States
Alfredo Lucas Department of Bioengineering, University of Pennsylvania, Philadelphia, PA 19104, United States Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, United States Department of Neurology, University of Pennsylvania, Philadelphia, PA 19104, United States
Chloé E Hill Department of Neurology, University of Michigan, Ann Arbor, MI 48109, United States
Roy H Hamilton Department of Neurology, University of Pennsylvania, Philadelphia, PA 19104, United States
Kevin B Johnson Department of Bioengineering, University of Pennsylvania, Philadelphia, PA 19104, United States Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, United States Department of Pediatrics, University of Pennsylvania, Philadelphia, PA 19104, United States
Dan Roth Department of Computer and Information Science, University of Pennsylvania, Philadelphia, PA 19104, United States
Brian Litt Department of Bioengineering, University of Pennsylvania, Philadelphia, PA 19104, United States Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, United States Department of Neurology, University of Pennsylvania, Philadelphia, PA 19104, United States
Colin A Ellis Center for Neuroengineering and Therapeutics, University of Pennsylvania, Philadelphia, PA 19104, United States Department of Neurology, University of Pennsylvania, Philadelphia, PA 19104, United States

Collapse

Liu Y, Joly R, Reading Turchioe M, Benda N, Hermann A, Beecy A, Pathak J, Zhang Y. Preparing for the bedside-optimizing a postpartum depression risk prediction model for clinical implementation in a health system. J Am Med Inform Assoc 2024;31:1258-1267. [PMID: 38531676 PMCID: PMC11105144 DOI: 10.1093/jamia/ocae056] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2023] [Revised: 02/23/2024] [Accepted: 03/04/2024] [Indexed: 03/28/2024] Open

Yu Z, Peng C, Yang X, Dang C, Adekkanattu P, Gopal Patra B, Peng Y, Pathak J, Wilson DL, Chang CY, Lo-Ciganic WH, George TJ, Hogan WR, Guo Y, Bian J, Wu Y. Identifying social determinants of health from clinical narratives: A study of performance, documentation ratio, and potential bias. J Biomed Inform 2024;153:104642. [PMID: 38621641 PMCID: PMC11141428 DOI: 10.1016/j.jbi.2024.104642] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2023] [Revised: 04/09/2024] [Accepted: 04/12/2024] [Indexed: 04/17/2024]

Abstract

OBJECTIVE

To develop a natural language processing (NLP) package to extract social determinants of health (SDoH) from clinical narratives, examine the bias among race and gender groups, test the generalizability of extracting SDoH for different disease groups, and examine population-level extraction ratio.

METHODS

We developed SDoH corpora using clinical notes identified at the University of Florida (UF) Health. We systematically compared 7 transformer-based large language models (LLMs) and developed an open-source package - SODA (i.e., SOcial DeterminAnts) to facilitate SDoH extraction from clinical narratives. We examined the performance and potential bias of SODA for different race and gender groups, tested the generalizability of SODA using two disease domains including cancer and opioid use, and explored strategies for improvement. We applied SODA to extract 19 categories of SDoH from the breast (n = 7,971), lung (n = 11,804), and colorectal cancer (n = 6,240) cohorts to assess patient-level extraction ratio and examine the differences among race and gender groups.

RESULTS

We developed an SDoH corpus using 629 clinical notes of cancer patients with annotations of 13,193 SDoH concepts/attributes from 19 categories of SDoH, and another cross-disease validation corpus using 200 notes from opioid use patients with 4,342 SDoH concepts/attributes. We compared 7 transformer models and the GatorTron model achieved the best mean average strict/lenient F1 scores of 0.9122 and 0.9367 for SDoH concept extraction and 0.9584 and 0.9593 for linking attributes to SDoH concepts. There is a small performance gap (∼4%) between Males and Females, but a large performance gap (>16 %) among race groups. The performance dropped when we applied the cancer SDoH model to the opioid cohort; fine-tuning using a smaller opioid SDoH corpus improved the performance. The extraction ratio varied in the three cancer cohorts, in which 10 SDoH could be extracted from over 70 % of cancer patients, but 9 SDoH could be extracted from less than 70 % of cancer patients. Individuals from the White and Black groups have a higher extraction ratio than other minority race groups.

CONCLUSIONS

Our SODA package achieved good performance in extracting 19 categories of SDoH from clinical narratives. The SODA package with pre-trained transformer models is available at https://github.com/uf-hobi-informatics-lab/SODA_Docker.

Collapse

Affiliation(s)

Zehao Yu Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
Cheng Peng Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
Xi Yang Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
Chong Dang Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
Prakash Adekkanattu Information Technologies and Services, Weill Cornell Medicine, New York, NY, USA
Braja Gopal Patra Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
Yifan Peng Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
Jyotishman Pathak Department of Population Health Sciences, Weill Cornell Medicine, New York, NY, USA
Debbie L Wilson Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
Ching-Yuan Chang Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
Wei-Hsuan Lo-Ciganic Department of Pharmaceutical Outcomes & Policy, College of Pharmacy, University of Florida, Gainesville, FL 32611, USA
Thomas J George Division of Hematology & Oncology, Department of Medicine, College of Medicine, University of Florida, Gainesville, FL, USA
William R Hogan Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA
Yi Guo Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
Jiang Bian Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA
Yonghui Wu Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; Cancer Informatics Shared Resource, University of Florida Health Cancer Center, Gainesville, FL, USA.

Collapse

Siddique SM, Tipton K, Leas B, Jepson C, Aysola J, Cohen JB, Flores E, Harhay MO, Schmidt H, Weissman GE, Fricke J, Treadwell JR, Mull NK. The Impact of Health Care Algorithms on Racial and Ethnic Disparities : A Systematic Review. Ann Intern Med 2024;177:484-496. [PMID: 38467001 DOI: 10.7326/m23-2960] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 03/13/2024] Open

Abstract

BACKGROUND

There is increasing concern for the potential impact of health care algorithms on racial and ethnic disparities.

PURPOSE

To examine the evidence on how health care algorithms and associated mitigation strategies affect racial and ethnic disparities.

DATA SOURCES

Several databases were searched for relevant studies published from 1 January 2011 to 30 September 2023.

STUDY SELECTION

Using predefined criteria and dual review, studies were screened and selected to determine: 1) the effect of algorithms on racial and ethnic disparities in health and health care outcomes and 2) the effect of strategies or approaches to mitigate racial and ethnic bias in the development, validation, dissemination, and implementation of algorithms.

DATA EXTRACTION

Outcomes of interest (that is, access to health care, quality of care, and health outcomes) were extracted with risk-of-bias assessment using the ROBINS-I (Risk Of Bias In Non-randomised Studies - of Interventions) tool and adapted CARE-CPM (Critical Appraisal for Racial and Ethnic Equity in Clinical Prediction Models) equity extension.

DATA SYNTHESIS

Sixty-three studies (51 modeling, 4 retrospective, 2 prospective, 5 prepost studies, and 1 randomized controlled trial) were included. Heterogenous evidence on algorithms was found to: a) reduce disparities (for example, the revised kidney allocation system), b) perpetuate or exacerbate disparities (for example, severity-of-illness scores applied to critical care resource allocation), and/or c) have no statistically significant effect on select outcomes (for example, the HEART Pathway [history, electrocardiogram, age, risk factors, and troponin]). To mitigate disparities, 7 strategies were identified: removing an input variable, replacing a variable, adding race, adding a non-race-based variable, changing the racial and ethnic composition of the population used in model development, creating separate thresholds for subpopulations, and modifying algorithmic analytic techniques.

LIMITATION

Results are mostly based on modeling studies and may be highly context-specific.

CONCLUSION

Algorithms can mitigate, perpetuate, and exacerbate racial and ethnic disparities, regardless of the explicit use of race and ethnicity, but evidence is heterogeneous. Intentionality and implementation of the algorithm can impact the effect on disparities, and there may be tradeoffs in outcomes.

PRIMARY FUNDING SOURCE

Agency for Healthcare Quality and Research.

Collapse

Affiliation(s)

Shazia Mehmood Siddique Division of Gastroenterology, University of Pennsylvania; Leonard Davis Institute of Health Economics, University of Pennsylvania; and Center for Evidence-Based Practice, Penn Medicine, Philadelphia, Pennsylvania (S.M.S.)
Kelley Tipton ECRI-Penn Medicine Evidence-based Practice Center, ECRI, Plymouth Meeting, Pennsylvania (K.T., C.J., J.R.T.)
Brian Leas Center for Evidence-Based Practice, Penn Medicine, Philadelphia, Pennsylvania (B.L., E.F., J.F.)
Christopher Jepson ECRI-Penn Medicine Evidence-based Practice Center, ECRI, Plymouth Meeting, Pennsylvania (K.T., C.J., J.R.T.)
Jaya Aysola Leonard Davis Institute of Health Economics, University of Pennsylvania; Division of General Internal Medicine, University of Pennsylvania; and Penn Medicine Center for Health Equity Advancement, Penn Medicine, Philadelphia, Pennsylvania (J.A.)
Jordana B Cohen Division of Renal-Electrolyte and Hypertension, University of Pennsylvania; and Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, Pennsylvania (J.B.C.)
Emilia Flores Center for Evidence-Based Practice, Penn Medicine, Philadelphia, Pennsylvania (B.L., E.F., J.F.)
Michael O Harhay Leonard Davis Institute of Health Economics, University of Pennsylvania; Center for Evidence-Based Practice, Penn Medicine; Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania; and Division of Pulmonary and Critical Care, University of Pennsylvania, Philadelphia, Pennsylvania (M.O.H.)
Harald Schmidt Department of Medical Ethics & Health Policy, University of Pennsylvania, Philadelphia, Pennsylvania (H.S.)
Gary E Weissman Leonard Davis Institute of Health Economics, University of Pennsylvania; Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania; and Division of Pulmonary and Critical Care, University of Pennsylvania, Philadelphia, Pennsylvania (G.E.W.)
Julie Fricke Center for Evidence-Based Practice, Penn Medicine, Philadelphia, Pennsylvania (B.L., E.F., J.F.)
Jonathan R Treadwell ECRI-Penn Medicine Evidence-based Practice Center, ECRI, Plymouth Meeting, Pennsylvania (K.T., C.J., J.R.T.)
Nikhil K Mull Center for Evidence-Based Practice, Penn Medicine; and Division of Hospital Medicine, University of Pennsylvania, Philadelphia, Pennsylvania (N.K.M.)

Collapse

Mashima Y, Tanigawa M, Yokoi H. Information heterogeneity between progress notes by physicians and nurses for inpatients with digestive system diseases. Sci Rep 2024;14:7656. [PMID: 38561333 PMCID: PMC10984979 DOI: 10.1038/s41598-024-56324-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2023] [Accepted: 03/05/2024] [Indexed: 04/04/2024] Open

Huang Y, Guo J, Chen WH, Lin HY, Tang H, Wang F, Xu H, Bian J. A scoping review of fair machine learning techniques when using real-world data. J Biomed Inform 2024;151:104622. [PMID: 38452862 PMCID: PMC11146346 DOI: 10.1016/j.jbi.2024.104622] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2023] [Revised: 01/19/2024] [Accepted: 03/03/2024] [Indexed: 03/09/2024]

Abstract

OBJECTIVE

The integration of artificial intelligence (AI) and machine learning (ML) in health care to aid clinical decisions is widespread. However, as AI and ML take important roles in health care, there are concerns about AI and ML associated fairness and bias. That is, an AI tool may have a disparate impact, with its benefits and drawbacks unevenly distributed across societal strata and subpopulations, potentially exacerbating existing health inequities. Thus, the objectives of this scoping review were to summarize existing literature and identify gaps in the topic of tackling algorithmic bias and optimizing fairness in AI/ML models using real-world data (RWD) in health care domains.

METHODS

We conducted a thorough review of techniques for assessing and optimizing AI/ML model fairness in health care when using RWD in health care domains. The focus lies on appraising different quantification metrics for accessing fairness, publicly accessible datasets for ML fairness research, and bias mitigation approaches.

RESULTS

We identified 11 papers that are focused on optimizing model fairness in health care applications. The current research on mitigating bias issues in RWD is limited, both in terms of disease variety and health care applications, as well as the accessibility of public datasets for ML fairness research. Existing studies often indicate positive outcomes when using pre-processing techniques to address algorithmic bias. There remain unresolved questions within the field that require further research, which includes pinpointing the root causes of bias in ML models, broadening fairness research in AI/ML with the use of RWD and exploring its implications in healthcare settings, and evaluating and addressing bias in multi-modal data.

CONCLUSION

This paper provides useful reference material and insights to researchers regarding AI/ML fairness in real-world health care data and reveals the gaps in the field. Fair AI/ML in health care is a burgeoning field that requires a heightened research focus to cover diverse applications and different types of RWD.

Collapse

Davenport MA, Sirrianni JW, Chisolm DJ. Machine learning data sources in pediatric sleep research: assessing racial/ethnic differences in electronic health record-based clinical notes prior to model training. FRONTIERS IN SLEEP 2024;3:1271167. [PMID: 38817450 PMCID: PMC11138315 DOI: 10.3389/frsle.2024.1271167] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/01/2024]

Abstract

Introduction

Pediatric sleep problems can be detected across racial/ethnic subpopulations in primary care settings. However, the electronic health record (EHR) data documentation that describes patients' sleep problems may be inherently biased due to both historical biases and informed presence. This study assessed racial/ethnic differences in natural language processing (NLP) training data (e.g., pediatric sleep-related keywords in primary care clinical notes) prior to model training.

Methods

We used a predefined keyword features set containing 178 Peds B-SATED keywords. We then queried all the clinical notes from patients seen in pediatric primary care between the ages of 5 and 18 from January 2018 to December 2021. A least absolute shrinkage and selection operator (LASSO) regression model was used to investigate whether there were racial/ethnic differences in the documentation of Peds B-SATED keywords. Then, mixed-effects logistic regression was used to determine whether the odds of the presence of global Peds B-SATED dimensions also differed across racial/ethnic subpopulations.

Results

Using both LASSO and multilevel modeling approaches, the current study found that there were racial/ethnic differences in providers' documentation of Peds B-SATED keywords and global dimensions. In addition, the most frequently documented Peds B-SATED keyword rankings qualitatively differed across racial/ethnic subpopulations.

Conclusion

This study revealed providers' differential patterns of documenting Peds B-SATED keywords and global dimensions that may account for the under-detection of pediatric sleep problems among racial/ethnic subpopulations. In research, these findings have important implications for the equitable clinical documentation of sleep problems in pediatric primary care settings and extend prior retrospective work in pediatric sleep specialty settings.

Collapse

Cary MP, Zink A, Wei S, Olson A, Yan M, Senior R, Bessias S, Gadhoumi K, Jean-Pierre G, Wang D, Ledbetter LS, Economou-Zavlanos NJ, Obermeyer Z, Pencina MJ. Mitigating Racial And Ethnic Bias And Advancing Health Equity In Clinical Algorithms: A Scoping Review. Health Aff (Millwood) 2023;42:1359-1368. [PMID: 37782868 PMCID: PMC10668606 DOI: 10.1377/hlthaff.2023.00553] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/04/2023]

Xie K, Ojemann WKS, Gallagher RS, Lucas A, Hill CE, Hamilton RH, Johnson KB, Roth D, Litt B, Ellis CA. Disparities in seizure outcomes revealed by large language models. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.09.20.23295842. [PMID: 37790442 PMCID: PMC10543059 DOI: 10.1101/2023.09.20.23295842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/05/2023]

Affiliation(s)

Kevin Xie University of Pennsylvania, Dept. of Bioengineering, Philadelphia, PA, USA University of Pennsylvania, Center for Neuroengineering and Therapeutics, Philadelphia, PA, USA
William K S Ojemann University of Pennsylvania, Dept. of Bioengineering, Philadelphia, PA, USA University of Pennsylvania, Center for Neuroengineering and Therapeutics, Philadelphia, PA, USA
Ryan S Gallagher University of Pennsylvania, Center for Neuroengineering and Therapeutics, Philadelphia, PA, USA University of Pennsylvania, Dept. of Neurology, Philadelphia, PA, USA
Alfredo Lucas University of Pennsylvania, Dept. of Bioengineering, Philadelphia, PA, USA University of Pennsylvania, Center for Neuroengineering and Therapeutics, Philadelphia, PA, USA University of Pennsylvania, Dept. of Neurology, Philadelphia, PA, USA
Chloé E Hill University of Michigan, Dept. of Neurology, Ann Arbor, MI, USA
Roy H Hamilton University of Pennsylvania, Dept. of Neurology, Philadelphia, PA, USA
Kevin B Johnson University of Pennsylvania, Dept. of Bioengineering, Philadelphia, PA, USA University of Pennsylvania, Dept. Of Biostatistics, Epidemiology and Informatics, Philadelphia, PA USA University of Pennsylvania, Dept. of Computer and Information Science, Philadelphia, PA, USA University of Pennsylvania, Dept. of Pediatrics, Philadelphia, PA, USA
Dan Roth University of Pennsylvania, Dept. of Computer and Information Science, Philadelphia, PA, USA
Brian Litt University of Pennsylvania, Dept. of Bioengineering, Philadelphia, PA, USA University of Pennsylvania, Center for Neuroengineering and Therapeutics, Philadelphia, PA, USA University of Pennsylvania, Dept. of Neurology, Philadelphia, PA, USA
Colin A Ellis University of Pennsylvania, Center for Neuroengineering and Therapeutics, Philadelphia, PA, USA University of Pennsylvania, Dept. of Neurology, Philadelphia, PA, USA

Collapse

Maurud S, Henni SH, Moen A. Health Equity in Clinical Research Informatics. Yearb Med Inform 2023;32:138-145. [PMID: 37414033 PMCID: PMC10751137 DOI: 10.1055/s-0043-1768720] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/08/2023] Open

Ge Y, Guo Y, Das S, Al-Garadi MA, Sarker A. Few-shot learning for medical text: A review of advances, trends, and opportunities. J Biomed Inform 2023;144:104458. [PMID: 37488023 PMCID: PMC10940971 DOI: 10.1016/j.jbi.2023.104458] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 06/19/2023] [Accepted: 07/19/2023] [Indexed: 07/26/2023]

Abstract

BACKGROUND

Few-shot learning (FSL) is a class of machine learning methods that require small numbers of labeled instances for training. With many medical topics having limited annotated text-based data in practical settings, FSL-based natural language processing (NLP) holds substantial promise. We aimed to conduct a review to explore the current state of FSL methods for medical NLP.

METHODS

We searched for articles published between January 2016 and October 2022 using PubMed/Medline, Embase, ACL Anthology, and IEEE Xplore Digital Library. We also searched the preprint servers (e.g., arXiv, medRxiv, and bioRxiv) via Google Scholar to identify the latest relevant methods. We included all articles that involved FSL and any form of medical text. We abstracted articles based on the data source, target task, training set size, primary method(s)/approach(es), and evaluation metric(s).

RESULTS

Fifty-one articles met our inclusion criteria-all published after 2018, and most since 2020 (42/51; 82%). Concept extraction/named entity recognition was the most frequently addressed task (21/51; 41%), followed by text classification (16/51; 31%). Thirty-two (61%) articles reconstructed existing datasets to fit few-shot scenarios, and MIMIC-III was the most frequently used dataset (10/51; 20%). 77% of the articles attempted to incorporate prior knowledge to augment the small datasets available for training. Common methods included FSL with attention mechanisms (20/51; 39%), prototypical networks (11/51; 22%), meta-learning (7/51; 14%), and prompt-based learning methods, the latter being particularly popular since 2021. Benchmarking experiments demonstrated relative underperformance of FSL methods on biomedical NLP tasks.

CONCLUSION

Despite the potential for FSL in biomedical NLP, progress has been limited. This may be attributed to the rarity of specialized data, lack of standardized evaluation criteria, and the underperformance of FSL methods on biomedical topics. The creation of publicly-available specialized datasets for biomedical FSL may aid method development by facilitating comparative analyses.

Collapse

Huang D, Cogill S, Hsia RY, Yang S, Kim D. Development and external validation of a pretrained deep learning model for the prediction of non-accidental trauma. NPJ Digit Med 2023;6:131. [PMID: 37468526 DOI: 10.1038/s41746-023-00875-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2022] [Accepted: 07/07/2023] [Indexed: 07/21/2023] Open

Banda JM, Shah NH, Periyakoil VS. Characterizing subgroup performance of probabilistic phenotype algorithms within older adults: a case study for dementia, mild cognitive impairment, and Alzheimer's and Parkinson's diseases. JAMIA Open 2023;6:ooad043. [PMID: 37397506 PMCID: PMC10307941 DOI: 10.1093/jamiaopen/ooad043] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2023] [Revised: 06/06/2023] [Accepted: 06/22/2023] [Indexed: 07/04/2023] Open

Abstract

Objective

Biases within probabilistic electronic phenotyping algorithms are largely unexplored. In this work, we characterize differences in subgroup performance of phenotyping algorithms for Alzheimer's disease and related dementias (ADRD) in older adults.

Materials and methods

We created an experimental framework to characterize the performance of probabilistic phenotyping algorithms under different racial distributions allowing us to identify which algorithms may have differential performance, by how much, and under what conditions. We relied on rule-based phenotype definitions as reference to evaluate probabilistic phenotype algorithms created using the Automated PHenotype Routine for Observational Definition, Identification, Training and Evaluation framework.

Results

We demonstrate that some algorithms have performance variations anywhere from 3% to 30% for different populations, even when not using race as an input variable. We show that while performance differences in subgroups are not present for all phenotypes, they do affect some phenotypes and groups more disproportionately than others.

Discussion

Our analysis establishes the need for a robust evaluation framework for subgroup differences. The underlying patient populations for the algorithms showing subgroup performance differences have great variance between model features when compared with the phenotypes with little to no differences.

Conclusion

We have created a framework to identify systematic differences in the performance of probabilistic phenotyping algorithms specifically in the context of ADRD as a use case. Differences in subgroup performance of probabilistic phenotyping algorithms are not widespread nor do they occur consistently. This highlights the great need for careful ongoing monitoring to evaluate, measure, and try to mitigate such differences.

Collapse

Malerbi FK, Nakayama LF, Gayle Dychiao R, Zago Ribeiro L, Villanueva C, Celi LA, Regatieri CV. Digital Education for the Deployment of Artificial Intelligence in Health Care. J Med Internet Res 2023;25:e43333. [PMID: 37347537 PMCID: PMC10337407 DOI: 10.2196/43333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2022] [Revised: 01/19/2023] [Accepted: 04/05/2023] [Indexed: 06/23/2023] Open

Khor S, Heagerty PJ, Basu A, Haupt EC, Lyons LJL, Hahn EE, Bansal A. Racial Disparities in the Ascertainment of Cancer Recurrence in Electronic Health Records. JCO Clin Cancer Inform 2023;7:e2300004. [PMID: 37267516 PMCID: PMC10530597 DOI: 10.1200/cci.23.00004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2023] [Revised: 03/20/2023] [Accepted: 04/05/2023] [Indexed: 06/04/2023] Open

Abstract

PURPOSE

There is growing interest in using computable phenotypes or proxies to identify important clinical outcomes, such as cancer recurrence, in rich electronic health records data. However, the race/ethnicity-specific accuracies of these proxies remain unclear. We examined whether the accuracy of a proxy for colorectal cancer (CRC) recurrence differed by race/ethnicity and the possible mechanisms that drove the differences.

METHODS

Using data from a large integrated health care system, we identified a stratified random sample of 282 Black/African American (AA), Hispanic, and non-Hispanic White (NHW) patients with CRC who received primary treatment. Patient 5-year recurrence status was estimated using a utilization-based proxy and evaluated against the true recurrence status obtained using detailed chart review and by race/ethnicity. We used covariate-adjusted probit regression models to estimate the associations between race/ethnicity and misclassification.

RESULTS

The recurrence proxy had excellent overall accuracy (positive predictive value [PPV] 89.4%; negative predictive value 96.5%; mean difference in timing 1.96 months); however, accuracy varied by race/ethnicity. Compared with NHW patients, PPV was 14.9% lower (95% CI, 2.53 to 28.6) among Hispanic patients and 4.3% lower (95% CI, -4.8 to 14.8) among Black/AA patients. The proxy disproportionately inflated the 5-year recurrence incidence for Hispanic patients by 10.6% (95% CI, 4.2 to 18.2). Compared with NHW patients, proxy recurrences for Hispanic patients were almost three times as likely to have been misclassified as positive (adjusted risk ratio 2.91 [95% CI, 1.21 to 8.31]). Higher false positives among racial/ethnic minorities may be related to higher prevalence of noncancerous lung-related problems and substantial delays in primary treatment because of insufficient patient-provider communication and abnormal treatment patterns.

CONCLUSION

Using a proxy with worse accuracy among racial/ethnic minority patients to estimate population health may misdirect resources and support erroneous conclusions around treatment benefit for these patients.

Collapse

Booth GJ, Ross B, Cronin WA, McElrath A, Cyr KL, Hodgson JA, Sibley C, Ismawan JM, Zuehl A, Slotto JG, Higgs M, Haldeman M, Geiger P, Jardine D. Competency-Based Assessments: Leveraging Artificial Intelligence to Predict Subcompetency Content. ACADEMIC MEDICINE : JOURNAL OF THE ASSOCIATION OF AMERICAN MEDICAL COLLEGES 2023;98:497-504. [PMID: 36477379 DOI: 10.1097/acm.0000000000005115] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]

Abstract

PURPOSE

Faculty feedback on trainees is critical to guiding trainee progress in a competency-based medical education framework. The authors aimed to develop and evaluate a Natural Language Processing (NLP) algorithm that automatically categorizes narrative feedback into corresponding Accreditation Council for Graduate Medical Education Milestone 2.0 subcompetencies.

METHOD

Ten academic anesthesiologists analyzed 5,935 narrative evaluations on anesthesiology trainees at 4 graduate medical education (GME) programs between July 1, 2019, and June 30, 2021. Each sentence (n = 25,714) was labeled with the Milestone 2.0 subcompetency that best captured its content or was labeled as demographic or not useful. Inter-rater agreement was assessed by Fleiss' Kappa. The authors trained an NLP model to predict feedback subcompetencies using data from 3 sites and evaluated its performance at a fourth site. Performance metrics included area under the receiver operating characteristic curve (AUC), positive predictive value, sensitivity, F1, and calibration curves. The model was implemented at 1 site in a self-assessment exercise.

RESULTS

Fleiss' Kappa for subcompetency agreement was moderate (0.44). Model performance was good for professionalism, interpersonal and communication skills, and practice-based learning and improvement (AUC 0.79, 0.79, and 0.75, respectively). Subcompetencies within medical knowledge and patient care ranged from fair to excellent (AUC 0.66-0.84 and 0.63-0.88, respectively). Performance for systems-based practice was poor (AUC 0.59). Performances for demographic and not useful categories were excellent (AUC 0.87 for both). In approximately 1 minute, the model interpreted several hundred evaluations and produced individual trainee reports with organized feedback to guide a self-assessment exercise. The model was built into a web-based application.

CONCLUSIONS

The authors developed an NLP model that recognized the feedback language of anesthesiologists across multiple GME programs. The model was operationalized in a self-assessment exercise. It is a powerful tool which rapidly organizes large amounts of narrative feedback.

Collapse

Affiliation(s)

Gregory J Booth G.J. Booth is assistant professor, Uniformed Services University of the Health Sciences, and residency program director, Department of Anesthesiology and Pain Medicine, Naval Medical Center Portsmouth, Portsmouth, Virginia
Benjamin Ross G.J. Booth is assistant professor, Uniformed Services University of the Health Sciences, and residency program director, Department of Anesthesiology and Pain Medicine, Naval Medical Center Portsmouth, Portsmouth, Virginia
William A Cronin G.J. Booth is assistant professor, Uniformed Services University of the Health Sciences, and residency program director, Department of Anesthesiology and Pain Medicine, Naval Medical Center Portsmouth, Portsmouth, Virginia
Angela McElrath G.J. Booth is assistant professor, Uniformed Services University of the Health Sciences, and residency program director, Department of Anesthesiology and Pain Medicine, Naval Medical Center Portsmouth, Portsmouth, Virginia
Kyle L Cyr G.J. Booth is assistant professor, Uniformed Services University of the Health Sciences, and residency program director, Department of Anesthesiology and Pain Medicine, Naval Medical Center Portsmouth, Portsmouth, Virginia
John A Hodgson G.J. Booth is assistant professor, Uniformed Services University of the Health Sciences, and residency program director, Department of Anesthesiology and Pain Medicine, Naval Medical Center Portsmouth, Portsmouth, Virginia
Charles Sibley G.J. Booth is assistant professor, Uniformed Services University of the Health Sciences, and residency program director, Department of Anesthesiology and Pain Medicine, Naval Medical Center Portsmouth, Portsmouth, Virginia
J Martin Ismawan G.J. Booth is assistant professor, Uniformed Services University of the Health Sciences, and residency program director, Department of Anesthesiology and Pain Medicine, Naval Medical Center Portsmouth, Portsmouth, Virginia
Alyssa Zuehl G.J. Booth is assistant professor, Uniformed Services University of the Health Sciences, and residency program director, Department of Anesthesiology and Pain Medicine, Naval Medical Center Portsmouth, Portsmouth, Virginia
James G Slotto G.J. Booth is assistant professor, Uniformed Services University of the Health Sciences, and residency program director, Department of Anesthesiology and Pain Medicine, Naval Medical Center Portsmouth, Portsmouth, Virginia
Maureen Higgs G.J. Booth is assistant professor, Uniformed Services University of the Health Sciences, and residency program director, Department of Anesthesiology and Pain Medicine, Naval Medical Center Portsmouth, Portsmouth, Virginia
Matthew Haldeman G.J. Booth is assistant professor, Uniformed Services University of the Health Sciences, and residency program director, Department of Anesthesiology and Pain Medicine, Naval Medical Center Portsmouth, Portsmouth, Virginia
Phillip Geiger G.J. Booth is assistant professor, Uniformed Services University of the Health Sciences, and residency program director, Department of Anesthesiology and Pain Medicine, Naval Medical Center Portsmouth, Portsmouth, Virginia
Dink Jardine G.J. Booth is assistant professor, Uniformed Services University of the Health Sciences, and residency program director, Department of Anesthesiology and Pain Medicine, Naval Medical Center Portsmouth, Portsmouth, Virginia

Collapse

Yang S, Varghese P, Stephenson E, Tu K, Gronsbell J. Machine learning approaches for electronic health records phenotyping: a methodical review. J Am Med Inform Assoc 2023;30:367-381. [PMID: 36413056 PMCID: PMC9846699 DOI: 10.1093/jamia/ocac216] [Citation(s) in RCA: 16] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/28/2022] [Revised: 09/27/2022] [Accepted: 10/27/2022] [Indexed: 11/23/2022] Open

Abstract

OBJECTIVE

Accurate and rapid phenotyping is a prerequisite to leveraging electronic health records for biomedical research. While early phenotyping relied on rule-based algorithms curated by experts, machine learning (ML) approaches have emerged as an alternative to improve scalability across phenotypes and healthcare settings. This study evaluates ML-based phenotyping with respect to (1) the data sources used, (2) the phenotypes considered, (3) the methods applied, and (4) the reporting and evaluation methods used.

MATERIALS AND METHODS

We searched PubMed and Web of Science for articles published between 2018 and 2022. After screening 850 articles, we recorded 37 variables on 100 studies.

RESULTS

Most studies utilized data from a single institution and included information in clinical notes. Although chronic conditions were most commonly considered, ML also enabled the characterization of nuanced phenotypes such as social determinants of health. Supervised deep learning was the most popular ML paradigm, while semi-supervised and weakly supervised learning were applied to expedite algorithm development and unsupervised learning to facilitate phenotype discovery. ML approaches did not uniformly outperform rule-based algorithms, but deep learning offered a marginal improvement over traditional ML for many conditions.

DISCUSSION

Despite the progress in ML-based phenotyping, most articles focused on binary phenotypes and few articles evaluated external validity or used multi-institution data. Study settings were infrequently reported and analytic code was rarely released.

CONCLUSION

Continued research in ML-based phenotyping is warranted, with emphasis on characterizing nuanced phenotypes, establishing reporting and evaluation standards, and developing methods to accommodate misclassified phenotypes due to algorithm errors in downstream applications.

Collapse

Davoudi A, Sajdeya R, Ison R, Hagen J, Rashidi P, Price CC, Tighe PJ. Fairness in the prediction of acute postoperative pain using machine learning models. Front Digit Health 2023;4:970281. [PMID: 36714611 PMCID: PMC9874861 DOI: 10.3389/fdgth.2022.970281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2022] [Accepted: 10/24/2022] [Indexed: 01/12/2023] Open

Thompson HM, Sharma B, Smith DL, Bhalla S, Erondu I, Hazra A, Ilyas Y, Pachwicewicz P, Sheth NK, Chhabra N, Karnik NS, Afshar M. Machine Learning Techniques to Explore Clinical Presentations of COVID-19 Severity and to Test the Association With Unhealthy Opioid Use: Retrospective Cross-sectional Cohort Study. JMIR Public Health Surveill 2022;8:e38158. [PMID: 36265163 PMCID: PMC9746674 DOI: 10.2196/38158] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 05/23/2022] [Accepted: 10/18/2022] [Indexed: 11/07/2022] Open

Abstract

BACKGROUND

The COVID-19 pandemic has exacerbated health inequities in the United States. People with unhealthy opioid use (UOU) may face disproportionate challenges with COVID-19 precautions, and the pandemic has disrupted access to opioids and UOU treatments. UOU impairs the immunological, cardiovascular, pulmonary, renal, and neurological systems and may increase severity of outcomes for COVID-19.

OBJECTIVE

We applied machine learning techniques to explore clinical presentations of hospitalized patients with UOU and COVID-19 and to test the association between UOU and COVID-19 disease severity.

METHODS

This retrospective, cross-sectional cohort study was conducted based on data from 4110 electronic health record patient encounters at an academic health center in Chicago between January 1, 2020, and December 31, 2020. The inclusion criterion was an unplanned admission of a patient aged ≥18 years; encounters were counted as COVID-19-positive if there was a positive test for COVID-19 or 2 COVID-19 International Classification of Disease, Tenth Revision codes. Using a predefined cutoff with optimal sensitivity and specificity to identify UOU, we ran a machine learning UOU classifier on the data for patients with COVID-19 to estimate the subcohort of patients with UOU. Topic modeling was used to explore and compare the clinical presentations documented for 2 subgroups: encounters with UOU and COVID-19 and those with no UOU and COVID-19. Mixed effects logistic regression accounted for multiple encounters for some patients and tested the association between UOU and COVID-19 outcome severity. Severity was measured with 3 utilization metrics: low-severity unplanned admission, medium-severity unplanned admission and receiving mechanical ventilation, and high-severity unplanned admission with in-hospital death. All models controlled for age, sex, race/ethnicity, insurance status, and BMI.

RESULTS

Topic modeling yielded 10 topics per subgroup and highlighted unique comorbidities associated with UOU and COVID-19 (eg, HIV) and no UOU and COVID-19 (eg, diabetes). In the regression analysis, each incremental increase in the classifier's predicted probability of UOU was associated with 1.16 higher odds of COVID-19 outcome severity (odds ratio 1.16, 95% CI 1.04-1.29; P=.009).

CONCLUSIONS

Among patients hospitalized with COVID-19, UOU is an independent risk factor associated with greater outcome severity, including in-hospital death. Social determinants of health and opioid-related overdose are unique comorbidities in the clinical presentation of the UOU patient subgroup. Additional research is needed on the role of COVID-19 therapeutics and inpatient management of acute COVID-19 pneumonia for patients with UOU. Further research is needed to test associations between expanded evidence-based harm reduction strategies for UOU and vaccination rates, hospitalizations, and risks for overdose and death among people with UOU and COVID-19. Machine learning techniques may offer more exhaustive means for cohort discovery and a novel mixed methods approach to population health.

Collapse

Affiliation(s)

Hale M Thompson Section of Community Behavioral Health, Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Chicago, IL, United States Center for Education, Research, and Advocacy, Department of Social and Behavioral Research, Howard Brown Health, Chicago, IL, United States
Brihat Sharma Section of Community Behavioral Health, Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Chicago, IL, United States
Dale L Smith Section of Community Behavioral Health, Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Chicago, IL, United States
Sameer Bhalla Department of Internal Medicine, Rush University Medical Center, Chicago, IL, United States
Ihuoma Erondu Section of Community Behavioral Health, Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Chicago, IL, United States
Aniruddha Hazra Section of Infectious Diseases and Global Health, Department of Medicine, University of Chicago, Chicago, IL, United States
Yousaf Ilyas Section of Community Behavioral Health, Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Chicago, IL, United States
Paul Pachwicewicz Section of Community Behavioral Health, Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Chicago, IL, United States
Neeral K Sheth Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Chicago, IL, United States
Neeraj Chhabra Department of Emergency Medicine, Rush University Medical College, Rush University Medical Center, Chicago, IL, United States
Niranjan S Karnik Section of Community Behavioral Health, Department of Psychiatry and Behavioral Sciences, Rush University Medical Center, Chicago, IL, United States
Majid Afshar Division of Pulmonary and Critical Care, Department of Medicine, School of Medicine and Public Health, University of Wisconsin, Madison, WI, United States

Collapse

Nelson AE, Arbeeva L. Narrative Review of Machine Learning in Rheumatic and Musculoskeletal Diseases for Clinicians and Researchers: Biases, Goals, and Future Directions. J Rheumatol 2022;49:1191-1200. [PMID: 35840150 PMCID: PMC9633365 DOI: 10.3899/jrheum.220326] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/21/2022] [Indexed: 11/22/2022]

Gao Y, Miller T, Xu D, Dligach D, Churpek MM, Afshar M. Summarizing Patients' Problems from Hospital Progress Notes Using Pre-trained Sequence-to-Sequence Models. PROCEEDINGS OF COLING. INTERNATIONAL CONFERENCE ON COMPUTATIONAL LINGUISTICS 2022;2022:2979-2991. [PMID: 36268128 PMCID: PMC9581107] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]

Hammouda N, Neyra JA. Can Artificial Intelligence Assist in Delivering Continuous Renal Replacement Therapy? Adv Chronic Kidney Dis 2022;29:439-449. [PMID: 36253027 PMCID: PMC9586461 DOI: 10.1053/j.ackd.2022.08.001] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 08/02/2022] [Accepted: 08/11/2022] [Indexed: 01/25/2023]

Conditional generation of medical time series for extrapolation to underrepresented populations. PLOS DIGITAL HEALTH 2022;1:e0000074. [PMID: 36812549 PMCID: PMC9931259 DOI: 10.1371/journal.pdig.0000074] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 06/10/2022] [Indexed: 11/19/2022]

Natural language processing to identify substance misuse in the electronic health record. Lancet Digit Health 2022;4:e401-e402. [DOI: 10.1016/s2589-7500(22)00096-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2022] [Accepted: 05/10/2022] [Indexed: 11/24/2022]

Afshar M. To err is machine: Considerations on the clinical impact of machine learning models in patients with unhealthy alcohol use. Alcohol Clin Exp Res 2022;46:912-914. [PMID: 35429003 DOI: 10.1111/acer.14842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2022] [Revised: 04/07/2022] [Accepted: 04/09/2022] [Indexed: 11/28/2022]

Afshar M, Sharma B, Dligach D, Oguss M, Brown R, Chhabra N, Thompson HM, Markossian T, Joyce C, Churpek MM, Karnik NS. Development and multimodal validation of a substance misuse algorithm for referral to treatment using artificial intelligence (SMART-AI): a retrospective deep learning study. THE LANCET DIGITAL HEALTH 2022;4:e426-e435. [PMID: 35623797 PMCID: PMC9159760 DOI: 10.1016/s2589-7500(22)00041-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/10/2021] [Revised: 02/12/2022] [Accepted: 02/16/2022] [Indexed: 01/02/2023]

Abstract

Background

Substance misuse is a heterogeneous and complex set of behavioural conditions that are highly prevalent in hospital settings and frequently co-occur. Few hospital-wide solutions exist to comprehensively and reliably identify these conditions to prioritise care and guide treatment. The aim of this study was to apply natural language processing (NLP) to clinical notes collected in the electronic health record (EHR) to accurately screen for substance misuse.

Methods

The model was trained and developed on a reference dataset derived from a hospital-wide programme at Rush University Medical Center (RUMC), Chicago, IL, USA, that used structured diagnostic interviews to manually screen admitted patients over 27 months (between Oct 1, 2017, and Dec 31, 2019; n=54 915). The Alcohol Use Disorder Identification Test and Drug Abuse Screening Tool served as reference standards. The first 24 h of notes in the EHR were mapped to standardised medical vocabulary and fed into single-label, multilabel, and multilabel with auxillary-task neural network models. Temporal validation of the model was done using data from the subsequent 12 months on a subset of RUMC patients (n=16 917). External validation was done using data from Loyola University Medical Center, Chicago, IL, USA between Jan 1, 2007, and Sept 30, 2017 (n=1991 adult patients). The primary outcome was discrimination for alcohol misuse, opioid misuse, or non-opioid drug misuse. Discrimination was assessed by the area under the receiver operating characteristic curve (AUROC). Calibration slope and intercept were measured with the unreliability index. Bias assessments were performed across demographic subgroups.

Findings

The model was trained on a cohort that had 3·5% misuse (n=1 921) with any type of substance. 220 (11%) of 1921 patients with substance misuse had more than one type of misuse. The multilabel convolutional neural network classifier had a mean AUROC of 0·97 (95% CI 0·96–0·98) during temporal validation for all types of substance misuse. The model was well calibrated and showed good face validity with model features containing explicit mentions of aberrant drug-taking behaviour. A false-negative rate of 0·18–0·19 and a false-positive rate of 0·03 between non-Hispanic Black and non-Hispanic White groups occurred. In external validation, the AUROCs for alcohol and opioid misuse were 0·88 (95% CI 0·86–0·90) and 0·94 (0·92–0·95), respectively.

Interpretation

We developed a novel and accurate approach to leveraging the first 24 h of EHR notes for screening multiple types of substance misuse.

Funding

National Institute On Drug Abuse, National Institutes of Health.

Collapse

Huang J, Galal G, Etemadi M, Vaidyanathan M. Evaluation and Mitigation of Racial Bias in Clinical Machine Learning Models: A Scoping Review (Preprint). JMIR Med Inform 2022;10:e36388. [PMID: 35639450 PMCID: PMC9198828 DOI: 10.2196/36388] [Citation(s) in RCA: 27] [Impact Index Per Article: 13.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2022] [Revised: 02/17/2022] [Accepted: 03/27/2022] [Indexed: 01/12/2023] Open

Abstract

Background

Racial bias is a key concern regarding the development, validation, and implementation of machine learning (ML) models in clinical settings. Despite the potential of bias to propagate health disparities, racial bias in clinical ML has yet to be thoroughly examined and best practices for bias mitigation remain unclear.

Objective

Our objective was to perform a scoping review to characterize the methods by which the racial bias of ML has been assessed and describe strategies that may be used to enhance algorithmic fairness in clinical ML.

Methods

A scoping review was conducted in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) Extension for Scoping Reviews. A literature search using PubMed, Scopus, and Embase databases, as well as Google Scholar, identified 635 records, of which 12 studies were included.

Results

Applications of ML were varied and involved diagnosis, outcome prediction, and clinical score prediction performed on data sets including images, diagnostic studies, clinical text, and clinical variables. Of the 12 studies, 1 (8%) described a model in routine clinical use, 2 (17%) examined prospectively validated clinical models, and the remaining 9 (75%) described internally validated models. In addition, 8 (67%) studies concluded that racial bias was present, 2 (17%) concluded that it was not, and 2 (17%) assessed the implementation of bias mitigation strategies without comparison to a baseline model. Fairness metrics used to assess algorithmic racial bias were inconsistent. The most commonly observed metrics were equal opportunity difference (5/12, 42%), accuracy (4/12, 25%), and disparate impact (2/12, 17%). All 8 (67%) studies that implemented methods for mitigation of racial bias successfully increased fairness, as measured by the authors’ chosen metrics. Preprocessing methods of bias mitigation were most commonly used across all studies that implemented them.

Conclusions

The broad scope of medical ML applications and potential patient harms demand an increased emphasis on evaluation and mitigation of racial bias in clinical ML. However, the adoption of algorithmic fairness principles in medicine remains inconsistent and is limited by poor data availability and ML model reporting. We recommend that researchers and journal editors emphasize standardized reporting and data availability in medical ML studies to improve transparency and facilitate evaluation for racial bias.

Collapse

Aboalshamat K, Alhuzali R, Alalyani A, Alsharif S, Qadhi H, Almatrafi R, Ammash D, Alotaibi S. Medical and Dental Professionals Readiness for Artificial Intelligence for Saudi Arabia Vision 2030. INTERNATIONAL JOURNAL OF PHARMACEUTICAL RESEARCH AND ALLIED SCIENCES 2022. [DOI: 10.51847/nu8y6y6q1m] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]