1
|
Barcelona V, Scharp D, Idnay BR, Moen H, Cato K, Topaz M. Identifying stigmatizing language in clinical documentation: A scoping review of emerging literature. PLoS One 2024; 19:e0303653. [PMID: 38941299 PMCID: PMC11213326 DOI: 10.1371/journal.pone.0303653] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2023] [Accepted: 04/30/2024] [Indexed: 06/30/2024] Open
Abstract
BACKGROUND Racism and implicit bias underlie disparities in health care access, treatment, and outcomes. An emerging area of study in examining health disparities is the use of stigmatizing language in the electronic health record (EHR). OBJECTIVES We sought to summarize the existing literature related to stigmatizing language documented in the EHR. To this end, we conducted a scoping review to identify, describe, and evaluate the current body of literature related to stigmatizing language and clinician notes. METHODS We searched PubMed, Cumulative Index of Nursing and Allied Health Literature (CINAHL), and Embase databases in May 2022, and also conducted a hand search of IEEE to identify studies investigating stigmatizing language in clinical documentation. We included all studies published through April 2022. The results for each search were uploaded into EndNote X9 software, de-duplicated using the Bramer method, and then exported to Covidence software for title and abstract screening. RESULTS Studies (N = 9) used cross-sectional (n = 3), qualitative (n = 3), mixed methods (n = 2), and retrospective cohort (n = 1) designs. Stigmatizing language was defined via content analysis of clinical documentation (n = 4), literature review (n = 2), interviews with clinicians (n = 3) and patients (n = 1), expert panel consultation, and task force guidelines (n = 1). Natural language processing was used in four studies to identify and extract stigmatizing words from clinical notes. All of the studies reviewed concluded that negative clinician attitudes and the use of stigmatizing language in documentation could negatively impact patient perception of care or health outcomes. DISCUSSION The current literature indicates that NLP is an emerging approach to identifying stigmatizing language documented in the EHR. NLP-based solutions can be developed and integrated into routine documentation systems to screen for stigmatizing language and alert clinicians or their supervisors. Potential interventions resulting from this research could generate awareness about how implicit biases affect communication patterns and work to achieve equitable health care for diverse populations.
Collapse
Affiliation(s)
- Veronica Barcelona
- Columbia University School of Nursing, New York, New York, United States of America
| | - Danielle Scharp
- Columbia University School of Nursing, New York, New York, United States of America
| | - Betina R. Idnay
- Department of Biomedical Informatics, Columbia University, New York, New York, United States of America
| | - Hans Moen
- Department of Computer Science, Aalto University, Aalto, Finland
| | - Kenrick Cato
- University of Pennsylvania School of Nursing, Philadelphia, Pennsylvania, United States of America
| | - Maxim Topaz
- Columbia University School of Nursing, New York, New York, United States of America
| |
Collapse
|
2
|
Barcelona V, Scharp D, Moen H, Davoudi A, Idnay BR, Cato K, Topaz M. Using Natural Language Processing to Identify Stigmatizing Language in Labor and Birth Clinical Notes. Matern Child Health J 2024; 28:578-586. [PMID: 38147277 DOI: 10.1007/s10995-023-03857-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 12/10/2023] [Indexed: 12/27/2023]
Abstract
INTRODUCTION Stigma and bias related to race and other minoritized statuses may underlie disparities in pregnancy and birth outcomes. One emerging method to identify bias is the study of stigmatizing language in the electronic health record. The objective of our study was to develop automated natural language processing (NLP) methods to identify two types of stigmatizing language: marginalizing language and its complement, power/privilege language, accurately and automatically in labor and birth notes. METHODS We analyzed notes for all birthing people > 20 weeks' gestation admitted for labor and birth at two hospitals during 2017. We then employed text preprocessing techniques, specifically using TF-IDF values as inputs, and tested machine learning classification algorithms to identify stigmatizing and power/privilege language in clinical notes. The algorithms assessed included Decision Trees, Random Forest, and Support Vector Machines. Additionally, we applied a feature importance evaluation method (InfoGain) to discern words that are highly correlated with these language categories. RESULTS For marginalizing language, Decision Trees yielded the best classification with an F-score of 0.73. For power/privilege language, Support Vector Machines performed optimally, achieving an F-score of 0.91. These results demonstrate the effectiveness of the selected machine learning methods in classifying language categories in clinical notes. CONCLUSION We identified well-performing machine learning methods to automatically detect stigmatizing language in clinical notes. To our knowledge, this is the first study to use NLP performance metrics to evaluate the performance of machine learning methods in discerning stigmatizing language. Future studies should delve deeper into refining and evaluating NLP methods, incorporating the latest algorithms rooted in deep learning.
Collapse
Affiliation(s)
- Veronica Barcelona
- School of Nursing, Columbia University, 560 West 168th St, Mail Code 6, New York, NY, 10032, USA.
| | - Danielle Scharp
- School of Nursing, Columbia University, 560 West 168th St, Mail Code 6, New York, NY, 10032, USA
| | - Hans Moen
- Department of Computer Science, Aalto University, Espoo, Finland
| | | | - Betina R Idnay
- Department of Biomedical Informatics, Columbia University, New York, NY, USA
| | - Kenrick Cato
- School of Nursing, Columbia University, 560 West 168th St, Mail Code 6, New York, NY, 10032, USA
- University of Pennsylvania, Philadelphia, PA, USA
| | - Maxim Topaz
- School of Nursing, Columbia University, 560 West 168th St, Mail Code 6, New York, NY, 10032, USA
| |
Collapse
|
3
|
Hanson RF, Zhu V, Are F, Espeleta H, Wallis E, Heider P, Kautz M, Lenert L. Initial development of tools to identify child abuse and neglect in pediatric primary care. BMC Med Inform Decis Mak 2023; 23:266. [PMID: 37978498 PMCID: PMC10656827 DOI: 10.1186/s12911-023-02361-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2022] [Accepted: 11/02/2023] [Indexed: 11/19/2023] Open
Abstract
BACKGROUND Child abuse and neglect (CAN) is prevalent, associated with long-term adversities, and often undetected. Primary care settings offer a unique opportunity to identify CAN and facilitate referrals, when warranted. Electronic health records (EHR) contain extensive information to support healthcare decisions, yet time constraints preclude most providers from thorough EHR reviews that could indicate CAN. Strategies that summarize EHR data to identify CAN and convey this to providers has potential to mitigate CAN-related sequelae. This study used expert review/consensus and Natural Language Processing (NLP) to develop and test a lexicon to characterize children who have experienced or are at risk for CAN and compared machine learning methods to the lexicon + NLP approach to determine the algorithm's performance for identifying CAN. METHODS Study investigators identified 90 CAN terms and invited an interdisciplinary group of child abuse experts for review and validation. We then used NLP to develop pipelines to finalize the CAN lexicon. Data for pipeline development and refinement were drawn from a randomly selected sample of EHR from patients seen at pediatric primary care clinics within a U.S. academic health center. To explore a machine learning approach for CAN identification, we used Support Vector Machine algorithms. RESULTS The investigator-generated list of 90 CAN terms were reviewed and validated by 25 invited experts, resulting in a final pool of 133 terms. NLP utilized a randomly selected sample of 14,393 clinical notes from 153 patients to test the lexicon, and .03% of notes were identified as CAN positive. CAN identification varied by clinical note type, with few differences found by provider type (physicians versus nurses, social workers, etc.). An evaluation of the final NLP pipelines indicated 93.8% positive CAN rate for the training set and 71.4% for the test set, with decreased precision attributed primarily to false positives. For the machine learning approach, SVM pipeline performance was 92% for CAN + and 100% for non-CAN, indicating higher sensitivity than specificity. CONCLUSIONS The NLP algorithm's development and refinement suggest that innovative tools can identify youth at risk for CAN. The next key step is to refine the NLP algorithm to eventually funnel this information to care providers to guide clinical decision making.
Collapse
Affiliation(s)
| | - Vivienne Zhu
- Medical University of South Carolina, Charleston, SC, USA
| | | | | | | | - Paul Heider
- Medical University of South Carolina, Charleston, SC, USA
| | - Marin Kautz
- Medical University of South Carolina, Charleston, SC, USA
| | - Leslie Lenert
- Medical University of South Carolina, Charleston, SC, USA
| |
Collapse
|
4
|
Landau AY, Blanchard A, Atkins N, Salazar S, Cato K, Patton DU, Topaz M. Black and Latinx Primary Caregiver Considerations for Developing and Implementing a Machine Learning-Based Model for Detecting Child Abuse and Neglect With Implications for Racial Bias Reduction: Qualitative Interview Study With Primary Caregivers. JMIR Form Res 2023; 7:e40194. [PMID: 36719717 PMCID: PMC9929722 DOI: 10.2196/40194] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2022] [Revised: 07/22/2022] [Accepted: 08/15/2022] [Indexed: 02/02/2023] Open
Abstract
BACKGROUND Child abuse and neglect, once viewed as a social problem, is now an epidemic. Moreover, health providers agree that existing stereotypes may link racial and social class issues to child abuse. The broad adoption of electronic health records (EHRs) in clinical settings offers a new avenue for addressing this epidemic. To reduce racial bias and improve the development, implementation, and outcomes of machine learning (ML)-based models that use EHR data, it is crucial to involve marginalized members of the community in the process. OBJECTIVE This study elicited Black and Latinx primary caregivers' viewpoints regarding child abuse and neglect while living in underserved communities to highlight considerations for designing an ML-based model for detecting child abuse and neglect in emergency departments (EDs) with implications for racial bias reduction and future interventions. METHODS We conducted a qualitative study using in-depth interviews with 20 Black and Latinx primary caregivers whose children were cared for at a single pediatric tertiary-care ED to gain insights about child abuse and neglect and their experiences with health providers. RESULTS Three central themes were developed in the coding process: (1) primary caregivers' perspectives on the definition of child abuse and neglect, (2) primary caregivers' experiences with health providers and medical documentation, and (3) primary caregivers' perceptions of child protective services. CONCLUSIONS Our findings highlight essential considerations from primary caregivers for developing an ML-based model for detecting child abuse and neglect in ED settings. This includes how to define child abuse and neglect from a primary caregiver lens. Miscommunication between patients and health providers can potentially lead to a misdiagnosis, and therefore, have a negative impact on medical documentation. Additionally, the outcome and application of the ML-based models for detecting abuse and neglect may cause additional harm than expected to the community. Further research is needed to validate these findings and integrate them into creating an ML-based model.
Collapse
Affiliation(s)
- Aviv Y Landau
- School of Social Policy & Practice, University of Pennsylvania, Philadelphia, PA, United States
| | - Ashley Blanchard
- New York Presbyterian Morgan Stanley Children's Hospital, Columbia University Irving Medical Center, New York, NY, United States
| | - Nia Atkins
- Columbia College, Columbia University, New York, NY, United States
| | - Stephanie Salazar
- Columbia School of Social Work, Columbia University, New York, NY, United States
| | - Kenrick Cato
- University of Pennsylvania School of Nursing, University of Pennsylvania, Phildelphia, PA, United States
- Childrens Hospital of Philadelphia, University of Pennsylvania, Phildelphia, PA, United States
| | - Desmond U Patton
- School of Social Policy & Practice, University of Pennsylvania, Philadelphia, PA, United States
- Annenberg School for Communication, University of Pennsylvania, Phildelphia, PA, United States
- Department of Child and Adolescent Psychiatry and Behavioral Sciences, University of Pennsylvania, Phildelphia, PA, United States
| | - Maxim Topaz
- Columbia University School of Nursing, Columbia University, New York, NY, United States
- Columbia University Data Science Institute, Columbia University, New York, NY, United States
| |
Collapse
|
5
|
Bakken S. Addressing Consequential Public Health Problems Through Informatics and Data Science. J Am Med Inform Assoc 2022; 29:413-414. [PMID: 35092686 PMCID: PMC8800529 DOI: 10.1093/jamia/ocab294] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2021] [Accepted: 12/28/2021] [Indexed: 01/29/2023] Open
Affiliation(s)
- Suzanne Bakken
- School of Nursing, Department of Biomedical Informatics, and Data Science Institute, Columbia University, New York, New York, USA
| |
Collapse
|