1
|
Zolnoori M, Zolnour A, Vergez S, Sridharan S, Spens I, Topaz M, Noble JM, Bakken S, Hirschberg J, Bowles K, Onorato N, McDonald MV. Beyond electronic health record data: leveraging natural language processing and machine learning to uncover cognitive insights from patient-nurse verbal communications. J Am Med Inform Assoc 2025; 32:328-340. [PMID: 39667364 PMCID: PMC11756603 DOI: 10.1093/jamia/ocae300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 11/18/2024] [Accepted: 11/21/2024] [Indexed: 12/14/2024] Open
Abstract
BACKGROUND Mild cognitive impairment and early-stage dementia significantly impact healthcare utilization and costs, yet more than half of affected patients remain underdiagnosed. This study leverages audio-recorded patient-nurse verbal communication in home healthcare settings to develop an artificial intelligence-based screening tool for early detection of cognitive decline. OBJECTIVE To develop a speech processing algorithm using routine patient-nurse verbal communication and evaluate its performance when combined with electronic health record (EHR) data in detecting early signs of cognitive decline. METHOD We analyzed 125 audio-recorded patient-nurse verbal communication for 47 patients from a major home healthcare agency in New York City. Out of 47 patients, 19 experienced symptoms associated with the onset of cognitive decline. A natural language processing algorithm was developed to extract domain-specific linguistic and interaction features from these recordings. The algorithm's performance was compared against EHR-based screening methods. Both standalone and combined data approaches were assessed using F1-score and area under the curve (AUC) metrics. RESULTS The initial model using only patient-nurse verbal communication achieved an F1-score of 85 and an AUC of 86.47. The model based on EHR data achieved an F1-score of 75.56 and an AUC of 79. Combining patient-nurse verbal communication with EHR data yielded the highest performance, with an F1-score of 88.89 and an AUC of 90.23. Key linguistic indicators of cognitive decline included reduced linguistic diversity, grammatical challenges, repetition, and altered speech patterns. Incorporating audio data significantly enhanced the risk prediction models for hospitalization and emergency department visits. DISCUSSION Routine verbal communication between patients and nurses contains critical linguistic and interactional indicators for identifying cognitive impairment. Integrating audio-recorded patient-nurse communication with EHR data provides a more comprehensive and accurate method for early detection of cognitive decline, potentially improving patient outcomes through timely interventions. This combined approach could revolutionize cognitive impairment screening in home healthcare settings.
Collapse
Affiliation(s)
- Maryam Zolnoori
- Columbia University Irving Medical Center, New York, NY 10032, United States
- School of Nursing, Columbia University, New York, NY 10032, United States
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| | - Ali Zolnour
- Columbia University Irving Medical Center, New York, NY 10032, United States
| | - Sasha Vergez
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| | - Sridevi Sridharan
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| | - Ian Spens
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| | - Maxim Topaz
- Columbia University Irving Medical Center, New York, NY 10032, United States
- School of Nursing, Columbia University, New York, NY 10032, United States
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
- Data Science Institute, Columbia University, New York, NY 10027, United States
| | - James M Noble
- Columbia University Irving Medical Center, New York, NY 10032, United States
- Department of Neurology, Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, GH Sergievsky Center, Columbia University, New York, NY 10032, United States
| | - Suzanne Bakken
- School of Nursing, Columbia University, New York, NY 10032, United States
- Data Science Institute, Columbia University, New York, NY 10027, United States
- Department of Biomedical Informatics, Columbia University, New York, NY 10032, United States
| | - Julia Hirschberg
- Department of Computer Science, Columbia University, New York, NY 10027, United States
| | - Kathryn Bowles
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
- University of Pennsylvania School of Nursing, Philadelphia, PA 19104, United States
| | - Nicole Onorato
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| | - Margaret V McDonald
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| |
Collapse
|
2
|
Scroggins JK, Topaz M, Song J, Zolnoori M. Does synthetic data augmentation improve the performances of machine learning classifiers for identifying health problems in patient-nurse verbal communications in home healthcare settings? J Nurs Scholarsh 2025; 57:47-58. [PMID: 38961517 DOI: 10.1111/jnu.13004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2024] [Revised: 05/21/2024] [Accepted: 06/19/2024] [Indexed: 07/05/2024]
Abstract
BACKGROUND Identifying health problems in audio-recorded patient-nurse communication is important to improve outcomes in home healthcare patients who have complex conditions with increased risks of hospital utilization. Training machine learning classifiers for identifying problems requires resource-intensive human annotation. OBJECTIVE To generate synthetic patient-nurse communication and to automatically annotate for common health problems encountered in home healthcare settings using GPT-4. We also examined whether augmenting real-world patient-nurse communication with synthetic data can improve the performance of machine learning to identify health problems. DESIGN Secondary data analysis of patient-nurse verbal communication data in home healthcare settings. METHODS The data were collected from one of the largest home healthcare organizations in the United States. We used 23 audio recordings of patient-nurse communications from 15 patients. The audio recordings were transcribed verbatim and manually annotated for health problems (e.g., circulation, skin, pain) indicated in the Omaha System Classification scheme. Synthetic data of patient-nurse communication were generated using the in-context learning prompting method, enhanced by chain-of-thought prompting to improve the automatic annotation performance. Machine learning classifiers were applied to three training datasets: real-world communication, synthetic communication, and real-world communication augmented by synthetic communication. RESULTS Average F1 scores improved from 0.62 to 0.63 after training data were augmented with synthetic communication. The largest increase was observed using the XGBoost classifier where F1 scores improved from 0.61 to 0.64 (about 5% improvement). When trained solely on either real-world communication or synthetic communication, the classifiers showed comparable F1 scores of 0.62-0.61, respectively. CONCLUSION Integrating synthetic data improves machine learning classifiers' ability to identify health problems in home healthcare, with performance comparable to training on real-world data alone, highlighting the potential of synthetic data in healthcare analytics. CLINICAL RELEVANCE This study demonstrates the clinical relevance of leveraging synthetic patient-nurse communication data to enhance machine learning classifier performances to identify health problems in home healthcare settings, which will contribute to more accurate and efficient problem identification and detection of home healthcare patients with complex health conditions.
Collapse
Affiliation(s)
| | - Maxim Topaz
- Columbia University School of Nursing, New York, New York, USA
- Data Science Institute, Columbia University, New York, New York, USA
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| | - Jiyoun Song
- University of Pennsylvania School of Nursing, Philadelphia, Pennsylvania, USA
| | - Maryam Zolnoori
- Columbia University School of Nursing, New York, New York, USA
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| |
Collapse
|
3
|
Ma JE, Schlichte L, Haverfield M, Gambino J, Lange A, Blanchard K, Morgan B, Bekelman DB. Do goals of care documentation reflect the conversation?: Evaluating conversation-documentation accuracy. J Am Geriatr Soc 2024; 72:2500-2507. [PMID: 38593240 PMCID: PMC11323159 DOI: 10.1111/jgs.18913] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2023] [Revised: 03/07/2024] [Accepted: 03/17/2024] [Indexed: 04/11/2024]
Abstract
BACKGROUND Documenting goals of care in the electronic health record is meant to relay patient preferences to other clinicians. Evaluating the content and documentation of nurse and social worker led goals of care conversations can inform future goals of care initiative efforts. METHODS As part of the ADvancing symptom Alleviation with Palliative Treatment trial, this study analyzed goals of care conversations led by nurses and social workers and documented in the electronic health record. Informed by a goals of care communication guide, we identified five goals of care components: illness understanding, goals and values, end of life planning, surrogate, and advance directives. Forty conversation transcripts underwent content analysis. Through an iterative team process, we defined documentation accuracy as four categories: (1) Complete-comprehensive accurate documentation of the conversation, (2) Incomplete-partial documentation of the conversation, (3) Missing-discussed and not documented, and (4) Incorrect-misrepresented in documentation. We also defined-Not Discussed-for communication guide questions that were not discussed nor documented. A constant comparative approach was used to determine the presence or absence of conversation content in the documentation. RESULTS All five goals of care components were discussed in 67% (27/40) of conversation transcripts. Compared to the transcripts, surrogate (37/40, 93%) and advance directives (36/40, 90%) were often documented completely. Almost 40% of goals and values (15/40, 38%) and half of end of life planning (19/40, 48%) were incomplete. Illness understanding was missing (13/40, 33%), not discussed (13/40, 33%), or incorrect (2/40, 5%). CONCLUSION Nurse and social worker led goals of care conversations discussed and documented most components of the goals of care communication guide. Further research may guide how best to determine the relative importance of accuracy, especially in the broad setting of incomplete, missing, and incorrect EHR documentation.
Collapse
Affiliation(s)
- Jessica E Ma
- Geriatric Research Education and Clinical Center, Durham VA Health System, Durham, North Carolina, USA
- Division of General Internal Medicine, Department of Medicine, Duke University School of Medicine, Durham, North Carolina, USA
| | | | - Marie Haverfield
- Department of Communication Studies, San José State University, San Jose, California, USA
| | | | - Allison Lange
- Department of Medicine, University of Colorado School of Medicine, Anschutz Medical Campus, Aurora, Colorado, USA
| | - Kelly Blanchard
- VA Eastern Colorado Health Care System, Aurora, Colorado, USA
| | - Brianne Morgan
- VA Eastern Colorado Health Care System, Aurora, Colorado, USA
| | - David B Bekelman
- VA Eastern Colorado Health Care System, Aurora, Colorado, USA
- Division of General Internal Medicine, Department of Medicine, University of Colorado School of Medicine, Anschutz Medical Campus, Aurora, Colorado, USA
| |
Collapse
|
4
|
Zolnoori M, Sridharan S, Zolnour A, Vergez S, McDonald MV, Kostic Z, Bowles KH, Topaz M. Utilizing patient-nurse verbal communication in building risk identification models: the missing critical data stream in home healthcare. J Am Med Inform Assoc 2024; 31:435-444. [PMID: 37847651 PMCID: PMC10797261 DOI: 10.1093/jamia/ocad195] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2023] [Accepted: 09/21/2023] [Indexed: 10/19/2023] Open
Abstract
BACKGROUND In the United States, over 12 000 home healthcare agencies annually serve 6+ million patients, mostly aged 65+ years with chronic conditions. One in three of these patients end up visiting emergency department (ED) or being hospitalized. Existing risk identification models based on electronic health record (EHR) data have suboptimal performance in detecting these high-risk patients. OBJECTIVES To measure the added value of integrating audio-recorded home healthcare patient-nurse verbal communication into a risk identification model built on home healthcare EHR data and clinical notes. METHODS This pilot study was conducted at one of the largest not-for-profit home healthcare agencies in the United States. We audio-recorded 126 patient-nurse encounters for 47 patients, out of which 8 patients experienced ED visits and hospitalization. The risk model was developed and tested iteratively using: (1) structured data from the Outcome and Assessment Information Set, (2) clinical notes, and (3) verbal communication features. We used various natural language processing methods to model the communication between patients and nurses. RESULTS Using a Support Vector Machine classifier, trained on the most informative features from OASIS, clinical notes, and verbal communication, we achieved an AUC-ROC = 99.68 and an F1-score = 94.12. By integrating verbal communication into the risk models, the F-1 score improved by 26%. The analysis revealed patients at high risk tended to interact more with risk-associated cues, exhibit more "sadness" and "anxiety," and have extended periods of silence during conversation. CONCLUSION This innovative study underscores the immense value of incorporating patient-nurse verbal communication in enhancing risk prediction models for hospitalizations and ED visits, suggesting the need for an evolved clinical workflow that integrates routine patient-nurse verbal communication recording into the medical record.
Collapse
Affiliation(s)
- Maryam Zolnoori
- School of Nursing, Columbia University, New York, NY 10032, United States
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| | | | - Ali Zolnour
- School of Electrical and Computer Engineering, University of Tehran, Tehran 14395-515, Iran
| | - Sasha Vergez
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| | - Margaret V McDonald
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| | - Zoran Kostic
- Electrical Engineering Department, Columbia University, New York, NY 10027, United States
| | - Kathryn H Bowles
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
- School of Nursing, University of Pennsylvania, Philadelphia, PA 19104, United States
| | - Maxim Topaz
- School of Nursing, Columbia University, New York, NY 10032, United States
- Center for Home Care Policy & Research, VNS Health, New York, NY 10017, United States
| |
Collapse
|
5
|
Song J, Min SH, Chae S, Bowles KH, McDonald MV, Hobensack M, Barrón Y, Sridharan S, Davoudi A, Oh S, Evans L, Topaz M. Uncovering hidden trends: identifying time trajectories in risk factors documented in clinical notes and predicting hospitalizations and emergency department visits during home health care. J Am Med Inform Assoc 2023; 30:1801-1810. [PMID: 37339524 PMCID: PMC10586044 DOI: 10.1093/jamia/ocad101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2023] [Revised: 05/04/2023] [Accepted: 06/02/2023] [Indexed: 06/22/2023] Open
Abstract
OBJECTIVE This study aimed to identify temporal risk factor patterns documented in home health care (HHC) clinical notes and examine their association with hospitalizations or emergency department (ED) visits. MATERIALS AND METHODS Data for 73 350 episodes of care from one large HHC organization were analyzed using dynamic time warping and hierarchical clustering analysis to identify the temporal patterns of risk factors documented in clinical notes. The Omaha System nursing terminology represented risk factors. First, clinical characteristics were compared between clusters. Next, multivariate logistic regression was used to examine the association between clusters and risk for hospitalizations or ED visits. Omaha System domains corresponding to risk factors were analyzed and described in each cluster. RESULTS Six temporal clusters emerged, showing different patterns in how risk factors were documented over time. Patients with a steep increase in documented risk factors over time had a 3 times higher likelihood of hospitalization or ED visit than patients with no documented risk factors. Most risk factors belonged to the physiological domain, and only a few were in the environmental domain. DISCUSSION An analysis of risk factor trajectories reflects a patient's evolving health status during a HHC episode. Using standardized nursing terminology, this study provided new insights into the complex temporal dynamics of HHC, which may lead to improved patient outcomes through better treatment and management plans. CONCLUSION Incorporating temporal patterns in documented risk factors and their clusters into early warning systems may activate interventions to prevent hospitalizations or ED visits in HHC.
Collapse
Affiliation(s)
- Jiyoun Song
- Columbia University School of Nursing, New York City, New York, USA
| | - Se Hee Min
- Columbia University School of Nursing, New York City, New York, USA
| | - Sena Chae
- College of Nursing, University of Iowa, Iowa City, Iowa, USA
| | - Kathryn H Bowles
- Department of Biobehavioral Health Sciences, University of Pennsylvania School of Nursing, Philadelphia, Pennsylvania, USA
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| | | | - Mollie Hobensack
- Columbia University School of Nursing, New York City, New York, USA
| | - Yolanda Barrón
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| | - Sridevi Sridharan
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| | - Anahita Davoudi
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| | - Sungho Oh
- Department of Biobehavioral Health Sciences, University of Pennsylvania School of Nursing, Philadelphia, Pennsylvania, USA
| | - Lauren Evans
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| | - Maxim Topaz
- Columbia University School of Nursing, New York City, New York, USA
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
- Data Science Institute, Columbia University, New York City, New York, USA
| |
Collapse
|
6
|
Zolnoori M, Vergez S, Sridharan S, Zolnour A, Bowles K, Kostic Z, Topaz M. Is the patient speaking or the nurse? Automatic speaker type identification in patient-nurse audio recordings. J Am Med Inform Assoc 2023; 30:1673-1683. [PMID: 37478477 PMCID: PMC10531109 DOI: 10.1093/jamia/ocad139] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2023] [Revised: 06/06/2023] [Accepted: 07/16/2023] [Indexed: 07/23/2023] Open
Abstract
OBJECTIVES Patient-clinician communication provides valuable explicit and implicit information that may indicate adverse medical conditions and outcomes. However, practical and analytical approaches for audio-recording and analyzing this data stream remain underexplored. This study aimed to 1) analyze patients' and nurses' speech in audio-recorded verbal communication, and 2) develop machine learning (ML) classifiers to effectively differentiate between patient and nurse language. MATERIALS AND METHODS Pilot studies were conducted at VNS Health, the largest not-for-profit home healthcare agency in the United States, to optimize audio-recording patient-nurse interactions. We recorded and transcribed 46 interactions, resulting in 3494 "utterances" that were annotated to identify the speaker. We employed natural language processing techniques to generate linguistic features and built various ML classifiers to distinguish between patient and nurse language at both individual and encounter levels. RESULTS A support vector machine classifier trained on selected linguistic features from term frequency-inverse document frequency, Linguistic Inquiry and Word Count, Word2Vec, and Medical Concepts in the Unified Medical Language System achieved the highest performance with an AUC-ROC = 99.01 ± 1.97 and an F1-score = 96.82 ± 4.1. The analysis revealed patients' tendency to use informal language and keywords related to "religion," "home," and "money," while nurses utilized more complex sentences focusing on health-related matters and medical issues and were more likely to ask questions. CONCLUSION The methods and analytical approach we developed to differentiate patient and nurse language is an important precursor for downstream tasks that aim to analyze patient speech to identify patients at risk of disease and negative health outcomes.
Collapse
Affiliation(s)
- Maryam Zolnoori
- School of Nursing, Columbia University, New York, New York, USA
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| | - Sasha Vergez
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| | - Sridevi Sridharan
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| | - Ali Zolnour
- School of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
| | - Kathryn Bowles
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| | - Zoran Kostic
- Department of Electrical Engineering, Columbia University, New York, New York, USA
| | - Maxim Topaz
- School of Nursing, Columbia University, New York, New York, USA
- Center for Home Care Policy & Research, VNS Health, New York, New York, USA
| |
Collapse
|
7
|
Sillner AY, Berish D, Mailhot T, Sweeder L, Fick DM, Kolanowski AM. Delirium superimposed on dementia in post-acute care: Nurse documentation of symptoms and interventions. Geriatr Nurs 2023; 49:122-126. [PMID: 36495794 PMCID: PMC9892266 DOI: 10.1016/j.gerinurse.2022.11.015] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 11/21/2022] [Accepted: 11/22/2022] [Indexed: 12/12/2022]
Abstract
Delirium superimposed on dementia (DSD) is common in older adults being discharged to post-acute care settings (PAC). Nurse documentation remains poorly understood. Aims were to describe nurse documentation and to determine associations in a secondary data analysis of a large, single-blinded randomized controlled trial (Recreational Stimulation For Elders As A Vehicle To Resolve DSD (Reserve For DSD). Just under 75% of the sample had at least one symptom of delirium documented by the nursing staff, while 25.9% had none despite being CAM positive by expert adjudication. Only 32% had an intervention documented. Number of documented interventions were significantly associated with number of documented symptoms. There is a need for research and innovation related to nurse documentation and communication of DSD symptoms and interventions in an efficient and accurate manner to impact care for vulnerable older adults in these settings.
Collapse
Affiliation(s)
- Andrea Yevchak Sillner
- Ross and Carol Nese College of Nursing, The Pennsylvania State University, University Park, PA.
| | - Diane Berish
- Ross and Carol Nese College of Nursing, The Pennsylvania State University, University Park, PA
| | - Tanya Mailhot
- Montreal Heart Institute Research Center, Université de Montréal: Montreal, QC, CA
| | - Logan Sweeder
- Ross and Carol Nese College of Nursing, The Pennsylvania State University, University Park, PA
| | - Donna M Fick
- Ross and Carol Nese College of Nursing, The Pennsylvania State University, University Park, PA
| | - Ann M Kolanowski
- Ross and Carol Nese College of Nursing, The Pennsylvania State University, University Park, PA
| |
Collapse
|