1
|
Gao M, Pejavara K, Balu S, Henao R. Development of a Flexible Chain of Thought Framework for Automated Routing of Patient Portal Messages. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2025; 2024:443-452. [PMID: 40417581 PMCID: PMC12099328] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/27/2025]
Abstract
The increase in utilization of patient portal messages has imposed a considerable burden on healthcare providers, contributing to an increased incidence of provider burnout. This study introduces a framework for leveraging Large Language Models (LLMs) and Chain-of-Thought (CoT) prompting in order to automatically categorize and route messages to their appropriate location. The modeling framework, which utilizes gold standard annotations from triage nurses, not only facilitates the dynamic adaptation of the model to evolving healthcare workflows and emerging edge-case scenarios, but also significantly improves the model's classification accuracy compared to traditional zero-shot methods. In addition, the framework allows for flexibility in its task and continuous improvement via annotation of exemplar messages. The model is able to accurately categorize messages in an automated fashion, which has potential to dramatically ease the burden on providers and provide faster and safer responses to patients. This framework can also be readily extended to work in a variety of clinical and documentation settings.
Collapse
|
2
|
Azarpey A, Thomas J, Ring D, Franko O. Natural Language Processing of Sentiments Identified in Patient Comments Associated with Less Than Top-Rated Care. J Patient Exp 2025; 12:23743735251323677. [PMID: 40125346 PMCID: PMC11930458 DOI: 10.1177/23743735251323677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2025] Open
Abstract
Background: Natural language processing (NLP) analysis of patient comments about their care can inform improvement initiatives. Objective: We used NLP to quantify sentiments and identify topics in patient comments associated with submaximal ratings of experience. Methods: Using a set of 1117 patient comments associated with ratings 1-4 out of 5 from a commercial source, we analyzed associated sentiments measured by Linguistic Inquiry and Word Count software and associated themes using topic modeling. Results: In the sentiment analysis, positive sentiments were associated with better numerical ratings while word count, numbers, ethnicity, and negative tones were associated with lower ratings. Topics of "listening, concern, and collaboration" were associated with 1-star ratings and "logistics" and "pain" with 4-star ratings. Conclusion: The finding that NLP analysis of comments from submaximal patient ratings of experience is consistent with evidence that the worst ratings are associated with relationship issues and more moderate ratings are associated with process issues affirms the ability of NLP to analyze large amounts of patient comments to identify opportunities to improve patient experience of care.
Collapse
Affiliation(s)
- Ali Azarpey
- Department of Surgery and Perioperative Care, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - Jacob Thomas
- Department of Surgery and Perioperative Care, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - David Ring
- Department of Surgery and Perioperative Care, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - Orrin Franko
- East Bay Hand Medical Center, San Leandro, CA, USA
| |
Collapse
|
3
|
Lester RT, Manson M, Semakula M, Jang H, Mugabo H, Magzari A, Blackmer JM, Fattah F, Niyonsenga SP, Rwagasore E, Ruranga C, Remera E, Ngabonziza JCS, Carenini G, Nsanzimana S. Natural language processing to evaluate texting conversations between patients and healthcare providers during COVID-19 Home-Based Care in Rwanda at scale. PLOS DIGITAL HEALTH 2025; 4:e0000625. [PMID: 39813181 PMCID: PMC11734906 DOI: 10.1371/journal.pdig.0000625] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/27/2024] [Accepted: 11/19/2024] [Indexed: 01/18/2025]
Abstract
Community isolation of patients with communicable infectious diseases limits spread of pathogens but our understanding of isolated patients' needs and challenges is incomplete. Rwanda deployed a digital health service nationally to assist public health clinicians to remotely monitor and support SARS-CoV-2 cases via their mobile phones using daily interactive short message service (SMS) check-ins. We aimed to assess the texting patterns and communicated topics to better understand patient experiences. We extracted data on all COVID-19 cases and exposed contacts who were enrolled in the WelTel text messaging program between March 18, 2020, and March 31, 2022, and linked demographic and clinical data from the national COVID-19 registry. A sample of the text conversation corpus was English-translated and labeled with topics of interest defined by medical experts. Multiple natural language processing (NLP) topic classification models were trained and compared using F1 scores. Best performing models were applied to classify unlabeled conversations. Total 33,081 isolated patients (mean age 33·9, range 0-100), 44% female, including 30,398 cases and 2,683 contacts) were registered in WelTel. Registered patients generated 12,119 interactive text conversations in Kinyarwanda (n = 8,183, 67%), English (n = 3,069, 25%) and other languages. Sufficiently trained large language models (LLMs) were unavailable for Kinyarwanda. Traditional machine learning (ML) models outperformed fine-tuned transformer architecture language models on the native untranslated language corpus, however, the reverse was observed of models trained on English-only data. The most frequently identified topics discussed included symptoms (69%), diagnostics (38%), social issues (19%), prevention (18%), healthcare logistics (16%), and treatment (8·5%). Education, advice, and triage on these topics were provided to patients. Interactive text messaging can be used to remotely support isolated patients in pandemics at scale. NLP can help evaluate the medical and social factors that affect isolated patients which could ultimately inform precision public health responses to future pandemics.
Collapse
Affiliation(s)
- Richard T. Lester
- Division of Infectious Diseases, Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Matthew Manson
- Division of Infectious Diseases, Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | - Muhammed Semakula
- Rwanda Ministry of Health, Kigali, Rwanda
- Rwanda Biomedical Centre, Kigali, Rwanda
| | - Hyeju Jang
- Luddy School of Informatics, Computing, and Engineering, Department of Computer Science Indiana University Indianapolis, Indianapolis, Indiana, United States
- Department of Computer Science, Faculty of Science, University of British Columbia, Vancouver, British Columbia, Canada
| | | | - Ali Magzari
- Department of Electrical and Computer Engineering, University of British Columbia, Vancouver, British Columbia, Canada
| | - Junhong Ma Blackmer
- Department of Mathematics, University of British Columbia, Vancouver, British Columbia, Canada
| | - Fanan Fattah
- Division of Infectious Diseases, Department of Medicine, University of British Columbia, Vancouver, British Columbia, Canada
| | | | | | - Charles Ruranga
- African Center of Excellence in Data Science, University of Rwanda, Kigali, Rwanda
| | | | - Jean Claude S. Ngabonziza
- Rwanda Biomedical Centre, Kigali, Rwanda
- Department of Clinical Biology, University of Rwanda, Kigali, Rwanda
| | - Giuseppe Carenini
- Department of Computer Science, Faculty of Science, University of British Columbia, Vancouver, British Columbia, Canada
| | | |
Collapse
|
4
|
Bhattarai K, Oh IY, Sierra JM, Tang J, Payne PRO, Abrams Z, Lai AM. Leveraging GPT-4 for identifying cancer phenotypes in electronic health records: a performance comparison between GPT-4, GPT-3.5-turbo, Flan-T5, Llama-3-8B, and spaCy's rule-based and machine learning-based methods. JAMIA Open 2024; 7:ooae060. [PMID: 38962662 PMCID: PMC11221943 DOI: 10.1093/jamiaopen/ooae060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2024] [Revised: 06/12/2024] [Accepted: 06/18/2024] [Indexed: 07/05/2024] Open
Abstract
Objective Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients' health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI's Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, and 2 rule-based and machine learning-based methods, namely, scispaCy and medspaCy. Materials and Methods Phenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13 646 clinical notes for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, Flan-T5-xxl, Flan-T5-xl, Llama-3-8B, medspaCy, and scispaCy by comparing precision, recall, and micro-F1 scores. Results GPT-4 achieved higher F1 score, precision, and recall compared to Flan-T5-xl, Flan-T5-xxl, Llama-3-8B, medspaCy, and scispaCy's models. GPT-3.5-turbo performed similarly to that of GPT-4. GPT, Flan-T5, and Llama models were not constrained by explicit rule requirements for contextual pattern recognition. spaCy models relied on predefined patterns, leading to their suboptimal performance. Discussion and Conclusion GPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, and robust clinical phenotype extraction.
Collapse
Affiliation(s)
- Kriti Bhattarai
- Institute for Informatics, Data Science & Biostatistics, Washington University School of Medicine, St. Louis, MO 63110, United States
- Department of Computer Science, Washington University in St Louis, St. Louis, MO 63110, United States
| | - Inez Y Oh
- Institute for Informatics, Data Science & Biostatistics, Washington University School of Medicine, St. Louis, MO 63110, United States
| | - Jonathan Moran Sierra
- Medical Scientist Training Program, Washington University School of Medicine, St. Louis, MO 63110, United States
| | - Jonathan Tang
- Department of Internal Medicine, Washington University School of Medicine, St. Louis, MO 63110, United States
| | - Philip R O Payne
- Institute for Informatics, Data Science & Biostatistics, Washington University School of Medicine, St. Louis, MO 63110, United States
- Department of Computer Science, Washington University in St Louis, St. Louis, MO 63110, United States
| | - Zach Abrams
- Institute for Informatics, Data Science & Biostatistics, Washington University School of Medicine, St. Louis, MO 63110, United States
| | - Albert M Lai
- Institute for Informatics, Data Science & Biostatistics, Washington University School of Medicine, St. Louis, MO 63110, United States
- Department of Computer Science, Washington University in St Louis, St. Louis, MO 63110, United States
| |
Collapse
|
5
|
Ren Y, Wu Y, Fan JW, Khurana A, Fu S, Wu D, Liu H, Huang M. Automatic uncovering of patient primary concerns in portal messages using a fusion framework of pretrained language models. J Am Med Inform Assoc 2024; 31:1714-1724. [PMID: 38934289 PMCID: PMC11258404 DOI: 10.1093/jamia/ocae144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2023] [Revised: 05/24/2024] [Accepted: 06/03/2024] [Indexed: 06/28/2024] Open
Abstract
OBJECTIVES The surge in patient portal messages (PPMs) with increasing needs and workloads for efficient PPM triage in healthcare settings has spurred the exploration of AI-driven solutions to streamline the healthcare workflow processes, ensuring timely responses to patients to satisfy their healthcare needs. However, there has been less focus on isolating and understanding patient primary concerns in PPMs-a practice which holds the potential to yield more nuanced insights and enhances the quality of healthcare delivery and patient-centered care. MATERIALS AND METHODS We propose a fusion framework to leverage pretrained language models (LMs) with different language advantages via a Convolution Neural Network for precise identification of patient primary concerns via multi-class classification. We examined 3 traditional machine learning models, 9 BERT-based language models, 6 fusion models, and 2 ensemble models. RESULTS The outcomes of our experimentation underscore the superior performance achieved by BERT-based models in comparison to traditional machine learning models. Remarkably, our fusion model emerges as the top-performing solution, delivering a notably improved accuracy score of 77.67 ± 2.74% and an F1 score of 74.37 ± 3.70% in macro-average. DISCUSSION This study highlights the feasibility and effectiveness of multi-class classification for patient primary concern detection and the proposed fusion framework for enhancing primary concern detection. CONCLUSIONS The use of multi-class classification enhanced by a fusion of multiple pretrained LMs not only improves the accuracy and efficiency of patient primary concern identification in PPMs but also aids in managing the rising volume of PPMs in healthcare, ensuring critical patient communications are addressed promptly and accurately.
Collapse
Affiliation(s)
- Yang Ren
- Department of Computer Science and Engineering, University of South Carolina, Columbia, SC 29208, United States
| | - Yuqi Wu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN 55905, United States
| | - Jungwei W Fan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN 55905, United States
| | - Aditya Khurana
- Department of Radiation Oncology, Mayo Clinic, Rochester, MN 55905, United States
| | - Sunyang Fu
- Department of Health Data Science and Artificial Intelligence, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Dezhi Wu
- Department of Integrated Information Technology, University of South Carolina, Columbia, SC 29208, United States
| | - Hongfang Liu
- Department of Health Data Science and Artificial Intelligence, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| | - Ming Huang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN 55905, United States
- Department of Health Data Science and Artificial Intelligence, University of Texas Health Science Center at Houston, Houston, TX 77030, United States
| |
Collapse
|
6
|
Bhattarai K, Oh IY, Sierra JM, Tang J, Payne PRO, Abrams ZB, Lai AM. Leveraging GPT-4 for Identifying Cancer Phenotypes in Electronic Health Records: A Performance Comparison between GPT-4, GPT-3.5-turbo, Flan-T5 and spaCy's Rule-based & Machine Learning-based methods. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.09.27.559788. [PMID: 37808763 PMCID: PMC10557629 DOI: 10.1101/2023.09.27.559788] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/10/2023]
Abstract
Objective Accurately identifying clinical phenotypes from Electronic Health Records (EHRs) provides additional insights into patients' health, especially when such information is unavailable in structured data. This study evaluates the application of OpenAI's Generative Pre-trained Transformer (GPT)-4 model to identify clinical phenotypes from EHR text in non-small cell lung cancer (NSCLC) patients. The goal was to identify disease stages, treatments and progression utilizing GPT-4, and compare its performance against GPT-3.5-turbo, Flan-T5-xl, Flan-T5-xxl, and two rule-based and machine learning-based methods, namely, scispaCy and medspaCy. Materials and Methods Phenotypes such as initial cancer stage, initial treatment, evidence of cancer recurrence, and affected organs during recurrence were identified from 13,646 records for 63 NSCLC patients from Washington University in St. Louis, Missouri. The performance of the GPT-4 model is evaluated against GPT-3.5-turbo, Flan-T5-xxl, Flan-T5-xl, medspaCy and scispaCy by comparing precision, recall, and micro-F1 scores. Results GPT-4 achieved higher F1 score, precision, and recall compared to Flan-T5-xl, Flan-T5-xxl, medspaCy and scispaCy's models. GPT-3.5-turbo performed similarly to that of GPT-4. GPT and Flan-T5 models were not constrained by explicit rule requirements for contextual pattern recognition. SpaCy models relied on predefined patterns, leading to their suboptimal performance. Discussion and Conclusion GPT-4 improves clinical phenotype identification due to its robust pre-training and remarkable pattern recognition capability on the embedded tokens. It demonstrates data-driven effectiveness even with limited context in the input. While rule-based models remain useful for some tasks, GPT models offer improved contextual understanding of the text, and robust clinical phenotype extraction.
Collapse
|
7
|
Sim JA, Huang X, Horan MR, Stewart CM, Robison LL, Hudson MM, Baker JN, Huang IC. Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review. Artif Intell Med 2023; 146:102701. [PMID: 38042599 PMCID: PMC10693655 DOI: 10.1016/j.artmed.2023.102701] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 09/30/2023] [Accepted: 10/29/2023] [Indexed: 12/04/2023]
Abstract
OBJECTIVE Natural language processing (NLP) combined with machine learning (ML) techniques are increasingly used to process unstructured/free-text patient-reported outcome (PRO) data available in electronic health records (EHRs). This systematic review summarizes the literature reporting NLP/ML systems/toolkits for analyzing PROs in clinical narratives of EHRs and discusses the future directions for the application of this modality in clinical care. METHODS We searched PubMed, Scopus, and Web of Science for studies written in English between 1/1/2000 and 12/31/2020. Seventy-nine studies meeting the eligibility criteria were included. We abstracted and summarized information related to the study purpose, patient population, type/source/amount of unstructured PRO data, linguistic features, and NLP systems/toolkits for processing unstructured PROs in EHRs. RESULTS Most of the studies used NLP/ML techniques to extract PROs from clinical narratives (n = 74) and mapped the extracted PROs into specific PRO domains for phenotyping or clustering purposes (n = 26). Some studies used NLP/ML to process PROs for predicting disease progression or onset of adverse events (n = 22) or developing/validating NLP/ML pipelines for analyzing unstructured PROs (n = 19). Studies used different linguistic features, including lexical, syntactic, semantic, and contextual features, to process unstructured PROs. Among the 25 NLP systems/toolkits we identified, 15 used rule-based NLP, 6 used hybrid NLP, and 4 used non-neural ML algorithms embedded in NLP. CONCLUSIONS This study supports the potential utility of different NLP/ML techniques in processing unstructured PROs available in EHRs for clinical care. Though using annotation rules for NLP/ML to analyze unstructured PROs is dominant, deploying novel neural ML-based methods is warranted.
Collapse
Affiliation(s)
- Jin-Ah Sim
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States; School of AI Convergence, Hallym University, Chuncheon, Republic of Korea
| | - Xiaolei Huang
- Department of Computer Science, University of Memphis, Memphis, TN, United States
| | - Madeline R Horan
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Christopher M Stewart
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States
| | - Leslie L Robison
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Melissa M Hudson
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States; Department of Oncology, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Justin N Baker
- Department of Pediatrics, Stanford University, Stanford, CA, United States
| | - I-Chan Huang
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States.
| |
Collapse
|
8
|
Gleason KT, Powell DS, Wec A, Zou X, Gamper MJ, Peereboom D, Wolff JL. Patient portal interventions: a scoping review of functionality, automation used, and therapeutic elements of patient portal interventions. JAMIA Open 2023; 6:ooad077. [PMID: 37663406 PMCID: PMC10469545 DOI: 10.1093/jamiaopen/ooad077] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Revised: 08/03/2023] [Accepted: 08/14/2023] [Indexed: 09/05/2023] Open
Abstract
Objectives We sought to understand the objectives, targeted populations, therapeutic elements, and delivery characteristics of patient portal interventions. Materials and Methods Following Arksey and O-Malley's methodological framework, we conducted a scoping review of manuscripts published through June 2022 by hand and systematically searching PubMed, PSYCHInfo, Embase, and Web of Science. The search yielded 5403 manuscripts; 248 were selected for full-text review; 81 met the eligibility criteria for examining outcomes of a patient portal intervention. Results The 81 articles described: trials involving comparison groups (n = 37; 45.7%), quality improvement initiatives (n = 15; 18.5%), pilot studies (n = 7; 8.6%), and single-arm studies (n = 22; 27.2%). Studies were conducted in primary care (n = 33, 40.7%), specialty outpatient (n = 24, 29.6%), or inpatient settings (n = 4, 4.9%)-or they were deployed system wide (n = 9, 11.1%). Interventions targeted specific health conditions (n = 35, 43.2%), promoted preventive services (n = 19, 23.5%), or addressed communication (n = 19, 23.4%); few specifically sought to improve the patient experience (n = 3, 3.7%). About half of the studies (n = 40, 49.4%) relied on human involvement, and about half involved personalized (vs exclusively standardized) elements (n = 42, 51.8%). Interventions commonly collected patient-reported information (n = 36, 44.4%), provided education (n = 35, 43.2%), or deployed preventive service reminders (n = 14, 17.3%). Discussion This scoping review finds that most patient portal interventions have delivered education or facilitated collection of patient-reported information. Few interventions have involved pragmatic designs or been deployed system wide. Conclusion The patient portal is an important tool in real-world efforts to more effectively support patients, but interventions to date rely largely on evidence from consented participants rather than pragmatically implemented systems-level initiatives.
Collapse
Affiliation(s)
- Kelly T Gleason
- Johns Hopkins University School of Nursing, Baltimore, MD 21225, United States
| | - Danielle S Powell
- Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, United States
| | - Aleksandra Wec
- Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, United States
| | - Xingyuan Zou
- Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, United States
| | - Mary Jo Gamper
- Johns Hopkins University School of Nursing, Baltimore, MD 21225, United States
| | - Danielle Peereboom
- Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, United States
| | - Jennifer L Wolff
- Department of Health Policy and Management, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, United States
| |
Collapse
|
9
|
Mermin-Bunnell K, Zhu Y, Hornback A, Damhorst G, Walker T, Robichaux C, Mathew L, Jaquemet N, Peters K, Johnson TM, Wang MD, Anderson B. Use of Natural Language Processing of Patient-Initiated Electronic Health Record Messages to Identify Patients With COVID-19 Infection. JAMA Netw Open 2023; 6:e2322299. [PMID: 37418261 PMCID: PMC10329205 DOI: 10.1001/jamanetworkopen.2023.22299] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/06/2023] [Accepted: 05/19/2023] [Indexed: 07/08/2023] Open
Abstract
Importance Natural language processing (NLP) has the potential to enable faster treatment access by reducing clinician response time and improving electronic health record (EHR) efficiency. Objective To develop an NLP model that can accurately classify patient-initiated EHR messages and triage COVID-19 cases to reduce clinician response time and improve access to antiviral treatment. Design, Setting, and Participants This retrospective cohort study assessed development of a novel NLP framework to classify patient-initiated EHR messages and subsequently evaluate the model's accuracy. Included patients sent messages via the EHR patient portal from 5 Atlanta, Georgia, hospitals between March 30 and September 1, 2022. Assessment of the model's accuracy consisted of manual review of message contents to confirm the classification label by a team of physicians, nurses, and medical students, followed by retrospective propensity score-matched clinical outcomes analysis. Exposure Prescription of antiviral treatment for COVID-19. Main Outcomes and Measures The 2 primary outcomes were (1) physician-validated evaluation of the NLP model's message classification accuracy and (2) analysis of the model's potential clinical effect via increased patient access to treatment. The model classified messages into COVID-19-other (pertaining to COVID-19 but not reporting a positive test), COVID-19-positive (reporting a positive at-home COVID-19 test result), and non-COVID-19 (not pertaining to COVID-19). Results Among 10 172 patients whose messages were included in analyses, the mean (SD) age was 58 (17) years; 6509 patients (64.0%) were women and 3663 (36.0%) were men. In terms of race and ethnicity, 2544 patients (25.0%) were African American or Black, 20 (0.2%) were American Indian or Alaska Native, 1508 (14.8%) were Asian, 28 (0.3%) were Native Hawaiian or other Pacific Islander, 5980 (58.8%) were White, 91 (0.9%) were more than 1 race or ethnicity, and 1 (0.01%) chose not to answer. The NLP model had high accuracy and sensitivity, with a macro F1 score of 94% and sensitivity of 85% for COVID-19-other, 96% for COVID-19-positive, and 100% for non-COVID-19 messages. Among the 3048 patient-generated messages reporting positive SARS-CoV-2 test results, 2982 (97.8%) were not documented in structured EHR data. Mean (SD) message response time for COVID-19-positive patients who received treatment (364.10 [784.47] minutes) was faster than for those who did not (490.38 [1132.14] minutes; P = .03). Likelihood of antiviral prescription was inversely correlated with message response time (odds ratio, 0.99 [95% CI, 0.98-1.00]; P = .003). Conclusions and Relevance In this cohort study of 2982 COVID-19-positive patients, a novel NLP model classified patient-initiated EHR messages reporting positive COVID-19 test results with high sensitivity. Furthermore, when responses to patient messages occurred faster, patients were more likely to receive antiviral medical prescription within the 5-day treatment window. Although additional analysis on the effect on clinical outcomes is needed, these findings represent a possible use case for integration of NLP algorithms into clinical care.
Collapse
Affiliation(s)
- Kellen Mermin-Bunnell
- Currently a medical student at Emory University School of Medicine, Atlanta, Georgia
| | - Yuanda Zhu
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta
| | - Andrew Hornback
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta
| | - Gregory Damhorst
- Division of Infectious Diseases, Emory University School of Medicine, Atlanta, Georgia
| | - Tiffany Walker
- Division of General Internal Medicine, Emory University School of Medicine, Atlanta, Georgia
| | - Chad Robichaux
- Department of Biomedical Informatics, Emory University School of Medicine, Atlanta, Georgia
| | - Lejy Mathew
- Division of General Internal Medicine, Emory University School of Medicine, Atlanta, Georgia
| | - Nour Jaquemet
- Currently a medical student at Emory University School of Medicine, Atlanta, Georgia
| | | | - Theodore M. Johnson
- Division of General Internal Medicine, Emory University School of Medicine, Atlanta, Georgia
- Atlanta Veterans Affairs Healthcare System, Decatur, Georgia
| | - May Dongmei Wang
- School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta
- School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta
- Department of Biomedical Engineering, Georgia Institute of Technology and Emory University, Atlanta, Georgia
| | - Blake Anderson
- Division of General Internal Medicine, Emory University School of Medicine, Atlanta, Georgia
- Atlanta Veterans Affairs Healthcare System, Decatur, Georgia
| |
Collapse
|
10
|
Armstrong M, Benda NC, Seier K, Rogers C, Ancker JS, Stetson PD, Peng Y, Diamond LC. Improving Cancer Care Communication: Identifying Sociodemographic Differences in Patient Portal Secure Messages Not Authored by the Patient. Appl Clin Inform 2023; 14:296-299. [PMID: 36657471 PMCID: PMC10115514 DOI: 10.1055/a-2015-8679] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2022] [Accepted: 01/16/2023] [Indexed: 01/21/2023] Open
Affiliation(s)
- Misha Armstrong
- Department of Surgery, New York Presbyterian-Weill Cornell Medicine, New York, New York
| | - Natalie C. Benda
- Department of Population Health Science, Weill Cornell Medicine, New York, New York
| | - Kenneth Seier
- Department of Epidemiology- Biostatistics, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Christopher Rogers
- Department of Health Informatics, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Jessica S. Ancker
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Peter D. Stetson
- Department of Health Informatics, Memorial Sloan Kettering Cancer Center, New York, New York
| | - Yifan Peng
- Department of Population Health Science, Weill Cornell Medicine, New York, New York
| | - Lisa C. Diamond
- Department of Population Health Science, Weill Cornell Medicine, New York, New York
- Department of Psychiatry and Behavioral Sciences, Immigrant Health and Cancer Disparities Service, Memorial Sloan Kettering Cancer Center, New York, New York
- Department of Medicine, Hospital Medicine Service, Memorial Sloan Kettering Cancer Center, New York, New York
| |
Collapse
|
11
|
Ilicki J. Challenges in evaluating the accuracy of AI-containing digital triage systems: A systematic review. PLoS One 2022; 17:e0279636. [PMID: 36574438 PMCID: PMC9794085 DOI: 10.1371/journal.pone.0279636] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Accepted: 12/12/2022] [Indexed: 12/28/2022] Open
Abstract
INTRODUCTION Patient-operated digital triage systems with AI components are becoming increasingly common. However, previous reviews have found a limited amount of research on such systems' accuracy. This systematic review of the literature aimed to identify the main challenges in determining the accuracy of patient-operated digital AI-based triage systems. METHODS A systematic review was designed and conducted in accordance with PRISMA guidelines in October 2021 using PubMed, Scopus and Web of Science. Articles were included if they assessed the accuracy of a patient-operated digital triage system that had an AI-component and could triage a general primary care population. Limitations and other pertinent data were extracted, synthesized and analysed. Risk of bias was not analysed as this review studied the included articles' limitations (rather than results). Results were synthesized qualitatively using a thematic analysis. RESULTS The search generated 76 articles and following exclusion 8 articles (6 primary articles and 2 reviews) were included in the analysis. Articles' limitations were synthesized into three groups: epistemological, ontological and methodological limitations. Limitations varied with regards to intractability and the level to which they can be addressed through methodological choices. Certain methodological limitations related to testing triage systems using vignettes can be addressed through methodological adjustments, whereas epistemological and ontological limitations require that readers of such studies appraise the studies with limitations in mind. DISCUSSION The reviewed literature highlights recurring limitations and challenges in studying the accuracy of patient-operated digital triage systems with AI components. Some of these challenges can be addressed through methodology whereas others are intrinsic to the area of inquiry and involve unavoidable trade-offs. Future studies should take these limitations in consideration in order to better address the current knowledge gaps in the literature.
Collapse
|
12
|
Benda NC, Rogers C, Sharma M, Narain W, Diamond LC, Ancker J, Seier K, Stetson PD, Sulieman L, Armstrong M, Peng Y. Identifying Nonpatient Authors of Patient Portal Secure Messages in Oncology: A Proof-of-Concept Demonstration of Natural Language Processing Methods. JCO Clin Cancer Inform 2022; 6:e2200071. [PMID: 36542818 PMCID: PMC10476725 DOI: 10.1200/cci.22.00071] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 10/03/2022] [Accepted: 10/26/2022] [Indexed: 12/24/2022] Open
Abstract
PURPOSE Patient portal secure messages are not always authored by the patient account holder. Understanding who authored the message is particularly important in an oncology setting where symptom reporting is crucial to patient treatment. Natural language processing has the potential to detect messages not authored by the patient automatically. METHODS Patient portal secure messages from the Memorial Sloan Kettering Cancer Center were retrieved and manually annotated as a predicted unregistered proxy (ie, not written by the patient) or a presumed patient. After randomly splitting the annotated messages into training and test sets in a 70:30 ratio, a bag-of-words approach was used to extract features and then a Least Absolute Shrinkage and Selection Operator (LASSO) model was trained and used for classification. RESULTS Portal secure messages (n = 2,000) were randomly selected from unique patient accounts and manually annotated. We excluded 335 messages from the data set as the annotators could not determine if they were written by a patient or proxy. Using the remaining 1,665 messages, a LASSO model was developed that achieved an area under the curve of 0.932 and an area under the precision recall curve of 0.748. The sensitivity and specificity related to classifying true-positive cases (predicted unregistered proxy-authored messages) and true negatives (presumed patient-authored messages) were 0.681 and 0.960, respectively. CONCLUSION Our work demonstrates the feasibility of using unstructured, heterogenous patient portal secure messages to determine portal secure message authorship. Identifying patient authorship in real time can improve patient portal account security and can be used to improve the quality of the information extracted from the patient portal, such as patient-reported outcomes.
Collapse
Affiliation(s)
- Natalie C. Benda
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
| | - Christopher Rogers
- Department of Health Informatics, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Mohit Sharma
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
| | - Wazim Narain
- Department of Health Informatics, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Lisa C. Diamond
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
- Immigrant Health and Cancer Disparities Service, Department of Psychiatry and Behavioral Sciences, Memorial Sloan Kettering Cancer Center, New York, NY
- Hospital Medicine Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Jessica Ancker
- Hospital Medicine Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Kenneth Seier
- Department of Epidemiology- Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Peter D. Stetson
- Department of Health Informatics, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Lina Sulieman
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Misha Armstrong
- Department of Surgery, New York Presbyterian-Weill Cornell Medicine, New York, NY
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
| |
Collapse
|
13
|
Watanabe T, Yada S, Aramaki E, Yajima H, Kizaki H, Hori S. Extracting Multiple Worries from Breast Cancer Patient Blogs Using Multi-Label Classification with a Natural Language-Processing Model BERT (Bidirectional Encoder Representations from Transformers): Infodemiology Study of Blogs (Preprint). JMIR Cancer 2022; 8:e37840. [PMID: 35657664 PMCID: PMC9206207 DOI: 10.2196/37840] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2022] [Revised: 05/10/2022] [Accepted: 05/23/2022] [Indexed: 12/26/2022] Open
Abstract
Background Patients with breast cancer have a variety of worries and need multifaceted information support. Their accumulated posts on social media contain rich descriptions of their daily worries concerning issues such as treatment, family, and finances. It is important to identify these issues to help patients with breast cancer to resolve their worries and obtain reliable information. Objective This study aimed to extract and classify multiple worries from text generated by patients with breast cancer using Bidirectional Encoder Representations From Transformers (BERT), a context-aware natural language processing model. Methods A total of 2272 blog posts by patients with breast cancer in Japan were collected. Five worry labels, “treatment,” “physical,” “psychological,” “work/financial,” and “family/friends,” were defined and assigned to each post. Multiple labels were allowed. To assess the label criteria, 50 blog posts were randomly selected and annotated by two researchers with medical knowledge. After the interannotator agreement had been assessed by means of Cohen kappa, one researcher annotated all the blogs. A multilabel classifier that simultaneously predicts five worries in a text was developed using BERT. This classifier was fine-tuned by using the posts as input and adding a classification layer to the pretrained BERT. The performance was evaluated for precision using the average of 5-fold cross-validation results. Results Among the blog posts, 477 included “treatment,” 1138 included “physical,” 673 included “psychological,” 312 included “work/financial,” and 283 included “family/friends.” The interannotator agreement values were 0.67 for “treatment,” 0.76 for “physical,” 0.56 for “psychological,” 0.73 for “work/financial,” and 0.73 for “family/friends,” indicating a high degree of agreement. Among all blog posts, 544 contained no label, 892 contained one label, and 836 contained multiple labels. It was found that the worries varied from user to user, and the worries posted by the same user changed over time. The model performed well, though prediction performance differed for each label. The values of precision were 0.59 for “treatment,” 0.82 for “physical,” 0.64 for “psychological,” 0.67 for “work/financial,” and 0.58 for “family/friends.” The higher the interannotator agreement and the greater the number of posts, the higher the precision tended to be. Conclusions This study showed that the BERT model can extract multiple worries from text generated from patients with breast cancer. This is the first application of a multilabel classifier using the BERT model to extract multiple worries from patient-generated text. The results will be helpful to identify breast cancer patients’ worries and give them timely social support.
Collapse
Affiliation(s)
- Tomomi Watanabe
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Shuntaro Yada
- Nara Institute of Science and Technology, Nara, Japan
| | - Eiji Aramaki
- Nara Institute of Science and Technology, Nara, Japan
| | | | - Hayato Kizaki
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Satoko Hori
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| |
Collapse
|
14
|
Abstract
OBJECTIVES We aim to extract a subset of social factors from clinical notes using common text classification methods. DESIGN Retrospective chart review. SETTING We collaborated with a local level I trauma hospital located in an underserved area that has a housing unstable patient population of about 6.5% and extracted text notes related to various social determinants for acute care patients. PARTICIPANTS Notes were retrospectively extracted from 43 798 acute care patients. METHODS We solely use open source Python packages to test simple text classification methods that can potentially be easily generalisable and implemented. We extracted social history text from various sources, such as admission and emergency department notes, over a 5-year timeframe and performed manual chart reviews to ensure data quality. We manually labelled the sentiment of the notes, treating each text entry independently. Four different models with two different feature selection methods (bag of words and bigrams) were used to classify and predict housing stability, tobacco use and alcohol use status for the extracted clinical text. RESULTS From our analysis, we found overall positive results and metrics in applying open-source classification techniques; the accuracy scores were 91.2%, 84.7%, 82.8% for housing stability, tobacco use and alcohol use, respectively. There were many limitations in our analysis including social factors not present due to patient condition, multiple copy-forward entries and shorthand. Additionally, it was difficult to translate usage degrees for tobacco and alcohol use. However, when compared with structured data sources, our classification approach on unstructured notes yielded more results for housing and alcohol use; tobacco use proved less fruitful for unstructured notes.
Collapse
Affiliation(s)
- Andrew Teng
- Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA
| | - Adam Wilcox
- Biomedical Informatics and Medical Education, University of Washington, Seattle, Washington, USA
| |
Collapse
|
15
|
Roy M, Purington N, Liu M, Blayney DW, Kurian AW, Schapira L. Limited English Proficiency and Disparities in Health Care Engagement Among Patients With Breast Cancer. JCO Oncol Pract 2021; 17:e1837-e1845. [PMID: 33844591 PMCID: PMC9810131 DOI: 10.1200/op.20.01093] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
PURPOSE Race and ethnicity have been shown to affect quality of cancer care, and patients with low English proficiency (LEP) have increased risk for serious adverse events. We sought to assess the impact of primary language on health care engagement as indicated by clinical trial screening and engagement, use of genetic counseling, and communication via an electronic patient portal. METHODS Clinical and demographic data on patients with breast cancer diagnosed and treated from 2013 to 2018 within the Stanford University Health Care system were compiled via linkage of electronic health records, an internal clinical trial database, and the California Cancer Registry. Logistic and linear regression models were used to evaluate for association of clinical trial engagement and patient portal message rates with primary language group. RESULTS Patients with LEP had significantly lower rates of clinical trial engagement compared with their English-speaking counterparts (adjusted odds ratio [OR], 0.29; 95% CI, 0.16 to 0.51). Use of genetic counseling was similar between language groups. Rates of patient portal messaging did not differ between English-speaking and LEP groups on multivariable analysis; however, patients with LEP were less likely to have a portal account (adjusted OR, 0.89; 95% CI, 0.83 to 0.96). Among LEP subgroups, Spanish speakers were significantly less likely to engage with the patient portal compared with English speakers (estimated difference in monthly rate: OR, 0.43; 95% CI, 0.24 to 0.77). CONCLUSION We found that patients with LEP had lower rates of clinical trial engagement and odds of electronic patient portal enrollment. Interventions designed to overcome language and cultural barriers are essential to optimize the experience of patients with LEP.
Collapse
Affiliation(s)
- Mohana Roy
- Stanford University School of Medicine and Stanford Cancer Institute, Stanford, CA,Mohana Roy, MD, Division of Hematology and Oncology, Stanford University School of Medicine, 875 Blake Wilbur Rd, Stanford, CA 94305; e-mail:
| | - Natasha Purington
- Quantitative Sciences Unit, Stanford University School of Medicine, Stanford, CA
| | - Mina Liu
- Research Informatics Center, Stanford University School of Medicine, Stanford, CA
| | - Douglas W. Blayney
- Stanford University School of Medicine and Stanford Cancer Institute, Stanford, CA
| | - Allison W. Kurian
- Stanford University School of Medicine and Stanford Cancer Institute, Stanford, CA,Departments of Medicine and of Epidemiology and Population Health, Stanford University, Stanford, CA
| | - Lidia Schapira
- Stanford University School of Medicine and Stanford Cancer Institute, Stanford, CA
| |
Collapse
|
16
|
Lenivtceva ID, Kopanitsa G. The Pipeline for Standardizing Russian Unstructured Allergy Anamnesis Using FHIR AllergyIntolerance Resource. Methods Inf Med 2021; 60:95-103. [PMID: 34425626 DOI: 10.1055/s-0041-1733945] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
Abstract
BACKGROUND The larger part of essential medical knowledge is stored as free text which is complicated to process. Standardization of medical narratives is an important task for data exchange, integration, and semantic interoperability. OBJECTIVES The article aims to develop the end-to-end pipeline for structuring Russian free-text allergy anamnesis using international standards. METHODS The pipeline for free-text data standardization is based on FHIR (Fast Healthcare Interoperability Resources) and SNOMED CT (Systematized Nomenclature of Medicine Clinical Terms) to ensure semantic interoperability. The pipeline solves common tasks such as data preprocessing, classification, categorization, entities extraction, and semantic codes assignment. Machine learning methods, rule-based, and dictionary-based approaches were used to compose the pipeline. The pipeline was evaluated on 166 randomly chosen medical records. RESULTS AllergyIntolerance resource was used to represent allergy anamnesis. The module for data preprocessing included the dictionary with over 90,000 words, including specific medication terms, and more than 20 regular expressions for errors correction, classification, and categorization modules resulted in four dictionaries with allergy terms (total 2,675 terms), which were mapped to SNOMED CT concepts. F-scores for different steps are: 0.945 for filtering, 0.90 to 0.96 for allergy categorization, 0.90 and 0.93 for allergens reactions extraction, respectively. The allergy terminology coverage is more than 95%. CONCLUSION The proposed pipeline is a step to ensure semantic interoperability of Russian free-text medical records and could be effective in standardization systems for further data exchange and integration.
Collapse
Affiliation(s)
- Iuliia D Lenivtceva
- National Center for Cognitive Research, ITMO University, Saint-Petersburg, Russia
| | - Georgy Kopanitsa
- National Center for Cognitive Research, ITMO University, Saint-Petersburg, Russia
| |
Collapse
|
17
|
De A, Huang M, Feng T, Yue X, Yao L. Analyzing Patient Secure Messages Using a Fast Health Care Interoperability Resources (FIHR)-Based Data Model: Development and Topic Modeling Study. J Med Internet Res 2021; 23:e26770. [PMID: 34328444 PMCID: PMC8367168 DOI: 10.2196/26770] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2021] [Revised: 05/10/2021] [Accepted: 05/24/2021] [Indexed: 01/26/2023] Open
Abstract
Background Patient portals tethered to electronic health records systems have become attractive web platforms since the enacting of the Medicare Access and Children’s Health Insurance Program Reauthorization Act and the introduction of the Meaningful Use program in the United States. Patients can conveniently access their health records and seek consultation from providers through secure web portals. With increasing adoption and patient engagement, the volume of patient secure messages has risen substantially, which opens up new research and development opportunities for patient-centered care. Objective This study aims to develop a data model for patient secure messages based on the Fast Healthcare Interoperability Resources (FHIR) standard to identify and extract significant information. Methods We initiated the first draft of the data model by analyzing FHIR and manually reviewing 100 sentences randomly sampled from more than 2 million patient-generated secure messages obtained from the online patient portal at the Mayo Clinic Rochester between February 18, 2010, and December 31, 2017. We then annotated additional sets of 100 randomly selected sentences using the Multi-purpose Annotation Environment tool and updated the data model and annotation guideline iteratively until the interannotator agreement was satisfactory. We then created a larger corpus by annotating 1200 randomly selected sentences and calculated the frequency of the identified medical concepts in these sentences. Finally, we performed topic modeling analysis to learn the hidden topics of patient secure messages related to 3 highly mentioned microconcepts, namely, fatigue, prednisone, and patient visit, and to evaluate the proposed data model independently. Results The proposed data model has a 3-level hierarchical structure of health system concepts, including 3 macroconcepts, 28 mesoconcepts, and 85 microconcepts. Foundation and base macroconcepts comprise 33.99% (841/2474), clinical macroconcepts comprise 64.38% (1593/2474), and financial macroconcepts comprise 1.61% (40/2474) of the annotated corpus. The top 3 mesoconcepts among the 28 mesoconcepts are condition (505/2474, 20.41%), medication (424/2474, 17.13%), and practitioner (243/2474, 9.82%). Topic modeling identified hidden topics of patient secure messages related to fatigue, prednisone, and patient visit. A total of 89.2% (107/120) of the top-ranked topic keywords are actually the health concepts of the data model. Conclusions Our data model and annotated corpus enable us to identify and understand important medical concepts in patient secure messages and prepare us for further natural language processing analysis of such free texts. The data model could be potentially used to automatically identify other types of patient narratives, such as those in various social media and patient forums. In the future, we plan to develop a machine learning and natural language processing solution to enable automatic triaging solutions to reduce the workload of clinicians and perform more granular content analysis to understand patients’ needs and improve patient-centered care.
Collapse
Affiliation(s)
- Amrita De
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Ming Huang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Tinghao Feng
- Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC, United States
| | - Xiaomeng Yue
- Division of Pharmacy Practice and Administrative Sciences, James L. Winkle College of Pharmacy, University of Cincinnati, Cincinnati, OH, United States
| | - Lixia Yao
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
18
|
Steitz BD, Sulieman L, Warner JL, Fabbri D, Brown JT, Davis AL, Unertl KM. Classification and analysis of asynchronous communication content between care team members involved in breast cancer treatment. JAMIA Open 2021; 4:ooab049. [PMID: 34396056 PMCID: PMC8358477 DOI: 10.1093/jamiaopen/ooab049] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2021] [Revised: 05/06/2021] [Accepted: 06/16/2021] [Indexed: 12/03/2022] Open
Abstract
OBJECTIVE A growing research literature has highlighted the work of managing and triaging clinical messages as a major contributor to professional exhaustion and burnout. The goal of this study was to discover and quantify the distribution of message content sent among care team members treating patients with breast cancer. MATERIALS AND METHODS We analyzed nearly two years of communication data from the electronic health record (EHR) between care team members at Vanderbilt University Medical Center. We applied natural language processing to perform sentence-level annotation into one of five information types: clinical, medical logistics, nonmedical logistics, social, and other. We combined sentence-level annotations for each respective message. We evaluated message content by team member role and clinic activity. RESULTS Our dataset included 81 857 messages containing 613 877 sentences. Across all roles, 63.4% and 21.8% of messages contained logistical information and clinical information, respectively. Individuals in administrative or clinical staff roles sent 81% of all messages containing logistical information. There were 33.2% of messages sent by physicians containing clinical information-the most of any role. DISCUSSION AND CONCLUSION Our results demonstrate that EHR-based asynchronous communication is integral to coordinate care for patients with breast cancer. By understanding the content of messages sent by care team members, we can devise informatics initiatives to improve physicians' clerical burden and reduce unnecessary interruptions.
Collapse
Affiliation(s)
- Bryan D Steitz
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Lina Sulieman
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Jeremy L Warner
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Division of Hematology/Oncology, Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Daniel Fabbri
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - J Thomas Brown
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Alyssa L Davis
- Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Kim M Unertl
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| |
Collapse
|
19
|
Understanding current states of machine learning approaches in medical informatics: a systematic literature review. HEALTH AND TECHNOLOGY 2021. [DOI: 10.1007/s12553-021-00538-6] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
|
20
|
Decker BM, Hill CE, Baldassano SN, Khankhanian P. Can antiepileptic efficacy and epilepsy variables be studied from electronic health records? A review of current approaches. Seizure 2021; 85:138-144. [PMID: 33461032 DOI: 10.1016/j.seizure.2020.11.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/16/2020] [Accepted: 11/17/2020] [Indexed: 12/16/2022] Open
Abstract
As automated data extraction and natural language processing (NLP) are rapidly evolving, improving healthcare delivery by harnessing large data is garnering great interest. Assessing antiepileptic drug (AED) efficacy and other epilepsy variables pertinent to healthcare delivery remain a critical barrier to improving patient care. In this systematic review, we examined automatic electronic health record (EHR) extraction methodologies pertinent to epilepsy. We also reviewed more generalizable NLP pipelines to extract other critical patient variables. Our review found varying reports of performance measures. Whereas automated data extraction pipelines are a crucial advancement, this review calls attention to standardizing NLP methodology and accuracy reporting for greater generalizability. Moreover, the use of crowdsourcing competitions to spur innovative NLP pipelines would further advance this field.
Collapse
Affiliation(s)
- Barbara M Decker
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States.
| | - Chloé E Hill
- Department of Neurology, University of Michigan, 1500 East Medical Center Drive, Ann Arbor, MI, 48109, United States
| | - Steven N Baldassano
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States
| | - Pouya Khankhanian
- Center for Neuroengineering and Therapeutics, Department of Neurology, University of Pennsylvania, 3400 Spruce Street, Philadelphia, PA, 19104, United States
| |
Collapse
|
21
|
Coombs C, Hislop D, Taneva SK, Barnard S. The strategic impacts of Intelligent Automation for knowledge and service work: An interdisciplinary review. JOURNAL OF STRATEGIC INFORMATION SYSTEMS 2020. [DOI: 10.1016/j.jsis.2020.101600] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
|
22
|
Borjali A, Magnéli M, Shin D, Malchau H, Muratoglu OK, Varadarajan KM. Natural language processing with deep learning for medical adverse event detection from free-text medical narratives: A case study of detecting total hip replacement dislocation. Comput Biol Med 2020; 129:104140. [PMID: 33278631 DOI: 10.1016/j.compbiomed.2020.104140] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2020] [Revised: 11/18/2020] [Accepted: 11/19/2020] [Indexed: 12/11/2022]
Abstract
BACKGROUND Accurate and timely detection of medical adverse events (AEs) from free-text medical narratives can be challenging. Natural language processing (NLP) with deep learning has already shown great potential for analyzing free-text data, but its application for medical AE detection has been limited. METHOD In this study, we developed deep learning based NLP (DL-NLP) models for efficient and accurate hip dislocation AE detection following primary total hip replacement from standard (radiology notes) and non-standard (follow-up telephone notes) free-text medical narratives. We benchmarked these proposed models with traditional machine learning based NLP (ML-NLP) models, and also assessed the accuracy of International Classification of Diseases (ICD) and Current Procedural Terminology (CPT) codes in capturing these hip dislocation AEs in a multi-center orthopaedic registry. RESULTS All DL-NLP models outperformed all of the ML-NLP models, with a convolutional neural network (CNN) model achieving the best overall performance (Kappa = 0.97 for radiology notes, and Kappa = 1.00 for follow-up telephone notes). On the other hand, the ICD/CPT codes of the patients who sustained a hip dislocation AE were only 75.24% accurate. CONCLUSIONS We demonstrated that a DL-NLP model can be used in largescale orthopaedic registries for accurate and efficient detection of hip dislocation AEs. The NLP model in this study was developed with data from the most frequently used electronic medical record (EMR) system in the U.S., Epic. This NLP model could potentially be implemented in other Epic-based EMR systems to improve AE detection, and consequently, quality of care and patient outcomes.
Collapse
Affiliation(s)
- Alireza Borjali
- Department of Orthopaedic Surgery, Harris Orthopaedics Laboratory, Massachusetts General Hospital, Boston, MA, USA; Department of Orthopaedic Surgery, Harvard Medical School, Boston, MA, USA
| | - Martin Magnéli
- Department of Orthopaedic Surgery, Harris Orthopaedics Laboratory, Massachusetts General Hospital, Boston, MA, USA; Department of Orthopaedic Surgery, Harvard Medical School, Boston, MA, USA; Karolinska Institutet, Department of Clinical Sciences, Danderyd Hospital, Stockholm, Sweden
| | - David Shin
- Department of Orthopaedic Surgery, Harris Orthopaedics Laboratory, Massachusetts General Hospital, Boston, MA, USA
| | - Henrik Malchau
- Department of Orthopaedic Surgery, Harris Orthopaedics Laboratory, Massachusetts General Hospital, Boston, MA, USA; Department of Orthopaedic Surgery, Sahlgrenska University Hospital, Sweden
| | - Orhun K Muratoglu
- Department of Orthopaedic Surgery, Harris Orthopaedics Laboratory, Massachusetts General Hospital, Boston, MA, USA; Department of Orthopaedic Surgery, Harvard Medical School, Boston, MA, USA
| | - Kartik M Varadarajan
- Department of Orthopaedic Surgery, Harris Orthopaedics Laboratory, Massachusetts General Hospital, Boston, MA, USA; Department of Orthopaedic Surgery, Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
23
|
Sulieman L, Robinson JR, Jackson GP. Automating the Classification of Complexity of Medical Decision-Making in Patient-Provider Messaging in a Patient Portal. J Surg Res 2020; 255:224-232. [PMID: 32570124 PMCID: PMC7303623 DOI: 10.1016/j.jss.2020.05.039] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2020] [Revised: 04/09/2020] [Accepted: 05/05/2020] [Indexed: 01/24/2023]
Abstract
BACKGROUND Patient portals are consumer health applications that allow patients to view their health information. Portals facilitate the interactions between patients and their caregivers by offering secure messaging. Patients communicate different needs through portal messages. Medical needs contain requests for delivery of care (e.g. reporting new symptoms). Automating the classification of medical decision complexity in portal messages has not been investigated. MATERIALS AND METHODS We trained two multiclass classifiers, multinomial Naïve Bayes and random forest on 500 message threads, to quantify and label the complexity of decision-making into four classes: no decision, straightforward, low, and moderate. We compared the performance of the models to using only the number of medical terms without training a machine learning model. RESULTS Our analysis demonstrated that machine learning models have better performance than the model that did not use machine learning. Moreover, machine learning models could quantify the complexity of decision-making that the messages contained with 0.59, 0.45, and 0.58 for macro, micro, and weighted precision and 0.63,0.41, and 0.63 for macro, micro, and weighted recall. CONCLUSIONS This study is one of the first to attempt to classify patient portal messages by whether they involve medical decision-making and the complexity of that decision-making. Machine learning classifiers trained on message content resulted in better message thread classification than classifiers that employed medical terms in the messages alone.
Collapse
Affiliation(s)
- Lina Sulieman
- Vanderbilt University Medical Center, Nashville, Tennessee.
| | - Jamie R Robinson
- Vanderbilt University Medical Center, Nashville, Tennessee; IBM Watson Health, IBM, Cambridge, Massachusetts
| | - Gretchen P Jackson
- Vanderbilt University Medical Center, Nashville, Tennessee; IBM Watson Health, IBM, Cambridge, Massachusetts
| |
Collapse
|
24
|
Coquet J, Blayney DW, Brooks JD, Hernandez-Boussard T. Association between patient-initiated emails and overall 2-year survival in cancer patients undergoing chemotherapy: Evidence from the real-world setting. Cancer Med 2020; 9:8552-8561. [PMID: 32986931 PMCID: PMC7666724 DOI: 10.1002/cam4.3483] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2020] [Revised: 07/09/2020] [Accepted: 09/02/2020] [Indexed: 11/12/2022] Open
Abstract
PURPOSE Prior studies suggest email communication between patients and providers may improve patient engagement and health outcomes. The purpose of this study was to determine whether patient-initiated emails are associated with overall survival benefits among cancer patients undergoing chemotherapy. PATIENTS AND METHODS We identified patient-initiated emails through the patient portal in electronic health records (EHR) among 9900 cancer patients receiving chemotherapy between 2013 and 2018. Email users were defined as patients who sent at least one email 12 months before to 2 months after chemotherapy started. A propensity score-matched cohort analysis was carried out to reduce bias due to confounding (age, primary cancer type, gender, insurance payor, ethnicity, race, stage, income, Charlson score, county of residence). The cohort included 3223 email users and 3223 non-email users. The primary outcome was overall 2-year survival stratified by email use. Secondary outcomes included number of face-to-face visits, prescriptions, and telephone calls. The healthcare teams' response to emails and other forms of communication was also investigated. Finally, a quality measure related to chemotherapy-related inpatient and emergency department visits was evaluated. RESULTS Overall 2-year survival was higher in patients who were email users, with an adjusted hazard ratio of 0.80 (95 CI 0.72-0.90; p < 0.001). Email users had higher rates of healthcare utilization, including face-to-face visits (63 vs. 50; p < 0.001), drug prescriptions (28 vs. 21; p < 0.001), and phone calls (18 vs. 16; p < 0.001). Clinical quality outcome measure of inpatient use was better among email users (p = 0.015). CONCLUSION Patient-initiated emails are associated with a survival benefit among cancer patients receiving chemotherapy and may be a proxy for patient engagement. As value-based payment models emphasize incorporating the patients' voice into their care, email communications could serve as a novel source of patient-generated data.
Collapse
Affiliation(s)
- Jean Coquet
- Department of Medicine, Stanford University, Stanford, CA, USA
| | - Douglas W Blayney
- Department of Medicine, Stanford University, Stanford, CA, USA.,Stanford Cancer Institute, Stanford University School of Medicine, Stanford, CA, USA
| | - James D Brooks
- Department of Urology, Stanford University School of Medicine, Stanford, CA, USA
| | - Tina Hernandez-Boussard
- Department of Medicine, Stanford University, Stanford, CA, USA.,Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.,Department of Surgery, Stanford University School of Medicine, Stanford, CA, USA
| |
Collapse
|
25
|
Crossley SA, Balyan R, Liu J, Karter AJ, McNamara D, Schillinger D. Predicting the readability of physicians' secure messages to improve health communication using novel linguistic features: Findings from the ECLIPPSE study. ACTA ACUST UNITED AC 2020; 13:1-13. [PMID: 34306181 DOI: 10.1080/17538068.2020.1822726] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
Abstract
Background Low literacy skills impact important aspects of communication, including health-related information exchanges. Unsuccessful communication on the part of physician or patient contributes to lower quality of care, is associated with poorer chronic disease control, jeopardizes patient safety and can lead to unfavorable healthcare utilization patterns. To date, very little research has focused on digital communication between physicians and patients, such as secure messages sent via electronic patient portals. Method The purpose of the current study is to develop an automated readability formula to better understand what elements of physicians' digital messages make them more or less difficult to understand. The formula is developed using advanced natural language processing (NLP) to predict human ratings of physician text difficulty. Results The results indicate that NLP indices that capture a diverse set of linguistic features predict the difficulty of physician messages better than classic readability tools such as Flesch Kincaid Grade Level. Our results also provide information about the textual features that best explain text readability. Conclusion Implications for how the readability formula could provide feedback to physicians to improve digital health communication by promoting linguistic concordance between physician and patient are discussed.
Collapse
Affiliation(s)
- Scott A Crossley
- Department of Applied Linguistics/ESL, Georgia State University, Atlanta, GA, USA
| | - Renu Balyan
- Department of Psychology, Arizona State University, Tempe, AZ, USA
| | - Jennifer Liu
- Kaiser Permanente Northern California, Oakland, CA, USA
| | | | | | - Dean Schillinger
- Division of General Internal Medicine and Health Communications Research Program, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
26
|
North F, Luhman KE, Mallmann EA, Mallmann TJ, Tulledge-Scheitel SM, North EJ, Pecina JL. A Retrospective Analysis of Provider-to-Patient Secure Messages: How Much Are They Increasing, Who Is Doing the Work, and Is the Work Happening After Hours? JMIR Med Inform 2020; 8:e16521. [PMID: 32673238 PMCID: PMC7381047 DOI: 10.2196/16521] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2019] [Revised: 03/17/2020] [Accepted: 04/15/2020] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Patient portal registration and the use of secure messaging are increasing. However, little is known about how the work of responding to and initiating patient messages is distributed among care team members and how these messages may affect work after hours. OBJECTIVE This study aimed to examine the growth of secure messages and determine how the work of provider responses to patient-initiated secure messages and provider-initiated secure messages is distributed across care teams and across work and after-work hours. METHODS We collected secure messages sent from providers from January 1, 2013, to March 15, 2018, at Mayo Clinic, Rochester, Minnesota, both in response to patient secure messages and provider-initiated secure messages. We examined counts of messages over time, how the work of responding to messages and initiating messages was distributed among health care workers, messages sent per provider, messages per unique patient, and when the work was completed (proportion of messages sent after standard work hours). RESULTS Portal registration for patients having clinic visits increased from 33% to 62%, and increasingly more patients and providers were engaged in messaging. Provider message responses to individual patients increased significantly in both primary care and specialty practices. Message responses per specialty physician provider increased from 15 responses per provider per year to 53 responses per provider per year from 2013 to 2018, resulting in a 253% increase. Primary care physician message responses increased from 153 per provider per year to 322 from 2013 to 2018, resulting in a 110% increase. Physicians, nurse practitioners, physician assistants, and registered nurses, all contributed to the substantial increases in the number of messages sent. CONCLUSIONS Provider-sent secure messages at a large health care institution have increased substantially since implementation of secure messaging between patients and providers. The effort of responding to and initiating messages to patients was distributed across multiple provider categories. The percentage of message responses occurring after hours showed little substantial change over time compared with the overall increase in message volume.
Collapse
Affiliation(s)
- Frederick North
- Division of Community Internal Medicine, Department of Medicine, Mayo Clinic, Rochester, MN, United States
| | | | - Eric A Mallmann
- Undergraduate Research Education Program, Mayo Clinic, Rochester, MN, United States
| | - Toby J Mallmann
- Undergraduate Research Education Program, Mayo Clinic, Rochester, MN, United States
| | - Sidna M Tulledge-Scheitel
- Division of Community Internal Medicine, Department of Medicine, Mayo Clinic, Rochester, MN, United States
| | - Emily J North
- Department of Medicine, NYU Grossman School of Medicine, New York, NY, United States
| | - Jennifer L Pecina
- Department of Family Medicine, Mayo Clinic, Rochester, MN, United States
| |
Collapse
|
27
|
Leightley D, Pernet D, Velupillai S, Stewart RJ, Mark KM, Opie E, Murphy D, Fear NT, Stevelink SAM. The Development of the Military Service Identification Tool: Identifying Military Veterans in a Clinical Research Database Using Natural Language Processing and Machine Learning. JMIR Med Inform 2020; 8:e15852. [PMID: 32348287 PMCID: PMC7281146 DOI: 10.2196/15852] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2019] [Revised: 12/11/2019] [Accepted: 01/26/2020] [Indexed: 02/07/2023] Open
Abstract
BACKGROUND Electronic health care records (EHRs) are a rich source of health-related information, with potential for secondary research use. In the United Kingdom, there is no national marker for identifying those who have previously served in the Armed Forces, making analysis of the health and well-being of veterans using EHRs difficult. OBJECTIVE This study aimed to develop a tool to identify veterans from free-text clinical documents recorded in a psychiatric EHR database. METHODS Veterans were manually identified using the South London and Maudsley (SLaM) Biomedical Research Centre Clinical Record Interactive Search-a database holding secondary mental health care electronic records for the SLaM National Health Service Foundation Trust. An iterative approach was taken; first, a structured query language (SQL) method was developed, which was then refined using natural language processing and machine learning to create the Military Service Identification Tool (MSIT) to identify if a patient was a civilian or veteran. Performance, defined as correct classification of veterans compared with incorrect classification, was measured using positive predictive value, negative predictive value, sensitivity, F1 score, and accuracy (otherwise termed Youden Index). RESULTS A gold standard dataset of 6672 free-text clinical documents was manually annotated by human coders. Of these documents, 66.00% (4470/6672) were then used to train the SQL and MSIT approaches and 34.00% (2202/6672) were used for testing the approaches. To develop the MSIT, an iterative 2-stage approach was undertaken. In the first stage, an SQL method was developed to identify veterans using a keyword rule-based approach. This approach obtained an accuracy of 0.93 in correctly predicting civilians and veterans, a positive predictive value of 0.81, a sensitivity of 0.75, and a negative predictive value of 0.95. This method informed the second stage, which was the development of the MSIT using machine learning, which, when tested, obtained an accuracy of 0.97, a positive predictive value of 0.90, a sensitivity of 0.91, and a negative predictive value of 0.98. CONCLUSIONS The MSIT has the potential to be used in identifying veterans in the United Kingdom from free-text clinical documents, providing new and unique insights into the health and well-being of this population and their use of mental health care services.
Collapse
Affiliation(s)
- Daniel Leightley
- King's Centre for Military Health Research, King's College London, London, United Kingdom
| | - David Pernet
- King's Centre for Military Health Research, King's College London, London, United Kingdom
| | - Sumithra Velupillai
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
- South London and Maudsley NHS Foundation Trust, London, United Kingdom
| | - Robert J Stewart
- Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
- South London and Maudsley NHS Foundation Trust, London, United Kingdom
| | - Katharine M Mark
- King's Centre for Military Health Research, King's College London, London, United Kingdom
| | - Elena Opie
- King's Centre for Military Health Research, King's College London, London, United Kingdom
| | - Dominic Murphy
- King's Centre for Military Health Research, King's College London, London, United Kingdom
- Combat Stress, Letherhead, United Kingdom
| | - Nicola T Fear
- King's Centre for Military Health Research, King's College London, London, United Kingdom
- Academic Department of Military Mental Health, King's College London, London, United Kingdom
| | - Sharon A M Stevelink
- King's Centre for Military Health Research, King's College London, London, United Kingdom
- Department of Psychological Medicine, Institute of Psychiatry, Psychology and Neuroscience, King's College London, London, United Kingdom
| |
Collapse
|
28
|
Sulieman L, Yin Z, Malin BA. Why Patient Portal Messages Indicate Risk of Readmission for Patients with Ischemic Heart Disease. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2020; 2019:828-837. [PMID: 32308879 PMCID: PMC7153079] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Online portals enable patients to exchanging messages with healthcare providers. After discharge, patients message providers to ask questions and report problems. Care providers read and respond accordingly, which requires a non trivial amount of human effort and is unlikely to scale up as portals become more popular. Automatically detecting when a message indicates a worsening in a patient's condition can assist providers to identify patients at risk of readmission. We investigated the association between messages that patients, diagnosed with ischemic heart disease, sent after discharge and the risk of readmission. We studied 4,052 messages sent after discharge for 1,552 patients. We represented messages using inferred latent topics, linguistic features (e.g. emotions, activities), and clusters of medical terms. Our analysis indicates that mentioning medication dosage and additional procedures are associated with readmission. Moreover, patients who were readmitted rarely mentioned leisurely activities or described their insights about their health information.
Collapse
|
29
|
Yin Z, Harrell M, Warner JL, Chen Q, Fabbri D, Malin BA. The therapy is making me sick: how online portal communications between breast cancer patients and physicians indicate medication discontinuation. J Am Med Inform Assoc 2019; 25:1444-1451. [PMID: 30380083 DOI: 10.1093/jamia/ocy118] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2018] [Accepted: 08/10/2018] [Indexed: 12/13/2022] Open
Abstract
Objective Online platforms have created a variety of opportunities for breast patients to discuss their hormonal therapy, a long-term adjuvant treatment to reduce the chance of breast cancer occurrence and mortality. The goal of this investigation is to ascertain the extent to which the messages breast cancer patients communicated through an online portal can indicate their potential for discontinuing hormonal therapy. Materials and Methods We studied the de-identified electronic medical records of 1106 breast cancer patients who were prescribed hormonal therapy at Vanderbilt University Medical Center over a 12-year period. We designed a data-driven approach to investigate patients' patterns of messaging with healthcare providers, the topics they communicated, and the extent to which these messaging behaviors associate with the likelihood that a patient will discontinue a prescribed 5-year regimen of therapy. Results The results indicates that messaging rate over time [hazard ratio (HR) = 1.373, P = 0.002], mentions of side effects (HR = 1.214, P = 0.006), and surgery-related topics (HR = 1.170, P = 0.034) were associated with increased risk of early medication discontinuation. In contrast, seeking professional suggestions (HR = 0.766, P = 0.002), expressing gratitude to healthcare providers (HR = 0.872, P = 0.044), and mentions of drugs used to treat side effects (HR = 0.807, P = 0.013) were associated with decreased risk of medication discontinuation. Discussion and Conclusion This investigation suggests that patient-generated content can inform the study of health-related behaviors. Given that approximately 50% of breast cancer patients do not complete a course of hormonal therapy as described, the identification of factors associated with medication discontinuation can facilitate real-time interventions to prevent early discontinuation.
Collapse
Affiliation(s)
- Zhijun Yin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | | | - Jeremy L Warner
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Qingxia Chen
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Daniel Fabbri
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA
| | - Bradley A Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA
| |
Collapse
|
30
|
Hughes KS, Zhou J, Bao Y, Singh P, Wang J, Yin K. Natural language processing to facilitate breast cancer research and management. Breast J 2019; 26:92-99. [PMID: 31854067 DOI: 10.1111/tbj.13718] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2019] [Accepted: 10/02/2019] [Indexed: 12/23/2022]
Abstract
The medical literature has been growing exponentially, and its size has become a barrier for physicians to locate and extract clinically useful information. As a promising solution, natural language processing (NLP), especially machine learning (ML)-based NLP is a technology that potentially provides a promising solution. ML-based NLP is based on training a computational algorithm with a large number of annotated examples to allow the computer to "learn" and "predict" the meaning of human language. Although NLP has been widely applied in industry and business, most physicians still are not aware of the huge potential of this technology in medicine, and the implementation of NLP in breast cancer research and management is fairly limited. With a real-world successful project of identifying penetrance papers for breast and other cancer susceptibility genes, this review illustrates how to train and evaluate an NLP-based medical abstract classifier, incorporate it into a semiautomatic meta-analysis procedure, and validate the effectiveness of this procedure. Other implementations of NLP technology in breast cancer research, such as parsing pathology reports and mining electronic healthcare records, are also discussed. We hope this review will help breast cancer physicians and researchers to recognize, understand, and apply this technology to meet their own clinical or research needs.
Collapse
Affiliation(s)
- Kevin S Hughes
- Division of Surgical Oncology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
| | - Jingan Zhou
- Division of Surgical Oncology, Massachusetts General Hospital and Harvard Medical School, Boston, MA.,Department of General Surgery, Beijing Anzhen Hospital, Capital Medical University, Beijing, China
| | - Yujia Bao
- Computer Science & Artificial Intelligence, Massachusetts Institute of Technology, Boston, MA
| | - Preeti Singh
- Division of Surgical Oncology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
| | - Jin Wang
- Division of Surgical Oncology, Massachusetts General Hospital and Harvard Medical School, Boston, MA.,Department of Breast Oncology, Sun Yat-sen University Cancer Center, State Key Laboratory of Oncology in South China, Collaborative Innovation Center of Cancer Medicine, Guangzhou, China
| | - Kanhua Yin
- Division of Surgical Oncology, Massachusetts General Hospital and Harvard Medical School, Boston, MA
| |
Collapse
|
31
|
Duvall M, North F, Leasure W, Pecina J. Patient portal message characteristics and reported thoughts of self-harm and suicide: A retrospective cohort study. J Telemed Telecare 2019; 27:501-508. [PMID: 31726902 DOI: 10.1177/1357633x19887262] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
INTRODUCTION As use of electronic portal communication with healthcare teams increases, processes that effectively recognize messages that contain critical information are needed. This study aims to evaluate whether certain language and other characteristics of patient portal messages are associated with expressions of self-harm and suicidal ideation. METHODS Using patient portal messages sent between 1 January 2013 and 30 June 2017, we searched for words and letter combinations 'suicid' (to identify words suicide and suicidal), 'depress' (for depression, depressed, depressing), 'harm himself' (or 'herself 'or 'myself'), 'hurt himself' ('herself' or 'myself'), 'kill', 'shoot', 'cutting', 'knife', 'gun', 'overdose', 'over dose' and 'jump'. RESULTS Of 831,009 messages, 11,174 messages contained one or more search terms. We manually reviewed 7,736 messages for content expressing self-harm or suicidality. Of the reviewed messages, 3.2% indicated thoughts of self-harm or suicide and 2.2% of messages suggested active suicidality. Of those expressing any thoughts of self-harm or suicide, 13.4% mentioned a specific plan, 20% were passively suicidal. Messages indicating thoughts of self-harm and suicide were more common in patients who were unmarried, non-white and younger than 18 years. Factors significantly associated with thoughts of self-harm were messages addressed to psychiatry or containing the letter combinations 'suicide', 'die', 'depress' and 'harm/hurt my/her/himself'. DISCUSSION Certain letter combinations and patient portal message characteristics may be associated with expressions of self-harm and suicide. These factors should be considered as we develop systems of effectively screening patient portal messages for critical clinical information.
Collapse
Affiliation(s)
- Michelle Duvall
- Department of Family Medicine, Mayo Clinic, Rochester, MN, USA
| | - Frederick North
- Department of Community Internal Medicine, Mayo Clinic, Rochester, MN, USA
| | - William Leasure
- Department of Psychiatry & Psychology, Mayo Clinic, Rochester, MN, USA
| | - Jennifer Pecina
- Department of Family Medicine, Mayo Clinic, Rochester, MN, USA
| |
Collapse
|
32
|
Yin Z, Sulieman LM, Malin BA. A systematic literature review of machine learning in online personal health data. J Am Med Inform Assoc 2019; 26:561-576. [PMID: 30908576 PMCID: PMC7647332 DOI: 10.1093/jamia/ocz009] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2018] [Revised: 01/06/2019] [Accepted: 01/11/2019] [Indexed: 02/07/2023] Open
Abstract
OBJECTIVE User-generated content (UGC) in online environments provides opportunities to learn an individual's health status outside of clinical settings. However, the nature of UGC brings challenges in both data collecting and processing. The purpose of this study is to systematically review the effectiveness of applying machine learning (ML) methodologies to UGC for personal health investigations. MATERIALS AND METHODS We searched PubMed, Web of Science, IEEE Library, ACM library, AAAI library, and the ACL anthology. We focused on research articles that were published in English and in peer-reviewed journals or conference proceedings between 2010 and 2018. Publications that applied ML to UGC with a focus on personal health were identified for further systematic review. RESULTS We identified 103 eligible studies which we summarized with respect to 5 research categories, 3 data collection strategies, 3 gold standard dataset creation methods, and 4 types of features applied in ML models. Popular off-the-shelf ML models were logistic regression (n = 22), support vector machines (n = 18), naive Bayes (n = 17), ensemble learning (n = 12), and deep learning (n = 11). The most investigated problems were mental health (n = 39) and cancer (n = 15). Common health-related aspects extracted from UGC were treatment experience, sentiments and emotions, coping strategies, and social support. CONCLUSIONS The systematic review indicated that ML can be effectively applied to UGC in facilitating the description and inference of personal health. Future research needs to focus on mitigating bias introduced when building study cohorts, creating features from free text, improving clinical creditability of UGC, and model interpretability.
Collapse
Affiliation(s)
- Zhijun Yin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Lina M Sulieman
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Bradley A Malin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee, USA
- Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville, Tennessee, USA
| |
Collapse
|
33
|
Leviton A, Oppenheimer J, Chiujdea M, Antonetty A, Ojo OW, Garcia S, Weas S, Fleegler E, Chan E, Loddenkemper T. Characteristics of Future Models of Integrated Outpatient Care. Healthcare (Basel) 2019; 7:healthcare7020065. [PMID: 31035586 PMCID: PMC6627383 DOI: 10.3390/healthcare7020065] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 04/23/2019] [Accepted: 04/24/2019] [Indexed: 01/01/2023] Open
Abstract
Replacement of fee-for-service with capitation arrangements, forces physicians and institutions to minimize health care costs, while maintaining high-quality care. In this report we described how patients and their families (or caregivers) can work with members of the medical care team to achieve these twin goals of maintaining-and perhaps improving-high-quality care and minimizing costs. We described how increased self-management enables patients and their families/caregivers to provide electronic patient-reported outcomes (i.e., symptoms, events) (ePROs), as frequently as the patient or the medical care team consider appropriate. These capabilities also allow ongoing assessments of physiological measurements/phenomena (mHealth). Remote surveillance of these communications allows longer intervals between (fewer) patient visits to the medical-care team, when this is appropriate, or earlier interventions, when it is appropriate. Systems are now available that alert medical care providers to situations when interventions might be needed.
Collapse
Affiliation(s)
- Alan Leviton
- Division of Epilepsy and Clinical Neurophysiology, Department of Neurology, Boston Children's Hospital and Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA.
| | - Julia Oppenheimer
- Division of Epilepsy and Clinical Neurophysiology, Department of Neurology, Boston Children's Hospital and Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA.
| | - Madeline Chiujdea
- Division of Epilepsy and Clinical Neurophysiology, Department of Neurology, Boston Children's Hospital and Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA.
| | - Annalee Antonetty
- Division of Epilepsy and Clinical Neurophysiology, Department of Neurology, Boston Children's Hospital and Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA.
| | - Oluwafemi William Ojo
- Division of Epilepsy and Clinical Neurophysiology, Department of Neurology, Boston Children's Hospital and Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA.
| | - Stephanie Garcia
- Division of Epilepsy and Clinical Neurophysiology, Department of Neurology, Boston Children's Hospital and Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA.
| | - Sarah Weas
- Division of Developmental Medicine, Department of Medicine, Boston Children's Hospital and Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA.
| | - Eric Fleegler
- Division of Emergency Medicine, Department of Medicine, Boston Children's Hospital and Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA.
| | - Eugenia Chan
- Division of Developmental Medicine, Department of Medicine, Boston Children's Hospital and Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA.
| | - Tobias Loddenkemper
- Division of Epilepsy and Clinical Neurophysiology, Department of Neurology, Boston Children's Hospital and Harvard Medical School, 300 Longwood Avenue, Boston, MA 02115, USA.
| |
Collapse
|
34
|
Sheikhalishahi S, Miotto R, Dudley JT, Lavelli A, Rinaldi F, Osmani V. Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review. JMIR Med Inform 2019; 7:e12239. [PMID: 31066697 PMCID: PMC6528438 DOI: 10.2196/12239] [Citation(s) in RCA: 265] [Impact Index Per Article: 44.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/17/2018] [Revised: 03/04/2019] [Accepted: 03/24/2019] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset. OBJECTIVE The goal of the research was to provide a comprehensive overview of the development and uptake of NLP methods applied to free-text clinical notes related to chronic diseases, including the investigation of challenges faced by NLP methodologies in understanding clinical narratives. METHODS Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed and searches were conducted in 5 databases using "clinical notes," "natural language processing," and "chronic disease" and their variations as keywords to maximize coverage of the articles. RESULTS Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using the International Classification of Diseases, 10th Revision. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes. CONCLUSIONS Efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.
Collapse
Affiliation(s)
- Seyedmostafa Sheikhalishahi
- eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
- Department of Information Engineering and Computer Science, University of Trento, Trento, Italy
| | - Riccardo Miotto
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Joel T Dudley
- Institute for Next Generation Healthcare, Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, United States
| | - Alberto Lavelli
- NLP Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
| | - Fabio Rinaldi
- Institute of Computational Linguistics, University of Zurich, Zurich, Switzerland
| | - Venet Osmani
- eHealth Research Group, Fondazione Bruno Kessler Research Institute, Trento, Italy
| |
Collapse
|
35
|
Chen J, Lalor J, Liu W, Druhl E, Granillo E, Vimalananda VG, Yu H. Detecting Hypoglycemia Incidents Reported in Patients' Secure Messages: Using Cost-Sensitive Learning and Oversampling to Reduce Data Imbalance. J Med Internet Res 2019; 21:e11990. [PMID: 30855231 PMCID: PMC6431826 DOI: 10.2196/11990] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2018] [Revised: 01/19/2019] [Accepted: 02/10/2019] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND Improper dosing of medications such as insulin can cause hypoglycemic episodes, which may lead to severe morbidity or even death. Although secure messaging was designed for exchanging nonurgent messages, patients sometimes report hypoglycemia events through secure messaging. Detecting these patient-reported adverse events may help alert clinical teams and enable early corrective actions to improve patient safety. OBJECTIVE We aimed to develop a natural language processing system, called HypoDetect (Hypoglycemia Detector), to automatically identify hypoglycemia incidents reported in patients' secure messages. METHODS An expert in public health annotated 3000 secure message threads between patients with diabetes and US Department of Veterans Affairs clinical teams as containing patient-reported hypoglycemia incidents or not. A physician independently annotated 100 threads randomly selected from this dataset to determine interannotator agreement. We used this dataset to develop and evaluate HypoDetect. HypoDetect incorporates 3 machine learning algorithms widely used for text classification: linear support vector machines, random forest, and logistic regression. We explored different learning features, including new knowledge-driven features. Because only 114 (3.80%) messages were annotated as positive, we investigated cost-sensitive learning and oversampling methods to mitigate the challenge of imbalanced data. RESULTS The interannotator agreement was Cohen kappa=.976. Using cross-validation, logistic regression with cost-sensitive learning achieved the best performance (area under the receiver operating characteristic curve=0.954, sensitivity=0.693, specificity 0.974, F1 score=0.590). Cost-sensitive learning and the ensembled synthetic minority oversampling technique improved the sensitivity of the baseline systems substantially (by 0.123 to 0.728 absolute gains). Our results show that a variety of features contributed to the best performance of HypoDetect. CONCLUSIONS Despite the challenge of data imbalance, HypoDetect achieved promising results for the task of detecting hypoglycemia incidents from secure messages. The system has a great potential to facilitate early detection and treatment of hypoglycemia.
Collapse
Affiliation(s)
- Jinying Chen
- Department of Population and Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, United States
- Bedford Veterans Affairs Medical Center, Center for Healthcare Organization and Implementation Research, Bedford, MA, United States
| | - John Lalor
- Bedford Veterans Affairs Medical Center, Center for Healthcare Organization and Implementation Research, Bedford, MA, United States
- College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA, United States
| | - Weisong Liu
- Bedford Veterans Affairs Medical Center, Center for Healthcare Organization and Implementation Research, Bedford, MA, United States
- Department of Computer Science, University of Massachusetts Lowell, Lowell, MA, United States
| | - Emily Druhl
- Bedford Veterans Affairs Medical Center, Center for Healthcare Organization and Implementation Research, Bedford, MA, United States
| | - Edgard Granillo
- Department of Population and Quantitative Health Sciences, University of Massachusetts Medical School, Worcester, MA, United States
- Bedford Veterans Affairs Medical Center, Center for Healthcare Organization and Implementation Research, Bedford, MA, United States
| | - Varsha G Vimalananda
- Bedford Veterans Affairs Medical Center, Center for Healthcare Organization and Implementation Research, Bedford, MA, United States
- School of Medicine, Boston University, Boston, MA, United States
| | - Hong Yu
- Bedford Veterans Affairs Medical Center, Center for Healthcare Organization and Implementation Research, Bedford, MA, United States
- College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA, United States
- Department of Computer Science, University of Massachusetts Lowell, Lowell, MA, United States
- Department of Medicine, University of Massachusetts Medical School, Worcester, MA, United States
| |
Collapse
|
36
|
A systematic review of natural language processing and text mining of symptoms from electronic patient-authored text data. Int J Med Inform 2019; 125:37-46. [PMID: 30914179 DOI: 10.1016/j.ijmedinf.2019.02.008] [Citation(s) in RCA: 92] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2018] [Revised: 01/13/2019] [Accepted: 02/19/2019] [Indexed: 02/04/2023]
Abstract
OBJECTIVE In this systematic review, we aim to synthesize the literature on the use of natural language processing (NLP) and text mining as they apply to symptom extraction and processing in electronic patient-authored text (ePAT). MATERIALS AND METHODS A comprehensive literature search of 1964 articles from PubMed and EMBASE was narrowed to 21 eligible articles. Data related to purpose, text source, number of users and/or posts, evaluation metrics, and quality indicators were recorded. RESULTS Pain (n = 18) and fatigue and sleep disturbance (n = 18) were the most frequently evaluated symptom clinical content categories. Studies accessed ePAT from sources such as Twitter and online community forums or patient portals focused on diseases, including diabetes, cancer, and depression. Fifteen studies used NLP as a primary methodology. Studies reported evaluation metrics including the precision, recall, and F-measure for symptom-specific research questions. DISCUSSION NLP and text mining have been used to extract and analyze patient-authored symptom data in a wide variety of online communities. Though there are computational challenges with accessing ePAT, the depth of information provided directly from patients offers new horizons for precision medicine, characterization of sub-clinical symptoms, and the creation of personal health libraries as outlined by the National Library of Medicine. CONCLUSION Future research should consider the needs of patients expressed through ePAT and its relevance to symptom science. Understanding the role that ePAT plays in health communication and real-time assessment of symptoms, through the use of NLP and text mining, is critical to a patient-centered health system.
Collapse
|
37
|
Seale DE, LeRouge CM, Ohs JE, Tao D, Lach HW, Jupka K, Wray R. Exploring Early Adopter Baby Boomers' Approach to Managing Their Health and Healthcare. INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS 2019. [DOI: 10.4018/ijehmc.2019010106] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The Patient 3.0 Profile is used to explore to the patient engagement strategies of early adopter baby boomers' in three domains: 1) patient relationships, 2) health information use and 3) consumer health technology (CHT) use. Findings from six focus groups with early adopter boomers challenge prior notions about older adults' passive approach to patient engagement. Baby boomers want to make final healthcare decisions with input from providers. While adept at finding and critically assessing online health information for self-education and self-management, boomers want providers to curate relevant and trustworthy information. Boomers embrace CHTs offered through providers (i.e., patient portals, email and text messaging) and sponsored by wellness programs (i.e., diet and activity devices and apps). However, there is no indication they add information to their online medical records or use CHT for diagnosis, treatment or disease management. Additional resources are needed to encourage widespread adoption, support patient effectiveness, and confirm cost-benefit.
Collapse
Affiliation(s)
| | | | | | | | - Helen W. Lach
- Saint Louis University, School of Nursing, Saint Louis, USA
| | - Keri Jupka
- National Center for Parents as Teachers, Saint Louis, USA
| | | |
Collapse
|
38
|
Liang WH, Madan-Swain A, Cronin RM, Jackson GP. Development of a Technology-Supported, Lay Peer-to-Peer Family Engagement Consultation Service in a Pediatric Hospital. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2018:730-739. [PMID: 30815115 PMCID: PMC6371240] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Patient and caregiver engagement in making decisions and taking actions to promote health are critically important for improving outcomes, enhancing healthcare experience satisfaction, and reducing costs. Patients and caregivers have a wealth of expertise in illness self-management and can aid others in attaining high levels of activation through peer-to-peer social support. We describe the development of a technology-supported, family engagement consultation service at Children's of Alabama that integrates parent volunteers as front-line, peer-to-peer support consultants with a multidisciplinary team of informatics professionals in the pediatric hospital setting. This service was adapted from an existing engagement consultation service with a traditional medical consultation model at Vanderbilt Children's Hospital. The unique features of the new model are articulated, along with plans for a shared knowledge database of consumer health resources to meet needs. The layperson peer-to-peer design is highly innovative and relevant as healthcare transitions towards increasingly participatory and personalized medicine.
Collapse
Affiliation(s)
- Wayne H Liang
- University of Alabama at Birmingham, Birmingham, Alabama
| | | | | | | |
Collapse
|
39
|
Lee DJ, Cronin R, Robinson J, Anders S, Unertl K, Kelly K, Hankins H, Skeens R, Jackson GP. Common Consumer Health-Related Needs in the Pediatric Hospital Setting: Lessons from an Engagement Consultation Service. Appl Clin Inform 2018; 9:595-603. [PMID: 30089333 PMCID: PMC6082659 DOI: 10.1055/s-0038-1667205] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2018] [Accepted: 06/16/2018] [Indexed: 10/28/2022] Open
Abstract
BACKGROUND Informed and engaged parents may influence outcomes for childhood illness. Understanding the needs of the caregivers of pediatric patients is a critical first step in promoting engagement in their child's care. In 2014, we developed an Engagement Consultation Service at the Monroe Carell Jr. Children's Hospital at Vanderbilt. This service determines the health-related needs of the caregivers of hospitalized children and makes educational or technology recommendations to meet those needs and support engagement. OBJECTIVES This report describes the most common health-related needs identified in the caregivers of hospitalized pediatric patients and details the recommended interventions to meet those needs. METHODS The most commonly reported consumer health-related needs from our 3-year experience with the Engagement Consultation Service were extracted from consultations notes. Each need was classified by semantic type using a taxonomy of consumer health needs. Typical recommendations for each need and their administration were detailed. RESULTS The most frequently recognized needs involved communicating with health care providers after discharge, using medical devices, distinguishing between benign and concerning symptoms, knowing what questions to ask providers and remembering them, finding trustworthy sources of information online, understanding disease prognosis, and getting emotional support. A variety of apps, Web sites, printed materials, and online groups were recommended. CONCLUSION The parents of hospitalized patients share several common health-related needs that can be addressed with educational and technology interventions. An inpatient Engagement Consultation Service provides a generalizable framework for identifying health-related needs and delivers tools to meet those needs and promote engagement during and after hospitalizations.
Collapse
Affiliation(s)
- Daniel J. Lee
- Department of Urology, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
| | - Robert Cronin
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
- Department of Internal Medicine, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
- Department of Pediatrics, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
| | - Jamie Robinson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
- Department of Surgery, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
| | - Shilo Anders
- Department of Anesthesiology, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
| | - Kim Unertl
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
| | - Katherine Kelly
- Department of Pediatrics, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
| | - Heather Hankins
- Surgical Outcomes Center for Kids, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
| | - Ryan Skeens
- Department of Pediatrics, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
| | - Gretchen P. Jackson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
- Department of Pediatrics, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
- Department of Pediatric Surgery, Vanderbilt University Medical Center, Vanderbilt University, Nashville, Tennessee, United States
| |
Collapse
|
40
|
Robinson JR, Anders SH, Novak LL, Simpson CL, Holroyd LE, Bennett KA, Jackson GP. Consumer health-related needs of pregnant women and their caregivers. JAMIA Open 2018; 1:57-66. [PMID: 30474071 PMCID: PMC6241505 DOI: 10.1093/jamiaopen/ooy018] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2017] [Revised: 05/02/2018] [Accepted: 06/04/2018] [Indexed: 11/14/2022] Open
Abstract
Objectives To build effective applications, technology designers must understand consumer health needs. Pregnancy is a common health condition, and expectant families have unanswered questions. This study examined consumer health-related needs in pregnant women and caregivers and determined the types of needs that were not met. Materials and Methods We enrolled pregnant women <36 weeks’ gestational age and caregivers from advanced maternal–fetal and group prenatal care settings. Participant characteristics were collected through surveys, and health-related needs were elicited in semi-structured interviews. Researchers categorized needs by semantic type and whether they were met (ie, met, partially met, or unmet). Inter-rater reliability was measured by Cohen’s kappa. Results Seventy-one pregnant women and 29 caregivers participated and reported 1054 needs, 28% unmet, and 49% partially met. Need types were 66.2% informational, 15.9% logistical, 8.9% social, 8.6% medical, and 0.3% other. Inter-rater reliability was near perfect (κ=0.95, P < 0.001). Discussion Common topics of unmet needs were prognosis, life management, and need for emotional support. For pregnant women, these unmet needs focused around being healthy, childbirth, infant care, and being a good mother; caregivers’ needs involved caring for the mother, the natural course of pregnancy, and life after pregnancy. Conclusion Pregnant women and caregivers have a rich set of health-related needs with many not fully met. Caregivers’ needs differed from those of pregnant women and may not be adequately addressed by resources designed for mothers. Many unmet needs involved stress and life management. Knowledge about consumer health needs can inform the design of better technologies for pregnancy.
Collapse
Affiliation(s)
- Jamie R Robinson
- Department of Surgery, Vanderbilt University Medical Center, 1161 21st Ave S, CCC-4312 MCN, Nashville, Tennessee 37232-2730, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End, Suite 14107, Nashville, Tennessee 37203, USA
- Corresponding Author: Jamie R. Robinson, MD, MS, Department of Surgery, Vanderbilt University Medical Center, 1161 21st Ave S, CCC-4312 MCN, Nashville, TN 37232-2730, USA ()
| | - Shilo H Anders
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End, Suite 14107, Nashville, Tennessee 37203, USA
- Department of Anesthesiology, Vanderbilt University Medical Center, 1211 Medical Center Drive, Nashville, Tennessee 37232, USA
| | - Laurie L Novak
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End, Suite 14107, Nashville, Tennessee 37203, USA
| | - Christopher L Simpson
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End, Suite 14107, Nashville, Tennessee 37203, USA
| | - Lauren E Holroyd
- School of Medicine, Vanderbilt University, 2215 Garland Avenue, Light Hall, Nashville, Tennessee 37203, USA
| | - Kelly A Bennett
- Department of Obstetrics and Gynecology, 1211 Medical Center Drive, Nashville, Tennessee 37232, USA
| | - Gretchen P Jackson
- Department of Surgery, Vanderbilt University Medical Center, 1161 21st Ave S, CCC-4312 MCN, Nashville, Tennessee 37232-2730, USA
- Department of Biomedical Informatics, Vanderbilt University Medical Center, 2525 West End, Suite 14107, Nashville, Tennessee 37203, USA
- Department of Pediatrics, 1211 Medical Center Drive, Nashville, Tennessee 37232, USA
| |
Collapse
|
41
|
Abstract
BACKGROUND Health question-answering (QA) systems have become a typical application scenario of Artificial Intelligent (AI). An annotated question corpus is prerequisite for training machines to understand health information needs of users. Thus, we aimed to develop an annotated classification corpus of Chinese health questions (Qcorp) and make it openly accessible. METHODS We developed a two-layered classification schema and corresponding annotation rules on basis of our previous work. Using the schema, we annotated 5000 questions that were randomly selected from 5 Chinese health websites within 6 broad sections. 8 annotators participated in the annotation task, and the inter-annotator agreement was evaluated to ensure the corpus quality. Furthermore, the distribution and relationship of the annotated tags were measured by descriptive statistics and social network map. RESULTS The questions were annotated using 7101 tags that covers 29 topic categories in the two-layered schema. In our released corpus, the distribution of questions on the top-layered categories was treatment of 64.22%, diagnosis of 37.14%, epidemiology of 14.96%, healthy lifestyle of 10.38%, and health provider choice of 4.54% respectively. Both the annotated health questions and annotation schema were openly accessible on the Qcorp website. Users can download the annotated Chinese questions in CSV, XML, and HTML format. CONCLUSIONS We developed a Chinese health question corpus including 5000 manually annotated questions. It is openly accessible and would contribute to the intelligent health QA system development.
Collapse
Affiliation(s)
- Haihong Guo
- Institute of Medical Information / Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Xu Na
- Institute of Medical Information / Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China
| | - Jiao Li
- Institute of Medical Information / Medical Library, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing, China.
| |
Collapse
|
42
|
Sulieman L, Gilmore D, French C, Cronin RM, Jackson GP, Russell M, Fabbri D. Classifying patient portal messages using Convolutional Neural Networks. J Biomed Inform 2017; 74:59-70. [DOI: 10.1016/j.jbi.2017.08.014] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2017] [Revised: 08/02/2017] [Accepted: 08/28/2017] [Indexed: 12/31/2022]
|