1
|
Li T, Ke X, Shi H. Topic modeling and evolutionary trends of China's language policy: A LDA-ARIMA approach. PLoS One 2025; 20:e0324644. [PMID: 40435369 PMCID: PMC12119018 DOI: 10.1371/journal.pone.0324644] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2024] [Accepted: 04/27/2025] [Indexed: 06/01/2025] Open
Abstract
BACKGROUND Language policy serves as an essential tool for governments to guide and regulate language development. However, China's current language policy faces challenges like outdated analytical methods, inefficiencies caused by policy misalignment, and the absence of predictive frameworks. This study provides a comprehensive overview of China's language policy by identifying key topics and predicting future trends. METHODS We employ the Latent Dirichlet Allocation topic model and Autoregressive Integrated Moving Average model systematically analyze and predict the evolution of China's language policy. By gathering a large-scale textual data of 1,420 policy texts from 2001-2023 on official websites, we achieve both topic extraction and evolution prediction. RESULTS This study reveals that: (1) Language life, language education, and language resources have high popularity indexes, and language education and language planning exhibit high expected values. (2) The theme intensity of most topics has been a significant upward trend since 2014, with significant fluctuations during T1-T2. (3) From 2001 to 2023, the actual and fitted values show an overall positive trend. In 2024-2028, the predicted value of language resources stabilizes after a brief decline in 2024, while other topics show upward trends. CONCLUSIONS This study extracts 1,420 policy texts from official websites and outlines the following findings: (1) Language policies focus on maintaining a harmonious linguistic environment, addressing educational inequality, and protecting language resources. (2) Since 2014, most topics have exhibited fluctuating yet sustained growth trend, particularly in language education and research. (3) Except for language resources, the predicted values of the remaining six topics will show a growing trend from 2024 to 2028. Based on these findings, we propose policy recommendations such as strengthening language research, developing a multilingual education system, and optimizing language resource management.
Collapse
Affiliation(s)
- Tianxin Li
- Department of Literature, Shaanxi Normal University, Xi’an, Shaanxi, China
| | - Xigang Ke
- Department of Literature, Shaanxi Normal University, Xi’an, Shaanxi, China
| | - Hui Shi
- Department of Literature, Nanjing Normal University, Nanjing, Jiangsu, China.
| |
Collapse
|
2
|
Venkata Krishna Reddy M, Raghavendar Raju L, Sai Prasad K, Kumari DDA, Veerabhadram V, Yamsani N. Enhanced effective convolutional attention network with squeeze-and-excitation inception module for multi-label clinical document classification. Sci Rep 2025; 15:16988. [PMID: 40379823 PMCID: PMC12084642 DOI: 10.1038/s41598-025-98719-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2024] [Accepted: 04/14/2025] [Indexed: 05/19/2025] Open
Abstract
Clinical Document Classification (CDC) is crucial in healthcare for organizing and categorizing large volumes of medical information, leading to improved patient care, streamlined research, and enhanced administrative efficiency. With the advancement of artificial intelligence, automatic CDC is now achievable through deep learning techniques. While existing research has shown promising results, more effective and accurate classification of long clinical documents is still desired. To address this, we propose a new model called the Enhanced Effective Convolutional Attention Network (EECAN), which incorporates a Squeeze-and-Excitation (SE) Inception module to improve feature representation by adaptively recalibrating channel-wise feature responses. This architecture introduces an Encoder and Attention-Based Clinical Document Classification (EAB-CDC) strategy, which utilizes sum-pooling and multi-layer attention mechanisms to extract salient features from clinical document representations. This study proposes EECAN (Enhanced Effective Convolutional Attention Network) as the overall model architecture and EAB-CDC (Encoder and Attention-Based Clinical Document Classification) as a core strategy conducted in EECAN. EAB-CDC is not a standalone model but a functional part applied to the architecture for discriminative feature extraction by sum-pooling and multi-layer attention mechanisms. With this integrated design, EECAN can transform multi-label clinical texts' general and label-specific contexts without losing information. Our empirical study, conducted on benchmark datasets such as MIMIC-III and MIMIC-III-50, demonstrates that the proposed EECAN model outperforms several existing deep learning approaches, achieving AUC scores of 99.70% and 99.80% using sum-pooling and multi-layer attention, respectively. These results highlight the model's substantial potential for integration into clinical systems, such as Electronic Health Record (EHR) platforms, for the automated classification of clinical texts and improved healthcare decision-making support.
Collapse
Affiliation(s)
- M Venkata Krishna Reddy
- Department of Computer Science and Engineering, Chaitanya Bharathi Institute of Technology (Autonomous), Gandipet, Hyderabad, India.
| | - L Raghavendar Raju
- Department of Computer Science and Engineering, Matrusri Engineering College, Hyderabad, India
| | - Kashi Sai Prasad
- Department of CSE-AI&ML, , MLR Institute of Technology, Hyderabad, India
| | - Dr D Anitha Kumari
- Professor, Department of CSM, TKR College of Engineering and Technology, Hyderabad, India
| | | | - Nagendar Yamsani
- School of Computer Science and Artificial Intelligence, SR University, Warangal, India
| |
Collapse
|
3
|
Xiao N, Huang X, Wu Y, Li B, Zang W, Shinwari K, Tuzankina IA, Chereshnev VA, Liu G. Opportunities and challenges with artificial intelligence in allergy and immunology: a bibliometric study. Front Med (Lausanne) 2025; 12:1523902. [PMID: 40270494 PMCID: PMC12014590 DOI: 10.3389/fmed.2025.1523902] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2024] [Accepted: 03/27/2025] [Indexed: 04/25/2025] Open
Abstract
Introduction The fields of allergy and immunology are increasingly recognizing the transformative potential of artificial intelligence (AI). Its adoption is reshaping research directions, clinical practices, and healthcare systems. However, a systematic overview identifying current statuses, emerging trends, and future research hotspots is lacking. Methods This study applied bibliometric analysis methods to systematically evaluate the global research landscape of AI applications in allergy and immunology. Data from 3,883 articles published by 21,552 authors across 1,247 journals were collected and analyzed to identify leading contributors, prevalent research themes, and collaboration patterns. Results Analysis revealed that the USA and China are currently leading in research output and scientific impact in this domain. AI methodologies, especially machine learning (ML) and deep learning (DL), are predominantly applied in drug discovery and development, disease classification and prediction, immune response modeling, clinical decision support, diagnostics, healthcare system digitalization, and medical education. Emerging trends indicate significant movement toward personalized medical systems integration. Discussion The findings demonstrate the dynamic evolution of AI in allergy and immunology, highlighting the broadening scope from basic diagnostics to comprehensive personalized healthcare systems. Despite advancements, critical challenges persist, including technological limitations, ethical concerns, and regulatory frameworks that could potentially hinder further implementation and integration. Conclusion AI holds considerable promise for advancing allergy and immunology globally by enhancing healthcare precision, efficiency, and accessibility. Addressing existing technological, ethical, and regulatory challenges will be crucial to fully realizing its potential, ultimately improving global health outcomes and patient well-being.
Collapse
Affiliation(s)
- Ningkun Xiao
- Department of Immunochemistry, Institution of Chemical Engineering, Ural Federal University, Yekaterinburg, Russia
- Laboratory for Brain and Neurocognitive Development, Department of Psychology, Institution of Humanities, Ural Federal University, Yekaterinburg, Russia
| | - Xinlin Huang
- Laboratory for Brain and Neurocognitive Development, Department of Psychology, Institution of Humanities, Ural Federal University, Yekaterinburg, Russia
| | - Yujun Wu
- Preventive Medicine and Software Engineering, West China School of Public Health, Sichuan University, Chengdu, China
| | - Baoheng Li
- Engineering School of Information Technologies, Telecommunications and Control Systems, Ural Federal University, Yekaterinburg, Russia
| | - Wanli Zang
- Postgraduate School, University of Harbin Sport, Harbin, China
| | - Khyber Shinwari
- Laboratório de Biologia Molecular de Microrganismos, Universidade São Francisco, Bragança Paulista, Brazil
- Department of Biology, Nangrahar University, Nangrahar, Afghanistan
| | - Irina A. Tuzankina
- Institute of Immunology and Physiology of the Ural Branch of the Russian Academy of Sciences, Yekaterinburg, Russia
| | - Valery A. Chereshnev
- Department of Immunochemistry, Institution of Chemical Engineering, Ural Federal University, Yekaterinburg, Russia
- Institute of Immunology and Physiology of the Ural Branch of the Russian Academy of Sciences, Yekaterinburg, Russia
| | - Guojun Liu
- School of Life Science and Technology, Inner Mongolia University of Science and Technology, Baotou, China
| |
Collapse
|
4
|
Azarpey A, Thomas J, Ring D, Franko O. Natural Language Processing of Sentiments Identified in Patient Comments Associated with Less Than Top-Rated Care. J Patient Exp 2025; 12:23743735251323677. [PMID: 40125346 PMCID: PMC11930458 DOI: 10.1177/23743735251323677] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/25/2025] Open
Abstract
Background: Natural language processing (NLP) analysis of patient comments about their care can inform improvement initiatives. Objective: We used NLP to quantify sentiments and identify topics in patient comments associated with submaximal ratings of experience. Methods: Using a set of 1117 patient comments associated with ratings 1-4 out of 5 from a commercial source, we analyzed associated sentiments measured by Linguistic Inquiry and Word Count software and associated themes using topic modeling. Results: In the sentiment analysis, positive sentiments were associated with better numerical ratings while word count, numbers, ethnicity, and negative tones were associated with lower ratings. Topics of "listening, concern, and collaboration" were associated with 1-star ratings and "logistics" and "pain" with 4-star ratings. Conclusion: The finding that NLP analysis of comments from submaximal patient ratings of experience is consistent with evidence that the worst ratings are associated with relationship issues and more moderate ratings are associated with process issues affirms the ability of NLP to analyze large amounts of patient comments to identify opportunities to improve patient experience of care.
Collapse
Affiliation(s)
- Ali Azarpey
- Department of Surgery and Perioperative Care, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - Jacob Thomas
- Department of Surgery and Perioperative Care, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - David Ring
- Department of Surgery and Perioperative Care, Dell Medical School, The University of Texas at Austin, Austin, TX, USA
| | - Orrin Franko
- East Bay Hand Medical Center, San Leandro, CA, USA
| |
Collapse
|
5
|
Yanagisawa Y, Watabe S, Yokoyama S, Sayama K, Kizaki H, Tsuchiya M, Imai S, Someya M, Taniguchi R, Yada S, Aramaki E, Hori S. Identifying Adverse Events in Outpatients With Prostate Cancer Using Pharmaceutical Care Records in Community Pharmacies: Application of Named Entity Recognition. JMIR Cancer 2025; 11:e69663. [PMID: 40068144 PMCID: PMC11937706 DOI: 10.2196/69663] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2024] [Revised: 01/27/2025] [Accepted: 02/20/2025] [Indexed: 03/29/2025] Open
Abstract
BACKGROUND Androgen receptor axis-targeting reagents (ARATs) have become key drugs for patients with castration-resistant prostate cancer (CRPC). ARATs are taken long term in outpatient settings, and effective adverse event (AE) monitoring can help prolong treatment duration for patients with CRPC. Despite the importance of monitoring, few studies have identified which AEs can be captured and assessed in community pharmacies, where pharmacists in Japan dispense medications, provide counseling, and monitor potential AEs for outpatients prescribed ARATs. Therefore, we anticipated that a named entity recognition (NER) system might be used to extract AEs recorded in pharmaceutical care records generated by community pharmacists. OBJECTIVE This study aimed to evaluate whether an NER system can effectively and systematically identify AEs in outpatients undergoing ARAT therapy by reviewing pharmaceutical care records generated by community pharmacists, focusing on assessment notes, which often contain detailed records of AEs. Additionally, the study sought to determine whether outpatient pharmacotherapy monitoring can be enhanced by using NER to systematically collect AEs from pharmaceutical care records. METHODS We used an NER system based on the widely used Japanese medical term extraction system MedNER-CR-JA, which uses Bidirectional Encoder Representations from Transformers (BERT). To evaluate its performance for pharmaceutical care records by community pharmacists, the NER system was first applied to 1008 assessment notes in records related to anticancer drug prescriptions. Three pharmaceutically proficient researchers compared the results with the annotated notes assigned symptom tags according to annotation guidelines and evaluated the performance of the NER system on the assessment notes in the pharmaceutical care records. The system was then applied to 2193 assessment notes for patients prescribed ARATs. RESULTS The F1-score for exact matches of all symptom tags between the NER system and annotators was 0.72, confirming the NER system has sufficient performance for application to pharmaceutical care records. The NER system automatically assigned 1900 symptom tags for the 2193 assessment notes from patients prescribed ARATs; 623 tags (32.8%) were positive symptom tags (symptoms present), while 1067 tags (56.2%) were negative symptom tags (symptoms absent). Positive symptom tags included ARAT-related AEs such as "pain," "skin disorders," "fatigue," and "gastrointestinal symptoms." Many other symptoms were classified as serious AEs. Furthermore, differences in symptom tag profiles reflecting pharmacists' AE monitoring were observed between androgen synthesis inhibition and androgen receptor signaling inhibition. CONCLUSIONS The NER system successfully extracted AEs from pharmaceutical care records of patients prescribed ARATs, demonstrating its potential to systematically track the presence and absence of AEs in outpatients. Based on the analysis of a large volume of pharmaceutical medical records using the NER system, community pharmacists not only detect potential AEs but also actively monitor the absence of severe AEs, offering valuable insights for the continuous improvement of patient safety management.
Collapse
Affiliation(s)
- Yuki Yanagisawa
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Satoshi Watabe
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Sakura Yokoyama
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Kyoko Sayama
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Hayato Kizaki
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Masami Tsuchiya
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Shungo Imai
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | | | | | - Shuntaro Yada
- Nara Institute of Science and Technology, Nara, Japan
- Faculty of Library, Information and Media Science, University of Tsukuba, Tsukuba, Japan
| | - Eiji Aramaki
- Nara Institute of Science and Technology, Nara, Japan
| | - Satoko Hori
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| |
Collapse
|
6
|
Shi YV, Komiak S. Unveiling patient-centric interactions in virtual consultation: A comprehensive text mining approach. Health Informatics J 2025; 31:14604582251327093. [PMID: 40098374 DOI: 10.1177/14604582251327093] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/19/2025]
Abstract
This study aims to explore patient perceptions and interactions with virtual consultation (VC) systems to understand the factors influencing their adoption and satisfaction. We analyzed 21,839 patient reviews from four major virtual consultation platforms-MDLive, Doctor on Demand, Maple, and HealthTap-collected from publicly accessible sources. Sentiment analysis, word frequency analysis, topic modeling using Latent Dirichlet Allocation (LDA), and association rule mining were used to extract insights. The findings reveal a generally positive sentiment among patients, with recurring themes focusing on app functionality and the important role of doctors in the virtual consultation experience. Virtual consultation systems were found to play a dual role: as a communicator during initial interactions and as a medium facilitating patient-doctor communication. The analysis also identified key doctor-related factors, categorized by the Theory of Planned Behavior, including attitudes (e.g., empathy), subjective norms (e.g., cultural competence), and perceived behavioral control (e.g., time management). The study provides valuable insights for enhancing healthcare system design and improving virtual consultation quality. However, limitations include potential bias in patient reviews, limited platform focus, and the lack of demographic data. Future research should explore advanced machine learning techniques and investigate relationships between different factors to improve virtual healthcare.
Collapse
Affiliation(s)
- Yuxi Vania Shi
- Sobey School of Business, Saint Mary's University, Halifax, NS, Canada
| | - Sherrie Komiak
- Faculty of Business Administration, Memorial University of Newfoundland, St.John's, NL, Canada
| |
Collapse
|
7
|
Stanhope V, Yoo N, Matthews E, Baslock D, Hu Y. The Impact of Collaborative Documentation on Person-Centered Care: Textual Analysis of Clinical Notes. JMIR Med Inform 2024; 12:e52678. [PMID: 39302636 PMCID: PMC11429664 DOI: 10.2196/52678] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2023] [Revised: 06/07/2024] [Accepted: 06/26/2024] [Indexed: 09/22/2024] Open
Abstract
Background Collaborative documentation (CD) is a behavioral health practice involving shared writing of clinic visit notes by providers and consumers. Despite widespread dissemination of CD, research on its effectiveness or impact on person-centered care (PCC) has been limited. Principles of PCC planning, a recovery-based approach to service planning that operationalizes PCC, can inform the measurement of person-centeredness within clinical documentation. Objective This study aims to use the clinical informatics approach of natural language processing (NLP) to examine the impact of CD on person-centeredness in clinic visit notes. Using a dictionary-based approach, this study conducts a textual analysis of clinic notes from a community mental health center before and after staff were trained in CD. Methods This study used visit notes (n=1981) from 10 providers in a community mental health center 6 months before and after training in CD. LIWC-22 was used to assess all notes using the Linguistic Inquiry and Word Count (LIWC) dictionary, which categorizes over 5000 linguistic and psychological words. Twelve LIWC categories were selected and mapped onto PCC planning principles through the consensus of 3 domain experts. The LIWC-22 contextualizer was used to extract sentence fragments from notes corresponding to LIWC categories. Then, fixed-effects modeling was used to identify differences in notes before and after CD training while accounting for nesting within the provider. Results Sentence fragments identified by the contextualizing process illustrated how visit notes demonstrated PCC. The fixed effects analysis found a significant positive shift toward person-centeredness; this was observed in 6 of the selected LIWC categories post CD. Specifically, there was a notable increase in words associated with achievement (β=.774, P<.001), power (β=.831, P<.001), money (β=.204, P<.001), physical health (β=.427, P=.03), while leisure words decreased (β=-.166, P=.002). Conclusions By using a dictionary-based approach, the study identified how CD might influence the integration of PCC principles within clinical notes. Although the results were mixed, the findings highlight the potential effectiveness of CD in enhancing person-centeredness in clinic notes. By leveraging NLP techniques, this research illuminated the value of narrative clinical notes in assessing the quality of care in behavioral health contexts. These findings underscore the promise of NLP for quality assurance in health care settings and emphasize the need for refining algorithms to more accurately measure PCC.
Collapse
Affiliation(s)
- Victoria Stanhope
- Silver School of Social Work, New York University, 1 Washington Square N, New York, NY, 10003, United States, 1 3016931203
| | - Nari Yoo
- Silver School of Social Work, New York University, 1 Washington Square N, New York, NY, 10003, United States, 1 3016931203
| | - Elizabeth Matthews
- Graduate School of Service, Fordham University, New York, NY, United States
| | - Daniel Baslock
- School of Social Work, Virginia Commonwealth University, Richmond, VA, United States
| | - Yuanyuan Hu
- School of Social Work, University of Minnesota, St Paul, MN, United States
| |
Collapse
|
8
|
Watabe S, Watanabe T, Yada S, Aramaki E, Yajima H, Kizaki H, Hori S. Exploring a method for extracting concerns of multiple breast cancer patients in the domain of patient narratives using BERT and its optimization by domain adaptation using masked language modeling. PLoS One 2024; 19:e0305496. [PMID: 39241041 PMCID: PMC11379386 DOI: 10.1371/journal.pone.0305496] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 05/30/2024] [Indexed: 09/08/2024] Open
Abstract
Narratives posted on the internet by patients contain a vast amount of information about various concerns. This study aimed to extract multiple concerns from interviews with breast cancer patients using the natural language processing (NLP) model bidirectional encoder representations from transformers (BERT). A total of 508 interview transcriptions of breast cancer patients written in Japanese were labeled with five types of concern labels: "treatment," "physical," "psychological," "work/financial," and "family/friends." The labeled texts were used to create a multi-label classifier by fine-tuning a pre-trained BERT model. Prior to fine-tuning, we also created several classifiers with domain adaptation using (1) breast cancer patients' blog articles and (2) breast cancer patients' interview transcriptions. The performance of the classifiers was evaluated in terms of precision through 5-fold cross-validation. The multi-label classifiers with only fine-tuning had precision values of over 0.80 for "physical" and "work/financial" out of the five concerns. On the other hand, precision for "treatment" was low at approximately 0.25. However, for the classifiers using domain adaptation, the precision of this label took a range of 0.40-0.51, with some cases improving by more than 0.2. This study showed combining domain adaptation with a multi-label classifier on target data made it possible to efficiently extract multiple concerns from interviews.
Collapse
Affiliation(s)
- Satoshi Watabe
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Tomomi Watanabe
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Shuntaro Yada
- Nara Institute of Science and Technology, Nara, Japan
| | - Eiji Aramaki
- Nara Institute of Science and Technology, Nara, Japan
| | | | - Hayato Kizaki
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Satoko Hori
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| |
Collapse
|
9
|
Levis M, Levy J, Dimambro M, Dufort V, Ludmer DJ, Goldberg M, Shiner B. Using natural language processing to evaluate temporal patterns in suicide risk variation among high-risk Veterans. Psychiatry Res 2024; 339:116097. [PMID: 39083961 PMCID: PMC11488589 DOI: 10.1016/j.psychres.2024.116097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/11/2023] [Revised: 06/24/2024] [Accepted: 07/21/2024] [Indexed: 08/02/2024]
Abstract
Measuring suicide risk fluctuation remains difficult, especially for high-suicide risk patients. Our study addressed this issue by leveraging Dynamic Topic Modeling, a natural language processing method that evaluates topic changes over time, to analyze high-suicide risk Veterans Affairs patients' unstructured electronic health records. Our sample included all high-risk patients that died (cases) or did not (controls) by suicide in 2017 and 2018. Cases and controls shared the same risk, location, and treatment intervals and received nine months of mental health care during the year before the relevant end date. Each case was matched with five controls. We analyzed case records from diagnosis until death and control records from diagnosis until matched case's death date. Our final sample included 218 cases and 943 controls. We analyzed the corpus using a Python-based Dynamic Topic Modeling algorithm. We identified five distinct topics, "Medication," "Intervention," "Treatment Goals," "Suicide," and "Treatment Focus." We observed divergent change patterns over time, with pathology-focused care increasing for cases and supportive care increasing for controls. The case topics tended to fluctuate more than the control topics, suggesting the importance of monitoring lability. Our study provides a method for monitoring risk fluctuation and strengthens the groundwork for time-sensitive risk measurement.
Collapse
Affiliation(s)
- Maxwell Levis
- White River Junction VA Medical Center, White River Junction, VT, USA; Geisel School of Medicine at Dartmouth, Hanover, NH, USA.
| | - Joshua Levy
- Geisel School of Medicine at Dartmouth, Hanover, NH, USA
| | - Monica Dimambro
- White River Junction VA Medical Center, White River Junction, VT, USA
| | - Vincent Dufort
- White River Junction VA Medical Center, White River Junction, VT, USA
| | - Dana J Ludmer
- National Institute for the Psychotherapies, New York, NY, USA
| | | | - Brian Shiner
- White River Junction VA Medical Center, White River Junction, VT, USA; Geisel School of Medicine at Dartmouth, Hanover, NH, USA; National Center for PTSD Executive Division, White River Junction, VT, USA
| |
Collapse
|
10
|
Rudroff T, Rainio O, Klén R. Leveraging Artificial Intelligence to Optimize Transcranial Direct Current Stimulation for Long COVID Management: A Forward-Looking Perspective. Brain Sci 2024; 14:831. [PMID: 39199522 PMCID: PMC11353063 DOI: 10.3390/brainsci14080831] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2024] [Revised: 08/12/2024] [Accepted: 08/18/2024] [Indexed: 09/01/2024] Open
Abstract
Long COVID (Coronavirus disease), affecting millions globally, presents unprecedented challenges to healthcare systems due to its complex, multifaceted nature and the lack of effective treatments. This perspective review explores the potential of artificial intelligence (AI)-guided transcranial direct current stimulation (tDCS) as an innovative approach to address the urgent need for effective Long COVID management. The authors examine how AI could optimize tDCS protocols, enhance clinical trial design, and facilitate personalized treatment for the heterogeneous manifestations of Long COVID. Key areas discussed include AI-driven personalization of tDCS parameters based on individual patient characteristics and real-time symptom fluctuations, the use of machine learning for patient stratification, and the development of more sensitive outcome measures in clinical trials. This perspective addresses ethical considerations surrounding data privacy, algorithmic bias, and equitable access to AI-enhanced treatments. It also explores challenges and opportunities for implementing AI-guided tDCS across diverse healthcare settings globally. Future research directions are outlined, including the need for large-scale validation studies and investigations of long-term efficacy and safety. The authors argue that while AI-guided tDCS shows promise for addressing the complex nature of Long COVID, significant technical, ethical, and practical challenges remain. They emphasize the importance of interdisciplinary collaboration, patient-centered approaches, and a commitment to global health equity in realizing the potential of this technology. This perspective article provides a roadmap for researchers, clinicians, and policymakers involved in developing and implementing AI-guided neuromodulation therapies for Long COVID and potentially other neurological and psychiatric conditions.
Collapse
Affiliation(s)
- Thorsten Rudroff
- Turku PET Centre, University of Turku, Turku University Hospital, 20520 Turku, Finland; (O.R.); (R.K.)
| | | | | |
Collapse
|
11
|
Park JI, Park JW, Zhang K, Kim D. Advancing equity in breast cancer care: natural language processing for analysing treatment outcomes in under-represented populations. BMJ Health Care Inform 2024; 31:e100966. [PMID: 38955389 PMCID: PMC11218025 DOI: 10.1136/bmjhci-2023-100966] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2023] [Accepted: 06/21/2024] [Indexed: 07/04/2024] Open
Abstract
OBJECTIVE The study aimed to develop natural language processing (NLP) algorithms to automate extracting patient-centred breast cancer treatment outcomes from clinical notes in electronic health records (EHRs), particularly for women from under-represented populations. METHODS The study used clinical notes from 2010 to 2021 from a tertiary hospital in the USA. The notes were processed through various NLP techniques, including vectorisation methods (term frequency-inverse document frequency (TF-IDF), Word2Vec, Doc2Vec) and classification models (support vector classification, K-nearest neighbours (KNN), random forest (RF)). Feature selection and optimisation through random search and fivefold cross-validation were also conducted. RESULTS The study annotated 100 out of 1000 clinical notes, using 970 notes to build the text corpus. TF-IDF and Doc2Vec combined with RF showed the highest performance, while Word2Vec was less effective. RF classifier demonstrated the best performance, although with lower recall rates, suggesting more false negatives. KNN showed lower recall due to its sensitivity to data noise. DISCUSSION The study highlights the significance of using NLP in analysing clinical notes to understand breast cancer treatment outcomes in under-represented populations. The TF-IDF and Doc2Vec models were more effective in capturing relevant information than Word2Vec. The study observed lower recall rates in RF models, attributed to the dataset's imbalanced nature and the complexity of clinical notes. CONCLUSION The study developed high-performing NLP pipeline to capture treatment outcomes for breast cancer in under-represented populations, demonstrating the importance of document-level vectorisation and ensemble methods in clinical notes analysis. The findings provide insights for more equitable healthcare strategies and show the potential for broader NLP applications in clinical settings.
Collapse
Affiliation(s)
- Jung In Park
- University of California Irvine, Irvine, California, USA
| | - Jong Won Park
- Yonsei Cancer Center, Yonsei University College of Medicine, Seoul, South Korea
| | - Kexin Zhang
- Donald Bren School of Information & Computer Sciences, University of California Irvine, Irvine, California, USA
| | - Doyop Kim
- Independent Researcher, Irvine, California, USA
| |
Collapse
|
12
|
Ohno Y, Kato R, Ishikawa H, Nishiyama T, Isawa M, Mochizuki M, Aramaki E, Aomori T. Using the Natural Language Processing System Medical Named Entity Recognition-Japanese to Analyze Pharmaceutical Care Records: Natural Language Processing Analysis. JMIR Form Res 2024; 8:e55798. [PMID: 38833694 PMCID: PMC11185902 DOI: 10.2196/55798] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2023] [Revised: 04/05/2024] [Accepted: 04/26/2024] [Indexed: 06/06/2024] Open
Abstract
BACKGROUND Large language models have propelled recent advances in artificial intelligence technology, facilitating the extraction of medical information from unstructured data such as medical records. Although named entity recognition (NER) is used to extract data from physicians' records, it has yet to be widely applied to pharmaceutical care records. OBJECTIVE In this study, we aimed to investigate the feasibility of automatic extraction of the information regarding patients' diseases and symptoms from pharmaceutical care records. The verification was performed using Medical Named Entity Recognition-Japanese (MedNER-J), a Japanese disease-extraction system designed for physicians' records. METHODS MedNER-J was applied to subjective, objective, assessment, and plan data from the care records of 49 patients who received cefazolin sodium injection at Keio University Hospital between April 2018 and March 2019. The performance of MedNER-J was evaluated in terms of precision, recall, and F1-score. RESULTS The F1-scores of NER for subjective, objective, assessment, and plan data were 0.46, 0.70, 0.76, and 0.35, respectively. In NER and positive-negative classification, the F1-scores were 0.28, 0.39, 0.64, and 0.077, respectively. The F1-scores of NER for objective (0.70) and assessment data (0.76) were higher than those for subjective and plan data, which supported the superiority of NER performance for objective and assessment data. This might be because objective and assessment data contained many technical terms, similar to the training data for MedNER-J. Meanwhile, the F1-score of NER and positive-negative classification was high for assessment data alone (F1-score=0.64), which was attributed to the similarity of its description format and contents to those of the training data. CONCLUSIONS MedNER-J successfully read pharmaceutical care records and showed the best performance for assessment data. However, challenges remain in analyzing records other than assessment data. Therefore, it will be necessary to reinforce the training data for subjective data in order to apply the system to pharmaceutical care records.
Collapse
Affiliation(s)
- Yukiko Ohno
- Faculty of Pharmacy, Keio University, Tokyo, Japan
| | - Riri Kato
- Faculty of Pharmacy, Keio University, Tokyo, Japan
| | | | | | - Minae Isawa
- Faculty of Pharmacy, Keio University, Tokyo, Japan
| | | | - Eiji Aramaki
- Nara Institute of Science and Technology, Nara, Japan
| | - Tohru Aomori
- Faculty of Pharmacy, Takasaki University of Health and Welfare, Gunma, Japan
| |
Collapse
|
13
|
Ragab M, Almuhammadi A, Mansour RF, Kadry S. RETRACTED: Natural language processing with deep learning enabled hybrid content retrieval model for digital library management. EXPERT SYSTEMS 2024; 41. [DOI: 10.1111/exsy.13135] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/04/2022] [Accepted: 08/23/2022] [Indexed: 10/28/2024]
Abstract
AbstractIn recent times, natural language processing (NLP) technique has received significant attention in content retrieval (CR) domain. The emergence of digital libraries, in recent years, enables people from across the globe to access and store books, documents, and literature of multiple kinds. The development of NLP models has considerably improved the performance in terms of digital library management. In this scenario, artificial intelligence‐based expert systems are required to handle massive quantities of data that exist in digital libraries and achieve effective CR performance. In this background, the current study designs NLP with deep learning enabled hybrid content retrieval (NLPDL‐HCR) model for digital library management. The aim of the presented NLPDL‐HCR is to effectually retrieve the images as well as textual data from digital libraries based on a user's query. The proposed NLPDL‐HCR model encompasses two major stages namely, text retrieval and image retrieval (IR). During text retrieval process, the proposed NLPDL‐HCR model includes term frequency inverse document frequency vectorizer with optimal gated recurrent unit (GRU) model. The hyperparameters of the GRU model are optimally adjusted with the help of RMSProp approach. Besides, the IR process involves three sub‐processes namely, densely connected networks‐based feature extraction, butterfly optimization algorithm‐based hyperparameter tuning, and Euclidean distance‐based similarity measurement. The experimental analysis results, accomplished by the proposed NLPDL‐HCR model using benchmark datasets, highlighted its superior performance over recent state‐of‐the‐art approaches.
Collapse
Affiliation(s)
- Mahmoud Ragab
- Information Technology Department, Faculty of Computing and Information Technology King Abdulaziz University Jeddah Saudi Arabia
- Mathematics Department, Faculty of Science Al‐Azhar University Naser City Egypt
- Center of Excellence in Smart Environment Research King Abdulaziz University Jeddah Saudi Arabia
| | - Anas Almuhammadi
- English Language Institute King Abdulaziz University Jeddah Saudi Arabia
| | - Romany F. Mansour
- Department of Mathematics, Faculty of Science New Valley University El‐Kharga Egypt
| | - Seifedine Kadry
- Faculty of Applied Computing and Technology Noroff University College Kristiansand Norway
| |
Collapse
|
14
|
Wessel D, Pogrebnyakov N. Using Social Media as a Source of Real-World Data for Pharmaceutical Drug Development and Regulatory Decision Making. Drug Saf 2024; 47:495-511. [PMID: 38446405 PMCID: PMC11018692 DOI: 10.1007/s40264-024-01409-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 02/07/2024] [Indexed: 03/07/2024]
Abstract
INTRODUCTION While pharmaceutical companies aim to leverage real-world data (RWD) to bridge the gap between clinical drug development and real-world patient outcomes, extant research has mainly focused on the use of social media in a post-approval safety-surveillance setting. Recent regulatory and technological developments indicate that social media may serve as a rich source to expand the evidence base to pre-approval and drug development activities. However, use cases related to drug development have been largely omitted, thereby missing some of the benefits of RWD. In addition, an applied end-to-end understanding of RWD rooted in both industry and regulations is lacking. OBJECTIVE We aimed to investigate how social media can be used as a source of RWD to support regulatory decision making and drug development in the pharmaceutical industry. We aimed to specifically explore the data pipeline and examine how social-media derived RWD can align with regulatory guidance from the US Food and Drug Administration and industry needs. METHODS A machine learning pipeline was developed to extract patient insights related to anticoagulants from X (Twitter) data. These findings were then analysed from an industry perspective, and complemented by interviews with professionals from a pharmaceutical company. RESULTS The analysis reveals several use cases where RWD derived from social media can be beneficial, particularly in generating hypotheses around patient and therapeutic area needs. We also note certain limitations of social media data, particularly around inferring causality. CONCLUSIONS Social media display considerable potential as a source of RWD for guiding efforts in pharmaceutical drug development and pre-approval settings. Although further regulatory guidance on the use of social media for RWD is needed to encourage its use, regulatory and technological developments are suggested to warrant at least exploratory uses for drug development.
Collapse
Affiliation(s)
- Didrik Wessel
- Copenhagen Business School, Frederiksberg, Denmark.
- , Nørrebrogade 18A 3TH, 2200, Copenhagen N, Denmark.
| | | |
Collapse
|
15
|
Nishioka S, Watabe S, Yanagisawa Y, Sayama K, Kizaki H, Imai S, Someya M, Taniguchi R, Yada S, Aramaki E, Hori S. Adverse Event Signal Detection Using Patients' Concerns in Pharmaceutical Care Records: Evaluation of Deep Learning Models. J Med Internet Res 2024; 26:e55794. [PMID: 38625718 PMCID: PMC11061790 DOI: 10.2196/55794] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2023] [Revised: 02/14/2024] [Accepted: 03/09/2024] [Indexed: 04/17/2024] Open
Abstract
BACKGROUND Early detection of adverse events and their management are crucial to improving anticancer treatment outcomes, and listening to patients' subjective opinions (patients' voices) can make a major contribution to improving safety management. Recent progress in deep learning technologies has enabled various new approaches for the evaluation of safety-related events based on patient-generated text data, but few studies have focused on the improvement of real-time safety monitoring for individual patients. In addition, no study has yet been performed to validate deep learning models for screening patients' narratives for clinically important adverse event signals that require medical intervention. In our previous work, novel deep learning models have been developed to detect adverse event signals for hand-foot syndrome or adverse events limiting patients' daily lives from the authored narratives of patients with cancer, aiming ultimately to use them as safety monitoring support tools for individual patients. OBJECTIVE This study was designed to evaluate whether our deep learning models can screen clinically important adverse event signals that require intervention by health care professionals. The applicability of our deep learning models to data on patients' concerns at pharmacies was also assessed. METHODS Pharmaceutical care records at community pharmacies were used for the evaluation of our deep learning models. The records followed the SOAP format, consisting of subjective (S), objective (O), assessment (A), and plan (P) columns. Because of the unique combination of patients' concerns in the S column and the professional records of the pharmacists, this was considered a suitable data for the present purpose. Our deep learning models were applied to the S records of patients with cancer, and the extracted adverse event signals were assessed in relation to medical actions and prescribed drugs. RESULTS From 30,784 S records of 2479 patients with at least 1 prescription of anticancer drugs, our deep learning models extracted true adverse event signals with more than 80% accuracy for both hand-foot syndrome (n=152, 91%) and adverse events limiting patients' daily lives (n=157, 80.1%). The deep learning models were also able to screen adverse event signals that require medical intervention by health care providers. The extracted adverse event signals could reflect the side effects of anticancer drugs used by the patients based on analysis of prescribed anticancer drugs. "Pain or numbness" (n=57, 36.3%), "fever" (n=46, 29.3%), and "nausea" (n=40, 25.5%) were common symptoms out of the true adverse event signals identified by the model for adverse events limiting patients' daily lives. CONCLUSIONS Our deep learning models were able to screen clinically important adverse event signals that require intervention for symptoms. It was also confirmed that these deep learning models could be applied to patients' subjective information recorded in pharmaceutical care records accumulated during pharmacists' daily work.
Collapse
Affiliation(s)
- Satoshi Nishioka
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Satoshi Watabe
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Yuki Yanagisawa
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Kyoko Sayama
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Hayato Kizaki
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | - Shungo Imai
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| | | | | | - Shuntaro Yada
- Nara Institute of Science and Technology, Nara, Japan
| | - Eiji Aramaki
- Nara Institute of Science and Technology, Nara, Japan
| | - Satoko Hori
- Division of Drug Informatics, Keio University Faculty of Pharmacy, Tokyo, Japan
| |
Collapse
|
16
|
Hassan E, Abd El-Hafeez T, Shams MY. Optimizing classification of diseases through language model analysis of symptoms. Sci Rep 2024; 14:1507. [PMID: 38233458 PMCID: PMC10794698 DOI: 10.1038/s41598-024-51615-5] [Citation(s) in RCA: 29] [Impact Index Per Article: 29.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2023] [Accepted: 01/07/2024] [Indexed: 01/19/2024] Open
Abstract
This paper investigated the use of language models and deep learning techniques for automating disease prediction from symptoms. Specifically, we explored the use of two Medical Concept Normalization-Bidirectional Encoder Representations from Transformers (MCN-BERT) models and a Bidirectional Long Short-Term Memory (BiLSTM) model, each optimized with a different hyperparameter optimization method, to predict diseases from symptom descriptions. In this paper, we utilized two distinct dataset called Dataset-1, and Dataset-2. Dataset-1 consists of 1,200 data points, with each point representing a unique combination of disease labels and symptom descriptions. While, Dataset-2 is designed to identify Adverse Drug Reactions (ADRs) from Twitter data, comprising 23,516 rows categorized as ADR (1) or Non-ADR (0) tweets. The results indicate that the MCN-BERT model optimized with AdamP achieved 99.58% accuracy for Dataset-1 and 96.15% accuracy for Dataset-2. The MCN-BERT model optimized with AdamW performed well with 98.33% accuracy for Dataset-1 and 95.15% for Dataset-2, while the BiLSTM model optimized with Hyperopt achieved 97.08% accuracy for Dataset-1 and 94.15% for Dataset-2. Our findings suggest that language models and deep learning techniques have promise for supporting earlier detection and more prompt treatment of diseases, as well as expanding remote diagnostic capabilities. The MCN-BERT and BiLSTM models demonstrated robust performance in accurately predicting diseases from symptoms, indicating the potential for further related research.
Collapse
Affiliation(s)
- Esraa Hassan
- Faculty of Artificial Intelligence, Kafrelsheikh University, Kafrelsheikh, 33516, Egypt.
| | - Tarek Abd El-Hafeez
- Department of Computer Science, Faculty of Science, Minia University, Minia, 61519, Egypt.
- Computer Science Unit, Deraya University, Minia University, Minia, 61765, Egypt.
| | - Mahmoud Y Shams
- Faculty of Artificial Intelligence, Kafrelsheikh University, Kafrelsheikh, 33516, Egypt.
| |
Collapse
|
17
|
Scharp D, Hobensack M, Davoudi A, Topaz M. Natural Language Processing Applied to Clinical Documentation in Post-acute Care Settings: A Scoping Review. J Am Med Dir Assoc 2024; 25:69-83. [PMID: 37838000 PMCID: PMC10792659 DOI: 10.1016/j.jamda.2023.09.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 09/05/2023] [Accepted: 09/07/2023] [Indexed: 10/16/2023]
Abstract
OBJECTIVES To determine the scope of the application of natural language processing to free-text clinical notes in post-acute care and provide a foundation for future natural language processing-based research in these settings. DESIGN Scoping review; reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews guidelines. SETTING AND PARTICIPANTS Post-acute care (ie, home health care, long-term care, skilled nursing facilities, and inpatient rehabilitation facilities). METHODS PubMed, Cumulative Index of Nursing and Allied Health Literature, and Embase were searched in February 2023. Eligible studies had quantitative designs that used natural language processing applied to clinical documentation in post-acute care settings. The quality of each study was appraised. RESULTS Twenty-one studies were included. Almost all studies were conducted in home health care settings. Most studies extracted data from electronic health records to examine the risk for negative outcomes, including acute care utilization, medication errors, and suicide mortality. About half of the studies did not report age, sex, race, or ethnicity data or use standardized terminologies. Only 8 studies included variables from socio-behavioral domains. Most studies fulfilled all quality appraisal indicators. CONCLUSIONS AND IMPLICATIONS The application of natural language processing is nascent in post-acute care settings. Future research should apply natural language processing using standardized terminologies to leverage free-text clinical notes in post-acute care to promote timely, comprehensive, and equitable care. Natural language processing could be integrated with predictive models to help identify patients who are at risk of negative outcomes. Future research should incorporate socio-behavioral determinants and diverse samples to improve health equity in informatics tools.
Collapse
Affiliation(s)
| | | | - Anahita Davoudi
- VNS Health, Center for Home Care Policy & Research, New York, NY, USA
| | - Maxim Topaz
- Columbia University School of Nursing, New York, NY, USA
| |
Collapse
|
18
|
Sim JA, Huang X, Horan MR, Stewart CM, Robison LL, Hudson MM, Baker JN, Huang IC. Natural language processing with machine learning methods to analyze unstructured patient-reported outcomes derived from electronic health records: A systematic review. Artif Intell Med 2023; 146:102701. [PMID: 38042599 PMCID: PMC10693655 DOI: 10.1016/j.artmed.2023.102701] [Citation(s) in RCA: 17] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2023] [Revised: 09/30/2023] [Accepted: 10/29/2023] [Indexed: 12/04/2023]
Abstract
OBJECTIVE Natural language processing (NLP) combined with machine learning (ML) techniques are increasingly used to process unstructured/free-text patient-reported outcome (PRO) data available in electronic health records (EHRs). This systematic review summarizes the literature reporting NLP/ML systems/toolkits for analyzing PROs in clinical narratives of EHRs and discusses the future directions for the application of this modality in clinical care. METHODS We searched PubMed, Scopus, and Web of Science for studies written in English between 1/1/2000 and 12/31/2020. Seventy-nine studies meeting the eligibility criteria were included. We abstracted and summarized information related to the study purpose, patient population, type/source/amount of unstructured PRO data, linguistic features, and NLP systems/toolkits for processing unstructured PROs in EHRs. RESULTS Most of the studies used NLP/ML techniques to extract PROs from clinical narratives (n = 74) and mapped the extracted PROs into specific PRO domains for phenotyping or clustering purposes (n = 26). Some studies used NLP/ML to process PROs for predicting disease progression or onset of adverse events (n = 22) or developing/validating NLP/ML pipelines for analyzing unstructured PROs (n = 19). Studies used different linguistic features, including lexical, syntactic, semantic, and contextual features, to process unstructured PROs. Among the 25 NLP systems/toolkits we identified, 15 used rule-based NLP, 6 used hybrid NLP, and 4 used non-neural ML algorithms embedded in NLP. CONCLUSIONS This study supports the potential utility of different NLP/ML techniques in processing unstructured PROs available in EHRs for clinical care. Though using annotation rules for NLP/ML to analyze unstructured PROs is dominant, deploying novel neural ML-based methods is warranted.
Collapse
Affiliation(s)
- Jin-Ah Sim
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States; School of AI Convergence, Hallym University, Chuncheon, Republic of Korea
| | - Xiaolei Huang
- Department of Computer Science, University of Memphis, Memphis, TN, United States
| | - Madeline R Horan
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Christopher M Stewart
- Institute for Intelligent Systems, University of Memphis, Memphis, TN, United States
| | - Leslie L Robison
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Melissa M Hudson
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States; Department of Oncology, St. Jude Children's Research Hospital, Memphis, TN, United States
| | - Justin N Baker
- Department of Pediatrics, Stanford University, Stanford, CA, United States
| | - I-Chan Huang
- Department of Epidemiology and Cancer Control, St. Jude Children's Research Hospital, Memphis, TN, United States.
| |
Collapse
|
19
|
Das RK, Islam M, Hasan MM, Razia S, Hassan M, Khushbu SA. Sentiment analysis in multilingual context: Comparative analysis of machine learning and hybrid deep learning models. Heliyon 2023; 9:e20281. [PMID: 37809397 PMCID: PMC10560063 DOI: 10.1016/j.heliyon.2023.e20281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2023] [Revised: 09/13/2023] [Accepted: 09/18/2023] [Indexed: 10/10/2023] Open
Abstract
This research paper investigates the efficacy of various machine learning models, including deep learning and hybrid models, for text classification in the English and Bangla languages. The study focuses on sentiment analysis of comments from a popular Bengali e-commerce site, "DARAZ," which comprises both Bangla and translated English reviews. The primary objective of this study is to conduct a comparative analysis of various models, evaluating their efficacy in the domain of sentiment analysis. The research methodology includes implementing seven machine learning models and deep learning models, such as Long Short-Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), Convolutional 1D (Conv1D), and a combined Conv1D-LSTM. Preprocessing techniques are applied to a modified text set to enhance model accuracy. The major conclusion of the study is that Support Vector Machine (SVM) models exhibit superior performance compared to other models, achieving an accuracy of 82.56% for English text sentiment analysis and 86.43% for Bangla text sentiment analysis using the porter stemming algorithm. Additionally, the Bi-LSTM Based Model demonstrates the best performance among the deep learning models, achieving an accuracy of 78.10% for English text and 83.72% for Bangla text using porter stemming. This study signifies significant progress in natural language processing research, particularly for Bangla, by enhancing improved text classification models and methodologies. The results of this research make a significant contribution to the field of sentiment analysis and offer valuable insights for future research and practical applications.
Collapse
Affiliation(s)
- Rajesh Kumar Das
- Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, Bangladesh
| | - Mirajul Islam
- Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, Bangladesh
- Faculty of Graduate Studies, Daffodil International University, Dhaka 1341, Bangladesh
| | - Md Mahmudul Hasan
- Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, Bangladesh
| | - Sultana Razia
- Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, Bangladesh
| | - Mocksidul Hassan
- Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, Bangladesh
| | - Sharun Akter Khushbu
- Department of Computer Science and Engineering, Daffodil International University, Dhaka 1341, Bangladesh
| |
Collapse
|
20
|
Fraile Navarro D, Ijaz K, Rezazadegan D, Rahimi-Ardabili H, Dras M, Coiera E, Berkovsky S. Clinical named entity recognition and relation extraction using natural language processing of medical free text: A systematic review. Int J Med Inform 2023; 177:105122. [PMID: 37295138 DOI: 10.1016/j.ijmedinf.2023.105122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2022] [Revised: 04/14/2023] [Accepted: 06/03/2023] [Indexed: 06/12/2023]
Abstract
BACKGROUND Natural Language Processing (NLP) applications have developed over the past years in various fields including its application to clinical free text for named entity recognition and relation extraction. However, there has been rapid developments the last few years that there's currently no overview of it. Moreover, it is unclear how these models and tools have been translated into clinical practice. We aim to synthesize and review these developments. METHODS We reviewed literature from 2010 to date, searching PubMed, Scopus, the Association of Computational Linguistics (ACL), and Association of Computer Machinery (ACM) libraries for studies of NLP systems performing general-purpose (i.e., not disease- or treatment-specific) information extraction and relation extraction tasks in unstructured clinical text (e.g., discharge summaries). RESULTS We included in the review 94 studies with 30 studies published in the last three years. Machine learning methods were used in 68 studies, rule-based in 5 studies, and both in 22 studies. 63 studies focused on Named Entity Recognition, 13 on Relation Extraction and 18 performed both. The most frequently extracted entities were "problem", "test" and "treatment". 72 studies used public datasets and 22 studies used proprietary datasets alone. Only 14 studies defined clearly a clinical or information task to be addressed by the system and just three studies reported its use outside the experimental setting. Only 7 studies shared a pre-trained model and only 8 an available software tool. DISCUSSION Machine learning-based methods have dominated the NLP field on information extraction tasks. More recently, Transformer-based language models are taking the lead and showing the strongest performance. However, these developments are mostly based on a few datasets and generic annotations, with very few real-world use cases. This may raise questions about the generalizability of findings, translation into practice and highlights the need for robust clinical evaluation.
Collapse
Affiliation(s)
- David Fraile Navarro
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia.
| | - Kiran Ijaz
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Dana Rezazadegan
- Department of Computer Science and Software Engineering. School of Software and Electrical Engineering, Swinburne University of Technology, Melbourne, Australia
| | - Hania Rahimi-Ardabili
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Mark Dras
- Department of Computing, Macquarie University, Sydney, Australia
| | - Enrico Coiera
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| | - Shlomo Berkovsky
- Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, Sydney, Australia
| |
Collapse
|
21
|
Stojanovski K, King EJ, O'Connell S, Gallagher KS, Theall KP, Geronimus AT. Spiraling Risk: Visualizing the multilevel factors that socially pattern HIV risk among gay, bisexual & other men who have sex with men using Complex Systems Theory. Curr HIV/AIDS Rep 2023; 20:206-217. [PMID: 37486568 PMCID: PMC10403445 DOI: 10.1007/s11904-023-00664-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 06/23/2023] [Indexed: 07/25/2023]
Abstract
PURPOSE OF REVIEW Global disparities in HIV infection, particularly among gay, bisexual, and other men who have sex with men (GBMSM), indicate the importance of exploring the multi-level processes that shape HIV's spread. We used Complex Systems Theory and the PRISMA guidelines to conduct a systematic review of 63 global reviews to understand how HIV is socially patterned among GBMSM. The purpose was to conduct a thematic analysis of the reviews to (1) synthesize the multi-level risk factors of HIV risk, (2) categorize risk across the socioecological model, and (3) develop a conceptual model that visualizes the interrelated factors that shape GBMSMS's HIV "risk." RECENT FINDINGS We included 49 studies of high and moderate quality studies. Results indicated that GBMSM's HIV risk stems from the individual, interpersonal, and structural levels of the socioecological model. We identified a few themes that shape GBMSM's risk of HIV infection related to biomedical prevention methods; sexual and sex-seeking behaviors; behavioral prevention methods; individual-level characteristics and syndemic infections; lived experiences and interpersonal relationships; country-level income; country-level HIV prevalence; and structural stigma. The multi-level factors, in tandem, serve to perpetuate GBMSM's risk of HIV infection globally. The amalgamation of our thematic analyses from our systematic reviews of reviews suggests that the risk of HIV infection operates in an emergent, dynamic, and complex nature across multiple levels of the socioecological model. Applying complex systems theory indicates how multilevel factors create a dynamic and reinforcing system of HIV risk among GBMSM.
Collapse
Affiliation(s)
- K Stojanovski
- Department of Social, Behavioral and Population Sciences, Tulane School of Public Health & Tropical Medicine, New Orleans, USA.
| | - E J King
- Department of Health Behavior & Health Education, School of Public Health, University of Michigan, Ann Arbor, USA
| | - S O'Connell
- Department of Epidemiology, Tulane School of Public Health & Tropical Medicine, New Orleans, USA
| | - K S Gallagher
- Department of Health Policy and Management, Tulane School of Public Health & Tropical Medicine, New Orleans, USA
| | - K P Theall
- Department of Social, Behavioral and Population Sciences, Tulane School of Public Health & Tropical Medicine, New Orleans, USA
- Department of Epidemiology, Tulane School of Public Health & Tropical Medicine, New Orleans, USA
| | - A T Geronimus
- Department of Health Behavior & Health Education, School of Public Health, University of Michigan, Ann Arbor, USA
- Institute for Social Research, University of Michigan, Ann Arbor, USA
| |
Collapse
|
22
|
Trajanov D, Trajkovski V, Dimitrieva M, Dobreva J, Jovanovik M, Klemen M, Žagar A, Robnik-Šikonja M. Review of Natural Language Processing in Pharmacology. Pharmacol Rev 2023; 75:714-738. [PMID: 36931724 DOI: 10.1124/pharmrev.122.000715] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 01/18/2023] [Accepted: 03/07/2023] [Indexed: 03/19/2023] Open
Abstract
Natural language processing (NLP) is an area of artificial intelligence that applies information technologies to process the human language, understand it to a certain degree, and use it in various applications. This area has rapidly developed in the past few years and now employs modern variants of deep neural networks to extract relevant patterns from large text corpora. The main objective of this work is to survey the recent use of NLP in the field of pharmacology. As our work shows, NLP is a highly relevant information extraction and processing approach for pharmacology. It has been used extensively, from intelligent searches through thousands of medical documents to finding traces of adversarial drug interactions in social media. We split our coverage into five categories to survey modern NLP: methodology, commonly addressed tasks, relevant textual data, knowledge bases, and useful programming libraries. We split each of the five categories into appropriate subcategories, describe their main properties and ideas, and summarize them in a tabular form. The resulting survey presents a comprehensive overview of the area, useful to practitioners and interested observers. SIGNIFICANCE STATEMENT: The main objective of this work is to survey the recent use of NLP in the field of pharmacology in order to provide a comprehensive overview of the current state in the area after the rapid developments that occurred in the past few years. The resulting survey will be useful to practitioners and interested observers in the domain.
Collapse
Affiliation(s)
- Dimitar Trajanov
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Vangel Trajkovski
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Makedonka Dimitrieva
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Jovana Dobreva
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Milos Jovanovik
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Matej Klemen
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Aleš Žagar
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| | - Marko Robnik-Šikonja
- Faculty of Computer Science and Engineering, Ss. Cyril and Methodius University in Skopje, North Macedonia (D.T., V.T., M.D., J.D., M.J.); Computer Science Department, Metropolitan College, Boston University, Boston, Massachusetts (D.T.); and Faculty of Computer and Information Science, University of Ljubljana, Slovenia (M.K., A.Ž., M.R.- Š.)
| |
Collapse
|
23
|
Levis M, Levy J, Dufort V, Russ CJ, Shiner B. Dynamic suicide topic modelling: Deriving population-specific, psychosocial and time-sensitive suicide risk variables from Electronic Health Record psychotherapy notes. Clin Psychol Psychother 2023; 30:795-810. [PMID: 36797651 PMCID: PMC11172400 DOI: 10.1002/cpp.2842] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2023] [Accepted: 02/14/2023] [Indexed: 02/18/2023]
Abstract
In the machine learning subfield of natural language processing, a topic model is a type of unsupervised method that is used to uncover abstract topics within a corpus of text. Dynamic topic modelling (DTM) is used for capturing change in these topics over time. The study deploys DTM on corpus of electronic health record psychotherapy notes. This retrospective study examines whether DTM helps distinguish closely matched patients that did and did not die by suicide. Cohort consists of United States Department of Veterans Affairs (VA) patients diagnosed with Posttraumatic Stress Disorder (PTSD) between 2004 and 2013. Each case (those who died by suicide during the year following diagnosis) was matched with five controls (those who remained alive) that shared psychotherapists and had similar suicide risk based on VA's suicide prediction algorithm. Cohort was restricted to patients who received psychotherapy for 9+ months after initial PTSD diagnoses (cases = 77; controls = 362). For cases, psychotherapy notes from diagnosis until death were examined. For controls, psychotherapy notes from diagnosis until matched case's death date were examined. A Python-based DTM algorithm was utilized. Derived topics identified population-specific themes, including PTSD, psychotherapy, medication, communication and relationships. Control topics changed significantly more over time than case topics. Topic differences highlighted engagement, expressivity and therapeutic alliance. This study strengthens groundwork for deriving population-specific, psychosocial and time-sensitive suicide risk variables.
Collapse
Affiliation(s)
- Maxwell Levis
- White River Junction VA Medical Center, Hartford, Vermont, USA
- Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
| | - Joshua Levy
- Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
| | - Vincent Dufort
- White River Junction VA Medical Center, Hartford, Vermont, USA
| | - Carey J. Russ
- White River Junction VA Medical Center, Hartford, Vermont, USA
- Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
| | - Brian Shiner
- White River Junction VA Medical Center, Hartford, Vermont, USA
- Geisel School of Medicine at Dartmouth, Hanover, New Hampshire, USA
- National Center for PTSD Executive Division, Hartford, Vermont, USA
| |
Collapse
|
24
|
Mitha S, Schwartz J, Hobensack M, Cato K, Woo K, Smaldone A, Topaz M. Natural Language Processing of Nursing Notes: An Integrative Review. Comput Inform Nurs 2023; 41:377-384. [PMID: 36730744 PMCID: PMC11499545 DOI: 10.1097/cin.0000000000000967] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
Natural language processing includes a variety of techniques that help to extract meaning from narrative data. In healthcare, medical natural language processing has been a growing field of study; however, little is known about its use in nursing. We searched PubMed, EMBASE, and CINAHL and found 689 studies, narrowed to 43 eligible studies using natural language processing in nursing notes. Data related to the study purpose, patient population, methodology, performance evaluation metrics, and quality indicators were extracted for each study. The majority (86%) of the studies were conducted from 2015 to 2021. Most of the studies (58%) used inpatient data. One of four studies used data from open-source databases. The most common standard terminologies used were the Unified Medical Language System and Systematized Nomenclature of Medicine, whereas nursing-specific standard terminologies were used only in eight studies. Full system performance metrics (eg, F score) were reported for 61% of applicable studies. The overall number of nursing natural language processing publications remains relatively small compared with the other medical literature. Future studies should evaluate and report appropriate performance metrics and use existing standard nursing terminologies to enable future scalability of the methods and findings.
Collapse
Affiliation(s)
- Shazia Mitha
- Author Affiliations : Columbia University School of Nursing, New York
| | | | | | | | | | | | | |
Collapse
|
25
|
Dinari F, Bahaadinbeigy K, Bassiri S, Mashouf E, Bastaminejad S, Moulaei K. Benefits, barriers, and facilitators of using speech recognition technology in nursing documentation and reporting: A cross-sectional study. Health Sci Rep 2023; 6:e1330. [PMID: 37313530 PMCID: PMC10259462 DOI: 10.1002/hsr2.1330] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2023] [Revised: 05/18/2023] [Accepted: 05/31/2023] [Indexed: 06/15/2023] Open
Abstract
Background and Aim Nursing reports are necessary for clinical communication and provide an accurate reflection of nursing assessments, care provided, changes in clinical status, and patient-related information to support the multidisciplinary team to provide individualized care. Nurses always face challenges in recording and documenting nursing reports. Speech recognition systems (SRS), as one of the documentation technologies, can play a potential role in recording medical reports. Therefore, this study seeks to identify the barriers, benefits, and facilitators of utilizing speech recognition technology in nursing reports. Materials and Methods This cross-sectional was conducted through a researcher-made questionnaire in 2022. Invitations were sent to 200 ICU nurses working in the three educational hospitals of Imam Reza (AS), Qaem and Imam Zaman in Mashhad city (Iran), 125 of whom accepted our invitation. Finally, 73 nurses included the study based on inclusion and exclusion criteria. Data analysis was performed using SPSS 22.0. Results According to the nurses, "paperwork reduction" (3.96, ±1.96), "performance improvement" (3.96, ±0.93), and "cost reduction" (3.95, ±1.07) were the most common benefits of using the SRS. "Lack of specialized, technical, and experienced staff to teach nurses how to work with speech recognition systems" (3.59, ±1.18), "insufficient training of nurses" (3.59, ±1.11), and "need to edit and control quality and correct documents" (3.59, ±1.03) were the most common barriers to using SRS. As well as "ability to fully review documentation processes" (3.62, ±1.13), "creation of integrated data in record documentation" (3.58, ±1.15), "possibility of error correction for nurses" (3.51, ±1.16) were the most common facilitators. There was no significant relationship between nurses' demographic information and the benefits, barriers, and facilitators. Conclusions By providing information on the benefits, barriers, and facilitators of using this technology, hospital managers, nursing managers, and information technology managers of healthcare centers can make more informed decisions in selecting and implementing SRS for nursing report documentation. This will help to avoid potential challenges that may reduce the efficiency, effectiveness, and productivity of the systems.
Collapse
Affiliation(s)
- Fatemeh Dinari
- Medical Informatics Research Center, Institute for Futures Studies in HealthKerman University of Medical SciencesKermanIran
| | - Kambiz Bahaadinbeigy
- Medical Informatics Research Center, Institute for Futures Studies in HealthKerman University of Medical SciencesKermanIran
| | - Somayyeh Bassiri
- Branch Artificial IntelligentIslamic Azad University MashhadMashhadIran
| | - Esmat Mashouf
- Department of Health Information TechnologyVarastegan Institute for Medical SciencesMashhadIran
| | - Saiyad Bastaminejad
- Department of Genetics, Faculty of ParamedicalIlam University of Medical SciencesIlamIran
| | - Khadijeh Moulaei
- Department of Health Information Technology, Faculty of ParamedicalIlam University of Medical SciencesIlamIran
| |
Collapse
|
26
|
Ismail A, Al-Zoubi T, El Naqa I, Saeed H. The role of artificial intelligence in hastening time to recruitment in clinical trials. BJR Open 2023; 5:20220023. [PMID: 37953865 PMCID: PMC10636341 DOI: 10.1259/bjro.20220023] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2022] [Revised: 03/20/2023] [Accepted: 04/11/2023] [Indexed: 09/01/2023] Open
Abstract
Novel and developing artificial intelligence (AI) systems can be integrated into healthcare settings in numerous ways. For example, in the case of automated image classification and natural language processing, AI systems are beginning to demonstrate near expert level performance in detecting abnormalities such as seizure activity. This paper, however, focuses on AI integration into clinical trials. During the clinical trial recruitment process, considerable labor and time is spent sifting through electronic health record and interviewing patients. With the advancement of deep learning techniques such as natural language processing, intricate electronic health record data can be efficiently processed. This provides utility to workflows such as recruitment for clinical trials. Studies are starting to show promise in shortening the time to recruitment and reducing workload for those involved in clinical trial design. Additionally, numerous guidelines are being constructed to encourage integration of AI into the healthcare setting with meaningful impact. The goal would be to improve the clinical trial process by reducing bias in patient composition, improving retention of participants, and lowering costs and labor.
Collapse
Affiliation(s)
- Abdalah Ismail
- Advocate Aurora Health Care Department of Diagnostic Radiology, Aurora, United States
| | | | | | - Hina Saeed
- Lynn Cancer Institute-Baptist Health City, Boca Raton, United States
| |
Collapse
|
27
|
Rani S, Jain A. Optimizing healthcare system by amalgamation of text processing and deep learning: a systematic review. MULTIMEDIA TOOLS AND APPLICATIONS 2023:1-25. [PMID: 37362695 PMCID: PMC10183315 DOI: 10.1007/s11042-023-15539-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 08/23/2021] [Revised: 05/18/2022] [Accepted: 04/19/2023] [Indexed: 06/28/2023]
Abstract
The explosion of clinical textual data has drawn the attention of researchers. Owing to the abundance of clinical data, it is becoming difficult for healthcare professionals to take real-time measures. The tools and methods are lacking when compared to the amount of clinical data generated every day. This review aims to survey the text processing pipeline with deep learning methods such as CNN, RNN, LSTM, and GRU in the healthcare domain and discuss various applications such as clinical concept detection and extraction, medically aware dialogue systems, sentiment analysis of drug reviews shared online, clinical trial matching, and pharmacovigilance. In addition, we highlighted the major challenges in deploying text processing with deep learning to clinical textual data and identified the scope of research in this domain. Furthermore, we have discussed various resources that can be used in the future to optimize the healthcare domain by amalgamating text processing and deep learning.
Collapse
Affiliation(s)
- Somiya Rani
- Department of Computer Science and Engineering, NSUT East Campus (erstwhile AIACTR), Affiliated to Guru Gobind Singh Indraprastha University, Delhi, India
| | - Amita Jain
- Department of Computer Science and Engineering, Netaji Subhas University of Technology, Delhi, India
| |
Collapse
|
28
|
Neubauer L, Straw I, Mariconti E, Tanczer LM. A Systematic Literature Review of the Use of Computational Text Analysis Methods in Intimate Partner Violence Research. JOURNAL OF FAMILY VIOLENCE 2023; 38:1-20. [PMID: 37358974 PMCID: PMC10028783 DOI: 10.1007/s10896-023-00517-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Accepted: 02/15/2023] [Indexed: 06/28/2023]
Abstract
Purpose Computational text mining methods are proposed as a useful methodological innovation in Intimate Partner Violence (IPV) research. Text mining can offer researchers access to existing or new datasets, sourced from social media or from IPV-related organisations, that would be too large to analyse manually. This article aims to give an overview of current work applying text mining methodologies in the study of IPV, as a starting point for researchers wanting to use such methods in their own work. Methods This article reports the results of a systematic review of academic research using computational text mining to research IPV. A review protocol was developed according to PRISMA guidelines, and a literature search of 8 databases was conducted, identifying 22 unique studies that were included in the review. Results The included studies cover a wide range of methodologies and outcomes. Supervised and unsupervised approaches are represented, including rule-based classification (n = 3), traditional Machine Learning (n = 8), Deep Learning (n = 6) and topic modelling (n = 4) methods. Datasets are mostly sourced from social media (n = 15), with other data being sourced from police forces (n = 3), health or social care providers (n = 3), or litigation texts (n = 1). Evaluation methods mostly used a held-out, labelled test set, or k-fold Cross Validation, with Accuracy and F1 metrics reported. Only a few studies commented on the ethics of computational IPV research. Conclusions Text mining methodologies offer promising data collection and analysis techniques for IPV research. Future work in this space must consider ethical implications of computational approaches.
Collapse
Affiliation(s)
- Lilly Neubauer
- University College London, Gower Street, London, WC1E 6BT UK
| | - Isabel Straw
- University College London, Gower Street, London, WC1E 6BT UK
| | | | | |
Collapse
|
29
|
Durieux BN, Zverev SR, Tarbi EC, Kwok A, Sciacca K, Pollak KI, Tulsky JA, Lindvall C. Development of a keyword library for capturing PRO-CTCAE-focused "symptom talk" in oncology conversations. JAMIA Open 2023; 6:ooad009. [PMID: 36789287 PMCID: PMC9912707 DOI: 10.1093/jamiaopen/ooad009] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2022] [Revised: 01/18/2023] [Accepted: 02/02/2023] [Indexed: 02/12/2023] Open
Abstract
Objectives As computational methods for detecting symptoms can help us better attend to patient suffering, the objectives of this study were to develop and evaluate the performance of a natural language processing keyword library for detecting symptom talk, and to describe symptom communication within our dataset to generate insights for future model building. Materials and Methods This was a secondary analysis of 121 transcribed outpatient oncology conversations from the Communication in Oncologist-Patient Encounters trial. Through an iterative process of identifying symptom expressions via inductive and deductive techniques, we generated a library of keywords relevant to the Patient-Reported Outcome version of the Common Terminology Criteria for Adverse Events (PRO-CTCAE) framework from 90 conversations, and tested the library on 31 additional transcripts. To contextualize symptom expressions and the nature of misclassifications, we qualitatively analyzed 450 mislabeled and properly labeled symptom-positive turns. Results The final library, comprising 1320 terms, identified symptom talk among conversation turns with an F1 of 0.82 against a PRO-CTCAE-focused gold standard, and an F1 of 0.61 against a broad gold standard. Qualitative observations suggest that physical symptoms are more easily detected than psychological symptoms (eg, anxiety), and ambiguity persists throughout symptom communication. Discussion This rudimentary keyword library captures most PRO-CTCAE-focused symptom talk, but the ambiguity of symptom speech limits the utility of rule-based methods alone, and limits to generalizability must be considered. Conclusion Our findings highlight opportunities for more advanced computational models to detect symptom expressions from transcribed clinical conversations. Future improvements in speech-to-text could enable real-time detection at scale.
Collapse
Affiliation(s)
- Brigitte N Durieux
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
| | - Samuel R Zverev
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, USA,NYU School of Medicine, New York University, New York, New York, USA
| | - Elise C Tarbi
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, USA,Department of Nursing, University of Vermont, Burlington, Vermont, USA
| | - Anne Kwok
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
| | - Kate Sciacca
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, USA,Department of Palliative Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, USA
| | - Kathryn I Pollak
- Department of Population Health Sciences, Duke University School of Medicine, Duke University, Durham, North Carolina, USA,Cancer Prevention and Control Program, Duke Cancer Institute, Duke University, Durham, North Carolina, USA
| | - James A Tulsky
- Department of Psychosocial Oncology and Palliative Care, Dana-Farber Cancer Institute, Boston, Massachusetts, USA,Department of Medicine, Brigham and Women’s Hospital, Boston, Massachusetts, USA
| | - Charlotta Lindvall
- Corresponding Author: Charlotta Lindvall, MD, PhD, Department of Psychosocial Oncology & Palliative Care, Dana-Farber Cancer Institute, 450 Brookline Ave, LW670, Boston, MA 02215, USA;
| |
Collapse
|
30
|
Ng JY, Verhoeff N, Steen J. What are the ways in which social media is used in the context of complementary and alternative medicine in the health and medical scholarly literature? a scoping review. BMC Complement Med Ther 2023; 23:32. [PMID: 36732809 PMCID: PMC9893203 DOI: 10.1186/s12906-023-03856-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2021] [Accepted: 01/20/2023] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Despite the increased use of social media to share health-related information and the substantial impact that complementary and alternative medicine (CAM) can have on individuals' health and wellbeing, currently, to our knowledge, there is no review that compiles research on how social media is used in the context of CAM. The objective of this study was to summarize what are the ways in which social media is used in the context of CAM. METHODS A scoping review was conducted, following Arksey and O'Malley's five-stage methodological framework. MEDLINE, EMBASE, PsycINFO, AMED, and CINAHL databases were systematically searched from inception until October 3, 2020, in addition to the Canadian Agency for Drugs and Technology in Health (CADTH) website. Eligible studies had to have investigated how at least one social media platform is used in the context of a single or multiple types of CAM treatments. RESULTS Searches retrieved 1714 items following deduplication, of which 1687 titles and abstracts were eliminated, leaving 94 full-text articles to be considered. Of those, 65 were not eligible, leaving a total of 29 articles eligible for review. Three themes emerged from our analysis: 1) social media is used to share user/practitioner beliefs, attitudes, and experiences about CAM, 2) social media acts as a vehicle for the spread of misinformation about CAM, and 3) there are unique challenges with social media research in the context of CAM. CONCLUSIONS In addition to social media being a useful tool to share user/practitioner beliefs, attitudes, and experiences about CAM, it has shown to be accessible, effective, and a viable option in delivering CAM therapies and information. Social media has also been shown to spread a large amount of misleading and false information in the context of CAM. Additionally, this review highlights the challenges with conducting social media research in the context of CAM, particularly in collecting a representative sample.
Collapse
Affiliation(s)
- Jeremy Y Ng
- Department of Health Research Methods, Evidence, and Impact, Faculty of Health Sciences, McMaster University, Michael G. DeGroote Centre for Learning and Discovery, Room 2112, 1280 Main Street West, Hamilton, ON, L8S 4K1, Canada.
| | - Natasha Verhoeff
- Department of Health Research Methods, Evidence, and Impact, Faculty of Health Sciences, McMaster University, Michael G. DeGroote Centre for Learning and Discovery, Room 2112, 1280 Main Street West, Hamilton, ON, L8S 4K1, Canada
| | - Jeremy Steen
- Department of Health Research Methods, Evidence, and Impact, Faculty of Health Sciences, McMaster University, Michael G. DeGroote Centre for Learning and Discovery, Room 2112, 1280 Main Street West, Hamilton, ON, L8S 4K1, Canada
| |
Collapse
|
31
|
Pethani F, Dunn AG. Natural language processing for clinical notes in dentistry: A systematic review. J Biomed Inform 2023; 138:104282. [PMID: 36623780 DOI: 10.1016/j.jbi.2023.104282] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 12/01/2022] [Accepted: 01/04/2023] [Indexed: 01/09/2023]
Abstract
OBJECTIVE To identify and synthesise research on applications of natural language processing (NLP) for information extraction and retrieval from clinical notes in dentistry. MATERIALS AND METHODS A predefined search strategy was applied in EMBASE, CINAHL and Medline. Studies eligible for inclusion were those that that described, evaluated, or applied NLP to clinical notes containing either human or simulated patient information. Quality of the study design and reporting was independently assessed based on a set of questions derived from relevant tools including CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS). A narrative synthesis was conducted to present the results. RESULTS Of the 17 included studies, 10 developed and evaluated NLP methods and 7 described applications of NLP-based information retrieval methods in dental records. Studies were published between 2015 and 2021, most were missing key details needed for reproducibility, and there was no consistency in design or reporting. The 10 studies developing or evaluating NLP methods used document classification or entity extraction, and 4 compared NLP methods to non-NLP methods. The quality of reporting on NLP studies in dentistry has modestly improved over time. CONCLUSIONS Study design heterogeneity and incomplete reporting of studies currently limits our ability to synthesise NLP applications in dental records. Standardisation of reporting and improved connections between NLP methods and applied NLP in dentistry may improve how we can make use of clinical notes from dentistry in population health or decision support systems. PROTOCOL REGISTRATION PROSPERO CRD42021227823.
Collapse
Affiliation(s)
- Farhana Pethani
- Biomedical Informatics and Digital Health, Faculty of Medicine and Health, the University of Sydney, Sydney, Australia
| | - Adam G Dunn
- Biomedical Informatics and Digital Health, Faculty of Medicine and Health, the University of Sydney, Sydney, Australia.
| |
Collapse
|
32
|
Robinson T, Condell J, Ramsey E, Leavey G. Self-Management of Subclinical Common Mental Health Disorders (Anxiety, Depression and Sleep Disorders) Using Wearable Devices. INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH 2023; 20:ijerph20032636. [PMID: 36768002 PMCID: PMC9916237 DOI: 10.3390/ijerph20032636] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/22/2022] [Revised: 01/21/2023] [Accepted: 01/28/2023] [Indexed: 05/05/2023]
Abstract
RATIONALE Common mental health disorders (CMD) (anxiety, depression, and sleep disorders) are among the leading causes of disease burden globally. The economic burden associated with such disorders is estimated at $2.4 trillion as of 2010 and is expected to reach $16 trillion by 2030. The UK has observed a 21-fold increase in the economic burden associated with CMD over the past decade. The recent COVID-19 pandemic was a catalyst for adopting technologies for mental health support and services, thereby increasing the reception of personal health data and wearables. Wearables hold considerable promise to empower users concerning the management of subclinical common mental health disorders. However, there are significant challenges to adopting wearables as a tool for the self-management of the symptoms of common mental health disorders. AIMS This review aims to evaluate the potential utility of wearables for the self-management of sub-clinical anxiety and depressive mental health disorders. Furthermore, we seek to understand the potential of wearables to reduce the burden on the healthcare system. METHODOLOGY a systematic review of research papers was conducted, focusing on wearable devices for the self-management of CMD released between 2018-2022, focusing primarily on mental health management using technology. RESULTS We screened 445 papers and analysed the reports from 12 wearable devices concerning their device type, year, biometrics used, and machine learning algorithm deployed. Electrodermal activity (EDA/GSR/SC/Skin Temperature), physical activity, and heart rate (HR) are the most common biometrics with nine, six and six reference counts, respectively. Additionally, while smartwatches have greater penetration and integration within the marketplace, fitness trackers have the most significant public value benefit of £513.9 M, likely due to greater retention.
Collapse
Affiliation(s)
- Tony Robinson
- School of Computing, Engineering, and Intelligent Systems, Ulster University, Magee Campus, Derry/Londonderry BT48 7JL, UK
- Correspondence:
| | - Joan Condell
- School of Computing, Engineering, and Intelligent Systems, Ulster University, Magee Campus, Derry/Londonderry BT48 7JL, UK
| | - Elaine Ramsey
- Department of Global Business and Enterprise, Ulster University, Magee Campus, Derry/Londonderry BT48 7JL, UK
| | - Gerard Leavey
- The Bamford Centre for Mental Health and Wellbeing, School of Psychology, Ulster University, Coleraine Campus, Cromore Rd., Coleraine BT52 1SA, UK
| |
Collapse
|
33
|
Komkov AA, Mazaev VP, Ryazanova SV, Kobak AA, Bazaeva EV, Samochatov DN, Koshkina EV, Bushueva ЕV, Drapkina OM. Application of the program for artificial intelligence analytics of paper text and segmentation by specified parameters in clinical practice. КАРДИОВАСКУЛЯРНАЯ ТЕРАПИЯ И ПРОФИЛАКТИКА 2023. [DOI: 10.15829/1728-8800-2022-3458] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
The development of novel technologies using elements of artificial intelligence (AI) in medicine is addressed to practical clinical implementation and provision of key issues, including improvement in the use of routine clinical data, aimed at practical relevance, standardization, confidentiality and patient safety.Aim. To evaluate the effectiveness of the RuPatient electronic heart record (EHR) system in real clinical practice for extracting and structuring medical data.Material and methods. Extraction and recognition of data using EHR from various following sources: outpatient records, statements, routine medical reports, epicrisis and other structured and unstructured medical information based on the developed technology of intelligent text analytics, optical character recognition, for specified words and phrases, and the use of machine learning elements. A particular criterion for evaluating the effectiveness of EHR is the time spent on filling out electronic medical records compared to real clinical practice.Results. The time of entering and processing information by the recognition system of medical documentation included in the RuPatient EHR was shorter than in standard practice (20,3±1,4 minutes, 25,1±1,5 minutes, respectively, p<0,001), the average time of recognition of documents was 30±4,3 seconds. During the ROC analysis, we determined that the threshold value that allows high accuracy to recognize images of discharge epicrisis using the RuPatient system was 83,5% with an area under the curve (AUC) value of 0,76.Conclusions. The developed RuPatient EHR has a medical documentation recognition module for creating structured data based on AI technology elements and can be used in creating an electronic medical history and accumulation of structured data for the implementation of tasks for the practical and scientific use of big data and AI projects in medicine. When using the RuPatient system, the burden on medical staff during document management can be reduced and access to primary medical information simplified.
Collapse
Affiliation(s)
- A. A. Komkov
- National Medical Research Center for Therapy and Preventive Medicine; L.A. Vorokhobov City Clinical Hospital № 67
| | - V. P. Mazaev
- National Medical Research Center for Therapy and Preventive Medicine
| | - S. V. Ryazanova
- National Medical Research Center for Therapy and Preventive Medicine
| | | | - E. V. Bazaeva
- National Medical Research Center for Therapy and Preventive Medicine
| | | | | | | | - O. M. Drapkina
- National Medical Research Center for Therapy and Preventive Medicine
| |
Collapse
|
34
|
Jeon E, Kim A, Lee J, Heo H, Lee H, Woo K. Developing a Classification Algorithm for Prediabetes Risk Detection From Home Care Nursing Notes: Using Natural Language Processing. Comput Inform Nurs 2023:00024665-990000000-00087. [PMID: 37165830 DOI: 10.1097/cin.0000000000001000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
This study developed and validated a rule-based classification algorithm for prediabetes risk detection using natural language processing from home care nursing notes. First, we developed prediabetes-related symptomatic terms in English and Korean. Second, we used natural language processing to preprocess the notes. Third, we created a rule-based classification algorithm with 31 484 notes, excluding 315 instances of missing data. The final algorithm was validated by measuring accuracy, precision, recall, and the F1 score against a gold standard testing set (400 notes). The developed terms comprised 11 categories and 1639 words in Korean and 1181 words in English. Using the rule-based classification algorithm, 42.2% of the notes comprised one or more prediabetic symptoms. The algorithm achieved high performance when applied to the gold standard testing set. We proposed a rule-based natural language processing algorithm to optimize the classification of the prediabetes risk group, depending on whether the home care nursing notes contain prediabetes-related symptomatic terms. Tokenization based on white space and the rule-based algorithm were brought into effect to detect the prediabetes symptomatic terms. Applying this algorithm to electronic health records systems will increase the possibility of preventing diabetes onset through early detection of risk groups and provision of tailored intervention.
Collapse
Affiliation(s)
- Eunjoo Jeon
- Author Affiliations: Technology Research, SamsungSDS (Dr Jeon); College of Nursing, Seoul National University (Mss Kim, J. Lee, and H. Lee and Dr Woo); and Seoul National University Hospital (Ms Heo), Seoul, South Korea
| | | | | | | | | | | |
Collapse
|
35
|
Okada A, Tsuchihashi-Makaya M, Nagao N, Ochiai R. Somatic Changes Perceived by Patients With Heart Failure During Acute Exacerbation: A Qualitative Study Using Text Mining. J Cardiovasc Nurs 2023; 38:23-32. [PMID: 35467568 DOI: 10.1097/jcn.0000000000000915] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
BACKGROUND Patients with heart failure (HF) often inadequately perceive their symptoms. This may be because the medical terms do not match the somatic changes experienced by patients. To improve symptom perception, healthcare professionals must understand the somatic changes as perceived by patients. OBJECTIVE This study aims to analyze patients' narratives about somatic changes in patients with HF by text mining and to clarify the overall description of somatic changes using patients' expressions. METHODS Semistructured interviews were conducted on 21 patients hospitalized for acute exacerbation of HF. Qualitative data obtained from the interviews were analyzed by content analysis through text mining. RESULTS Among the 21 patients, 76.2% were men. The mean (SD) age was 71.3 (13.7) years. The most frequently used terms were "breath," "distressed," "feet," and " ha-ha (gasping sound)" (46, 40, 29, and 28 times, respectively). The somatic changes noticed by patients could be categorized into medical jargon such as "dyspnea on exertion," "exercise intolerance," "fatigue," "paroxysmal nocturnal dyspnea," "frequent urination," "increased sputum," "weight gain," "feet and face edema," "abdominal edema," and "ankle edema." However, the expressions of somatic changes used by the patients were diverse. CONCLUSIONS The findings of patient-specific expressions of symptoms suggest that there is a need to assess symptoms not only using medical jargon but also by focusing on patient-specific expressions.
Collapse
|
36
|
A Natural Language Interface for Automatic Generation of Data Flow Diagram using Web Extraction Techniques. JOURNAL OF KING SAUD UNIVERSITY - COMPUTER AND INFORMATION SCIENCES 2023. [DOI: 10.1016/j.jksuci.2023.01.006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
|
37
|
Binsfeld Gonçalves L, Nesic I, Obradovic M, Stieltjes B, Weikert T, Bremerich J. Natural Language Processing and Graph Theory: Making Sense of Imaging Records in a Novel Representation Frame. JMIR Med Inform 2022; 10:e40534. [PMID: 36542426 PMCID: PMC9813822 DOI: 10.2196/40534] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2022] [Revised: 09/13/2022] [Accepted: 11/30/2022] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND A concise visualization framework of related reports would increase readability and improve patient management. To this end, temporal referrals to prior comparative exams are an essential connection to previous exams in written reports. Due to unstructured narrative texts' variable structure and content, their extraction is hampered by poor computer readability. Natural language processing (NLP) permits the extraction of structured information from unstructured texts automatically and can serve as an essential input for such a novel visualization framework. OBJECTIVE This study proposes and evaluates an NLP-based algorithm capable of extracting the temporal referrals in written radiology reports, applies it to all the radiology reports generated for 10 years, introduces a graphical representation of imaging reports, and investigates its benefits for clinical and research purposes. METHODS In this single-center, university hospital, retrospective study, we developed a convolutional neural network capable of extracting the date of referrals from imaging reports. The model's performance was assessed by calculating precision, recall, and F1-score using an independent test set of 149 reports. Next, the algorithm was applied to our department's radiology reports generated from 2011 to 2021. Finally, the reports and their metadata were represented in a modulable graph. RESULTS For extracting the date of referrals, the named-entity recognition (NER) model had a high precision of 0.93, a recall of 0.95, and an F1-score of 0.94. A total of 1,684,635 reports were included in the analysis. Temporal reference was mentioned in 53.3% (656,852/1,684,635), explicitly stated as not available in 21.0% (258,386/1,684,635), and omitted in 25.7% (317,059/1,684,635) of the reports. Imaging records can be visualized in a directed and modulable graph, in which the referring links represent the connecting arrows. CONCLUSIONS Automatically extracting the date of referrals from unstructured radiology reports using deep learning NLP algorithms is feasible. Graphs refined the selection of distinct pathology pathways, facilitated the revelation of missing comparisons, and enabled the query of specific referring exam sequences. Further work is needed to evaluate its benefits in clinics, research, and resource planning.
Collapse
Affiliation(s)
- Laurent Binsfeld Gonçalves
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Ivan Nesic
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Marko Obradovic
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Bram Stieltjes
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Thomas Weikert
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| | - Jens Bremerich
- Clinic of Radiology & Nuclear Medicine, University Hospital Basel, University of Basel, Basel, Switzerland
| |
Collapse
|
38
|
Fang C, Markuzon N, Patel N, Rueda JD. Natural Language Processing for Automated Classification of Qualitative Data From Interviews of Patients With Cancer. VALUE IN HEALTH : THE JOURNAL OF THE INTERNATIONAL SOCIETY FOR PHARMACOECONOMICS AND OUTCOMES RESEARCH 2022; 25:1995-2002. [PMID: 35840523 DOI: 10.1016/j.jval.2022.06.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2021] [Revised: 05/19/2022] [Accepted: 06/12/2022] [Indexed: 06/15/2023]
Abstract
OBJECTIVES This study sought to explore the use of novel natural language processing (NLP) methods for classifying unstructured, qualitative textual data from interviews of patients with cancer to identify patient-reported symptoms and impacts on quality of life. METHODS We tested the ability of 4 NLP models to accurately classify text from interview transcripts as "symptom," "quality of life impact," and "other." Interview data sets from patients with hepatocellular carcinoma (HCC) (n = 25), biliary tract cancer (BTC) (n = 23), and gastric cancer (n = 24) were used. Models were cross-validated with transcript subsets designated for training, validation, and testing. Multiclass classification performance of the 4 models was evaluated at paragraph and sentence level using the HCC testing data set and analyzed by the one-versus-rest technique quantified by the receiver operating characteristic area under the curve (ROC AUC) score. RESULTS NLP models accurately classified multiclass text from patient interviews. The Bidirectional Encoder Representations from Transformers model generally outperformed all other models at paragraph and sentence level. The highest predictive performance of the Bidirectional Encoder Representations from Transformers model was observed using the HCC data set to train and BTC data set to test (mean ROC AUC, 0.940 [SD 0.028]), with similarly high predictive performance using balanced and imbalanced training data sets from BTC and gastric cancer populations. CONCLUSIONS NLP models were accurate in predicting multiclass classification of text from interviews of patients with cancer, with most surpassing 0.9 ROC AUC at paragraph level. NLP may be a useful tool for scaling up processing of patient interviews in clinical studies and, thus, could serve to facilitate patient input into drug development and improving patient care.
Collapse
Affiliation(s)
- Chao Fang
- Oncology Biometrics ML/AI, AstraZeneca, Waltham, MA, USA
| | | | - Nikunj Patel
- US Medical Affairs, AstraZeneca, Gaithersburg, MD, USA
| | - Juan-David Rueda
- Oncology Market Access and Pricing, AstraZeneca, Gaithersburg, MD, USA
| |
Collapse
|
39
|
Benda NC, Rogers C, Sharma M, Narain W, Diamond LC, Ancker J, Seier K, Stetson PD, Sulieman L, Armstrong M, Peng Y. Identifying Nonpatient Authors of Patient Portal Secure Messages in Oncology: A Proof-of-Concept Demonstration of Natural Language Processing Methods. JCO Clin Cancer Inform 2022; 6:e2200071. [PMID: 36542818 PMCID: PMC10476725 DOI: 10.1200/cci.22.00071] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2022] [Revised: 10/03/2022] [Accepted: 10/26/2022] [Indexed: 12/24/2022] Open
Abstract
PURPOSE Patient portal secure messages are not always authored by the patient account holder. Understanding who authored the message is particularly important in an oncology setting where symptom reporting is crucial to patient treatment. Natural language processing has the potential to detect messages not authored by the patient automatically. METHODS Patient portal secure messages from the Memorial Sloan Kettering Cancer Center were retrieved and manually annotated as a predicted unregistered proxy (ie, not written by the patient) or a presumed patient. After randomly splitting the annotated messages into training and test sets in a 70:30 ratio, a bag-of-words approach was used to extract features and then a Least Absolute Shrinkage and Selection Operator (LASSO) model was trained and used for classification. RESULTS Portal secure messages (n = 2,000) were randomly selected from unique patient accounts and manually annotated. We excluded 335 messages from the data set as the annotators could not determine if they were written by a patient or proxy. Using the remaining 1,665 messages, a LASSO model was developed that achieved an area under the curve of 0.932 and an area under the precision recall curve of 0.748. The sensitivity and specificity related to classifying true-positive cases (predicted unregistered proxy-authored messages) and true negatives (presumed patient-authored messages) were 0.681 and 0.960, respectively. CONCLUSION Our work demonstrates the feasibility of using unstructured, heterogenous patient portal secure messages to determine portal secure message authorship. Identifying patient authorship in real time can improve patient portal account security and can be used to improve the quality of the information extracted from the patient portal, such as patient-reported outcomes.
Collapse
Affiliation(s)
- Natalie C. Benda
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
| | - Christopher Rogers
- Department of Health Informatics, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Mohit Sharma
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
| | - Wazim Narain
- Department of Health Informatics, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Lisa C. Diamond
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
- Immigrant Health and Cancer Disparities Service, Department of Psychiatry and Behavioral Sciences, Memorial Sloan Kettering Cancer Center, New York, NY
- Hospital Medicine Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Jessica Ancker
- Hospital Medicine Service, Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Kenneth Seier
- Department of Epidemiology- Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Peter D. Stetson
- Department of Health Informatics, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Lina Sulieman
- Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN
| | - Misha Armstrong
- Department of Surgery, New York Presbyterian-Weill Cornell Medicine, New York, NY
| | - Yifan Peng
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
| |
Collapse
|
40
|
Krauer F, Schmid BV. Mapping the plague through natural language processing. Epidemics 2022; 41:100656. [PMID: 36410316 DOI: 10.1016/j.epidem.2022.100656] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 05/27/2022] [Accepted: 11/12/2022] [Indexed: 11/19/2022] Open
Abstract
Pandemic diseases such as plague have produced a vast amount of literature providing information about the spatiotemporal extent, transmission, or countermeasures. However, the manual extraction of such information from running text is a tedious process, and much of this information remains locked into a narrative format. Natural Language processing (NLP) is a promising tool for the automated extraction of epidemiological data, and can facilitate the establishment of datasets. In this paper, we explore the utility of NLP to assist in the creation of a plague outbreak dataset. We produced a gold standard list of toponyms by manual annotation of a German plague treatise published by Sticker in 1908. We investigated the performance of five pre-trained NLP libraries (Google, Stanford CoreNLP, spaCy, germaNER and Geoparser) for the automated extraction of location data compared to the gold standard. Of all tested algorithms, spaCy performed best (sensitivity 0.92, F1 score 0.83), followed closely by Stanford CoreNLP (sensitivity 0.81, F1 score 0.87). Google NLP had a slightly lower performance (F1 score 0.72, sensitivity 0.78). Geoparser and germaNER had a poor sensitivity (0.41 and 0.61). We then evaluated how well automated geocoding services such as Google geocoding, Geonames and Geoparser located these outbreaks correctly. All geocoding services performed poorly - particularly for historical regions - and returned the correct GIS information only in 60.4%, 52.7% and 33.8% of all cases. Finally, we compared our newly digitized plague dataset to a re-digitized version of the plague treatise by Biraben and provide an update of the spatio-temporal extent of the second pandemic plague outbreaks. We conclude that NLP tools have their limitations, but they are potentially useful to accelerate the collection of data and the generation of a global plague outbreak database.
Collapse
Affiliation(s)
- Fabienne Krauer
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, 0316 Oslo, Norway.
| | - Boris V Schmid
- Centre for Ecological and Evolutionary Synthesis, Department of Biosciences, University of Oslo, 0316 Oslo, Norway
| |
Collapse
|
41
|
Bakken S, Dreisbach C. Informatics and data science perspective on Future of Nursing 2020-2030: Charting a pathway to health equity. Nurs Outlook 2022; 70:S77-S87. [PMID: 36446542 DOI: 10.1016/j.outlook.2022.04.004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2021] [Revised: 04/16/2022] [Accepted: 04/21/2022] [Indexed: 11/27/2022]
Abstract
The Future of Nursing 2020 to 2030 report explicitly addresses the need for integration of nursing expertise in designing, generating, analyzing, and applying data to support initiatives focused on social determinants of health (SDOH) and health equity. The metrics necessary to enable and evaluate progress on all recommendations require harnessing existing data sources and developing new ones, as well as transforming and integrating data into information systems to facilitate communication, information sharing, and decision making among the key stakeholders. We examine the recommendations of the 2021 report through an interdisciplinary lens that integrates nursing, biomedical informatics, and data science by addressing three critical questions: (a) what data are needed?, (b) what infrastructure and processes are needed to transform data into information?, and (c) what information systems are needed to "level up" nurse-led interventions from the micro-level to the meso- and macro-levels to address social determinants of health and advance health equity?
Collapse
Affiliation(s)
- Suzanne Bakken
- School of Nursing, Columbia University, New York, NY 10032, United States; Department of Biomedical Informatics, Columbia University, New York, NY, United States; Data Science Institute, Columbia University, New York, NY, United States.
| | - Caitlin Dreisbach
- Data Science Institute, Columbia University, New York, NY, United States
| |
Collapse
|
42
|
Carrillo-Larco RM, Castillo-Cara M, Lovón-Melgarejo J. Government plans in the 2016 and 2021 Peruvian presidential elections: A natural language processing analysis of the health chapters. Wellcome Open Res 2022. [DOI: 10.12688/wellcomeopenres.16867.5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Background: While clinical medicine has exploded, electronic health records for Natural Language Processing (NLP) analyses, public health, and health policy research have not yet adopted these algorithms. We aimed to dissect the health chapters of the government plans of the 2016 and 2021 Peruvian presidential elections, and to compare different NLP algorithms. Methods: From the government plans (18 in 2016; 19 in 2021) we extracted each sentence from the health chapters. We used five NLP algorithms to extract keywords and phrases from each plan: Term Frequency–Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), TextRank, Keywords Bidirectional Encoder Representations from Transformers (KeyBERT), and Rapid Automatic Keywords Extraction (Rake). Results: In 2016 we analysed 630 sentences, whereas in 2021 there were 1,685 sentences. The TF-IDF algorithm showed that in 2016, 26 terms appeared with a frequency of 0.08 or greater, while in 2021 27 terms met this criterion. The LDA algorithm defined two groups. The first included terms related to things the population would receive (e.g., ’insurance’), while the second included terms about the health system (e.g., ’capacity’). In 2021, most of the government plans belonged to the second group. The TextRank analysis provided keywords showing that ’universal health coverage’ appeared frequently in 2016, while in 2021 keywords about the COVID-19 pandemic were often found. The KeyBERT algorithm provided keywords based on the context of the text. These keywords identified some underlying characteristics of the political party (e.g., political spectrum such as left-wing). The Rake algorithm delivered phrases, in which we found ’universal health coverage’ in 2016 and 2021. Conclusion: The NLP analysis could be used to inform on the underlying priorities in each government plan. NLP analysis could also be included in research of health policies and politics during general elections and provide informative summaries for the general population.
Collapse
|
43
|
Carrillo-Larco RM, Castillo-Cara M, Lovón-Melgarejo J. Government plans in the 2016 and 2021 Peruvian presidential elections: A natural language processing analysis of the health chapters. Wellcome Open Res 2022; 6:177. [PMID: 39931661 PMCID: PMC11809155 DOI: 10.12688/wellcomeopenres.16867.3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 10/17/2022] [Indexed: 02/13/2025] Open
Abstract
Background: While clinical medicine has exploded, electronic health records for Natural Language Processing (NLP) analyses, public health, and health policy research have not yet adopted these algorithms. We aimed to dissect the health chapters of the government plans of the 2016 and 2021 Peruvian presidential elections, and to compare different NLP algorithms. Methods: From the government plans (18 in 2016; 19 in 2021) we extracted each sentence from the health chapters. We used five NLP algorithms to extract keywords and phrases from each plan: Term Frequency-Inverse Document Frequency (TF-IDF), Latent Dirichlet Allocation (LDA), TextRank, Keywords Bidirectional Encoder Representations from Transformers (KeyBERT), and Rapid Automatic Keywords Extraction (Rake). Results: In 2016 we analysed 630 sentences, whereas in 2021 there were 1,685 sentences. The TF-IDF algorithm showed that in 2016, 26 terms appeared with a frequency of 0.08 or greater, while in 2021 27 terms met this criterion. The LDA algorithm defined two groups. The first included terms related to things the population would receive (e.g., 'insurance'), while the second included terms about the health system (e.g., 'capacity'). In 2021, most of the government plans belonged to the second group. The TextRank analysis provided keywords showing that 'universal health coverage' appeared frequently in 2016, while in 2021 keywords about the COVID-19 pandemic were often found. The KeyBERT algorithm provided keywords based on the context of the text. These keywords identified some underlying characteristics of the political party (e.g., political spectrum such as left-wing). The Rake algorithm delivered phrases, in which we found 'universal health coverage' in 2016 and 2021. Conclusions: The NLP analysis could be used to inform on the underlying priorities in each government plan. NLP analysis could also be included in research of health policies and politics during general elections and provide informative summaries for the general population.
Collapse
Affiliation(s)
- Rodrigo M. Carrillo-Larco
- Department of Epidemiology and Biostatistics, Imperial College London, London, SW7 2AZ, UK
- CRONICAS Centre of Excellence in Chronic Diseases, Universidad Peruana Cayetano Heredia, Lima, Peru
| | | | | |
Collapse
|
44
|
Martinez-Millana A, Saez-Saez A, Tornero-Costa R, Azzopardi-Muscat N, Traver V, Novillo-Ortiz D. Artificial intelligence and its impact on the domains of universal health coverage, health emergencies and health promotion: An overview of systematic reviews. Int J Med Inform 2022; 166:104855. [PMID: 35998421 PMCID: PMC9551134 DOI: 10.1016/j.ijmedinf.2022.104855] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2022] [Revised: 08/01/2022] [Accepted: 08/11/2022] [Indexed: 12/04/2022]
Abstract
BACKGROUND Artificial intelligence is fueling a new revolution in medicine and in the healthcare sector. Despite the growing evidence on the benefits of artificial intelligence there are several aspects that limit the measure of its impact in people's health. It is necessary to assess the current status on the application of AI towards the improvement of people's health in the domains defined by WHO's Thirteenth General Programme of Work (GPW13) and the European Programme of Work (EPW), to inform about trends, gaps, opportunities, and challenges. OBJECTIVE To perform a systematic overview of systematic reviews on the application of artificial intelligence in the people's health domains as defined in the GPW13 and provide a comprehensive and updated map on the application specialties of artificial intelligence in terms of methodologies, algorithms, data sources, outcomes, predictors, performance, and methodological quality. METHODS A systematic search in MEDLINE, EMBASE, Cochrane and IEEEXplore was conducted between January 2015 and June 2021 to collect systematic reviews using a combination of keywords related to the domains of universal health coverage, health emergencies protection, and better health and wellbeing as defined by the WHO's PGW13 and EPW. Eligibility criteria was based on methodological quality and the inclusion of practical implementation of artificial intelligence. Records were classified and labeled using ICD-11 categories into the domains of the GPW13. Descriptors related to the area of implementation, type of modeling, data entities, outcomes and implementation on care delivery were extracted using a structured form and methodological aspects of the included reviews studies was assessed using the AMSTAR checklist. RESULTS The search strategy resulted in the screening of 815 systematic reviews from which 203 were assessed for eligibility and 129 were included in the review. The most predominant domain for artificial intelligence applications was Universal Health Coverage (N = 98) followed by Health Emergencies (N = 16) and Better Health and Wellbeing (N = 15). Neoplasms area on Universal Health Coverage was the disease area featuring most of the applications (21.7 %, N = 28). The reviews featured analytics primarily over both public and private data sources (67.44 %, N = 87). The most used type of data was medical imaging (31.8 %, N = 41) and predictors based on regions of interest and clinical data. The most prominent subdomain of Artificial Intelligence was Machine Learning (43.4 %, N = 56), in which Support Vector Machine method was predominant (20.9 %, N = 27). Regarding the purpose, the application of Artificial Intelligence I is focused on the prediction of the diseases (36.4 %, N = 47). With respect to the validation, more than a half of the reviews (54.3 %, N = 70) did not report a validation procedure and, whenever available, the main performance indicator was the accuracy (28.7 %, N = 37). According to the methodological quality assessment, a third of the reviews (34.9 %, N = 45) implemented methods for analysis the risk of bias and the overall AMSTAR score below was 5 (4.01 ± 1.93) on all the included systematic reviews. CONCLUSION Artificial intelligence is being used for disease modelling, diagnose, classification and prediction in the three domains of GPW13. However, the evidence is often limited to laboratory and the level of adoption is largely unbalanced between ICD-11 categoriesand diseases. Data availability is a determinant factor on the developmental stage of artificial intelligence applications. Most of the reviewed studies show a poor methodological quality and are at high risk of bias, which limits the reproducibility of the results and the reliability of translating these applications to real clinical scenarios. The analyzed papers show results only in laboratory and testing scenarios and not in clinical trials nor case studies, limiting the supporting evidence to transfer artificial intelligence to actual care delivery.
Collapse
Affiliation(s)
- Antonio Martinez-Millana
- Instituto Universitario de Investigación de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas (ITACA), Universitat Politècnica de València, Camino de Vera S/N, Valencia 46022, Spain
| | - Aida Saez-Saez
- Instituto Universitario de Investigación de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas (ITACA), Universitat Politècnica de València, Camino de Vera S/N, Valencia 46022, Spain
| | - Roberto Tornero-Costa
- Instituto Universitario de Investigación de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas (ITACA), Universitat Politècnica de València, Camino de Vera S/N, Valencia 46022, Spain
| | - Natasha Azzopardi-Muscat
- Division of Country Health Policies and Systems, World Health Organization, Regional Office for Europe, Copenhagen, Denmark
| | - Vicente Traver
- Instituto Universitario de Investigación de Aplicaciones de las Tecnologías de la Información y de las Comunicaciones Avanzadas (ITACA), Universitat Politècnica de València, Camino de Vera S/N, Valencia 46022, Spain
| | - David Novillo-Ortiz
- Division of Country Health Policies and Systems, World Health Organization, Regional Office for Europe, Copenhagen, Denmark.
| |
Collapse
|
45
|
Sprint G, Cook DJ, Schmitter-Edgecombe M, Holder LB. Multimodal Fusion of Smart Home and Text-based Behavior Markers for Clinical Assessment Prediction. ACM TRANSACTIONS ON COMPUTING FOR HEALTHCARE 2022; 3:41. [PMID: 36381500 PMCID: PMC9645787 DOI: 10.1145/3531231] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/01/2021] [Accepted: 04/11/2022] [Indexed: 01/27/2023]
Abstract
New modes of technology are offering unprecedented opportunities to unobtrusively collect data about people's behavior. While there are many use cases for such information, we explore its utility for predicting multiple clinical assessment scores. Because clinical assessments are typically used as screening tools for impairment and disease, such as mild cognitive impairment (MCI), automatically mapping behavioral data to assessment scores can help detect changes in health and behavior across time. In this paper, we aim to extract behavior markers from two modalities, a smart home environment and a custom digital memory notebook app, for mapping to ten clinical assessments that are relevant for monitoring MCI onset and changes in cognitive health. Smart home-based behavior markers reflect hourly, daily, and weekly activity patterns, while app-based behavior markers reflect app usage and writing content/style derived from free-form journal entries. We describe machine learning techniques for fusing these multimodal behavior markers and utilizing joint prediction. We evaluate our approach using three regression algorithms and data from 14 participants with MCI living in a smart home environment. We observed moderate to large correlations between predicted and ground-truth assessment scores, ranging from r = 0.601 to r = 0.871 for each clinical assessment.
Collapse
Affiliation(s)
- Gina Sprint
- Department of Computer Science, Gonzaga University
| | - Diane J Cook
- School of Electrical Engineering and Computer Science, Washington State University
| | | | - Lawrence B Holder
- School of Electrical Engineering and Computer Science, Washington State University
| |
Collapse
|
46
|
Abdulmohsin HA, Al-Khateeb B, Hasan SS, Dwivedi R. Automatic illness prediction system through speech. COMPUTERS & ELECTRICAL ENGINEERING : AN INTERNATIONAL JOURNAL 2022; 102:108224. [PMID: 35880184 PMCID: PMC9302036 DOI: 10.1016/j.compeleceng.2022.108224] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2022] [Revised: 06/29/2022] [Accepted: 07/08/2022] [Indexed: 06/15/2023]
Abstract
Due to the COVID-19 epidemic and the curfew caused by it, many people have sought to find an ADPS on the internet in the last few years. This hints to a new age of medical treatment, all the more so if the number of internet users continues to expand. As a result, automatic illness prediction online applications have attracted the interest of a large number of researchers worldwide. This work aims to develop and implement an automated illness prediction system based on speech. The system will be designed to forecast the sort of ailment a patient is suffering from based on his voice, but this was not feasible during the trial, therefore the diseases were divided into three categories (painful, light pain and psychological pain), and then the diagnose process were implemented accordingly. The medical dataset named "speech, transcription, and intent" served as the baseline for this study. The smoothness, MFCC, and SCV properties were used in this work, which demonstrated their high representation to human being medical situations. The noise reduction forward-backward filter was used to eliminate noise from wave files captured online in order to account for the high level of noise seen in the deployed dataset. For this study, a hybrid feature selection method was created and built that combined the output of a genetic algorithm (GA) with the inputs of a NN algorithm. Classification was performed using SVM, neural network, and GMM. The greatest results obtained were 94.55% illness classification accuracy in terms of SVM. The results showed that diagnosing illness through speech is a difficult process, especially when diagnosing each type of illness separately, but when grouping the different illness types into groups, depending on the amount of pain and the psychological situation of the patient, the results were much higher.
Collapse
Key Words
- ADPS, Automated Disease Prediction System
- Automatic disease prediction
- CPU, Central Processing Unit
- Forward-backward filter
- GA, Genetic Algorithm
- GB, Giga Byte
- GMM, Gaussian Mixture Model
- MFCC, Mel Frequency Cepstral Co-efficient
- Medical speech transcription and intent dataset
- Mel frequency Cepstral coefficient
- NN, Neural Network
- Neural network
- RAM, Random Access Memory
- RSM, Response Service Methodology
- SCV, Spectral Centroid Variability
- SVM, Support Vector Machine
- Spectral centroid variability
Collapse
Affiliation(s)
- Husam Ali Abdulmohsin
- Computer Science Department, College of Science, University of Baghdad, Baghdad, Iraq
| | - Belal Al-Khateeb
- College of Computer Science and information Technology, University of Anbar, Anbar, Iraq
| | - Samer Sami Hasan
- Computer Science Department, College of Science, University of Baghdad, Baghdad, Iraq
| | - Rinky Dwivedi
- Maharaja Surajmal Institute of Technology, New Delhi, India
| |
Collapse
|
47
|
Li Z, Wang X, Xu M, Li Y, Wang Y, Chen Y, Li S, Li Z, Yang J, Tang C, Xiong F, Jian W, He P, Zhan Y, Zheng J, Ye F. Development and clinical application of an electronic health record quality control system for pulmonary aspergillosis based on guidelines and natural language processing technology. J Thorac Dis 2022; 14:3398-3407. [PMID: 36245604 PMCID: PMC9562533 DOI: 10.21037/jtd-22-532] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2022] [Accepted: 08/19/2022] [Indexed: 11/26/2022]
Abstract
Background There are considerable differences in the diagnosis and treatment of pulmonary aspergillosis (PA) between specialized hospitals and primary hospitals or developed areas and underdeveloped areas in China. There is a lack of electronic systems that assist respiratory physicians in standardizing the diagnosis and treatment of PA. Methods We extracted 26 quality control points from the latest guidelines related to PA, and developed a PA quality control system of electronic health record (EHR) based on natural language processing (NLP) techniques. We obtained PA patient records in the Department of Respiratory Medicine of the First Affiliated Hospital of Guangzhou Medical University to verify the effectiveness of the system comparing with manually evaluation of respiratory experts. Results We successfully developed quality control system of PA; 699 PA medical records from EHR of the First Affiliated Hospital of Guangzhou Medical University between January 2015 and March 2020 were obtained and assessed by the system; 162 defects were found, which included 19 medical records with diagnostic defects, 76 medical records with examination defects, and 80 medical records with treatment defects; 200 medical records were sampled for validation, and found that the sensitivity and accuracy of quality control system for pulmonary aspergillosis (QCSA) were 0.99 and 0.96, F1 value was 0.85, and the recall rate was 0.77 compared with experts' evaluation. Conclusions Our system successfully uses medical guidelines and NLP technology to detect defects in the diagnosis and treatment of PA, which helps to improve the management quality of PA patients.
Collapse
Affiliation(s)
- Zhengtu Li
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Xidong Wang
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Mengke Xu
- Guangzhou Tianpeng Technology Co., Ltd., Guangzhou, China
| | - Yongming Li
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Yinguang Wang
- Guangzhou Tianpeng Technology Co., Ltd., Guangzhou, China
| | - Yijun Chen
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Shaoqiang Li
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Zhun Li
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Jinglu Yang
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Chun Tang
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Fangshu Xiong
- Guangzhou Tianpeng Technology Co., Ltd., Guangzhou, China
| | - Wenhua Jian
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Peimei He
- Guangzhou Tianpeng Technology Co., Ltd., Guangzhou, China
| | - Yangqing Zhan
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Jinping Zheng
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| | - Feng Ye
- State Key Laboratory of Respiratory Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, the First Affiliated Hospital of Guangzhou Medical University, Guangzhou, China
| |
Collapse
|
48
|
Darzidehkalani E, Ghasemi-Rad M, van Ooijen PMA. Federated Learning in Medical Imaging: Part I: Toward Multicentral Health Care Ecosystems. J Am Coll Radiol 2022; 19:969-974. [PMID: 35483439 DOI: 10.1016/j.jacr.2022.03.015] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2021] [Revised: 03/28/2022] [Accepted: 03/29/2022] [Indexed: 11/28/2022]
Abstract
With recent developments in medical imaging facilities, extensive medical imaging data are produced every day. This increasing amount of data provides an opportunity for researchers to develop data-driven methods and deliver better health care. However, data-driven models require a large amount of data to be adequately trained. Furthermore, there is always a limited amount of data available in each data center. Hence, deep learning models trained on local data centers might not reach their total performance capacity. One solution could be to accumulate all data from different centers into one center. However, data privacy regulations do not allow medical institutions to easily combine their data, and this becomes increasingly difficult when institutions from multiple countries are involved. Another solution is to use privacy-preserving algorithms, which can make use of all the data available in multiple centers while keeping the sensitive data private. Federated learning (FL) is such a mechanism that enables deploying large-scale machine learning models trained on different data centers without sharing sensitive data. In FL, instead of transferring data, a general model is trained on local data sets and transferred between data centers. FL has been identified as a promising field of research, with extensive possible uses in medical research and practice. This article introduces FL, with a comprehensive look into its concepts and recent research trends in medical imaging.
Collapse
Affiliation(s)
- Erfan Darzidehkalani
- Department of Radiotherapy, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands; Machine Learning Lab, Data Science Center in Health, University Medical Center Groningen, University of Groningen, the Netherlands.
| | - Mohammad Ghasemi-Rad
- Assistant Professor of Radiology, Department of Interventional Radiology, Baylor College of Medicine, Houston, Texas
| | - P M A van Ooijen
- Department of Radiotherapy, University Medical Center Groningen, University of Groningen, Groningen, the Netherlands; Coordinator Machine Learning Lab, Data Science Center in Health, University Medical Center Groningen, University of Groningen, the Netherlands
| |
Collapse
|
49
|
Boggs JM, Kafka JM. A Critical Review of Text Mining Applications for Suicide Research. CURR EPIDEMIOL REP 2022; 9:126-134. [PMID: 35911089 PMCID: PMC9315081 DOI: 10.1007/s40471-022-00293-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 03/23/2022] [Indexed: 11/28/2022]
Abstract
Purpose of Review Applying text mining to suicide research holds a great deal of promise. In this manuscript, literature from 2019 to 2021 is critically reviewed for text mining projects that use electronic health records, social media data, and death records. Recent Findings Text mining has helped identify risk factors for suicide in general and specific populations (e.g., older adults), has been combined with structured variables in EHRs to predict suicide risk, and has been used to track trends in social media suicidal discourse following population level events (e.g., COVID-19, celebrity suicides). Summary Future research should utilize text mining along with data linkage methods to capture more complete information on risk factors and outcomes across data sources (e.g., combining death records and EHRs), evaluate effectiveness of NLP-based intervention programs that use suicide risk prediction, establish standards for reporting accuracy of text mining programs to enable comparison across studies, and incorporate implementation science to understand feasibility, acceptability, and technical considerations.
Collapse
Affiliation(s)
- Jennifer M Boggs
- Kaiser Permanente Colorado, Institute for Health Research, Aurora, CO USA
| | - Julie M Kafka
- Department of Health Behavior, Gillings School of Global Public Health at University of North Carolina Chapel Hill, Chapel Hill, NC USA
| |
Collapse
|
50
|
Cheng X, Cao Q, Liao SS. An overview of literature on COVID-19, MERS and SARS: Using text mining and latent Dirichlet allocation. J Inf Sci 2022; 48:304-320. [PMID: 38603038 PMCID: PMC7464068 DOI: 10.1177/0165551520954674] [Citation(s) in RCA: 22] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The unprecedented outbreak of COVID-19 is one of the most serious global threats to public health in this century. During this crisis, specialists in information science could play key roles to support the efforts of scientists in the health and medical community for combatting COVID-19. In this article, we demonstrate that information specialists can support health and medical community by applying text mining technique with latent Dirichlet allocation procedure to perform an overview of a mass of coronavirus literature. This overview presents the generic research themes of the coronavirus diseases: COVID-19, MERS and SARS, reveals the representative literature per main research theme and displays a network visualisation to explore the overlapping, similarity and difference among these themes. The overview can help the health and medical communities to extract useful information and interrelationships from coronavirus-related studies.
Collapse
Affiliation(s)
- Xian Cheng
- Business School, Sichuan University, China
| | - Qiang Cao
- Department of Information Systems, City University of Hong Kong, China
| | | |
Collapse
|