1
|
Kuhn D, Harrison NE, Musey PI, Crandall DJ, Pang PS, Welch JL, Harle CA. Preliminary findings regarding the association between patient demographics and ED experience scores across a regional health system: A cross sectional study using natural language processing of patient comments. Int J Med Inform 2025; 195:105748. [PMID: 39671851 DOI: 10.1016/j.ijmedinf.2024.105748] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 11/02/2024] [Accepted: 11/30/2024] [Indexed: 12/15/2024]
Abstract
OBJECTIVE Existing literature shows associations between patient demographics and reported experiences of care, but this relationship is poorly understood. Our objective was to use natural language processing of patient comments to gain insight into associations between patient demographics and experiences of care. METHODS This is a cross-sectional study of 14,848 unique emergency department (ED) patient visits from 1/1/2020 to 12/31/2020. Patients discharged from one of 16 ED sites in a regional health system who filled out a patient experience survey with comments were included. This study had two outcome variables: (1) positive vs. non-positive (negative/neutral) comment sentiment, and (2) promoter vs. non-promoter status (based on NRCHealth's Net Promoter Score; likelihood to recommend of 9 or 10 are considered "promoters", while scores of 8 or below are "non-promoters"). We used natural language processing to sort patient comments into topics and sentiments. Logistic regression with mediation analysis was used to estimate the associations between patient demographics and the following: (1) comments about compassion vs. other topics, (2) positive comments, and (3) patient experience, defined as likelihood to recommend. RESULTS Comments about care and compassion (51 % of total comments) had highly positive sentiment (97 %), compared to mixed sentiment for other topics. Older, male, and Asian patients were more likely to comment on compassion and most likely to make positive comments. Our mediation analysis suggests that the demographic association with positive patient comments and net promoter scores was mediated by their focus on care and compassion as a primary comment theme for their visit. Notably, the overall percentage of patients providing comments was only 1.8 %, raising concerns about whether data currently used for hospital and physician feedback has adequate validity to yield meaningful insights. CONCLUSIONS The increased likelihood of specific patient sub-groups to comment on compassionate care may explain previously reported differences in experience by patient demographics.
Collapse
Affiliation(s)
- Diane Kuhn
- Department of Emergency Medicine, Indiana University School of Medicine, 720 Eskenazi Ave, Indianapolis, IN 46202, USA; Regenstrief Institute, 1101 W 10(th) St, Indianapolis, IN 46202, USA.
| | - Nicholas E Harrison
- Department of Emergency Medicine, Indiana University School of Medicine, 720 Eskenazi Ave, Indianapolis, IN 46202, USA
| | - Paul I Musey
- Department of Emergency Medicine, Indiana University School of Medicine, 720 Eskenazi Ave, Indianapolis, IN 46202, USA; Regenstrief Institute, 1101 W 10(th) St, Indianapolis, IN 46202, USA
| | - David J Crandall
- Luddy School of Informatics, Computing, and Engineering, Indiana University Bloomington, 1015 E. 11(th) St, Bloomington, IN 47408, USA
| | - Peter S Pang
- Department of Emergency Medicine, Indiana University School of Medicine, 720 Eskenazi Ave, Indianapolis, IN 46202, USA; Regenstrief Institute, 1101 W 10(th) St, Indianapolis, IN 46202, USA
| | - Julie L Welch
- Department of Emergency Medicine, Indiana University School of Medicine, 720 Eskenazi Ave, Indianapolis, IN 46202, USA
| | - Christopher A Harle
- Regenstrief Institute, 1101 W 10(th) St, Indianapolis, IN 46202, USA; Department of Health Policy and Management, Indiana University Richard M Fairbanks School of Public Health, 1050 Wishard Blvd, Indianapolis, IN 46202, USA
| |
Collapse
|
2
|
Khosravi P, Mohammadi S, Zahiri F, Khodarahmi M, Zahiri J. AI-Enhanced Detection of Clinically Relevant Structural and Functional Anomalies in MRI: Traversing the Landscape of Conventional to Explainable Approaches. J Magn Reson Imaging 2024; 60:2272-2289. [PMID: 38243677 DOI: 10.1002/jmri.29247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2023] [Revised: 01/05/2024] [Accepted: 01/08/2024] [Indexed: 01/21/2024] Open
Abstract
Anomaly detection in medical imaging, particularly within the realm of magnetic resonance imaging (MRI), stands as a vital area of research with far-reaching implications across various medical fields. This review meticulously examines the integration of artificial intelligence (AI) in anomaly detection for MR images, spotlighting its transformative impact on medical diagnostics. We delve into the forefront of AI applications in MRI, exploring advanced machine learning (ML) and deep learning (DL) methodologies that are pivotal in enhancing the precision of diagnostic processes. The review provides a detailed analysis of preprocessing, feature extraction, classification, and segmentation techniques, alongside a comprehensive evaluation of commonly used metrics. Further, this paper explores the latest developments in ensemble methods and explainable AI, offering insights into future directions and potential breakthroughs. This review synthesizes current insights, offering a valuable guide for researchers, clinicians, and medical imaging experts. It highlights AI's crucial role in improving the precision and speed of detecting key structural and functional irregularities in MRI. Our exploration of innovative techniques and trends furthers MRI technology development, aiming to refine diagnostics, tailor treatments, and elevate patient care outcomes. LEVEL OF EVIDENCE: 5 TECHNICAL EFFICACY: Stage 1.
Collapse
Affiliation(s)
- Pegah Khosravi
- Department of Biological Sciences, New York City College of Technology, CUNY, New York City, New York, USA
- The CUNY Graduate Center, City University of New York, New York City, New York, USA
| | - Saber Mohammadi
- Department of Biological Sciences, New York City College of Technology, CUNY, New York City, New York, USA
- Department of Biophysics, Tarbiat Modares University, Tehran, Iran
| | - Fatemeh Zahiri
- Department of Cell and Molecular Sciences, Kharazmi University, Tehran, Iran
| | | | - Javad Zahiri
- Department of Neuroscience, University of California San Diego, San Diego, California, USA
| |
Collapse
|
3
|
Jeng AC, Sibley IJ, Bale TL. A global perspective on AI innovation and effective use in the research lab. Neuroscience 2024; 560:106-108. [PMID: 39307414 DOI: 10.1016/j.neuroscience.2024.09.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2024] [Accepted: 09/17/2024] [Indexed: 09/28/2024]
Affiliation(s)
- Alyssa C Jeng
- Department of Psychiatry, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Isabelle J Sibley
- Department of Psychiatry, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Tracy L Bale
- Department of Psychiatry, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA.
| |
Collapse
|
4
|
Leung LY, Puttock E, Kallmes DF, Luetmer P, Fu S, Zheng CX, Liu H, Chen W, Kent DM. Statins are rarely prescribed for incidentally discovered covert cerebrovascular disease: a retrospective cohort in a large electronic health record (EHR) identified using natural language processing. BMJ Neurol Open 2024; 6:e000855. [PMID: 39534403 PMCID: PMC11555097 DOI: 10.1136/bmjno-2024-000855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2024] [Accepted: 10/21/2024] [Indexed: 11/16/2024] Open
Abstract
Introduction While incidentally discovered covert cerebrovascular diseases (id-CCD) are associated with future stroke, it is not known if patients with id-CCD are prescribed statins. Methods Patients age ≥50 with id-CCD on neuroimaging from 2009 to 2019 with no prior ischaemic stroke, transient ischaemic attack or dementia were identified using natural language processing in a large real-world cohort. Robust Poisson multivariable regression was used to assess statin prescription among patients without prior statins. Results Among 2 41 050 patients, 74 975 patients (31.1%; 4.7% with covert brain infarcts (CBI); 29.0% with white matter disease (WMD)) had id-CCD. 53.5% (95% CI 53.2 to 53.9%) were not on statins within 6 months prior to the scan. Of those, 12.0% (95% CI 11.7 to 12.3%) were prescribed statins in the next 6 months compared with 9.3% (95% CI 9.1 to 9.4%) in those without CCD, a 2.7% (95% CI 2.4 to 3.1%) absolute increase in statin prescription for those with id-CCD. In adjusted analyses, the presence of id-CCD was only associated with minor increases in statin prescription (CBI or WMD (risk ratio (RR) 1.09, 95% CI 1.05 to 1.13), CBI alone (RR 1.34, 95% CI 1.21 to 1.47), WMD alone (RR 1.05, 95% CI 1.01 to 1.09), and CBI and WMD (RR 1.23, 95% CI 1.12 to 1.35)). Discussion Identification of id-CCD is not associated with substantial changes in statin prescription in routine clinical practice.
Collapse
Affiliation(s)
| | - Eric Puttock
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, California, USA
| | | | | | - Sunyang Fu
- The University of Texas Health Science Center at Houston School of Biomedical Informatics, Houston, Texas, USA
| | - Chengyi X Zheng
- Kaiser Permanente Southern California Department of Research & Evaluation, Pasadena, California, USA
| | - Hongfang Liu
- The University of Texas Health Science Center at Houston School of Biomedical Informatics, Houston, Texas, USA
| | - Wansu Chen
- Kaiser Permanente Southern California Department of Research & Evaluation, Pasadena, California, USA
| | - David M Kent
- Tufts Medical Center Predictive Analytics and Comparative Effectiveness (PACE) Center, Boston, Massachusetts, USA
| |
Collapse
|
5
|
Tabari P, Costagliola G, De Rosa M, Boeker M. State-of-the-Art Fast Healthcare Interoperability Resources (FHIR)-Based Data Model and Structure Implementations: Systematic Scoping Review. JMIR Med Inform 2024; 12:e58445. [PMID: 39316433 PMCID: PMC11472501 DOI: 10.2196/58445] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2024] [Revised: 07/28/2024] [Accepted: 08/17/2024] [Indexed: 09/25/2024] Open
Abstract
BACKGROUND Data models are crucial for clinical research as they enable researchers to fully use the vast amount of clinical data stored in medical systems. Standardized data and well-defined relationships between data points are necessary to guarantee semantic interoperability. Using the Fast Healthcare Interoperability Resources (FHIR) standard for clinical data representation would be a practical methodology to enhance and accelerate interoperability and data availability for research. OBJECTIVE This research aims to provide a comprehensive overview of the state-of-the-art and current landscape in FHIR-based data models and structures. In addition, we intend to identify and discuss the tools, resources, limitations, and other critical aspects mentioned in the selected research papers. METHODS To ensure the extraction of reliable results, we followed the instructions of the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) checklist. We analyzed the indexed articles in PubMed, Scopus, Web of Science, IEEE Xplore, the ACM Digital Library, and Google Scholar. After identifying, extracting, and assessing the quality and relevance of the articles, we synthesized the extracted data to identify common patterns, themes, and variations in the use of FHIR-based data models and structures across different studies. RESULTS On the basis of the reviewed articles, we could identify 2 main themes: dynamic (pipeline-based) and static data models. The articles were also categorized into health care use cases, including chronic diseases, COVID-19 and infectious diseases, cancer research, acute or intensive care, random and general medical notes, and other conditions. Furthermore, we summarized the important or common tools and approaches of the selected papers. These items included FHIR-based tools and frameworks, machine learning approaches, and data storage and security. The most common resource was "Observation" followed by "Condition" and "Patient." The limitations and challenges of developing data models were categorized based on the issues of data integration, interoperability, standardization, performance, and scalability or generalizability. CONCLUSIONS FHIR serves as a highly promising interoperability standard for developing real-world health care apps. The implementation of FHIR modeling for electronic health record data facilitates the integration, transmission, and analysis of data while also advancing translational research and phenotyping. Generally, FHIR-based exports of local data repositories improve data interoperability for systems and data warehouses across different settings. However, ongoing efforts to address existing limitations and challenges are essential for the successful implementation and integration of FHIR data models.
Collapse
Affiliation(s)
- Parinaz Tabari
- Department of Informatics, University of Salerno, Fisciano, Italy
| | | | - Mattia De Rosa
- Department of Informatics, University of Salerno, Fisciano, Italy
| | - Martin Boeker
- Institute for Artificial Intelligence and Informatics in Medicine, Medical Center rechts der Isar, School of Medicine and Health, Technical University of Munich, Munich, Germany
| |
Collapse
|
6
|
Woo KMC, Simon GW, Akindutire O, Aphinyanaphongs Y, Austrian JS, Kim JG, Genes N, Goldenring JA, Major VJ, Pariente CS, Pineda EG, Kang SK. Evaluation of GPT-4 ability to identify and generate patient instructions for actionable incidental radiology findings. J Am Med Inform Assoc 2024; 31:1983-1993. [PMID: 38778578 PMCID: PMC11339516 DOI: 10.1093/jamia/ocae117] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Revised: 03/30/2024] [Accepted: 05/06/2024] [Indexed: 05/25/2024] Open
Abstract
OBJECTIVES To evaluate the proficiency of a HIPAA-compliant version of GPT-4 in identifying actionable, incidental findings from unstructured radiology reports of Emergency Department patients. To assess appropriateness of artificial intelligence (AI)-generated, patient-facing summaries of these findings. MATERIALS AND METHODS Radiology reports extracted from the electronic health record of a large academic medical center were manually reviewed to identify non-emergent, incidental findings with high likelihood of requiring follow-up, further sub-stratified as "definitely actionable" (DA) or "possibly actionable-clinical correlation" (PA-CC). Instruction prompts to GPT-4 were developed and iteratively optimized using a validation set of 50 reports. The optimized prompt was then applied to a test set of 430 unseen reports. GPT-4 performance was primarily graded on accuracy identifying either DA or PA-CC findings, then secondarily for DA findings alone. Outputs were reviewed for hallucinations. AI-generated patient-facing summaries were assessed for appropriateness via Likert scale. RESULTS For the primary outcome (DA or PA-CC), GPT-4 achieved 99.3% recall, 73.6% precision, and 84.5% F-1. For the secondary outcome (DA only), GPT-4 demonstrated 95.2% recall, 77.3% precision, and 85.3% F-1. No findings were "hallucinated" outright. However, 2.8% of cases included generated text about recommendations that were inferred without specific reference. The majority of True Positive AI-generated summaries required no or minor revision. CONCLUSION GPT-4 demonstrates proficiency in detecting actionable, incidental findings after refined instruction prompting. AI-generated patient instructions were most often appropriate, but rarely included inferred recommendations. While this technology shows promise to augment diagnostics, active clinician oversight via "human-in-the-loop" workflows remains critical for clinical implementation.
Collapse
Affiliation(s)
- Kar-mun C Woo
- Ronald O. Perelman Department of Emergency Medicine, NYU Grossman School of Medicine, New York, NY 10016, United States
| | - Gregory W Simon
- Ronald O. Perelman Department of Emergency Medicine, NYU Grossman School of Medicine, New York, NY 10016, United States
| | - Olumide Akindutire
- Ronald O. Perelman Department of Emergency Medicine, NYU Grossman School of Medicine, New York, NY 10016, United States
| | - Yindalon Aphinyanaphongs
- Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States
- Department of Health Informatics, Medical Center IT, NYU Langone Health, New York, NY 10016, United States
| | - Jonathan S Austrian
- Department of Health Informatics, Medical Center IT, NYU Langone Health, New York, NY 10016, United States
- Department of Medicine, NYU Grossman School of Medicine, New York, NY 10016, United States
| | - Jung G Kim
- Ronald O. Perelman Department of Emergency Medicine, NYU Grossman School of Medicine, New York, NY 10016, United States
- Institute for Innovations in Medical Education, NYU Langone Health, New York, NY 10016, United States
| | - Nicholas Genes
- Ronald O. Perelman Department of Emergency Medicine, NYU Grossman School of Medicine, New York, NY 10016, United States
- Department of Health Informatics, Medical Center IT, NYU Langone Health, New York, NY 10016, United States
| | - Jacob A Goldenring
- Ronald O. Perelman Department of Emergency Medicine, NYU Grossman School of Medicine, New York, NY 10016, United States
| | - Vincent J Major
- Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States
- Department of Health Informatics, Medical Center IT, NYU Langone Health, New York, NY 10016, United States
| | - Chloé S Pariente
- Department of Health Informatics, Medical Center IT, NYU Langone Health, New York, NY 10016, United States
| | - Edwin G Pineda
- MCIT Clinical Systems—ASAP application, NYU Langone Health, New York, NY 10016, United States
| | - Stella K Kang
- Department of Population Health, NYU Grossman School of Medicine, New York, NY 10016, United States
- Department of Radiology, NYU Grossman School of Medicine, New York, NY 10016, United States
| |
Collapse
|
7
|
Clancy Ú, Puttock EJ, Chen W, Whiteley W, Vickery EM, Leung LY, Luetmer PH, Kallmes DF, Fu S, Zheng C, Liu H, Kent DM. Mortality Outcomes in a Large Population with and without Covert Cerebrovascular Disease. Aging Dis 2024; 16:AD.2024.0211. [PMID: 38421836 PMCID: PMC11745435 DOI: 10.14336/ad.2024.0211] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2023] [Accepted: 02/11/2024] [Indexed: 03/02/2024] Open
Abstract
Covert cerebrovascular disease (CCD) is frequently reported on neuroimaging and associates with increased dementia and stroke risk. We aimed to determine how incidentally-discovered CCD during clinical neuroimaging in a large population associates with mortality. We screened CT and MRI reports of adults aged ≥50 in the Kaiser Permanente Southern California health system who underwent neuroimaging for a non-stroke clinical indication from 2009-2019. Natural language processing identified incidental covert brain infarcts (CBI) and/or white matter hyperintensities (WMH), grading WMH as mild/moderate/severe. Models adjusted for age, sex, ethnicity, multimorbidity, vascular risks, depression, exercise, and imaging modality. Of n=241,028, the mean age was 64.9 (SD=10.4); mean follow-up 4.46 years; 178,554 (74.1%) had CT; 62,474 (25.9%) had MRI; 11,328 (4.7%) had CBI; and 69,927 (29.0%) had WMH. The mortality rate per 1,000 person-years with CBI was 59.0 (95%CI 57.0-61.1); with WMH=46.5 (45.7-47.2); with neither=17.4 (17.1-17.7). In adjusted models, mortality risk associated with CBI was modified by age, e.g. HR 1.34 [1.21-1.48] at age 56.1 years vs HR 1.22 [1.17-1.28] at age 72 years. Mortality associated with WMH was modified by both age and imaging modality e.g., WMH on MRI at age 56.1 HR = 1.26 [1.18-1.35]; WMH on MRI at age 72 HR 1.15 [1.09-1.21]; WMH on CT at age 56.1 HR 1.41 [1.33-1.50]; WMH on CT at age 72 HR 1.28 [1.24-1.32], vs. patients without CBI or without WMH, respectively. Increasing WMH severity associated with higher mortality, e.g. mild WMH on MRI had adjusted HR=1.13 [1.06-1.20] while severe WMH on CT had HR=1.45 [1.33-1.59]. Incidentally-detected CBI and WMH on population-based clinical neuroimaging can predict higher mortality rates. We need treatments and healthcare planning for individuals with CCD.
Collapse
Affiliation(s)
- Úna Clancy
- Centre for Clinical Brain Sciences, Edinburgh Imaging, and UK Dementia Research Institute, University of Edinburgh, Edinburgh EH16 4SB, United Kingdom.
| | - Eric J. Puttock
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, California, USA.
| | - Wansu Chen
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, California, USA.
| | - William Whiteley
- Centre for Clinical Brain Sciences, Edinburgh Imaging, and UK Dementia Research Institute, University of Edinburgh, Edinburgh EH16 4SB, United Kingdom.
| | - Ellen M. Vickery
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, Massachusetts, USA.
| | - Lester Y. Leung
- Department of Neurology, Tufts Medical Center, Boston, Massachusetts, USA.
| | | | - David F. Kallmes
- Department of Radiology, Mayo Clinic, Rochester, Minnesota, USA.
| | - Sunyang Fu
- Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center, Houston, Texas, USA
| | - Chengyi Zheng
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, California, USA.
| | - Hongfang Liu
- Center for Translational AI Excellence and Applications in Medicine, University of Texas Health Science Center, Houston, Texas, USA
| | - David M. Kent
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, Massachusetts, USA.
| |
Collapse
|
8
|
Chien A, Tang H, Jagessar B, Chang KW, Peng N, Nael K, Salamon N. AI-Assisted Summarization of Radiologic Reports: Evaluating GPT3davinci, BARTcnn, LongT5booksum, LEDbooksum, LEDlegal, and LEDclinical. AJNR Am J Neuroradiol 2024; 45:244-248. [PMID: 38238092 PMCID: PMC11285993 DOI: 10.3174/ajnr.a8102] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2023] [Accepted: 11/09/2023] [Indexed: 02/09/2024]
Abstract
BACKGROUND AND PURPOSE The review of clinical reports is an essential part of monitoring disease progression. Synthesizing multiple imaging reports is also important for clinical decisions. It is critical to aggregate information quickly and accurately. Machine learning natural language processing (NLP) models hold promise to address an unmet need for report summarization. MATERIALS AND METHODS We evaluated NLP methods to summarize longitudinal aneurysm reports. A total of 137 clinical reports and 100 PubMed case reports were used in this study. Models were 1) compared against expert-generated summary using longitudinal imaging notes collected in our institute and 2) compared using publicly accessible PubMed case reports. Five AI models were used to summarize the clinical reports, and a sixth model, the online GPT3davinci NLP large language model (LLM), was added for the summarization of PubMed case reports. We assessed the summary quality through comparison with expert summaries using quantitative metrics and quality reviews by experts. RESULTS In clinical summarization, BARTcnn had the best performance (BERTscore = 0.8371), followed by LongT5Booksum and LEDlegal. In the analysis using PubMed case reports, GPT3davinci demonstrated the best performance, followed by models BARTcnn and then LEDbooksum (BERTscore = 0.894, 0.872, and 0.867, respectively). CONCLUSIONS AI NLP summarization models demonstrated great potential in summarizing longitudinal aneurysm reports, though none yet reached the level of quality for clinical usage. We found the online GPT LLM outperformed the others; however, the BARTcnn model is potentially more useful because it can be implemented on-site. Future work to improve summarization, address other types of neuroimaging reports, and develop structured reports may allow NLP models to ease clinical workflow.
Collapse
Affiliation(s)
- Aichi Chien
- From the Department of Radiological Science (A.C., H.T., B.J., K.N., N.S.), David Geffen School of Medicine at UCLA, Los Angeles, California
| | - Hubert Tang
- From the Department of Radiological Science (A.C., H.T., B.J., K.N., N.S.), David Geffen School of Medicine at UCLA, Los Angeles, California
| | - Bhavita Jagessar
- From the Department of Radiological Science (A.C., H.T., B.J., K.N., N.S.), David Geffen School of Medicine at UCLA, Los Angeles, California
| | - Kai-Wei Chang
- Department of Computer Science (K.C., N.P.), University of California, Los Angeles, Los Angeles, California
| | - Nanyun Peng
- Department of Computer Science (K.C., N.P.), University of California, Los Angeles, Los Angeles, California
| | - Kambiz Nael
- From the Department of Radiological Science (A.C., H.T., B.J., K.N., N.S.), David Geffen School of Medicine at UCLA, Los Angeles, California
| | - Noriko Salamon
- From the Department of Radiological Science (A.C., H.T., B.J., K.N., N.S.), David Geffen School of Medicine at UCLA, Los Angeles, California
| |
Collapse
|
9
|
Meinel TR, Triulzi CB, Kaesmacher J, Mujanovic A, Pasi M, Leung LY, Kent DM, Sui Y, Seiffge D, Bücke P, Umarova R, Arnold M, Roten L, Nguyen TN, Wardlaw J, Fischer U. Management of covert brain infarction survey: A call to care for and trial this neglected population. Eur Stroke J 2023; 8:1079-1088. [PMID: 37427426 PMCID: PMC10683731 DOI: 10.1177/23969873231187444] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2023] [Accepted: 06/26/2023] [Indexed: 07/11/2023] Open
Abstract
BACKGROUND Covert brain infarction (CBI) is highly prevalent and linked with stroke risk factors, increased mortality, and morbidity. Evidence to guide management is sparse. We sought to gain information on current practice and attitudes toward CBI and to compare differences in management according to CBI phenotype. METHODS We conducted a web-based, structured, international survey from November 2021 to February 2022 among neurologists and neuroradiologists. The survey captured respondents' baseline characteristics, general approach toward CBI and included two case scenarios designed to evaluate management decisions taken upon incidental detection of an embolic-phenotype and a small-vessel-disease phenotype. RESULTS Of 627 respondents (38% vascular neurologists, 24% general neurologists, and 26% neuroradiologists), 362 (58%) had a partial, and 305 (49%) a complete response. Most respondents were university hospital senior faculty members experienced in stroke, mostly from Europe and Asia. Only 66 (18%) of respondents had established institutional written protocols to manage CBI. The majority indicated that they were uncertain regarding useful investigations and further management of CBI patients (median 67 on a slider 0-100, 95% CI 35-81). Almost all respondents (97%) indicated that they would assess vascular risk factors. Although most would investigate and treat similarly to ischemic stroke for both phenotypes, including initiating antithrombotic treatment, there was considerable diagnostic and therapeutic heterogeneity. Less than half of respondents (42%) would assess cognitive function or depression. CONCLUSIONS There is a high degree of uncertainty and heterogeneity regarding management of two common types of CBI, even among experienced stroke physicians. Respondents were more proactive regarding the diagnostic and therapeutic management than the minimum recommended by current expert opinions. More data are required to guide management of CBI; meantime, more consistent approaches to identification and consistent application of current knowledge, that also consider cognition and mood, would be promising first steps to improve consistency of care.
Collapse
Affiliation(s)
- Thomas R Meinel
- Neurology, Stroke Research Center Bern, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Camilla B Triulzi
- Neurology, Stroke Research Center Bern, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Johannes Kaesmacher
- Institute of Diagnostic and Interventional Neuroradiology, Inselspital, Bern University Hospital, and University of Bern, Bern, Switzerland
| | - Adnan Mujanovic
- Neurology, Stroke Research Center Bern, Bern University Hospital, University of Bern, Bern, Switzerland
- Institute of Diagnostic and Interventional Neuroradiology, Inselspital, Bern University Hospital, and University of Bern, Bern, Switzerland
| | - Marco Pasi
- University of Lille, Inserm, CHU Lille, U1172-Lille Neuroscience & Cognition (LilNCog), Lille, France
| | - Lester Y Leung
- Department of Neurology, Tufts Medical Center, Boston, MA, USA
| | - David M Kent
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA
| | - Yi Sui
- Department of Neurology, The First Affiliated Hospital of China Medical University, Shenyang, China
- Department of Neurology, Shenyang First People’s Hospital, Shenyang Brain Institute, Shenyang, China
| | - David Seiffge
- Neurology, Stroke Research Center Bern, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Philipp Bücke
- Neurology, Stroke Research Center Bern, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Roza Umarova
- Neurology, Stroke Research Center Bern, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Marcel Arnold
- Neurology, Stroke Research Center Bern, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Laurent Roten
- Cardiology, Bern University Hospital, University of Bern, Bern, Switzerland
| | - Thanh N Nguyen
- Neurology and Radiology, Boston Medical Center, Boston, MA, USA
| | - Joanna Wardlaw
- Division of Neuroimaging Sciences, Brain Research Imaging Centre, Centre for Clinical Brain Sciences, UK Dementia Research Institute at the University of Edinburgh, Edinburgh, UK
| | - Urs Fischer
- Neurology, Stroke Research Center Bern, Bern University Hospital, University of Bern, Bern, Switzerland
- Neurology, Basel University Hospital, University of Basel, Basel, Switzerland
| |
Collapse
|
10
|
Casey A, Davidson E, Grover C, Tobin R, Grivas A, Zhang H, Schrempf P, O’Neil AQ, Lee L, Walsh M, Pellie F, Ferguson K, Cvoro V, Wu H, Whalley H, Mair G, Whiteley W, Alex B. Understanding the performance and reliability of NLP tools: a comparison of four NLP tools predicting stroke phenotypes in radiology reports. Front Digit Health 2023; 5:1184919. [PMID: 37840686 PMCID: PMC10569314 DOI: 10.3389/fdgth.2023.1184919] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Accepted: 09/06/2023] [Indexed: 10/17/2023] Open
Abstract
Background Natural language processing (NLP) has the potential to automate the reading of radiology reports, but there is a need to demonstrate that NLP methods are adaptable and reliable for use in real-world clinical applications. Methods We tested the F1 score, precision, and recall to compare NLP tools on a cohort from a study on delirium using images and radiology reports from NHS Fife and a population-based cohort (Generation Scotland) that spans multiple National Health Service health boards. We compared four off-the-shelf rule-based and neural NLP tools (namely, EdIE-R, ALARM+, ESPRESSO, and Sem-EHR) and reported on their performance for three cerebrovascular phenotypes, namely, ischaemic stroke, small vessel disease (SVD), and atrophy. Clinical experts from the EdIE-R team defined phenotypes using labelling techniques developed in the development of EdIE-R, in conjunction with an expert researcher who read underlying images. Results EdIE-R obtained the highest F1 score in both cohorts for ischaemic stroke, ≥93%, followed by ALARM+, ≥87%. The F1 score of ESPRESSO was ≥74%, whilst that of Sem-EHR is ≥66%, although ESPRESSO had the highest precision in both cohorts, 90% and 98%. For F1 scores for SVD, EdIE-R scored ≥98% and ALARM+ ≥90%. ESPRESSO scored lowest with ≥77% and Sem-EHR ≥81%. In NHS Fife, F1 scores for atrophy by EdIE-R and ALARM+ were 99%, dropping in Generation Scotland to 96% for EdIE-R and 91% for ALARM+. Sem-EHR performed lowest for atrophy at 89% in NHS Fife and 73% in Generation Scotland. When comparing NLP tool output with brain image reads using F1 scores, ALARM+ scored 80%, outperforming EdIE-R at 66% in ischaemic stroke. For SVD, EdIE-R performed best, scoring 84%, with Sem-EHR 82%. For atrophy, EdIE-R and both ALARM+ versions were comparable at 80%. Conclusions The four NLP tools show varying F1 (and precision/recall) scores across all three phenotypes, although more apparent for ischaemic stroke. If NLP tools are to be used in clinical settings, this cannot be performed "out of the box." It is essential to understand the context of their development to assess whether they are suitable for the task at hand or whether further training, re-training, or modification is required to adapt tools to the target task.
Collapse
Affiliation(s)
- Arlene Casey
- Advanced Care Research Centre, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Emma Davidson
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Claire Grover
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Richard Tobin
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Andreas Grivas
- School of Informatics, University of Edinburgh, Edinburgh, United Kingdom
| | - Huayu Zhang
- Advanced Care Research Centre, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom
| | - Patrick Schrempf
- Canon Medical Research Europe Ltd., AI Research, Edinburgh, United Kingdom
- School of Computer Science, University of St Andrews, St Andrews, United Kingdom
| | - Alison Q. O’Neil
- Canon Medical Research Europe Ltd., AI Research, Edinburgh, United Kingdom
- School of Engineering, University of Edinburgh, Edinburgh, United Kingdom
| | - Liam Lee
- Medical School, University of Edinburgh, Edinburgh, United Kingdom
| | - Michael Walsh
- Intensive Care Department, University Hospitals Bristol and Weston, Bristol, United Kingdom
| | - Freya Pellie
- National Horizons Centre, Teesside University, Darlington, United Kingdom
- School of Health and Life Sciences, Teesside University, Middlesbrough, United Kingdom
| | - Karen Ferguson
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
| | - Vera Cvoro
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Department of Geriatric Medicine, NHS Fife, Fife, United Kingdom
| | - Honghan Wu
- Institute of Health Informatics, University College London, London, United Kingdom
- Alan Turing Institute, London, United Kingdom
| | - Heather Whalley
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Generation Scotland, Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, United Kingdom
| | - Grant Mair
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Neuroradiology, Department of Clinical Neurosciences, NHS Lothian, Edinburgh, United Kingdom
| | - William Whiteley
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom
- Neuroradiology, Department of Clinical Neurosciences, NHS Lothian, Edinburgh, United Kingdom
| | - Beatrice Alex
- Edinburgh Futures Institute, University of Edinburgh, Edinburgh, United Kingdom
- School of Literatures, Languages and Cultures, University of Edinburgh, Edinburgh, United Kingdom
| |
Collapse
|
11
|
Teghipco A, Newman-Norlund R, Fridriksson J, Rorden C, Bonilha L. Distinct brain morphometry patterns revealed by deep learning improve prediction of aphasia severity. RESEARCH SQUARE 2023:rs.3.rs-3126126. [PMID: 37461696 PMCID: PMC10350198 DOI: 10.21203/rs.3.rs-3126126/v1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 07/28/2023]
Abstract
Emerging evidence suggests that post-stroke aphasia severity depends on the integrity of the brain beyond the stroke lesion. While measures of lesion anatomy and brain integrity combine synergistically to explain aphasic symptoms, significant interindividual variability remains unaccounted for. A possible explanatory factor may be the spatial distribution of brain atrophy beyond the lesion. This includes not just the specific brain areas showing atrophy, but also distinct three-dimensional patterns of atrophy. Here, we tested whether deep learning with Convolutional Neural Networks (CNN) on whole brain morphometry (i.e., segmented tissue volumes) and lesion anatomy can better predict which individuals with chronic stroke (N=231) have severe aphasia, and whether encoding spatial dependencies in the data might be capable of improving predictions by identifying unique individualized spatial patterns. We observed that CNN achieves significantly higher accuracy and F1 scores than Support Vector Machine (SVM), even when the SVM is nonlinear or integrates linear and nonlinear dimensionality reduction techniques. Performance parity was only achieved when the SVM was directly trained on the latent features learned by the CNN. Saliency maps demonstrated that the CNN leveraged widely distributed patterns of brain atrophy predictive of aphasia severity, whereas the SVM focused almost exclusively on the area around the lesion. Ensemble clustering of CNN saliency maps revealed distinct morphometry patterns that were unrelated to lesion size, highly consistent across individuals, and implicated unique brain networks associated with different cognitive processes as measured by the wider neuroimaging literature. Individualized predictions of severity depended on both ipsilateral and contralateral features outside of the location of stroke. Our findings illustrate that three-dimensional network distributions of atrophy in individuals with aphasia are directly associated with aphasia severity, underscoring the potential for deep learning to improve prognostication of behavioral outcomes from neuroimaging data, and highlighting the prospective benefits of interrogating spatial dependence at different scales in multivariate feature space.
Collapse
|
12
|
Rietberg MT, Nguyen VB, Geerdink J, Vijlbrief O, Seifert C. Accurate and Reliable Classification of Unstructured Reports on Their Diagnostic Goal Using BERT Models. Diagnostics (Basel) 2023; 13:diagnostics13071251. [PMID: 37046469 PMCID: PMC10093295 DOI: 10.3390/diagnostics13071251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Revised: 03/09/2023] [Accepted: 03/17/2023] [Indexed: 03/29/2023] Open
Abstract
Understanding the diagnostic goal of medical reports is valuable information for understanding patient flows. This work focuses on extracting the reason for taking an MRI scan of Multiple Sclerosis (MS) patients using the attached free-form reports: Diagnosis, Progression or Monitoring. We investigate the performance of domain-dependent and general state-of-the-art language models and their alignment with domain expertise. To this end, eXplainable Artificial Intelligence (XAI) techniques are used to acquire insight into the inner workings of the model, which are verified on their trustworthiness. The verified XAI explanations are then compared with explanations from a domain expert, to indirectly determine the reliability of the model. BERTje, a Dutch Bidirectional Encoder Representations from Transformers (BERT) model, outperforms RobBERT and MedRoBERTa.nl in both accuracy and reliability. The latter model (MedRoBERTa.nl) is a domain-specific model, while BERTje is a generic model, showing that domain-specific models are not always superior. Our validation of BERTje in a small prospective study shows promising results for the potential uptake of the model in a practical setting.
Collapse
|
13
|
Fu S, Wang L, Moon S, Zong N, He H, Pejaver V, Relevo R, Walden A, Haendel M, Chute CG, Liu H. Recommended practices and ethical considerations for natural language processing-assisted observational research: A scoping review. Clin Transl Sci 2023; 16:398-411. [PMID: 36478394 PMCID: PMC10014687 DOI: 10.1111/cts.13463] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 11/03/2022] [Accepted: 11/18/2022] [Indexed: 12/12/2022] Open
Abstract
An increasing number of studies have reported using natural language processing (NLP) to assist observational research by extracting clinical information from electronic health records (EHRs). Currently, no standardized reporting guidelines for NLP-assisted observational studies exist. The absence of detailed reporting guidelines may create ambiguity in the use of NLP-derived content, knowledge gaps in the current research reporting practices, and reproducibility challenges. To address these issues, we conducted a scoping review of NLP-assisted observational clinical studies and examined their reporting practices, focusing on NLP methodology and evaluation. Through our investigation, we discovered a high variation regarding the reporting practices, such as inconsistent use of references for measurement studies, variation in the reporting location (reference, appendix, and manuscript), and different granularity of NLP methodology and evaluation details. To promote the wide adoption and utilization of NLP solutions in clinical research, we outline several perspectives that align with the six principles released by the World Health Organization (WHO) that guide the ethical use of artificial intelligence for health.
Collapse
Affiliation(s)
- Sunyang Fu
- Department of AI and Informatics ResearchMayo ClinicRochesterMinnesotaUSA
| | - Liwei Wang
- Department of AI and Informatics ResearchMayo ClinicRochesterMinnesotaUSA
| | - Sungrim Moon
- Department of AI and Informatics ResearchMayo ClinicRochesterMinnesotaUSA
| | - Nansu Zong
- Department of AI and Informatics ResearchMayo ClinicRochesterMinnesotaUSA
| | - Huan He
- Department of AI and Informatics ResearchMayo ClinicRochesterMinnesotaUSA
| | - Vikas Pejaver
- Institute for Genomic Health, Icahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
- Department of Genetics and Genomic SciencesIcahn School of Medicine at Mount SinaiNew YorkNew YorkUSA
| | - Rose Relevo
- The National Center for Data to HealthBethesdaMarylandUSA
| | - Anita Walden
- The National Center for Data to HealthBethesdaMarylandUSA
| | - Melissa Haendel
- Center for Health AIUniversity of Colorado Anschutz Medical CampusAuroraColoradoUSA
| | | | - Hongfang Liu
- Department of AI and Informatics ResearchMayo ClinicRochesterMinnesotaUSA
| |
Collapse
|
14
|
IKAR: An Interdisciplinary Knowledge-Based Automatic Retrieval Method from Chinese Electronic Medical Record. INFORMATION 2023. [DOI: 10.3390/info14010049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
To date, information retrieval methods in the medical field have mainly focused on English medical reports, but little work has studied Chinese electronic medical reports, especially in the field of obstetrics and gynecology. In this paper, a dataset of 180,000 complete Chinese ultrasound reports in obstetrics and gynecology was established and made publicly available. Based on the ultrasound reports in the dataset, a new information retrieval method (IKAR) is proposed to extract key information from the ultrasound reports and automatically generate the corresponding ultrasound diagnostic results. The model can both extract what is already in the report and analyze what is not in the report by inference. After applying the IKAR method to the dataset, it is proved that the method could achieve 89.38% accuracy, 91.09% recall, and 90.23% F-score. Moreover, the method achieves an F-score of over 90% on 50% of the 10 components of the report. This study provides a quality dataset for the field of electronic medical records and offers a reference for information retrieval methods in the field of obstetrics and gynecology or in other fields.
Collapse
|
15
|
Kent DM, Leung LY, Zhou Y, Luetmer PH, Kallmes DF, Nelson J, Fu S, Puttock EJ, Zheng C, Liu H, Chen W. Association of Incidentally Discovered Covert Cerebrovascular Disease Identified Using Natural Language Processing and Future Dementia. J Am Heart Assoc 2023; 12:e027672. [PMID: 36565208 PMCID: PMC9973577 DOI: 10.1161/jaha.122.027672] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/30/2022] [Accepted: 11/09/2022] [Indexed: 12/25/2022]
Abstract
Background Covert cerebrovascular disease (CCD) has been shown to be associated with dementia in population-based studies with magnetic resonance imaging (MRI) screening, but dementia risk associated with incidentally discovered CCD is not known. Methods and Results Individuals aged ≥50 years enrolled in the Kaiser Permanente Southern California health system receiving head computed tomography (CT) or MRI for nonstroke indications from 2009 to 2019, without prior ischemic stroke/transient ischemic attack, dementia/Alzheimer disease, or visit reason/scan indication suggestive of cognitive decline or stroke were included. Natural language processing identified incidentally discovered covert brain infarction (id-CBI) and white matter disease (id-WMD) on the neuroimage report; white matter disease was characterized as mild, moderate, severe, or undetermined. We estimated risk of dementia associated with id-CBI and id-WMD. Among 241 050 qualified individuals, natural language processing identified 69 931 (29.0%) with id-WMD and 11 328 (4.7%) with id-CBI. Dementia incidence rates (per 1000 person-years) were 23.5 (95% CI, 22.9-24.0) for patients with id-WMD, 29.4 (95% CI, 27.9-31.0) with id-CBI, and 6.0 (95% CI, 5.8-6.2) without id-CCD. The association of id-WMD with future dementia was stronger in younger (aged <70 years) versus older (aged ≥70 years) patients and for CT- versus MRI-discovered lesions. For patients with versus without id-WMD on CT, the adjusted HR was 2.87 (95% CI, 2.58-3.19) for older and 1.87 (95% CI, 1.79-1.95) for younger patients. For patients with versus without id-WMD on MRI, the adjusted HR for dementia risk was 2.28 (95% CI, 1.99-2.62) for older and 1.48 (95% CI, 1.32-1.66) for younger patients. The adjusted HR for id-CBI was 2.02 (95% CI, 1.70-2.41) for older and 1.22 (95% CI, 1.15-1.30) for younger patients for either modality. Dementia risk was strongly correlated with id-WMD severity; adjusted HRs compared with patients who were negative for id-WMD by MRI ranged from 1.41 (95% CI, 1.25-1.60) for those with mild disease on MRI to 4.11 (95% CI, 3.58-4.72) for those with severe disease on CT. Conclusions Incidentally discovered CCD is common and associated with a high risk of dementia, representing an opportunity for prevention. The association is strengthened when discovered at younger age, by increasing id-WMD severity, and when id-WMD is detected by CT scan rather than MRI.
Collapse
Affiliation(s)
- David M. Kent
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical CenterBostonMA
| | | | - Yichen Zhou
- Department of Research and EvaluationKaiser Permanente Southern CaliforniaPasadenaCA
| | | | | | - Jason Nelson
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical CenterBostonMA
| | - Sunyang Fu
- Department of AI and InformaticsMayo ClinicRochesterMN
| | - Eric J. Puttock
- Department of Research and EvaluationKaiser Permanente Southern CaliforniaPasadenaCA
| | - Chengyi Zheng
- Department of Research and EvaluationKaiser Permanente Southern CaliforniaPasadenaCA
| | - Hongfang Liu
- Department of AI and InformaticsMayo ClinicRochesterMN
| | - Wansu Chen
- Department of Research and EvaluationKaiser Permanente Southern CaliforniaPasadenaCA
| |
Collapse
|
16
|
Fu S, Vassilaki M, Ibrahim OA, Petersen RC, Pagali S, St Sauver J, Moon S, Wang L, Fan JW, Liu H, Sohn S. Quality assessment of functional status documentation in EHRs across different healthcare institutions. Front Digit Health 2022; 4:958539. [PMID: 36238199 PMCID: PMC9552292 DOI: 10.3389/fdgth.2022.958539] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2022] [Accepted: 09/05/2022] [Indexed: 11/29/2022] Open
Abstract
The secondary use of electronic health records (EHRs) faces challenges in the form of varying data quality-related issues. To address that, we retrospectively assessed the quality of functional status documentation in EHRs of persons participating in Mayo Clinic Study of Aging (MCSA). We used a convergent parallel design to collect quantitative and qualitative data and independently analyzed the findings. We discovered a heterogeneous documentation process, where the care practice teams, institutions, and EHR systems all play an important role in how text data is documented and organized. Four prevalent instrument-assisted documentation (iDoc) expressions were identified based on three distinct instruments: Epic smart form, questionnaire, and occupational therapy and physical therapy templates. We found strong differences in the usage, information quality (intrinsic and contextual), and naturality of language among different type of iDoc expressions. These variations can be caused by different source instruments, information providers, practice settings, care events and institutions. In addition, iDoc expressions are context specific and thus shall not be viewed and processed uniformly. We recommend conducting data quality assessment of unstructured EHR text prior to using the information.
Collapse
Affiliation(s)
- Sunyang Fu
- Department of AI and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Maria Vassilaki
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
| | - Omar A. Ibrahim
- Department of AI and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Ronald C. Petersen
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
- Department of Neurology, Mayo Clinic, Rochester, MN, United States
| | - Sandeep Pagali
- Department of Medicine, Mayo Clinic, Rochester, MN, United States
| | - Jennifer St Sauver
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
| | - Sungrim Moon
- Department of AI and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Liwei Wang
- Department of AI and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Jungwei W. Fan
- Department of AI and Informatics, Mayo Clinic, Rochester, MN, United States
- Department of Quantitative Health Sciences, Mayo Clinic, Rochester, MN, United States
| | - Hongfang Liu
- Department of AI and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Sunghwan Sohn
- Department of AI and Informatics, Mayo Clinic, Rochester, MN, United States
- Correspondence: Sunghwan Sohn
| |
Collapse
|
17
|
Han P, Fu S, Kolis J, Hughes R, Hallstrom BR, Carvour M, Maradit-Kremers H, Sohn S, Vydiswaran VGV. Multicenter Validation of Natural Language Processing Algorithms for the Detection of Common Data Elements in Operative Notes for Total Hip Arthroplasty: Algorithm Development and Validation. JMIR Med Inform 2022; 10:e38155. [PMID: 36044253 PMCID: PMC9475406 DOI: 10.2196/38155] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2022] [Revised: 05/30/2022] [Accepted: 07/12/2022] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Natural language processing (NLP) methods are powerful tools for extracting and analyzing critical information from free-text data. MedTaggerIE, an open-source NLP pipeline for information extraction based on text patterns, has been widely used in the annotation of clinical notes. A rule-based system, MedTagger-total hip arthroplasty (THA), developed based on MedTaggerIE, was previously shown to correctly identify the surgical approach, fixation, and bearing surface from the THA operative notes at Mayo Clinic. OBJECTIVE This study aimed to assess the implementability, usability, and portability of MedTagger-THA at two external institutions, Michigan Medicine and the University of Iowa, and provide lessons learned for best practices. METHODS We conducted iterative test-apply-refinement processes with three involved sites-the development site (Mayo Clinic) and two deployment sites (Michigan Medicine and the University of Iowa). Mayo Clinic was the primary NLP development site, with the THA registry as the gold standard. The activities at the two deployment sites included the extraction of the operative notes, gold standard development (Michigan: registry data; Iowa: manual chart review), the refinement of NLP algorithms on training data, and the evaluation of test data. Error analyses were conducted to understand language variations across sites. To further assess the model specificity for approach and fixation, we applied the refined MedTagger-THA to arthroscopic hip procedures and periacetabular osteotomy cases, as neither of these operative notes should contain any approach or fixation keywords. RESULTS MedTagger-THA algorithms were implemented and refined independently for both sites. At Michigan, the study comprised THA-related notes for 2569 patient-date pairs. Before model refinement, MedTagger-THA algorithms demonstrated excellent accuracy for approach (96.6%, 95% CI 94.6%-97.9%) and fixation (95.7%, 95% CI 92.4%-97.6%). These results were comparable with internal accuracy at the development site (99.2% for approach and 90.7% for fixation). Model refinement improved accuracies slightly for both approach (99%, 95% CI 97.6%-99.6%) and fixation (98%, 95% CI 95.3%-99.3%). The specificity of approach identification was 88.9% for arthroscopy cases, and the specificity of fixation identification was 100% for both periacetabular osteotomy and arthroscopy cases. At the Iowa site, the study comprised an overall data set of 100 operative notes (50 training notes and 50 test notes). MedTagger-THA algorithms achieved moderate-high performance on the training data. After model refinement, the model achieved high performance for approach (100%, 95% CI 91.3%-100%), fixation (98%, 95% CI 88.3%-100%), and bearing surface (92%, 95% CI 80.5%-97.3%). CONCLUSIONS High performance across centers was achieved for the MedTagger-THA algorithms, demonstrating that they were sufficiently implementable, usable, and portable to different deployment sites. This study provided important lessons learned during the model deployment and validation processes, and it can serve as a reference for transferring rule-based electronic health record models.
Collapse
Affiliation(s)
- Peijin Han
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, United States
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - Julie Kolis
- Department of Orthopedic Surgery, University of Michigan, Ann Arbor, MI, United States
| | - Richard Hughes
- Department of Orthopedic Surgery, University of Michigan, Ann Arbor, MI, United States
| | - Brian R Hallstrom
- Department of Orthopedic Surgery, University of Michigan, Ann Arbor, MI, United States
| | - Martha Carvour
- Department of Internal Medicine and Epidemiology, University of Iowa, Iowa City, IA, United States
| | - Hilal Maradit-Kremers
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
- Departments of Orthopedic Surgery, Mayo Clinic, Rochester, MN, United States
| | - Sunghwan Sohn
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN, United States
| | - V G Vinod Vydiswaran
- Department of Learning Health Sciences, Medical School, University of Michigan, Ann Arbor, MI, United States
- School of Information, University of Michigan, Ann Arbor, MI, United States
| |
Collapse
|
18
|
Li J, Lin Y, Zhao P, Liu W, Cai L, Sun J, Zhao L, Yang Z, Song H, Lv H, Wang Z. Automatic text classification of actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer (BERT) and in-domain pre-training (IDPT). BMC Med Inform Decis Mak 2022; 22:200. [PMID: 35907966 PMCID: PMC9338483 DOI: 10.1186/s12911-022-01946-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2022] [Accepted: 07/18/2022] [Indexed: 11/17/2022] Open
Abstract
Background Given the increasing number of people suffering from tinnitus, the accurate categorization of patients with actionable reports is attractive in assisting clinical decision making. However, this process requires experienced physicians and significant human labor. Natural language processing (NLP) has shown great potential in big data analytics of medical texts; yet, its application to domain-specific analysis of radiology reports is limited. Objective The aim of this study is to propose a novel approach in classifying actionable radiology reports of tinnitus patients using bidirectional encoder representations from transformer BERT-based models and evaluate the benefits of in domain pre-training (IDPT) along with a sequence adaptation strategy. Methods A total of 5864 temporal bone computed tomography(CT) reports are labeled by two experienced radiologists as follows: (1) normal findings without notable lesions; (2) notable lesions but uncorrelated to tinnitus; and (3) at least one lesion considered as potential cause of tinnitus. We then constructed a framework consisting of deep learning (DL) neural networks and self-supervised BERT models. A tinnitus domain-specific corpus is used to pre-train the BERT model to further improve its embedding weights. In addition, we conducted an experiment to evaluate multiple groups of max sequence length settings in BERT to reduce the excessive quantity of calculations. After a comprehensive comparison of all metrics, we determined the most promising approach through the performance comparison of F1-scores and AUC values. Results In the first experiment, the BERT finetune model achieved a more promising result (AUC-0.868, F1-0.760) compared with that of the Word2Vec-based models(AUC-0.767, F1-0.733) on validation data. In the second experiment, the BERT in-domain pre-training model (AUC-0.948, F1-0.841) performed significantly better than the BERT based model(AUC-0.868, F1-0.760). Additionally, in the variants of BERT fine-tuning models, Mengzi achieved the highest AUC of 0.878 (F1-0.764). Finally, we found that the BERT max-sequence-length of 128 tokens achieved an AUC of 0.866 (F1-0.736), which is almost equal to the BERT max-sequence-length of 512 tokens (AUC-0.868,F1-0.760). Conclusion In conclusion, we developed a reliable BERT-based framework for tinnitus diagnosis from Chinese radiology reports, along with a sequence adaptation strategy to reduce computational resources while maintaining accuracy. The findings could provide a reference for NLP development in Chinese radiology reports. Supplementary Information The online version contains supplementary material available at 10.1186/s12911-022-01946-y.
Collapse
Affiliation(s)
- Jia Li
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 YongAn Road, Beijing, 100050, People's Republic of China
| | - Yucong Lin
- School of Medical Technology, Beijing Institute of Technology, No.5 Zhongguancun East Road, Beijing, 100050, People's Republic of China
| | - Pengfei Zhao
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 YongAn Road, Beijing, 100050, People's Republic of China
| | - Wenjuan Liu
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 YongAn Road, Beijing, 100050, People's Republic of China
| | - Linkun Cai
- School of Biological Science and Medical Engineering, Beihang University, No.37 XueYuan Road, Beijing, 100191, People's Republic of China
| | - Jing Sun
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 YongAn Road, Beijing, 100050, People's Republic of China
| | - Lei Zhao
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 YongAn Road, Beijing, 100050, People's Republic of China
| | - Zhenghan Yang
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 YongAn Road, Beijing, 100050, People's Republic of China
| | - Hong Song
- School of Computer Science and Technology, Beijing Institute of Technology, No. 5, South Street, Zhongguancun, Haidian District, Beijing, 100050, People's Republic of China.
| | - Han Lv
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 YongAn Road, Beijing, 100050, People's Republic of China.
| | - Zhenchang Wang
- Department of Radiology, Beijing Friendship Hospital, Capital Medical University, No. 95 YongAn Road, Beijing, 100050, People's Republic of China. .,School of Biological Science and Medical Engineering, Beihang University, No.37 XueYuan Road, Beijing, 100191, People's Republic of China.
| |
Collapse
|
19
|
Wang AY, Leung LY, Puttock EJ, Luetmer PH, Kallmes DF, Nelson J, Fu S, Zheng C, Liu H, Chen W, Kent DM. Stratifying Future Stroke Risk with Incidentally Discovered White Matter Disease Severity and Covert Brain Infarct Site. Cerebrovasc Dis 2022; 52:117-122. [PMID: 35760063 PMCID: PMC9792629 DOI: 10.1159/000524723] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Accepted: 03/17/2022] [Indexed: 02/04/2023] Open
Abstract
BACKGROUND Covert cerebrovascular disease (CCD) includes white matter disease (WMD) and covert brain infarction (CBI). Incidentally discovered CCD is associated with increased risk of subsequent symptomatic stroke. However, it is unknown whether the severity of WMD or the location of CBI predicts risk. OBJECTIVES The aim of this study was to examine the association of incidentally discovered WMD severity and CBI location with risk of subsequent symptomatic stroke. METHOD This retrospective cohort study includes patients aged ≥50 years old in the Kaiser Permanente Southern California health system who received neuroimaging for a nonstroke indication between 2009 and 2019. Incidental CBI and WMD were identified via natural language processing of the neuroimage report, and WMD severity was classified into grades. RESULTS A total of 261,960 patients received neuroimaging; 78,555 patients (30.0%) were identified to have incidental WMD and 12,857 patients (4.9%) to have incidental CBI. Increasing WMD severity is associated with an increased incidence rate of future stroke. However, the stroke incidence rate in CT-identified WMD is higher at each level of severity compared to rates in MRI-identified WMD. Patients with mild WMD via CT have a stroke incidence rate of 24.9 per 1,000 person-years, similar to that of patients with severe WMD via MRI. Among incidentally discovered CBI patients with a determined CBI location, 97.9% are subcortical rather than cortical infarcts. CBI confers a similar risk of future stroke, whether cortical or subcortical or whether MRI- or CT-detected. CONCLUSIONS Increasing severity of incidental WMD is associated with an increased risk of future symptomatic stroke, dependent on the imaging modality. Subcortical and cortical CBI conferred similar risks.
Collapse
Affiliation(s)
- Andy Y. Wang
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA
| | - Lester Y. Leung
- Department of Neurology, Tufts Medical Center, Boston, MA, USA
| | - Eric J. Puttock
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, USA
| | | | | | - Jason Nelson
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA
| | - Sunyang Fu
- Department of AI and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Chengyi Zheng
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, USA
| | - Hongfang Liu
- Department of AI and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Wansu Chen
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, USA
| | - David M. Kent
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA
| |
Collapse
|
20
|
Leung LY, Zhou Y, Fu S, Zheng C, Luetmer PH, Kallmes DF, Liu H, Chen W, Kent DM. Risk Factors for Silent Brain Infarcts and White Matter Disease in a Real-World Cohort Identified by Natural Language Processing. Mayo Clin Proc 2022; 97:1114-1122. [PMID: 35487789 PMCID: PMC9284412 DOI: 10.1016/j.mayocp.2021.11.038] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 04/16/2021] [Revised: 11/17/2021] [Accepted: 11/29/2021] [Indexed: 11/21/2022]
Abstract
OBJECTIVE To assess the frequency of silent brain infarcts (SBIs) and white matter disease (WMD) and associations with stroke risk factors (RFs) in a real-world population. PATIENTS AND METHODS This was an observational study of patients 50 years or older in the Kaiser Permanente Southern California health system from January 1, 2009, through June 30, 2019, with head computed tomography or magnetic resonance imaging for nonstroke indications and no history of stroke, transient ischemic attack, or dementia. A natural language processing (NLP) algorithm was applied to the electronic health record to identify individuals with reported SBIs or WMD. Multivariable Poisson regression estimated risk ratios of demographic characteristics, RFs, and scan modality on the presence of SBIs or WMD. RESULTS Among 262,875 individuals, the NLP identified 13,154 (5.0%) with SBIs and 78,330 (29.8%) with WMD. Stroke RFs were highly prevalent. Advanced age was strongly associated with increased risk of SBIs (adjusted relative risks [aRRs], 1.90, 3.23, and 4.72 for those aged in their 60s, 70s, and ≥80s compared with those in their 50s) and increased risk of WMD (aRRs, 1.79, 3.02, and 4.53 for those aged in their 60s, 70s, and ≥80s compared with those in their 50s). Magnetic resonance imaging was associated with a reduced risk of SBIs (aRR, 0.87; 95% CI, 0.83 to 0.91) and an increased risk of WMD (aRR, 2.86; 95% CI, 2.83 to 2.90). Stroke RFs had modest associations with increased risk of SBIs or WMD. CONCLUSION An NLP algorithm can identify a large cohort of patients with incidentally discovered SBIs and WMD. Advanced age is strongly associated with incidentally discovered SBIs and WMD.
Collapse
Affiliation(s)
- Lester Y Leung
- Department of Neurology, Tufts Medical Center, Boston, MA, USA.
| | - Yichen Zhou
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, USA
| | - Sunyang Fu
- Department of AI and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Chengyi Zheng
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, USA
| | | | | | - Hongfang Liu
- Department of AI and Informatics, Mayo Clinic, Rochester, MN, USA
| | - Wansu Chen
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA, USA
| | - David M Kent
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA, USA
| |
Collapse
|
21
|
Fu S, Lopes GS, Pagali SR, Thorsteinsdottir B, LeBrasseur NK, Wen A, Liu H, Rocca WA, Olson JE, St. Sauver J, Sohn S. Ascertainment of Delirium Status Using Natural Language Processing From Electronic Health Records. J Gerontol A Biol Sci Med Sci 2022; 77:524-530. [PMID: 35239951 PMCID: PMC8893184 DOI: 10.1093/gerona/glaa275] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2020] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Delirium is underdiagnosed in clinical practice and is not routinely coded for billing. Manual chart review can be used to identify the occurrence of delirium; however, it is labor-intensive and impractical for large-scale studies. Natural language processing (NLP) has the capability to process raw text in electronic health records (EHRs) and determine the meaning of the information. We developed and validated NLP algorithms to automatically identify the occurrence of delirium from EHRs. METHODS This study used a randomly selected cohort from the population-based Mayo Clinic Biobank (N = 300, age ≥65). We adopted the standardized evidence-based framework confusion assessment method (CAM) to develop and evaluate NLP algorithms to identify the occurrence of delirium using clinical notes in EHRs. Two NLP algorithms were developed based on CAM criteria: one based on the original CAM (NLP-CAM; delirium vs no delirium) and another based on our modified CAM (NLP-mCAM; definite, possible, and no delirium). The sensitivity, specificity, and accuracy were used for concordance in delirium status between NLP algorithms and manual chart review as the gold standard. The prevalence of delirium cases was examined using International Classification of Diseases, 9th Revision (ICD-9), NLP-CAM, and NLP-mCAM. RESULTS NLP-CAM demonstrated a sensitivity, specificity, and accuracy of 0.919, 1.000, and 0.967, respectively. NLP-mCAM demonstrated sensitivity, specificity, and accuracy of 0.827, 0.913, and 0.827, respectively. The prevalence analysis of delirium showed that the NLP-CAM algorithm identified 12 651 (9.4%) delirium patients, the NLP-mCAM algorithm identified 20 611 (15.3%) definite delirium cases, and 10 762 (8.0%) possible cases. CONCLUSIONS NLP algorithms based on the standardized evidence-based CAM framework demonstrated high performance in delineating delirium status in an expeditious and cost-effective manner.
Collapse
Affiliation(s)
- Sunyang Fu
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
- University of Minnesota, Minneapolis
| | - Guilherme S Lopes
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| | | | | | - Nathan K LeBrasseur
- Department of Physical Medicine & Rehabilitation, Mayo Clinic, Rochester, Minnesota
- Department of Physiology & Biomedical Engineering, Mayo Clinic, Rochester, Minnesota
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| | - Walter A Rocca
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| | - Janet E Olson
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| | | | - Sunghwan Sohn
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota
| |
Collapse
|
22
|
Crema C, Attardi G, Sartiano D, Redolfi A. Natural language processing in clinical neuroscience and psychiatry: A review. Front Psychiatry 2022; 13:946387. [PMID: 36186874 PMCID: PMC9515453 DOI: 10.3389/fpsyt.2022.946387] [Citation(s) in RCA: 15] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 05/17/2022] [Accepted: 08/22/2022] [Indexed: 11/13/2022] Open
Abstract
Natural language processing (NLP) is rapidly becoming an important topic in the medical community. The ability to automatically analyze any type of medical document could be the key factor to fully exploit the data it contains. Cutting-edge artificial intelligence (AI) architectures, particularly machine learning and deep learning, have begun to be applied to this topic and have yielded promising results. We conducted a literature search for 1,024 papers that used NLP technology in neuroscience and psychiatry from 2010 to early 2022. After a selection process, 115 papers were evaluated. Each publication was classified into one of three categories: information extraction, classification, and data inference. Automated understanding of clinical reports in electronic health records has the potential to improve healthcare delivery. Overall, the performance of NLP applications is high, with an average F1-score and AUC above 85%. We also derived a composite measure in the form of Z-scores to better compare the performance of NLP models and their different classes as a whole. No statistical differences were found in the unbiased comparison. Strong asymmetry between English and non-English models, difficulty in obtaining high-quality annotated data, and train biases causing low generalizability are the main limitations. This review suggests that NLP could be an effective tool to help clinicians gain insights from medical reports, clinical research forms, and more, making NLP an effective tool to improve the quality of healthcare services.
Collapse
Affiliation(s)
- Claudio Crema
- Laboratory of Neuroinformatics, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| | | | - Daniele Sartiano
- Istituto di Informatica e Telematica, Consiglio Nazionale delle Ricerche, Pisa, Italy
| | - Alberto Redolfi
- Laboratory of Neuroinformatics, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| |
Collapse
|
23
|
Automatic detection of actionable radiology reports using bidirectional encoder representations from transformers. BMC Med Inform Decis Mak 2021; 21:262. [PMID: 34511100 PMCID: PMC8436473 DOI: 10.1186/s12911-021-01623-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2021] [Accepted: 08/23/2021] [Indexed: 01/27/2023] Open
Abstract
Background It is essential for radiologists to communicate actionable findings to the referring clinicians reliably. Natural language processing (NLP) has been shown to help identify free-text radiology reports including actionable findings. However, the application of recent deep learning techniques to radiology reports, which can improve the detection performance, has not been thoroughly examined. Moreover, free-text that clinicians input in the ordering form (order information) has seldom been used to identify actionable reports. This study aims to evaluate the benefits of two new approaches: (1) bidirectional encoder representations from transformers (BERT), a recent deep learning architecture in NLP, and (2) using order information in addition to radiology reports. Methods We performed a binary classification to distinguish actionable reports (i.e., radiology reports tagged as actionable in actual radiological practice) from non-actionable ones (those without an actionable tag). 90,923 Japanese radiology reports in our hospital were used, of which 788 (0.87%) were actionable. We evaluated four methods, statistical machine learning with logistic regression (LR) and with gradient boosting decision tree (GBDT), and deep learning with a bidirectional long short-term memory (LSTM) model and a publicly available Japanese BERT model. Each method was used with two different inputs, radiology reports alone and pairs of order information and radiology reports. Thus, eight experiments were conducted to examine the performance. Results Without order information, BERT achieved the highest area under the precision-recall curve (AUPRC) of 0.5138, which showed a statistically significant improvement over LR, GBDT, and LSTM, and the highest area under the receiver operating characteristic curve (AUROC) of 0.9516. Simply coupling the order information with the radiology reports slightly increased the AUPRC of BERT but did not lead to a statistically significant improvement. This may be due to the complexity of clinical decisions made by radiologists. Conclusions BERT was assumed to be useful to detect actionable reports. More sophisticated methods are required to use order information effectively.
Collapse
|
24
|
Kent DM, Leung LY, Zhou Y, Luetmer PH, Kallmes DF, Nelson J, Fu S, Zheng C, Liu H, Chen W. Association of Silent Cerebrovascular Disease Identified Using Natural Language Processing and Future Ischemic Stroke. Neurology 2021; 97:e1313-e1321. [PMID: 34376505 PMCID: PMC8480402 DOI: 10.1212/wnl.0000000000012602] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2021] [Accepted: 07/20/2021] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND AND OBJECTIVE Silent cerebrovascular disease (SCD), comprised of silent brain infarction (SBI) and white matter disease (WMD), is commonly found incidentally on neuroimaging scans obtained in routine clinical care. However, their prognostic significance is not known. We aimed to estimate the incidence of, and risk increase in, future stroke in patients with incidentally-discovered SCD. METHODS: Patients in Kaiser Permanente Southern California (KPSC) health system aged ≥ 50, without prior ischemic stroke, transient ischemic attack, or dementia/Alzheimer's disease receiving a head CT or MRI between 2009-2019 were included. SBI and WMD were identified by natural language processing (NLP) from the neuroimage report. RESULTS Among 262,875 individuals receiving neuroimaging, NLP identified 13,154 (5.0%) with SBI and 78,330 (29.8%) with WMD. The incidence of future stroke was 32.5 (95% CI 31.1, 33·9) per 1,000 patient-years for patients with SBI; 1.·3 (95% CI 18.9, 19.8) for patients with WMD and 6.8 (95% CI 6.7, 7.0) for patients without SCD. The crude HR associated with SBI was 3.40 (95% CI 3.25 to 3.56); and for WMD was 2.63 (95% CI 2.54 to 2·71). With MRI-discovered SBI, the adjusted HR was 2.95 (95% CI 2.53 to 3.44) for those < age 65 and 2.15 (95% CI 1.91 to 2.41) for those ≥ age 65. With CT scan, the adjusted HR was 2.48 (95% CI 2.19 to 2.81) for those < age 65 and 1.81 (95% CI 1.71 to 1.91) for those >= age 65. The adjusted HR associated with a finding of WMD was 1.76 (95% CI 1.69 to 1.82) and was not modified by age or imaging modality. DISCUSSION Incidentally-discovered SBI and WMD are common and associated with increased risk of subsequent symptomatic stroke representing an important opportunity for stroke prevention.
Collapse
Affiliation(s)
- David M Kent
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA
| | - Lester Y Leung
- Department of Neurology, Tufts Medical Center, Boston, MA
| | - Yichen Zhou
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA
| | | | | | - Jason Nelson
- Predictive Analytics and Comparative Effectiveness Center, Tufts Medical Center, Boston, MA
| | - Sunyang Fu
- Division of Digital Health Services, Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Chengyi Zheng
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA
| | - Hongfang Liu
- Division of Digital Health Services, Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Wansu Chen
- Department of Research and Evaluation, Kaiser Permanente Southern California, Pasadena, CA
| |
Collapse
|
25
|
Canales L, Menke S, Marchesseau S, D'Agostino A, Del Rio-Bermudez C, Taberna M, Tello J. Assessing the Performance of Clinical Natural Language Processing Systems: Development of an Evaluation Methodology. JMIR Med Inform 2021; 9:e20492. [PMID: 34297002 PMCID: PMC8367121 DOI: 10.2196/20492] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 07/31/2020] [Accepted: 06/17/2021] [Indexed: 12/22/2022] Open
Abstract
Background Clinical natural language processing (cNLP) systems are of crucial importance due to their increasing capability in extracting clinically important information from free text contained in electronic health records (EHRs). The conversion of a nonstructured representation of a patient’s clinical history into a structured format enables medical doctors to generate clinical knowledge at a level that was not possible before. Finally, the interpretation of the insights gained provided by cNLP systems has a great potential in driving decisions about clinical practice. However, carrying out robust evaluations of those cNLP systems is a complex task that is hindered by a lack of standard guidance on how to systematically approach them. Objective Our objective was to offer natural language processing (NLP) experts a methodology for the evaluation of cNLP systems to assist them in carrying out this task. By following the proposed phases, the robustness and representativeness of the performance metrics of their own cNLP systems can be assured. Methods The proposed evaluation methodology comprised five phases: (1) the definition of the target population, (2) the statistical document collection, (3) the design of the annotation guidelines and annotation project, (4) the external annotations, and (5) the cNLP system performance evaluation. We presented the application of all phases to evaluate the performance of a cNLP system called “EHRead Technology” (developed by Savana, an international medical company), applied in a study on patients with asthma. As part of the evaluation methodology, we introduced the Sample Size Calculator for Evaluations (SLiCE), a software tool that calculates the number of documents needed to achieve a statistically useful and resourceful gold standard. Results The application of the proposed evaluation methodology on a real use-case study of patients with asthma revealed the benefit of the different phases for cNLP system evaluations. By using SLiCE to adjust the number of documents needed, a meaningful and resourceful gold standard was created. In the presented use-case, using as little as 519 EHRs, it was possible to evaluate the performance of the cNLP system and obtain performance metrics for the primary variable within the expected CIs. Conclusions We showed that our evaluation methodology can offer guidance to NLP experts on how to approach the evaluation of their cNLP systems. By following the five phases, NLP experts can assure the robustness of their evaluation and avoid unnecessary investment of human and financial resources. Besides the theoretical guidance, we offer SLiCE as an easy-to-use, open-source Python library.
Collapse
Affiliation(s)
- Lea Canales
- Department of Software and Computing System, University of Alicante, Alicante, Spain
| | | | | | | | | | | | | |
Collapse
|
26
|
Casey A, Davidson E, Poon M, Dong H, Duma D, Grivas A, Grover C, Suárez-Paniagua V, Tobin R, Whiteley W, Wu H, Alex B. A systematic review of natural language processing applied to radiology reports. BMC Med Inform Decis Mak 2021; 21:179. [PMID: 34082729 PMCID: PMC8176715 DOI: 10.1186/s12911-021-01533-7] [Citation(s) in RCA: 87] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Accepted: 05/17/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Natural language processing (NLP) has a significant role in advancing healthcare and has been found to be key in extracting structured information from radiology reports. Understanding recent developments in NLP application to radiology is of significance but recent reviews on this are limited. This study systematically assesses and quantifies recent literature in NLP applied to radiology reports. METHODS We conduct an automated literature search yielding 4836 results using automated filtering, metadata enriching steps and citation search combined with manual review. Our analysis is based on 21 variables including radiology characteristics, NLP methodology, performance, study, and clinical application characteristics. RESULTS We present a comprehensive analysis of the 164 publications retrieved with publications in 2019 almost triple those in 2015. Each publication is categorised into one of 6 clinical application categories. Deep learning use increases in the period but conventional machine learning approaches are still prevalent. Deep learning remains challenged when data is scarce and there is little evidence of adoption into clinical practice. Despite 17% of studies reporting greater than 0.85 F1 scores, it is hard to comparatively evaluate these approaches given that most of them use different datasets. Only 14 studies made their data and 15 their code available with 10 externally validating results. CONCLUSIONS Automated understanding of clinical narratives of the radiology reports has the potential to enhance the healthcare process and we show that research in this field continues to grow. Reproducibility and explainability of models are important if the domain is to move applications into clinical use. More could be done to share code enabling validation of methods on different institutional data and to reduce heterogeneity in reporting of study properties allowing inter-study comparisons. Our results have significance for researchers in the field providing a systematic synthesis of existing work to build on, identify gaps, opportunities for collaboration and avoid duplication.
Collapse
Affiliation(s)
- Arlene Casey
- School of Literatures, Languages and Cultures (LLC), University of Edinburgh, Edinburgh, Scotland
| | - Emma Davidson
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland
| | - Michael Poon
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland
| | - Hang Dong
- Centre for Medical Informatics, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, Scotland
- Health Data Research UK, London, UK
| | - Daniel Duma
- School of Literatures, Languages and Cultures (LLC), University of Edinburgh, Edinburgh, Scotland
| | - Andreas Grivas
- Institute for Language, Cognition and Computation, School of informatics, University of Edinburgh, Edinburgh, Scotland
| | - Claire Grover
- Institute for Language, Cognition and Computation, School of informatics, University of Edinburgh, Edinburgh, Scotland
| | - Víctor Suárez-Paniagua
- Centre for Medical Informatics, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, Scotland
- Health Data Research UK, London, UK
| | - Richard Tobin
- Institute for Language, Cognition and Computation, School of informatics, University of Edinburgh, Edinburgh, Scotland
| | - William Whiteley
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, Scotland
- Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Honghan Wu
- Health Data Research UK, London, UK
- Institute of Health Informatics, University College London, London, UK
| | - Beatrice Alex
- School of Literatures, Languages and Cultures (LLC), University of Edinburgh, Edinburgh, Scotland
- Edinburgh Futures Institute, University of Edinburgh, Edinburgh, Scotland
| |
Collapse
|
27
|
Mastorakos G, Khurana A, Huang M, Fu S, Tafti AP, Fan J, Liu H. Probing Patient Messages Enhanced by Natural Language Processing: A Top-Down Message Corpus Analysis. HEALTH DATA SCIENCE 2021; 2021:1504854. [PMID: 38487509 PMCID: PMC10877700 DOI: 10.34133/2021/1504854] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2020] [Accepted: 02/07/2021] [Indexed: 03/17/2024]
Abstract
Background. Patients increasingly use asynchronous communication platforms to converse with care teams. Natural language processing (NLP) to classify content and automate triage of these messages has great potential to enhance clinical efficiency. We characterize the contents of a corpus of portal messages generated by patients using NLP methods. We aim to demonstrate descriptive analyses of patient text that can contribute to the development of future sophisticated NLP applications. Methods. We collected approximately 3,000 portal messages from the cardiology, dermatology, and gastroenterology departments at Mayo Clinic. After labeling these messages as either Active Symptom, Logistical, Prescription, or Update, we used NER (named entity recognition) to identify medical concepts based on the UMLS library. We hierarchically analyzed the distribution of these messages in terms of departments, message types, medical concepts, and keywords therewithin. Results. Active Symptom and Logistical content types comprised approximately 67% of the message cohort. The "Findings" medical concept had the largest number of keywords across all groupings of content types and departments. "Anatomical Sites" and "Disorders" keywords were more prevalent in Active Symptom messages, while "Drugs" keywords were most prevalent in Prescription messages. Logistical messages tended to have the lower proportions of "Anatomical Sites,", "Disorders,", "Drugs,", and "Findings" keywords when compared to other message content types. Conclusions. This descriptive corpus analysis sheds light on the content and foci of portal messages. The insight into the content and differences among message themes can inform the development of more robust NLP models.
Collapse
Affiliation(s)
- George Mastorakos
- Mayo Clinic Alix School of Medicine, Mayo Clinic, Scottsdale, AZ, USA
| | - Aditya Khurana
- Mayo Clinic Alix School of Medicine, Mayo Clinic, Scottsdale, AZ, USA
| | - Ming Huang
- Mayo Clinic, Department of Health Sciences Research, Rochester, MN, USA
| | - Sunyang Fu
- Mayo Clinic, Department of Health Sciences Research, Rochester, MN, USA
| | - Ahmad P. Tafti
- Computer Science Department, University of Southern Maine, Portland, Maine, USA
- Dubyak Center for Digital Science and Innovation, University of Southern Maine, Portland, Maine, USA
| | - Jungwei Fan
- Mayo Clinic, Department of Health Sciences Research, Rochester, MN, USA
| | - Hongfang Liu
- Mayo Clinic, Department of Health Sciences Research, Rochester, MN, USA
| |
Collapse
|
28
|
Datta S, Khanpara S, Riascos RF, Roberts K. Leveraging Spatial Information in Radiology Reports for Ischemic Stroke Phenotyping. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2021; 2021:170-179. [PMID: 34457131 PMCID: PMC8378604] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Classifying fine-grained ischemic stroke phenotypes relies on identifying important clinical information. Radiology reports provide relevant information with context to determine such phenotype information. We focus on stroke phenotypes with location-specific information: brain region affected, laterality, stroke stage, and lacunarity. We use an existing fine-grained spatial information extraction system-Rad-SpatialNet-to identify clinically important information and apply simple domain rules on the extracted information to classify phenotypes. The performance of our proposed approach is promising (recall of 89.62% for classifying brain region and 74.11% for classifying brain region, side, and stroke stage together). Our work demonstrates that an information extraction system based on a fine-grained schema can be utilized to determine complex phenotypes with the inclusion of simple domain rules. These phenotypes have the potential to facilitate stroke research focusing on post-stroke outcome and treatment planning based on the stroke location.
Collapse
Affiliation(s)
- Surabhi Datta
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX
| | - Shekhar Khanpara
- McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX
| | - Roy F Riascos
- McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, TX
| | - Kirk Roberts
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX
| |
Collapse
|
29
|
Leung LY, Fu S, Luetmer PH, Kallmes DF, Madan N, Weinstein G, Lehman VT, Rydberg CH, Nelson J, Liu H, Kent DM. Agreement between neuroimages and reports for natural language processing-based detection of silent brain infarcts and white matter disease. BMC Neurol 2021; 21:189. [PMID: 33975556 PMCID: PMC8111708 DOI: 10.1186/s12883-021-02221-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 04/30/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND There are numerous barriers to identifying patients with silent brain infarcts (SBIs) and white matter disease (WMD) in routine clinical care. A natural language processing (NLP) algorithm may identify patients from neuroimaging reports, but it is unclear if these reports contain reliable information on these findings. METHODS Four radiology residents reviewed 1000 neuroimaging reports (RI) of patients age > 50 years without clinical histories of stroke, TIA, or dementia for the presence, acuity, and location of SBIs, and the presence and severity of WMD. Four neuroradiologists directly reviewed a subsample of 182 images (DR). An NLP algorithm was developed to identify findings in reports. We assessed interrater reliability for DR and RI, and agreement between these two and with NLP. RESULTS For DR, interrater reliability was moderate for the presence of SBIs (k = 0.58, 95 % CI 0.46-0.69) and WMD (k = 0.49, 95 % CI 0.35-0.63), and moderate to substantial for characteristics of SBI and WMD. Agreement between DR and RI was substantial for the presence of SBIs and WMD, and fair to substantial for characteristics of SBIs and WMD. Agreement between NLP and DR was substantial for the presence of SBIs (k = 0.64, 95 % CI 0.53-0.76) and moderate (k = 0.52, 95 % CI 0.39-0.65) for the presence of WMD. CONCLUSIONS Neuroimaging reports in routine care capture the presence of SBIs and WMD. An NLP can identify these findings (comparable to direct imaging review) and can likely be used for cohort identification.
Collapse
Affiliation(s)
- Lester Y Leung
- Department of Neurology, Tufts Medical Center, Box 314, 800 Washington Street, Boston, MA, 02111, USA.
| | - Sunyang Fu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | | | | | - Neel Madan
- Department of Radiology, Tufts Medical Center, Boston, MA, USA
| | - Gene Weinstein
- Department of Radiology, Tufts Medical Center, Boston, MA, USA
| | - Vance T Lehman
- Department of Radiology, Mayo Clinic, Rochester, MN, USA
| | | | - Jason Nelson
- Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - David M Kent
- Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, USA
| |
Collapse
|
30
|
Fu S, Chen D, He H, Liu S, Moon S, Peterson KJ, Shen F, Wang L, Wang Y, Wen A, Zhao Y, Sohn S, Liu H. Clinical concept extraction: A methodology review. J Biomed Inform 2020; 109:103526. [PMID: 32768446 PMCID: PMC7746475 DOI: 10.1016/j.jbi.2020.103526] [Citation(s) in RCA: 70] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2020] [Revised: 07/30/2020] [Accepted: 08/02/2020] [Indexed: 01/11/2023]
Abstract
BACKGROUND Concept extraction, a subdomain of natural language processing (NLP) with a focus on extracting concepts of interest, has been adopted to computationally extract clinical information from text for a wide range of applications ranging from clinical decision support to care quality improvement. OBJECTIVES In this literature review, we provide a methodology review of clinical concept extraction, aiming to catalog development processes, available methods and tools, and specific considerations when developing clinical concept extraction applications. METHODS Based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines, a literature search was conducted for retrieving EHR-based information extraction articles written in English and published from January 2009 through June 2019 from Ovid MEDLINE In-Process & Other Non-Indexed Citations, Ovid MEDLINE, Ovid EMBASE, Scopus, Web of Science, and the ACM Digital Library. RESULTS A total of 6,686 publications were retrieved. After title and abstract screening, 228 publications were selected. The methods used for developing clinical concept extraction applications were discussed in this review.
Collapse
Affiliation(s)
- Sunyang Fu
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States; University of Minnesota - Twin Cities, Minneapolis, MN 55455, United States.
| | - David Chen
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Huan He
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Sijia Liu
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Sungrim Moon
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Kevin J Peterson
- Department of Information Technology, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States; University of Minnesota - Twin Cities, Minneapolis, MN 55455, United States.
| | - Feichen Shen
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Liwei Wang
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Yiqing Zhao
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Sunghwan Sohn
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States.
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, 200 First Street SW, Rochester, MN 55905, United States; University of Minnesota - Twin Cities, Minneapolis, MN 55455, United States.
| |
Collapse
|
31
|
Zhao Y, Yu H, Fu S, Shen F, Davila JI, Liu H, Wang C. Data-driven Sublanguage Analysis for Cancer Genomics Knowledge Modeling: Applications in Mining Oncological Genetics Information from Patients' Genetic Reports. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2020; 2020:720-729. [PMID: 32477695 PMCID: PMC7233104] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Despite an abundance of information in clinical genetic testing reports, information is oftentimes not well documented/utilized for decision making. Unstructured information in genetic reports can contribute to long-term patient management and future translational research. Thus, we proposed a knowledge model that could manage unstructured information in medical genetic reports and facilitate knowledge extraction, curation and updating. For this pilot study, we used a dataset including 1,565 cancer genetics reports of Mayo Clinic patients. We used a previously developed, data-driven discovery pipeline that involves both semantic annotation and co-occurrence association analysis to establish a knowledge model. We showed that compared to genetic reports, around 56% of testing results are missing or incomplete in the clinical notes. We built a genetic report knowledge model and highlighted four key semantic groups including "Genes and Gene Products" and "Treatments". Coverage of term annotation was 99.5%. Accuracies of term annotation and relationship extraction were 98.9% and 92.9% respectively.
Collapse
Affiliation(s)
- Yiqing Zhao
- Division of Digital Health Sciences, Mayo Clinic, Rochester, MN
| | - Hanzhong Yu
- Division of Digital Health Sciences, Mayo Clinic, Rochester, MN
| | - Sunyang Fu
- Division of Digital Health Sciences, Mayo Clinic, Rochester, MN
| | - Feichen Shen
- Division of Digital Health Sciences, Mayo Clinic, Rochester, MN
| | - Jaime I Davila
- Division of Digital Health Sciences, Mayo Clinic, Rochester, MN
| | - Hongfang Liu
- Division of Digital Health Sciences, Mayo Clinic, Rochester, MN
| | - Chen Wang
- Division of Digital Health Sciences, Mayo Clinic, Rochester, MN
| |
Collapse
|
32
|
Fu S, Leung LY, Raulli AO, Kallmes DF, Kinsman KA, Nelson KB, Clark MS, Luetmer PH, Kingsbury PR, Kent DM, Liu H. Assessment of the impact of EHR heterogeneity for clinical research through a case study of silent brain infarction. BMC Med Inform Decis Mak 2020; 20:60. [PMID: 32228556 PMCID: PMC7106829 DOI: 10.1186/s12911-020-1072-9] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Accepted: 03/12/2020] [Indexed: 01/14/2023] Open
Abstract
Background The rapid adoption of electronic health records (EHRs) holds great promise for advancing medicine through practice-based knowledge discovery. However, the validity of EHR-based clinical research is questionable due to poor research reproducibility caused by the heterogeneity and complexity of healthcare institutions and EHR systems, the cross-disciplinary nature of the research team, and the lack of standard processes and best practices for conducting EHR-based clinical research. Method We developed a data abstraction framework to standardize the process for multi-site EHR-based clinical studies aiming to enhance research reproducibility. The framework was implemented for a multi-site EHR-based research project, the ESPRESSO project, with the goal to identify individuals with silent brain infarctions (SBI) at Tufts Medical Center (TMC) and Mayo Clinic. The heterogeneity of healthcare institutions, EHR systems, documentation, and process variation in case identification was assessed quantitatively and qualitatively. Result We discovered a significant variation in the patient populations, neuroimaging reporting, EHR systems, and abstraction processes across the two sites. The prevalence of SBI for patients over age 50 for TMC and Mayo is 7.4 and 12.5% respectively. There is a variation regarding neuroimaging reporting where TMC are lengthy, standardized and descriptive while Mayo’s reports are short and definitive with more textual variations. Furthermore, differences in the EHR system, technology infrastructure, and data collection process were identified. Conclusion The implementation of the framework identified the institutional and process variations and the heterogeneity of EHRs across the sites participating in the case study. The experiment demonstrates the necessity to have a standardized process for data abstraction when conducting EHR-based clinical studies.
Collapse
Affiliation(s)
- Sunyang Fu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Lester Y Leung
- Department of Neurology, Tufts Medical Center, Boston, MA, USA
| | | | | | | | | | | | | | - Paul R Kingsbury
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - David M Kent
- Institute for Clinical Research and Health Policy Studies, Tufts Medical Center, Boston, MA, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA.
| |
Collapse
|
33
|
Wen A, Fu S, Moon S, El Wazir M, Rosenbaum A, Kaggal VC, Liu S, Sohn S, Liu H, Fan J. Desiderata for delivering NLP to accelerate healthcare AI advancement and a Mayo Clinic NLP-as-a-service implementation. NPJ Digit Med 2019; 2:130. [PMID: 31872069 PMCID: PMC6917754 DOI: 10.1038/s41746-019-0208-8] [Citation(s) in RCA: 65] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2019] [Accepted: 11/25/2019] [Indexed: 12/23/2022] Open
Abstract
Data is foundational to high-quality artificial intelligence (AI). Given that a substantial amount of clinically relevant information is embedded in unstructured data, natural language processing (NLP) plays an essential role in extracting valuable information that can benefit decision making, administration reporting, and research. Here, we share several desiderata pertaining to development and usage of NLP systems, derived from two decades of experience implementing clinical NLP at the Mayo Clinic, to inform the healthcare AI community. Using a framework, we developed as an example implementation, the desiderata emphasize the importance of a user-friendly platform, efficient collection of domain expert inputs, seamless integration with clinical data, and a highly scalable computing infrastructure.
Collapse
Affiliation(s)
- Andrew Wen
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Sunyang Fu
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Sungrim Moon
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Mohamed El Wazir
- 2Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN USA
| | - Andrew Rosenbaum
- 2Department of Cardiovascular Medicine, Mayo Clinic, Rochester, MN USA
| | - Vinod C Kaggal
- 3Advanced Analytics Service Unit, Department of Information Technology, Mayo Clinic, Rochester, MN USA
| | - Sijia Liu
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Sunghwan Sohn
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Hongfang Liu
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| | - Jungwei Fan
- 1Division of Digital Health Sciences, Department of Health Sciences Research, Mayo Clinic, Rochester, MN USA
| |
Collapse
|
34
|
Wheater E, Mair G, Sudlow C, Alex B, Grover C, Whiteley W. A validated natural language processing algorithm for brain imaging phenotypes from radiology reports in UK electronic health records. BMC Med Inform Decis Mak 2019; 19:184. [PMID: 31500613 PMCID: PMC6734359 DOI: 10.1186/s12911-019-0908-7] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2019] [Accepted: 09/03/2019] [Indexed: 11/26/2022] Open
Abstract
Background Manual coding of phenotypes in brain radiology reports is time consuming. We developed a natural language processing (NLP) algorithm to enable automatic identification of brain imaging in radiology reports performed in routine clinical practice in the UK National Health Service (NHS). Methods We used anonymized text brain imaging reports from a cohort study of stroke/TIA patients and from a regional hospital to develop and test an NLP algorithm. Two experts marked up text in 1692 reports for 24 cerebrovascular and other neurological phenotypes. We developed and tested a rule-based NLP algorithm first within the cohort study, and further evaluated it in the reports from the regional hospital. Results The agreement between expert readers was excellent (Cohen’s κ =0.93) in both datasets. In the final test dataset (n = 700) in unseen regional hospital reports, the algorithm had very good performance for a report of any ischaemic stroke [sensitivity 89% (95% CI:81–94); positive predictive value (PPV) 85% (76–90); specificity 100% (95% CI:0.99–1.00)]; any haemorrhagic stroke [sensitivity 96% (95% CI: 80–99), PPV 72% (95% CI:55–84); specificity 100% (95% CI:0.99–1.00)]; brain tumours [sensitivity 96% (CI:87–99); PPV 84% (73–91); specificity: 100% (95% CI:0.99–1.00)] and cerebral small vessel disease and cerebral atrophy (sensitivity, PPV and specificity all > 97%). We obtained few reports of subarachnoid haemorrhage, microbleeds or subdural haematomas. In 110,695 reports from NHS Tayside, atrophy (n = 28,757, 26%), small vessel disease (15,015, 14%) and old, deep ischaemic strokes (10,636, 10%) were the commonest findings. Conclusions An NLP algorithm can be developed in UK NHS radiology records to allow identification of cohorts of patients with important brain imaging phenotypes at a scale that would otherwise not be possible.
Collapse
Affiliation(s)
- Emily Wheater
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, UK
| | - Grant Mair
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, UK
| | - Cathie Sudlow
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, UK.,Centre for Medical Informatics, Usher Institute of Population Health Sciences and Informatics, University of Edinburgh, Edinburgh, UK.,Health Data Research UK Scotland, Edinburgh, UK
| | - Beatrice Alex
- The Alan Turing Institute, British Library, 96 Euston Road, London, UK.,Institute for Language, Cognition and Computation, School of Informatics, University of Edinburgh, Edinburgh, UK
| | - Claire Grover
- The Alan Turing Institute, British Library, 96 Euston Road, London, UK.,Institute for Language, Cognition and Computation, School of Informatics, University of Edinburgh, Edinburgh, UK
| | - William Whiteley
- Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, UK. .,Nuffield Department of Population Health, University of Oxford, Oxford, UK.
| |
Collapse
|