1
|
Le KDR, Tay SBP, Choy KT, Verjans J, Sasanelli N, Kong JCH. Applications of natural language processing tools in the surgical journey. Front Surg 2024; 11:1403540. [PMID: 38826809 PMCID: PMC11140056 DOI: 10.3389/fsurg.2024.1403540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 05/07/2024] [Indexed: 06/04/2024] Open
Abstract
Background Natural language processing tools are becoming increasingly adopted in multiple industries worldwide. They have shown promising results however their use in the field of surgery is under-recognised. Many trials have assessed these benefits in small settings with promising results before large scale adoption can be considered in surgery. This study aims to review the current research and insights into the potential for implementation of natural language processing tools into surgery. Methods A narrative review was conducted following a computer-assisted literature search on Medline, EMBASE and Google Scholar databases. Papers related to natural language processing tools and consideration into their use for surgery were considered. Results Current applications of natural language processing tools within surgery are limited. From the literature, there is evidence of potential improvement in surgical capability and service delivery, such as through the use of these technologies to streamline processes including surgical triaging, data collection and auditing, surgical communication and documentation. Additionally, there is potential to extend these capabilities to surgical academia to improve processes in surgical research and allow innovation in the development of educational resources. Despite these outcomes, the evidence to support these findings are challenged by small sample sizes with limited applicability to broader settings. Conclusion With the increasing adoption of natural language processing technology, such as in popular forms like ChatGPT, there has been increasing research in the use of these tools within surgery to improve surgical workflow and efficiency. This review highlights multifaceted applications of natural language processing within surgery, albeit with clear limitations due to the infancy of the infrastructure available to leverage these technologies. There remains room for more rigorous research into broader capability of natural language processing technology within the field of surgery and the need for cross-sectoral collaboration to understand the ways in which these algorithms can best be integrated.
Collapse
Affiliation(s)
- Khang Duy Ricky Le
- Department of General Surgical Specialties, The Royal Melbourne Hospital, Melbourne, VIC, Australia
- Department of Surgical Oncology, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Geelong Clinical School, Deakin University, Geelong, VIC, Australia
- Department of Medical Education, The University of Melbourne, Melbourne, VIC, Australia
| | - Samuel Boon Ping Tay
- Department of Anaesthesia and Pain Medicine, Eastern Health, Box Hill, VIC, Australia
| | - Kay Tai Choy
- Department of Surgery, Austin Health, Melbourne, VIC, Australia
| | - Johan Verjans
- Australian Institute for Machine Learning (AIML), University of Adelaide, Adelaide, SA, Australia
- Lifelong Health Theme (Platform AI), South Australian Health and Medical Research Institute, Adelaide, SA, Australia
| | - Nicola Sasanelli
- Division of Information Technology, Engineering and the Environment, University of South Australia, Adelaide, SA, Australia
- Department of Operations (Strategic and International Partnerships), SmartSAT Cooperative Research Centre, Adelaide, SA, Australia
- Agora High Tech, Adelaide, SA, Australia
| | - Joseph C. H. Kong
- Department of Surgical Oncology, Peter MacCallum Cancer Centre, Melbourne, VIC, Australia
- Monash University Department of Surgery, Alfred Hospital, Melbourne, VIC, Australia
- Department of Colorectal Surgery, Alfred Hospital, Melbourne, VIC, Australia
- Sir Peter MacCallum Department of Oncology, The University of Melbourne, Melbourne, VIC, Australia
| |
Collapse
|
2
|
Rajaganapathy S, Chowdhury S, Buchner V, He Z, Jiang X, Yang P, Cerhan JR, Zong N. Synoptic Reporting by Summarizing Cancer Pathology Reports using Large Language Models. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.26.24306452. [PMID: 38746270 PMCID: PMC11092736 DOI: 10.1101/2024.04.26.24306452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]
Abstract
Background Synoptic reporting, the documenting of clinical information in a structured manner, is known to improve patient care by reducing errors, increasing readability, interoperability, and report completeness. Despite its advantages, manually synthesizing synoptic reports from narrative reports is expensive and error prone when the number of structured fields are many. While the recent revolutionary developments in Large Language Models (LLMs) have significantly advanced natural language processing, their potential for innovations in medicine is yet to be fully evaluated. Objectives In this study, we explore the strengths and challenges of utilizing the state-of-the-art language models in the automatic synthesis of synoptic reports. Materials and Methods We use a corpus of 7,774 cancer related, narrative pathology reports, which have annotated reference synoptic reports from Mayo Clinic EHR. Using these annotations as a reference, we reconfigure the state-of-the-art large language models, such as LLAMA-2, to generate the synoptic reports. Our annotated reference synoptic reports contain 22 unique data elements. To evaluate the accuracy of the reports generated by the LLMs, we use several metrics including the BERT F1 Score and verify our results by manual validation. Results We show that using fine-tuned LLAMA-2 models, we can obtain BERT Score F1 of 0.86 or higher across all data elements and BERT F1 scores of 0.94 or higher on over 50% (11 of 22) of the questions. The BERT F1 scores translate to average accuracies of 76% and as high as 81% for short clinical reports. Conclusions We demonstrate successful automatic synoptic report generation by fine-tuning large language models.
Collapse
|
3
|
Truhn D, Loeffler CM, Müller-Franzes G, Nebelung S, Hewitt KJ, Brandner S, Bressem KK, Foersch S, Kather JN. Extracting structured information from unstructured histopathology reports using generative pre-trained transformer 4 (GPT-4). J Pathol 2024; 262:310-319. [PMID: 38098169 DOI: 10.1002/path.6232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/16/2023] [Accepted: 11/03/2023] [Indexed: 02/06/2024]
Abstract
Deep learning applied to whole-slide histopathology images (WSIs) has the potential to enhance precision oncology and alleviate the workload of experts. However, developing these models necessitates large amounts of data with ground truth labels, which can be both time-consuming and expensive to obtain. Pathology reports are typically unstructured or poorly structured texts, and efforts to implement structured reporting templates have been unsuccessful, as these efforts lead to perceived extra workload. In this study, we hypothesised that large language models (LLMs), such as the generative pre-trained transformer 4 (GPT-4), can extract structured data from unstructured plain language reports using a zero-shot approach without requiring any re-training. We tested this hypothesis by utilising GPT-4 to extract information from histopathological reports, focusing on two extensive sets of pathology reports for colorectal cancer and glioblastoma. We found a high concordance between LLM-generated structured data and human-generated structured data. Consequently, LLMs could potentially be employed routinely to extract ground truth data for machine learning from unstructured pathology reports in the future. © 2023 The Authors. The Journal of Pathology published by John Wiley & Sons Ltd on behalf of The Pathological Society of Great Britain and Ireland.
Collapse
Affiliation(s)
- Daniel Truhn
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Chiara Ml Loeffler
- Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany
- Department of Medicine I, University Hospital Dresden, Dresden, Germany
- Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany
| | - Gustav Müller-Franzes
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Sven Nebelung
- Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen, Aachen, Germany
| | - Katherine J Hewitt
- Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany
- Department of Medicine III, University Hospital RWTH Aachen, Aachen, Germany
| | - Sebastian Brandner
- Department of Neurosurgery, University Hospital Erlangen, Erlangen, Germany
| | - Keno K Bressem
- Department of Radiology, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Berlin, Germany
| | - Sebastian Foersch
- Institute of Pathology, University Medical Center Mainz, Mainz, Germany
| | - Jakob Nikolas Kather
- Else Kroener Fresenius Center for Digital Health, Technical University Dresden, Dresden, Germany
- Department of Medicine I, University Hospital Dresden, Dresden, Germany
- Medical Oncology, National Center for Tumor Diseases (NCT), University Hospital Heidelberg, Heidelberg, Germany
- Pathology and Data Analytics, Leeds Institute of Medical Research at St James's, University of Leeds, Leeds, UK
| |
Collapse
|
4
|
Kaufmann B, Busby D, Das CK, Tillu N, Menon M, Tewari AK, Gorin MA. Validation of a Zero-shot Learning Natural Language Processing Tool to Facilitate Data Abstraction for Urologic Research. Eur Urol Focus 2024:S2405-4569(24)00012-9. [PMID: 38278710 DOI: 10.1016/j.euf.2024.01.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/18/2023] [Accepted: 01/15/2024] [Indexed: 01/28/2024]
Abstract
BACKGROUND Urologic research often requires data abstraction from unstructured text contained within the electronic health record. A number of natural language processing (NLP) tools have been developed to aid with this time-consuming task; however, the generalizability of these tools is typically limited by the need for task-specific training. OBJECTIVE To describe the development and validation of a zero-shot learning NLP tool to facilitate data abstraction from unstructured text for use in downstream urologic research. DESIGN, SETTING, AND PARTICIPANTS An NLP tool based on the GPT-3.5 model from OpenAI was developed and compared with three physicians for time to task completion and accuracy for abstracting 14 unique variables from a set of 199 deidentified radical prostatectomy pathology reports. The reports were processed in vectorized and scanned formats to establish the impact of optical character recognition on data abstraction. INTERVENTION A zero-shot learning NLP tool for data abstraction. OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS The tool was compared with the human abstractors in terms of superiority for data abstraction speed and noninferiority for accuracy. RESULTS AND LIMITATIONS The human abstractors required a median (interquartile range) of 93 s (72-122 s) per report for data abstraction, whereas the software required a median of 12 s (10-15 s) for the vectorized reports and 15 s (13-17 s) for the scanned reports (p < 0.001 for all paired comparisons). The accuracies of the three human abstractors were 94.7% (95% confidence interval [CI], 93.8-95.5%), 97.8% (95% CI, 97.2-98.3%), and 96.4% (95% CI, 95.6-97%) for the combined set of 2786 data points. The tool had accuracy of 94.2% (95% CI, 93.3-94.9%) for the vectorized reports and was noninferior to the human abstractors at a margin of -10% (α = 0.025). The tool had slightly lower accuracy of 88.7% (95% CI 87.5-89.9%) for the scanned reports, making it noninferior to two of three human abstractors. CONCLUSIONS The developed zero-shot learning NLP tool offers urologic researchers a highly generalizable and accurate method for data abstraction from unstructured text. An open access version of the tool is available for immediate use by the urologic community. PATIENT SUMMARY In this report, we describe the design and validation of an artificial intelligence tool for abstracting discrete data from unstructured notes contained within the electronic medical record. This freely available tool, which is based on the GPT-3.5 technology from OpenAI, is intended to facilitate research and scientific discovery by the urologic community.
Collapse
Affiliation(s)
- Basil Kaufmann
- Milton and Carroll Petrie Department of Urology, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Department of Urology, University Hospital Zurich, University of Zurich, Zurich, Switzerland.
| | - Dallin Busby
- Milton and Carroll Petrie Department of Urology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Chandan Krushna Das
- Milton and Carroll Petrie Department of Urology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Neeraja Tillu
- Milton and Carroll Petrie Department of Urology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Mani Menon
- Milton and Carroll Petrie Department of Urology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Ashutosh K Tewari
- Milton and Carroll Petrie Department of Urology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| | - Michael A Gorin
- Milton and Carroll Petrie Department of Urology, Icahn School of Medicine at Mount Sinai, New York, NY, USA
| |
Collapse
|
5
|
Hochheiser H, Finan S, Yuan Z, Durbin EB, Jeong JC, Hands I, Rust D, Kavuluru R, Wu XC, Warner JL, Savova G. DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.05.23289524. [PMID: 37205575 PMCID: PMC10187451 DOI: 10.1101/2023.05.05.23289524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]
Abstract
Objective The manual extraction of case details from patient records for cancer surveillance efforts is a resource-intensive task. Natural Language Processing (NLP) techniques have been proposed for automating the identification of key details in clinical notes. Our goal was to develop NLP application programming interfaces (APIs) for integration into cancer registry data abstraction tools in a computer-assisted abstraction setting. Methods We used cancer registry manual abstraction processes to guide the design of DeepPhe-CR, a web-based NLP service API. The coding of key variables was done through NLP methods validated using established workflows. A container-based implementation including the NLP wasdeveloped. Existing registry data abstraction software was modified to include results from DeepPhe-CR. An initial usability study with data registrars provided early validation of the feasibility of the DeepPhe-CR tools. Results API calls support submission of single documents and summarization of cases across multiple documents. The container-based implementation uses a REST router to handle requests and support a graph database for storing results. NLP modules extract topography, histology, behavior, laterality, and grade at 0.79-1.00 F1 across common and rare cancer types (breast, prostate, lung, colorectal, ovary and pediatric brain) on data from two cancer registries. Usability study participants were able to use the tool effectively and expressed interest in adopting the tool. Discussion Our DeepPhe-CR system provides a flexible architecture for building cancer-specific NLP tools directly into registrar workflows in a computer-assisted abstraction setting. Improving user interactions in client tools, may be needed to realize the potential of these approaches. DeepPhe-CR: https://deepphe.github.io/.
Collapse
Affiliation(s)
- Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA
| | - Sean Finan
- Boston Childrens' Hospital, Boston, MA, USA and Harvard Medical School, Boston, MA, USA
| | - Zhou Yuan
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Eric B Durbin
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY, USA
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - Jong Cheol Jeong
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - Isaac Hands
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY, USA
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA
| | - David Rust
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY, USA
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY, USA
| | | | - Jeremy L Warner
- Lifespan Health System, Providence, RI, USA
- Legorreta Cancer Center at Brown University, Providence, RI, USA
| | - Guergana Savova
- Boston Childrens' Hospital, Boston, MA, USA and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
6
|
Hochheiser H, Finan S, Yuan Z, Durbin EB, Jeong JC, Hands I, Rust D, Kavuluru R, Wu XC, Warner JL, Savova G. DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction. JCO Clin Cancer Inform 2023; 7:e2300156. [PMID: 38113411 PMCID: PMC10752457 DOI: 10.1200/cci.23.00156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 10/04/2023] [Accepted: 10/04/2023] [Indexed: 12/21/2023] Open
Abstract
PURPOSE Manual extraction of case details from patient records for cancer surveillance is a resource-intensive task. Natural Language Processing (NLP) techniques have been proposed for automating the identification of key details in clinical notes. Our goal was to develop NLP application programming interfaces (APIs) for integration into cancer registry data abstraction tools in a computer-assisted abstraction setting. METHODS We used cancer registry manual abstraction processes to guide the design of DeepPhe-CR, a web-based NLP service API. The coding of key variables was performed through NLP methods validated using established workflows. A container-based implementation of the NLP methods and the supporting infrastructure was developed. Existing registry data abstraction software was modified to include results from DeepPhe-CR. An initial usability study with data registrars provided early validation of the feasibility of the DeepPhe-CR tools. RESULTS API calls support submission of single documents and summarization of cases across one or more documents. The container-based implementation uses a REST router to handle requests and support a graph database for storing results. NLP modules extract topography, histology, behavior, laterality, and grade at 0.79-1.00 F1 across multiple cancer types (breast, prostate, lung, colorectal, ovary, and pediatric brain) from data of two population-based cancer registries. Usability study participants were able to use the tool effectively and expressed interest in the tool. CONCLUSION The DeepPhe-CR system provides an architecture for building cancer-specific NLP tools directly into registrar workflows in a computer-assisted abstraction setting. Improved user interactions in client tools may be needed to realize the potential of these approaches.
Collapse
Affiliation(s)
- Harry Hochheiser
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA
- Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA
| | - Sean Finan
- Boston Childrens' Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| | - Zhou Yuan
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA
| | - Eric B. Durbin
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY
| | - Jong Cheol Jeong
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY
| | - Isaac Hands
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY
| | - David Rust
- Kentucky Cancer Registry, Markey Cancer Center, Lexington, KY
| | - Ramakanth Kavuluru
- Division of Biomedical Informatics, College of Medicine, University of Kentucky, Lexington, KY
| | | | - Jeremy L. Warner
- Lifespan Health System, Providence, RI
- Legorreta Cancer Center at Brown University, Providence, RI
| | - Guergana Savova
- Boston Childrens' Hospital, Boston, MA
- Harvard Medical School, Boston, MA
| |
Collapse
|
7
|
Petch J, Kempainnen J, Pettengell C, Aviv S, Butler B, Pond G, Saha A, Bogach J, Allard-Coutu A, Sztur P, Ranisau J, Levine M. Developing a Data and Analytics Platform to Enable a Breast Cancer Learning Health System at a Regional Cancer Center. JCO Clin Cancer Inform 2023; 7:e2200182. [PMID: 37001040 PMCID: PMC10281330 DOI: 10.1200/cci.22.00182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 02/10/2023] [Indexed: 04/03/2023] Open
Abstract
PURPOSE This study documents the creation of automated, longitudinal, and prospective data and analytics platform for breast cancer at a regional cancer center. This platform combines principles of data warehousing with natural language processing (NLP) to provide the integrated, timely, meaningful, high-quality, and actionable data required to establish a learning health system. METHODS Data from six hospital information systems and one external data source were integrated on a nightly basis by automated extract/transform/load jobs. Free-text clinical documentation was processed using a commercial NLP engine. RESULTS The platform contains 141 data elements of 7,019 patients with newly diagnosed breast cancer who received care at our regional cancer center from January 1, 2014, to June 3, 2022. Daily updating of the database takes an average of 56 minutes. Evaluation of the tuning of NLP jobs found overall high performance, with an F1 of 1.0 for 19 variables, with a further 16 variables with an F1 of > 0.95. CONCLUSION This study describes how data warehousing combined with NLP can be used to create a prospective data and analytics platform to enable a learning health system. Although upfront time investment required to create the platform was considerable, now that it has been developed, daily data processing is completed automatically in less than an hour.
Collapse
Affiliation(s)
- Jeremy Petch
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, Canada
- Institute for Health Policy Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Toronto, Canada
- Division of Cardiology, Department of Medicine, Faculty of Health Sciences, McMaster University, Hamilton, Canada
- Population Health Research Institute, Hamilton Health Sciences, Hamilton, Canada
| | - Joel Kempainnen
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, Canada
| | | | | | | | - Greg Pond
- Escarpment Cancer Research Institute, Hamilton Health Sciences, Hamilton, Canada
| | - Ashirbani Saha
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, Canada
- Escarpment Cancer Research Institute, Hamilton Health Sciences, Hamilton, Canada
- Department of Oncology, Faculty of Health Sciences, McMaster University, Hamilton, Canada
| | - Jessica Bogach
- Department of Surgery, Faculty of Health Sciences, McMaster University, Hamilton, Canada
| | | | - Peter Sztur
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, Canada
| | - Jonathan Ranisau
- Centre for Data Science and Digital Health, Hamilton Health Sciences, Hamilton, Canada
| | - Mark Levine
- Hamilton Health Sciences, Hamilton, Canada
- Escarpment Cancer Research Institute, Hamilton Health Sciences, Hamilton, Canada
| |
Collapse
|
8
|
Baxter SL, Saseendrakumar BR, Cheung M, Savides TJ, Longhurst CA, Sinsky CA, Millen M, Tai-Seale M. Association of Electronic Health Record Inbasket Message Characteristics With Physician Burnout. JAMA Netw Open 2022; 5:e2244363. [PMID: 36449288 PMCID: PMC9713605 DOI: 10.1001/jamanetworkopen.2022.44363] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open
Abstract
IMPORTANCE Physician burnout is an ongoing epidemic; electronic health record (EHR) use has been associated with burnout, and the burden of EHR inbasket messages has grown in the context of the COVID-19 pandemic. Understanding how EHR inbasket messages are associated with physician burnout may uncover new insights for intervention strategies. OBJECTIVE To evaluate associations between EHR inbasket message characteristics and physician burnout. DESIGN, SETTING, AND PARTICIPANTS Cross-sectional study in a single academic medical center involving physicians from multiple specialties. Data collection took place April to September 2020, and data were analyzed September to December 2020. EXPOSURES Physicians responded to a survey including the validated Mini-Z 5-point burnout scale. MAIN OUTCOMES AND MEASURES Physician burnout according to the self-reported burnout scale. A sentiment analysis model was used to calculate sentiment scores for EHR inbasket messages extracted for participating physicians. Multivariable modeling was used to model risk of physician burnout using factors such as message characteristics, physician demographics, and clinical practice characteristics. RESULTS Of 609 physicians who responded to the survey, 297 (48.8%) were women, 343 (56.3%) were White, 391 (64.2%) practiced in outpatient settings, and 428 (70.28%) had been in medical practice for 15 years or less. Half (307 [50.4%]) reported burnout (score of 3 or higher). A total of 1 453 245 inbasket messages were extracted, of which 630 828 (43.4%) were patient messages. Among negative messages, common words included medical conditions, expletives and/or profanity, and words related to violence. There were no significant associations between message characteristics (including sentiment scores) and burnout. Odds of burnout were significantly higher among Hispanic/Latino physicians (odds ratio [OR], 3.44; 95% CI, 1.18-10.61; P = .03) and women (OR, 1.60; 95% CI, 1.13-2.27; P = .01), and significantly lower among physicians in clinical practice for more than 15 years (OR, 0.46; 95% CI, 0.30-0.68; P < .001). CONCLUSIONS AND RELEVANCE In this cross-sectional study, message characteristics were not associated with physician burnout, but the presence of expletives and violent words represents an opportunity for improving patient engagement, EHR portal design, or filters. Natural language processing represents a novel approach to understanding potential associations between EHR inbasket messages and physician burnout and may also help inform quality improvement initiatives aimed at improving patient experience.
Collapse
Affiliation(s)
- Sally L Baxter
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California, San Diego, La Jolla
- Department of Medicine, University of California, San Diego, La Jolla
| | - Bharanidharan Radha Saseendrakumar
- Division of Ophthalmology Informatics and Data Science, Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California, San Diego, La Jolla
| | - Michael Cheung
- Department of Family Medicine, University of California, San Diego, La Jolla
| | - Thomas J Savides
- Department of Medicine, University of California, San Diego, La Jolla
| | - Christopher A Longhurst
- Department of Medicine, University of California, San Diego, La Jolla
- Department of Pediatrics, University of California, San Diego, La Jolla
| | | | - Marlene Millen
- Department of Medicine, University of California, San Diego, La Jolla
| | - Ming Tai-Seale
- Department of Medicine, University of California, San Diego, La Jolla
- Department of Family Medicine, University of California, San Diego, La Jolla
| |
Collapse
|
9
|
Causa Andrieu P, Golia Pernicka JS, Yaeger R, Lupton K, Batch K, Zulkernine F, Simpson AL, Taya M, Gazit L, Nguyen H, Nicholas K, Gangai N, Sevilimedu V, Dickinson S, Paroder V, Bates DD, Do R. Natural Language Processing of Computed Tomography Reports to Label Metastatic Phenotypes With Prognostic Significance in Patients With Colorectal Cancer. JCO Clin Cancer Inform 2022; 6:e2200014. [PMID: 36103642 PMCID: PMC9848599 DOI: 10.1200/cci.22.00014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 06/04/2022] [Accepted: 08/04/2022] [Indexed: 01/21/2023] Open
Abstract
PURPOSE Natural language processing (NLP) applied to radiology reports can help identify clinically relevant M1 subcategories of patients with colorectal cancer (CRC). The primary purpose was to compare the overall survival (OS) of CRC according to American Joint Committee on Cancer TNM staging and explore an alternative classification. The secondary objective was to estimate the frequency of metastasis for each organ. METHODS Retrospective study of CRC who underwent computed tomography (CT) chest, abdomen, and pelvis between July 1, 2009, and March 26, 2019, at a tertiary cancer center, previously labeled for the presence or absence of metastasis by an NLP prediction model. Patients were classified in M0, M1a, M1b, and M1c (American Joint Committee on Cancer), or an alternative classification on the basis of the metastasis organ number: M1, single; M2, two; M3, three or more organs. Cox regression models were used to estimate hazard ratios; Kaplan-Meier curves were used to visualize survival curves using the two M1 subclassifications. RESULTS Nine thousand nine hundred twenty-eight patients with a total of 48,408 CT chest, abdomen, and pelvis reports were included. On the basis of NLP prediction, the median OS of M1a, M1b, and M1c was 4.47, 1.72, and 1.52 years, respectively. The median OS of M1, M2, and M3 was 4.24, 2.05, and 1.04 years, respectively. Metastases occurred most often in liver (35.8%), abdominopelvic lymph nodes (32.9%), lungs (29.3%), peritoneum (22.0%), thoracic nodes (19.9%), bones (9.2%), and pelvic organs (7.5%). Spleen and adrenal metastases occurred in < 5%. CONCLUSION NLP applied to a large radiology report database can identify clinically relevant metastatic phenotypes and be used to investigate new M1 substaging for CRC. Patients with three or more metastatic disease organs have the worst prognosis, with an OS of 1 year.
Collapse
Affiliation(s)
| | | | - Rona Yaeger
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Kaelan Lupton
- School of Computing, Queens University, Kingston, Canada
| | - Karen Batch
- School of Computing, Queens University, Kingston, Canada
| | | | | | - Michio Taya
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Lior Gazit
- Department of Strategy and Innovation, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Huy Nguyen
- Department of Strategy and Innovation, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Kevin Nicholas
- Department of Strategy and Innovation, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Natalie Gangai
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Varadan Sevilimedu
- Biostatistics Service, Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Shannan Dickinson
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Viktoriya Paroder
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY
| | - David D.B. Bates
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY
| | - Richard Do
- Department of Radiology, Memorial Sloan Kettering Cancer Center, New York, NY
| |
Collapse
|
10
|
Duda SN, Kennedy N, Conway D, Cheng AC, Nguyen V, Zayas-Cabán T, Harris PA. HL7 FHIR-based tools and initiatives to support clinical research: a scoping review. J Am Med Inform Assoc 2022; 29:1642-1653. [PMID: 35818340 DOI: 10.1093/jamia/ocac105] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 05/23/2022] [Accepted: 06/20/2022] [Indexed: 11/14/2022] Open
Abstract
OBJECTIVES The HL7® fast healthcare interoperability resources (FHIR®) specification has emerged as the leading interoperability standard for the exchange of healthcare data. We conducted a scoping review to identify trends and gaps in the use of FHIR for clinical research. MATERIALS AND METHODS We reviewed published literature, federally funded project databases, application websites, and other sources to discover FHIR-based papers, projects, and tools (collectively, "FHIR projects") available to support clinical research activities. RESULTS Our search identified 203 different FHIR projects applicable to clinical research. Most were associated with preparations to conduct research, such as data mapping to and from FHIR formats (n = 66, 32.5%) and managing ontologies with FHIR (n = 30, 14.8%), or post-study data activities, such as sharing data using repositories or registries (n = 24, 11.8%), general research data sharing (n = 23, 11.3%), and management of genomic data (n = 21, 10.3%). With the exception of phenotyping (n = 19, 9.4%), fewer FHIR-based projects focused on needs within the clinical research process itself. DISCUSSION Funding and usage of FHIR-enabled solutions for research are expanding, but most projects appear focused on establishing data pipelines and linking clinical systems such as electronic health records, patient-facing data systems, and registries, possibly due to the relative newness of FHIR and the incentives for FHIR integration in health information systems. Fewer FHIR projects were associated with research-only activities. CONCLUSION The FHIR standard is becoming an essential component of the clinical research enterprise. To develop FHIR's full potential for clinical research, funding and operational stakeholders should address gaps in FHIR-based research tools and methods.
Collapse
Affiliation(s)
- Stephany N Duda
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Nan Kennedy
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Douglas Conway
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA
| | - Alex C Cheng
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| | - Viet Nguyen
- Stratametrics LLC, Salt Lake City, Utah, USA.,HL7 Da Vinci Project, Ann Arbor, Michigan, USA
| | - Teresa Zayas-Cabán
- National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA
| | - Paul A Harris
- Vanderbilt Institute for Clinical and Translational Research, Vanderbilt University Medical Center, Nashville, Tennessee, USA.,Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee, USA
| |
Collapse
|
11
|
Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz IB, Wang N, Yang P, Xu H, Warner JL, Liu H. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clin Cancer Inform 2022; 6:e2200006. [PMID: 35917480 PMCID: PMC9470142 DOI: 10.1200/cci.22.00006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/18/2022] [Accepted: 06/15/2022] [Indexed: 11/20/2022] Open
Abstract
PURPOSE The advancement of natural language processing (NLP) has promoted the use of detailed textual data in electronic health records (EHRs) to support cancer research and to facilitate patient care. In this review, we aim to assess EHR for cancer research and patient care by using the Minimal Common Oncology Data Elements (mCODE), which is a community-driven effort to define a minimal set of data elements for cancer research and practice. Specifically, we aim to assess the alignment of NLP-extracted data elements with mCODE and review existing NLP methodologies for extracting said data elements. METHODS Published literature studies were searched to retrieve cancer-related NLP articles that were written in English and published between January 2010 and September 2020 from main literature databases. After the retrieval, articles with EHRs as the data source were manually identified. A charting form was developed for relevant study analysis and used to categorize data including four main topics: metadata, EHR data and targeted cancer types, NLP methodology, and oncology data elements and standards. RESULTS A total of 123 publications were selected finally and included in our analysis. We found that cancer research and patient care require some data elements beyond mCODE as expected. Transparency and reproductivity are not sufficient in NLP methods, and inconsistency in NLP evaluation exists. CONCLUSION We conducted a comprehensive review of cancer NLP for research and patient care using EHRs data. Issues and barriers for wide adoption of cancer NLP were identified and discussed.
Collapse
Affiliation(s)
- Liwei Wang
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sunyang Fu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Andrew Wen
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Xiaoyang Ruan
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Huan He
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sijia Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Sungrim Moon
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Michelle Mai
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| | - Irbaz B. Riaz
- Department of Hematology/Oncology, Mayo Clinic, Scottsdale, AZ
| | - Nan Wang
- Department of Computer Science and Engineering, College of Science and Engineering, University of Minnesota, Minneapolis, MN
| | - Ping Yang
- Department of Quantitative Health Sciences, Mayo Clinic, Scottsdale, AZ
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX
| | - Jeremy L. Warner
- Departments of Medicine (Hematology/Oncology), Vanderbilt University, Nashville, TN
- Department Biomedical Informatics, Vanderbilt University, Nashville, TN
| | - Hongfang Liu
- Department of Artificial Intelligence and Informatics, Mayo Clinic, Rochester, MN
| |
Collapse
|
12
|
Estevez M, Benedum CM, Jiang C, Cohen AB, Phadke S, Sarkar S, Bozkurt S. Considerations for the Use of Machine Learning Extracted Real-World Data to Support Evidence Generation: A Research-Centric Evaluation Framework. Cancers (Basel) 2022; 14:cancers14133063. [PMID: 35804834 PMCID: PMC9264846 DOI: 10.3390/cancers14133063] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 06/17/2022] [Accepted: 06/17/2022] [Indexed: 02/04/2023] Open
Abstract
A vast amount of real-world data, such as pathology reports and clinical notes, are captured as unstructured text in electronic health records (EHRs). However, this information is both difficult and costly to extract through human abstraction, especially when scaling to large datasets is needed. Fortunately, Natural Language Processing (NLP) and Machine Learning (ML) techniques provide promising solutions for a variety of information extraction tasks such as identifying a group of patients who have a specific diagnosis, share common characteristics, or show progression of a disease. However, using these ML-extracted data for research still introduces unique challenges in assessing validity and generalizability to different cohorts of interest. In order to enable effective and accurate use of ML-extracted real-world data (RWD) to support research and real-world evidence generation, we propose a research-centric evaluation framework for model developers, ML-extracted data users and other RWD stakeholders. This framework covers the fundamentals of evaluating RWD produced using ML methods to maximize the use of EHR data for research purposes.
Collapse
Affiliation(s)
- Melissa Estevez
- Flatiron Health, Inc., 233 Spring Street, New York, NY 10013, USA; (M.E.); (C.M.B.); (C.J.); (A.B.C.); (S.P.); (S.S.)
| | - Corey M. Benedum
- Flatiron Health, Inc., 233 Spring Street, New York, NY 10013, USA; (M.E.); (C.M.B.); (C.J.); (A.B.C.); (S.P.); (S.S.)
| | - Chengsheng Jiang
- Flatiron Health, Inc., 233 Spring Street, New York, NY 10013, USA; (M.E.); (C.M.B.); (C.J.); (A.B.C.); (S.P.); (S.S.)
| | - Aaron B. Cohen
- Flatiron Health, Inc., 233 Spring Street, New York, NY 10013, USA; (M.E.); (C.M.B.); (C.J.); (A.B.C.); (S.P.); (S.S.)
- Department of Medicine, NYU Grossman School of Medicine, New York, NY 10016, USA
| | - Sharang Phadke
- Flatiron Health, Inc., 233 Spring Street, New York, NY 10013, USA; (M.E.); (C.M.B.); (C.J.); (A.B.C.); (S.P.); (S.S.)
| | - Somnath Sarkar
- Flatiron Health, Inc., 233 Spring Street, New York, NY 10013, USA; (M.E.); (C.M.B.); (C.J.); (A.B.C.); (S.P.); (S.S.)
| | - Selen Bozkurt
- Flatiron Health, Inc., 233 Spring Street, New York, NY 10013, USA; (M.E.); (C.M.B.); (C.J.); (A.B.C.); (S.P.); (S.S.)
- Correspondence:
| |
Collapse
|
13
|
Zhou S, Wang N, Wang L, Liu H, Zhang R. CancerBERT: a cancer domain-specific language model for extracting breast cancer phenotypes from electronic health records. J Am Med Inform Assoc 2022; 29:1208-1216. [PMID: 35333345 PMCID: PMC9196678 DOI: 10.1093/jamia/ocac040] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 03/06/2022] [Accepted: 03/09/2022] [Indexed: 11/16/2022] Open
Abstract
OBJECTIVE Accurate extraction of breast cancer patients' phenotypes is important for clinical decision support and clinical research. This study developed and evaluated cancer domain pretrained CancerBERT models for extracting breast cancer phenotypes from clinical texts. We also investigated the effect of customized cancer-related vocabulary on the performance of CancerBERT models. MATERIALS AND METHODS A cancer-related corpus of breast cancer patients was extracted from the electronic health records of a local hospital. We annotated named entities in 200 pathology reports and 50 clinical notes for 8 cancer phenotypes for fine-tuning and evaluation. We kept pretraining the BlueBERT model on the cancer corpus with expanded vocabularies (using both term frequency-based and manually reviewed methods) to obtain CancerBERT models. The CancerBERT models were evaluated and compared with other baseline models on the cancer phenotype extraction task. RESULTS All CancerBERT models outperformed all other models on the cancer phenotyping NER task. Both CancerBERT models with customized vocabularies outperformed the CancerBERT with the original BERT vocabulary. The CancerBERT model with manually reviewed customized vocabulary achieved the best performance with macro F1 scores equal to 0.876 (95% CI, 0.873-0.879) and 0.904 (95% CI, 0.902-0.906) for exact match and lenient match, respectively. CONCLUSIONS The CancerBERT models were developed to extract the cancer phenotypes in clinical notes and pathology reports. The results validated that using customized vocabulary may further improve the performances of domain specific BERT models in clinical NLP tasks. The CancerBERT models developed in the study would further help clinical decision support.
Collapse
Affiliation(s)
- Sicheng Zhou
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Nan Wang
- School of Statistics, University of Minnesota, Minneapolis, Minnesota, USA
| | - Liwei Wang
- Department of AI and Informatics Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Hongfang Liu
- Department of AI and Informatics Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Rui Zhang
- Institute for Health Informatics, University of Minnesota, Minneapolis, Minnesota, USA.,Department of Pharmaceutical Care & Health Systems, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
14
|
Real-world study of surgical treatment of pancreatic cancer in China: Annual Report of China Pancreas Data Center (2016–2020). JOURNAL OF PANCREATOLOGY 2022. [DOI: 10.1097/jp9.0000000000000086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
|
15
|
Hu D, Li S, Zhang H, Wu N, Lu X. Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non-Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study. JMIR Med Inform 2022; 10:e35475. [PMID: 35468085 PMCID: PMC9086872 DOI: 10.2196/35475] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 03/31/2022] [Accepted: 04/11/2022] [Indexed: 11/21/2022] Open
Abstract
Background Lymph node metastasis (LNM) is critical for treatment decision making of patients with resectable non–small cell lung cancer, but it is difficult to precisely diagnose preoperatively. Electronic medical records (EMRs) contain a large volume of valuable information about LNM, but some key information is recorded in free text, which hinders its secondary use. Objective This study aims to develop LNM prediction models based on EMRs using natural language processing (NLP) and machine learning algorithms. Methods We developed a multiturn question answering NLP model to extract features about the primary tumor and lymph nodes from computed tomography (CT) reports. We then combined these features with other structured clinical characteristics to develop LNM prediction models using machine learning algorithms. We conducted extensive experiments to explore the effectiveness of the predictive models and compared them with size criteria based on CT image findings (the maximum short axis diameter of lymph node >10 mm was regarded as a metastatic node) and clinician’s evaluation. Since the NLP model may extract features with mistakes, we also calculated the concordance correlation between the predicted probabilities of models using NLP-extracted features and gold standard features to explore the influence of NLP-driven automatic extraction. Results Experimental results show that the random forest models achieved the best performances with 0.792 area under the receiver operating characteristic curve (AUC) value and 0.456 average precision (AP) value for pN2 LNM prediction and 0.768 AUC value and 0.524 AP value for pN1&N2 LNM prediction. And all machine learning models outperformed the size criteria and clinician’s evaluation. The concordance correlation between the random forest models using NLP-extracted features and gold standard features is 0.950 and improved to 0.984 when the top 5 important NLP-extracted features were replaced with gold standard features. Conclusions The LNM models developed can achieve competitive performance using only limited EMR data such as CT reports and tumor markers in comparison with the clinician’s evaluation. The multiturn question answering NLP model can extract features effectively to support the development of LNM prediction models, which may facilitate the clinical application of predictive models.
Collapse
Affiliation(s)
- Danqing Hu
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou, China
| | - Shaolei Li
- Department of Thoracic Surgery II, Peking University Cancer Hospital and Institute, Beijing, China
| | - Huanyao Zhang
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou, China
| | - Nan Wu
- Department of Thoracic Surgery II, Peking University Cancer Hospital and Institute, Beijing, China
| | - Xudong Lu
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou, China
| |
Collapse
|
16
|
Santos T, Tariq A, Gichoya JW, Trivedi H, Banerjee I. Automatic Classification of Cancer Pathology Reports: A Systematic Review. J Pathol Inform 2022; 13:100003. [PMID: 35242443 PMCID: PMC8860734 DOI: 10.1016/j.jpi.2022.100003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 11/12/2021] [Indexed: 11/30/2022] Open
Abstract
Pathology reports primarily consist of unstructured free text and thus the clinical information contained in the reports is not trivial to access or query. Multiple natural language processing (NLP) techniques have been proposed to automate the coding of pathology reports via text classification. In this systematic review, we follow the guidelines proposed by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA; Page et al., 2020: BMJ.) to identify the NLP systems for classifying pathology reports published between the years of 2010 and 2021. Based on our search criteria, a total of 3445 records were retrieved, and 25 articles met the final review criteria. We benchmarked the systems based on methodology, complexity of the prediction task and core types of NLP models: i) Rule-based and Intelligent systems, ii) statistical machine learning, and iii) deep learning. While certain tasks are well addressed by these models, many others have limitations and remain as open challenges, such as, extraction of many cancer characteristics (size, shape, type of cancer, others) from pathology reports. We investigated the final set of papers (25) and addressed their potential as well as their limitations. We hope that this systematic review helps researchers prioritize the development of innovated approaches to tackle the current limitations and help the advancement of cancer research.
Collapse
Affiliation(s)
- Thiago Santos
- Department of Computer Science, Emory University, Atlanta, GA, USA
- Department of Biomedical Informatics, Emory School of Medicine, Atlanta, GA, USA
- Corresponding author.
| | - Amara Tariq
- Department of Radiology, Mayo Clinic, Phoenix, AZ, USA
| | - Judy Wawira Gichoya
- Department of Biomedical Informatics, Emory School of Medicine, Atlanta, GA, USA
- Department of Radiology, Emory School of Medicine, Atlanta, GA, USA
| | - Hari Trivedi
- Department of Biomedical Informatics, Emory School of Medicine, Atlanta, GA, USA
- Department of Radiology, Emory School of Medicine, Atlanta, GA, USA
| | - Imon Banerjee
- Department of Radiology, Mayo Clinic, Phoenix, AZ, USA
- Department of Computer Engineering, Arizona State University, AZ, USA
| |
Collapse
|
17
|
Kim NH, Kim JM, Park DM, Ji SR, Kim JW. Analysis of depression in social media texts through the Patient Health Questionnaire-9 and natural language processing. Digit Health 2022; 8:20552076221114204. [PMID: 35874865 PMCID: PMC9297458 DOI: 10.1177/20552076221114204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Accepted: 06/30/2022] [Indexed: 12/05/2022] Open
Abstract
Objective Although depression in modern people is emerging as a major social problem, it shows a low rate of use of mental health services. The purpose of this study was to classify sentences written by social media users based on the nine symptoms of depression in the Patient Health Questionnaire-9, using natural language processing to assess naturally users’ depression based on their results. Methods First, train two sentence classifiers: the Y/N sentence classifier, which categorizes whether a user’s sentence is related to depression, and the 0–9 sentence classifier, which further categorizes the user sentence based on the depression symptomology of the Patient Health Questionnaire-9. Then the depression classifier, which is a logistic regression model, was generated to classify the sentence writer’s depression. These trained sentence classifiers and the depression classifier were used to analyze the social media textual data of users and establish their depression. Results Our experimental results showed that the proposed depression classifier showed 68.3% average accuracy, which was better than the baseline depression classifier that used only the Y/N sentence classifier and had 53.3% average accuracy. Conclusions This study is significant in that it demonstrates the possibility of determining depression from only social media users’ textual data.
Collapse
Affiliation(s)
- Nam Hyeok Kim
- Department of Mathematics, Hanyang University, Seoul, Republic of Korea
| | - Ji Min Kim
- Business Administration, Hanyang University, Seoul, Republic of Korea
| | - Da Mi Park
- Business Administration, Hanyang University, Seoul, Republic of Korea
| | - Su Ryeon Ji
- Department of Mathematics, Hanyang University, Seoul, Republic of Korea
| | - Jong Woo Kim
- School of Business, Hanyang University, Seoul, Republic of Korea
| |
Collapse
|
18
|
Abedian S, Sholle ET, Adekkanattu PM, Cusick MM, Weiner SE, Shoag JE, Hu JC, Campion TR. Automated Extraction of Tumor Staging and Diagnosis Information From Surgical Pathology Reports. JCO Clin Cancer Inform 2021; 5:1054-1061. [PMID: 34694896 PMCID: PMC8812635 DOI: 10.1200/cci.21.00065] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Revised: 08/25/2021] [Accepted: 09/29/2021] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Typically stored as unstructured notes, surgical pathology reports contain data elements valuable to cancer research that require labor-intensive manual extraction. Although studies have described natural language processing (NLP) of surgical pathology reports to automate information extraction, efforts have focused on specific cancer subtypes rather than across multiple oncologic domains. To address this gap, we developed and evaluated an NLP method to extract tumor staging and diagnosis information across multiple cancer subtypes. METHODS The NLP pipeline was implemented on an open-source framework called Leo. We used a total of 555,681 surgical pathology reports of 329,076 patients to develop the pipeline and evaluated our approach on subsets of reports from patients with breast, prostate, colorectal, and randomly selected cancer subtypes. RESULTS Averaged across all four cancer subtypes, the NLP pipeline achieved an accuracy of 1.00 for International Classification of Diseases, Tenth Revision codes, 0.89 for T staging, 0.90 for N staging, and 0.97 for M staging. It achieved an F1 score of 1.00 for International Classification of Diseases, Tenth Revision codes, 0.88 for T staging, 0.90 for N staging, and 0.24 for M staging. CONCLUSION The NLP pipeline was developed to extract tumor staging and diagnosis information across multiple cancer subtypes to support the research enterprise in our institution. Although it was not possible to demonstrate generalizability of our NLP pipeline to other institutions, other institutions may find value in adopting a similar NLP approach-and reusing code available at GitHub-to support the oncology research enterprise with elements extracted from surgical pathology reports.
Collapse
Affiliation(s)
- Sajjad Abedian
- Information Technologies and Services Department, Weill Cornell Medicine, New York, NY
| | - Evan T. Sholle
- Information Technologies and Services Department, Weill Cornell Medicine, New York, NY
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
| | | | - Marika M. Cusick
- Information Technologies and Services Department, Weill Cornell Medicine, New York, NY
| | - Stephanie E. Weiner
- Information Technologies and Services Department, Weill Cornell Medicine, New York, NY
| | - Jonathan E. Shoag
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
| | - Jim C. Hu
- Department of Population Health Sciences, Weill Cornell Medicine, New York, NY
| | - Thomas R. Campion
- Information Technologies and Services Department, Weill Cornell Medicine, New York, NY
- Department of Urology, Weill Cornell Medicine, New York, NY
- Clinical and Translational Science Center, Weill Cornell Medicine, New York, NY
- Department of Pediatrics, Weill Cornell Medicine, New York, NY
| |
Collapse
|
19
|
Yuan Z, Finan S, Warner J, Savova G, Hochheiser H. Interactive Exploration of Longitudinal Cancer Patient Histories Extracted From Clinical Text. JCO Clin Cancer Inform 2021; 4:412-420. [PMID: 32383981 PMCID: PMC7265796 DOI: 10.1200/cci.19.00115] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE Retrospective cancer research requires identification of patients matching both categorical and temporal inclusion criteria, often on the basis of factors exclusively available in clinical notes. Although natural language processing approaches for inferring higher-level concepts have shown promise for bringing structure to clinical texts, interpreting results is often challenging, involving the need to move between abstracted representations and constituent text elements. Our goal was to build interactive visual tools to support the process of interpreting rich representations of histories of patients with cancer. METHODS Qualitative inquiry into user tasks and goals, a structured data model, and an innovative natural language processing pipeline were used to guide design. RESULTS The resulting information visualization tool provides cohort- and patient-level views with linked interactions between components. CONCLUSION Interactive tools hold promise for facilitating the interpretation of patient summaries and identification of cohorts for retrospective research.
Collapse
Affiliation(s)
- Zhou Yuan
- University of Pittsburgh, Pittsburgh, PA
| | | | | | | | | |
Collapse
|
20
|
Belenkaya R, Gurley MJ, Golozar A, Dymshyts D, Miller RT, Williams AE, Ratwani S, Siapos A, Korsik V, Warner J, Campbell WS, Rivera D, Banokina T, Modina E, Bethusamy S, Stewart HM, Patel M, Chen R, Falconer T, Park RW, You SC, Jeon H, Shin SJ, Reich C. Extending the OMOP Common Data Model and Standardized Vocabularies to Support Observational Cancer Research. JCO Clin Cancer Inform 2021; 5:12-20. [PMID: 33411620 PMCID: PMC8140810 DOI: 10.1200/cci.20.00079] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Affiliation(s)
| | - Michael J Gurley
- Clinical and Translational Sciences Institute, Northwestern University, Evanston, IL
| | | | | | - Robert T Miller
- Tufts Clinical and Translational Science Institute, Boston, MA
| | - Andrew E Williams
- Tufts Institute for Clinical Research and Health Policy Studies, Boston, MA
| | | | | | | | | | | | | | | | | | | | | | | | - Ruijun Chen
- Department of Biomedical Informatics, Columbia University, New York City, NY
| | - Thomas Falconer
- Department of Biomedical Informatics, Columbia University, New York City, NY
| | - Rae Woong Park
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea
| | - Seng Chan You
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea
| | - Hokyun Jeon
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea
| | - Soe Jeong Shin
- Department of Biomedical Informatics, Ajou University School of Medicine, Suwon, South Korea
| | | |
Collapse
|
21
|
Hu D, Zhang H, Li S, Wang Y, Wu N, Lu X. Automatic Extraction of Lung Cancer Staging Information From Computed Tomography Reports: Deep Learning Approach. JMIR Med Inform 2021; 9:e27955. [PMID: 34287213 PMCID: PMC8339987 DOI: 10.2196/27955] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 05/27/2021] [Accepted: 06/07/2021] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Lung cancer is the leading cause of cancer deaths worldwide. Clinical staging of lung cancer plays a crucial role in making treatment decisions and evaluating prognosis. However, in clinical practice, approximately one-half of the clinical stages of lung cancer patients are inconsistent with their pathological stages. As one of the most important diagnostic modalities for staging, chest computed tomography (CT) provides a wealth of information about cancer staging, but the free-text nature of the CT reports obstructs their computerization. OBJECTIVE We aimed to automatically extract the staging-related information from CT reports to support accurate clinical staging of lung cancer. METHODS In this study, we developed an information extraction (IE) system to extract the staging-related information from CT reports. The system consisted of the following three parts: named entity recognition (NER), relation classification (RC), and postprocessing (PP). We first summarized 22 questions about lung cancer staging based on the TNM staging guideline. Next, three state-of-the-art NER algorithms were implemented to recognize the entities of interest. Next, we designed a novel RC method using the relation sign constraint (RSC) to classify the relations between entities. Finally, a rule-based PP module was established to obtain the formatted answers using the results of NER and RC. RESULTS We evaluated the developed IE system on a clinical data set containing 392 chest CT reports collected from the Department of Thoracic Surgery II in the Peking University Cancer Hospital. The experimental results showed that the bidirectional encoder representation from transformers (BERT) model outperformed the iterated dilated convolutional neural networks-conditional random field (ID-CNN-CRF) and bidirectional long short-term memory networks-conditional random field (Bi-LSTM-CRF) for NER tasks with macro-F1 scores of 80.97% and 90.06% under the exact and inexact matching schemes, respectively. For the RC task, the proposed RSC showed better performance than the baseline methods. Further, the BERT-RSC model achieved the best performance with a macro-F1 score of 97.13% and a micro-F1 score of 98.37%. Moreover, the rule-based PP module could correctly obtain the formatted results using the extractions of NER and RC, achieving a macro-F1 score of 94.57% and a micro-F1 score of 96.74% for all the 22 questions. CONCLUSIONS We conclude that the developed IE system can effectively and accurately extract information about lung cancer staging from CT reports. Experimental results show that the extracted results have significant potential for further use in stage verification and prediction to facilitate accurate clinical staging.
Collapse
Affiliation(s)
- Danqing Hu
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou, China
- Key Laboratory for Biomedical Engineering, Ministry of Education, Hangzhou, China
| | - Huanyao Zhang
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou, China
- Key Laboratory for Biomedical Engineering, Ministry of Education, Hangzhou, China
| | - Shaolei Li
- Department of Thoracic Surgery II, Peking University Cancer Hospital & Institute, Beijing, China
| | - Yuhong Wang
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou, China
- Key Laboratory for Biomedical Engineering, Ministry of Education, Hangzhou, China
| | - Nan Wu
- Department of Thoracic Surgery II, Peking University Cancer Hospital & Institute, Beijing, China
| | - Xudong Lu
- College of Biomedical Engineering and Instrumental Science, Zhejiang University, Hangzhou, China
- Key Laboratory for Biomedical Engineering, Ministry of Education, Hangzhou, China
| |
Collapse
|
22
|
Development of a generalizable natural language processing pipeline to extract physician-reported pain from clinical reports: Generated using publicly-available datasets and tested on institutional clinical reports for cancer patients with bone metastases. J Biomed Inform 2021; 120:103864. [PMID: 34265451 DOI: 10.1016/j.jbi.2021.103864] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 06/30/2021] [Accepted: 07/04/2021] [Indexed: 11/22/2022]
Abstract
OBJECTIVE The majority of cancer patients suffer from severe pain at the advanced stage of their illness. In most cases, cancer pain is underestimated by clinical staff and is not properly managed until it reaches a critical stage. Therefore, detecting and addressing cancer pain early can potentially improve the quality of life of cancer patients. The objective of this research project was to develop a generalizable Natural Language Processing (NLP) pipeline to find and classify physician-reported pain in the radiation oncology consultation notes of cancer patients with bone metastases. MATERIALS AND METHODS The texts of 1249 publicly-available hospital discharge notes in the i2b2 database were used as a training and validation set. The MetaMap and NegEx algorithms were implemented for medical terms extraction. Sets of NLP rules were developed to score pain terms in each note. By averaging pain scores, each note was assigned to one of the three verbally-declared pain (VDP) labels, including no pain, pain, and no mention of pain. Without further training, the generalizability of our pipeline in scoring individual pain terms was tested independently using 30 hospital discharge notes from the MIMIC-III database and 30 consultation notes of cancer patients with bone metastasis from our institution's radiation oncology electronic health record. Finally, 150 notes from our institution were used to assess the pipeline's performance at assigning VDP. RESULTS Our NLP pipeline successfully detected and quantified pain in the i2b2 summary notes with 93% overall precision and 92% overall recall. Testing on the MIMIC-III database achieved precision and recall of 91% and 86% respectively. The pipeline successfully detected pain with 89% precision and 82% recall on our institutional radiation oncology corpus. Finally, our pipeline assigned a VDP to each note in our institutional corpus with 84% and 82% precision and recall, respectively. CONCLUSION Our NLP pipeline enables the detection and classification of physician-reported pain in our radiation oncology corpus. This portable and ready-to-use pipeline can be used to automatically extract and classify physician-reported pain from clinical notes where the pain is not otherwise documented through structured data entry.
Collapse
|
23
|
Wen A, Rasmussen LV, Stone D, Liu S, Kiefer R, Adekkanattu P, Brandt PS, Pacheco JA, Luo Y, Wang F, Pathak J, Liu H, Jiang G. CQL4NLP: Development and Integration of FHIR NLP Extensions in Clinical Quality Language for EHR-driven Phenotyping. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2021; 2021:624-633. [PMID: 34457178 PMCID: PMC8378647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Lack of standardized representation of natural language processing (NLP) components in phenotyping algorithms hinders portability of the phenotyping algorithms and their execution in a high-throughput and reproducible manner. The objective of the study is to develop and evaluate a standard-driven approach - CQL4NLP - that integrates a collection of NLP extensions represented in the HL7 Fast Healthcare Interoperability Resources (FHIR) standard into the clinical quality language (CQL). A minimal NLP data model with 11 NLP-specific data elements was created, including six FHIR NLP extensions. All 11 data elements were identified from their usage in real-world phenotyping algorithms. An NLP ruleset generation mechanism was integrated into the NLP2FHIR pipeline and the NLP rulesets enabled comparable performance for a case study with the identification of obesity comorbidities. The NLP ruleset generation mechanism created a reproducible process for defining the NLP components of a phenotyping algorithm and its execution.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - Yuan Luo
- Northwestern University, Chicago, IL
| | - Fei Wang
- Weill Cornell Medicine, New York, NY
| | | | | | | |
Collapse
|
24
|
Bitterman DS, Miller TA, Mak RH, Savova GK. Clinical Natural Language Processing for Radiation Oncology: A Review and Practical Primer. Int J Radiat Oncol Biol Phys 2021; 110:641-655. [PMID: 33545300 DOI: 10.1016/j.ijrobp.2021.01.044] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 12/22/2020] [Accepted: 01/23/2021] [Indexed: 02/07/2023]
Abstract
Natural language processing (NLP), which aims to convert human language into expressions that can be analyzed by computers, is one of the most rapidly developing and widely used technologies in the field of artificial intelligence. Natural language processing algorithms convert unstructured free text data into structured data that can be extracted and analyzed at scale. In medicine, this unlocking of the rich, expressive data within clinical free text in electronic medical records will help untap the full potential of big data for research and clinical purposes. Recent major NLP algorithmic advances have significantly improved the performance of these algorithms, leading to a surge in academic and industry interest in developing tools to automate information extraction and phenotyping from clinical texts. Thus, these technologies are poised to transform medical research and alter clinical practices in the future. Radiation oncology stands to benefit from NLP algorithms if they are appropriately developed and deployed, as they may enable advances such as automated inclusion of radiation therapy details into cancer registries, discovery of novel insights about cancer care, and improved patient data curation and presentation at the point of care. However, challenges remain before the full value of NLP is realized, such as the plethora of jargon specific to radiation oncology, nonstandard nomenclature, a lack of publicly available labeled data for model development, and interoperability limitations between radiation oncology data silos. Successful development and implementation of high quality and high value NLP models for radiation oncology will require close collaboration between computer scientists and the radiation oncology community. Here, we present a primer on artificial intelligence algorithms in general and NLP algorithms in particular; provide guidance on how to assess the performance of such algorithms; review prior research on NLP algorithms for oncology; and describe future avenues for NLP in radiation oncology research and clinics.
Collapse
Affiliation(s)
- Danielle S Bitterman
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, Massachusetts; Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts; Artificial Intelligence in Medicine Program, Brigham and Women's Hospital, Boston, Massachusetts.
| | - Timothy A Miller
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts
| | - Raymond H Mak
- Department of Radiation Oncology, Brigham and Women's Hospital/Dana-Farber Cancer Institute, Boston, Massachusetts; Artificial Intelligence in Medicine Program, Brigham and Women's Hospital, Boston, Massachusetts
| | - Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts
| |
Collapse
|
25
|
Integrating Speculation Detection and Deep Learning to Extract Lung Cancer Diagnosis from Clinical Notes. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11020865] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
Despite efforts to develop models for extracting medical concepts from clinical notes, there are still some challenges in particular to be able to relate concepts to dates. The high number of clinical notes written for each single patient, the use of negation, speculation, and different date formats cause ambiguity that has to be solved to reconstruct the patient’s natural history. In this paper, we concentrate on extracting from clinical narratives the cancer diagnosis and relating it to the diagnosis date. To address this challenge, a hybrid approach that combines deep learning-based and rule-based methods is proposed. The approach integrates three steps: (i) lung cancer named entity recognition, (ii) negation and speculation detection, and (iii) relating the cancer diagnosis to a valid date. In particular, we apply the proposed approach to extract the lung cancer diagnosis and its diagnosis date from clinical narratives written in Spanish. Results obtained show an F-score of 90% in the named entity recognition task, and a 89% F-score in the task of relating the cancer diagnosis to the diagnosis date. Our findings suggest that speculation detection is together with negation detection a key component to properly extract cancer diagnosis from clinical notes.
Collapse
|
26
|
Hong JC, Fairchild AT, Tanksley JP, Palta M, Tenenbaum JD. Natural language processing for abstraction of cancer treatment toxicities: accuracy versus human experts. JAMIA Open 2020; 3:513-517. [PMID: 33623888 PMCID: PMC7886534 DOI: 10.1093/jamiaopen/ooaa064] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 10/26/2020] [Accepted: 10/30/2020] [Indexed: 12/29/2022] Open
Abstract
Objectives Expert abstraction of acute toxicities is critical in oncology research but is labor-intensive and variable. We assessed the accuracy of a natural language processing (NLP) pipeline to extract symptoms from clinical notes compared to physicians. Materials and Methods Two independent reviewers identified present and negated National Cancer Institute Common Terminology Criteria for Adverse Events (CTCAE) v5.0 symptoms from 100 randomly selected notes for on-treatment visits during radiation therapy with adjudication by a third reviewer. A NLP pipeline based on Apache clinical Text Analysis Knowledge Extraction System was developed and used to extract CTCAE terms. Accuracy was assessed by precision, recall, and F1. Results The NLP pipeline demonstrated high accuracy for common physician-abstracted symptoms, such as radiation dermatitis (F1 0.88), fatigue (0.85), and nausea (0.88). NLP had poor sensitivity for negated symptoms. Conclusion NLP accurately detects a subset of documented present CTCAE symptoms, though is limited for negated symptoms. It may facilitate strategies to more consistently identify toxicities during cancer therapy.
Collapse
Affiliation(s)
- Julian C Hong
- Department of Radiation Oncology, University of California, San Francisco, San Francisco, California, USA.,Bakar Computational Health Sciences Institute, University of California, San Francisco, San Francisco, California, USA.,Department of Radiation Oncology, Duke University, Durham, North Carolina, USA
| | - Andrew T Fairchild
- Department of Radiation Oncology, Duke University, Durham, North Carolina, USA
| | - Jarred P Tanksley
- Department of Radiation Oncology, Duke University, Durham, North Carolina, USA
| | - Manisha Palta
- Department of Radiation Oncology, Duke University, Durham, North Carolina, USA
| | - Jessica D Tenenbaum
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA
| |
Collapse
|
27
|
Zong N, Wen A, Stone DJ, Sharma DK, Wang C, Yu Y, Liu H, Shi Q, Jiang G. Developing an FHIR-Based Computational Pipeline for Automatic Population of Case Report Forms for Colorectal Cancer Clinical Trials Using Electronic Health Records. JCO Clin Cancer Inform 2020; 4:201-209. [PMID: 32134686 PMCID: PMC7113084 DOI: 10.1200/cci.19.00116] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
PURPOSE The Fast Healthcare Interoperability Resources (FHIR) is emerging as a next-generation standards framework developed by HL7 for exchanging electronic health care data. The modeling capability of FHIR in standardizing cancer data has been gaining increasing attention by the cancer research informatics community. However, few studies have been conducted to examine the capability of FHIR in electronic data capture (EDC) applications for effective cancer clinical trials. The objective of this study was to design, develop, and evaluate an FHIR-based method that enables the automation of the case report forms (CRFs) population for cancer clinical trials using real-world electronic health records (EHRs). MATERIALS AND METHODS We developed an FHIR-based computational pipeline of EDC with a case study for modeling colorectal cancer trials. We first leveraged an existing FHIR-based cancer profile to represent EHR data of patients with colorectal cancer, and then we used the FHIR Questionnaire and QuestionnaireResponse resources to represent the CRFs and their data population. To test the accuracy of and overall quality of the computational pipeline, we used synoptic reports of 287 Mayo Clinic patients with colorectal cancer from 2013 to 2019 with standard measures of precision, recall, and F1 score. RESULTS Using the computational pipeline, a total of 1,037 synoptic reports were successfully converted as the instances of the FHIR-based cancer profile. The average accuracy for converting all data elements (excluding tumor perforation) of the cancer profile was 0.99, using 200 randomly selected records. The average F1 score for populating nine questions of the CRFs in a real-world colorectal cancer trial was 0.95, using 100 randomly selected records. CONCLUSION We demonstrated that it is feasible to populate CRFs with EHR data in an automated manner with satisfactory performance. The outcome of the study provides helpful insight into future directions in implementing FHIR-based EDC applications for modern cancer clinical trials.
Collapse
Affiliation(s)
- Nansu Zong
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Daniel J Stone
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Deepak K Sharma
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Chen Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Yue Yu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Qian Shi
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| | - Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN
| |
Collapse
|
28
|
Abstract
Artificial intelligence (AI) has the potential to fundamentally alter the way medicine is practised. AI platforms excel in recognizing complex patterns in medical data and provide a quantitative, rather than purely qualitative, assessment of clinical conditions. Accordingly, AI could have particularly transformative applications in radiation oncology given the multifaceted and highly technical nature of this field of medicine with a heavy reliance on digital data processing and computer software. Indeed, AI has the potential to improve the accuracy, precision, efficiency and overall quality of radiation therapy for patients with cancer. In this Perspective, we first provide a general description of AI methods, followed by a high-level overview of the radiation therapy workflow with discussion of the implications that AI is likely to have on each step of this process. Finally, we describe the challenges associated with the clinical development and implementation of AI platforms in radiation oncology and provide our perspective on how these platforms might change the roles of radiotherapy medical professionals.
Collapse
|
29
|
Baxter SL, Klie AR, Radha Saseendrakumar B, Ye GY, Hogarth M. Text Processing for Detection of Fungal Ocular Involvement in Critical Care Patients: Cross-Sectional Study. J Med Internet Res 2020; 22:e18855. [PMID: 32795984 PMCID: PMC7455861 DOI: 10.2196/18855] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 04/21/2020] [Accepted: 06/13/2020] [Indexed: 11/13/2022] Open
Abstract
Background Fungal ocular involvement can develop in patients with fungal bloodstream infections and can be vision-threatening. Ocular involvement has become less common in the current era of improved antifungal therapies. Retrospectively determining the prevalence of fungal ocular involvement is important for informing clinical guidelines, such as the need for routine ophthalmologic consultations. However, manual retrospective record review to detect cases is time-consuming. Objective This study aimed to determine the prevalence of fungal ocular involvement in a critical care database using both structured and unstructured electronic health record (EHR) data. Methods We queried microbiology data from 46,467 critical care patients over 12 years (2000-2012) from the Medical Information Mart for Intensive Care III (MIMIC-III) to identify 265 patients with culture-proven fungemia. For each fungemic patient, demographic data, fungal species present in blood culture, and risk factors for fungemia (eg, presence of indwelling catheters, recent major surgery, diabetes, immunosuppressed status) were ascertained. All structured diagnosis codes and free-text narrative notes associated with each patient’s hospitalization were also extracted. Screening for fungal endophthalmitis was performed using two approaches: (1) by querying a wide array of eye- and vision-related diagnosis codes, and (2) by utilizing a custom regular expression pipeline to identify and collate relevant text matches pertaining to fungal ocular involvement. Both approaches were validated using manual record review. The main outcome measure was the documentation of any fungal ocular involvement. Results In total, 265 patients had culture-proven fungemia, with Candida albicans (n=114, 43%) and Candida glabrata (n=74, 28%) being the most common fungal species in blood culture. The in-hospital mortality rate was 121 (46%). In total, 7 patients were identified as having eye- or vision-related diagnosis codes, none of whom had fungal endophthalmitis based on record review. There were 26,830 free-text narrative notes associated with these 265 patients. A regular expression pipeline based on relevant terms yielded possible matches in 683 notes from 108 patients. Subsequent manual record review again demonstrated that no patients had fungal ocular involvement. Therefore, the prevalence of fungal ocular involvement in this cohort was 0%. Conclusions MIMIC-III contained no cases of ocular involvement among fungemic patients, consistent with prior studies reporting low rates of ocular involvement in fungemia. This study demonstrates an application of natural language processing to expedite the review of narrative notes. This approach is highly relevant for ophthalmology, where diagnoses are often based on physical examination findings that are documented within clinical notes.
Collapse
Affiliation(s)
- Sally L Baxter
- Viterbi Family Department of Ophthalmology and Shiley Eye Institute, University of California San Diego, La Jolla, CA, United States.,Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA, United States
| | - Adam R Klie
- Bioinformatics and Systems Biology, University of California San Diego, La Jolla, CA, United States
| | | | - Gordon Y Ye
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA, United States
| | - Michael Hogarth
- Division of Biomedical Informatics, Department of Medicine, University of California San Diego, La Jolla, CA, United States
| |
Collapse
|
30
|
Griffin AC, Topaloglu U, Davis S, Chung AE. From Patient Engagement to Precision Oncology: Leveraging Informatics to Advance Cancer Care. Yearb Med Inform 2020; 29:235-242. [PMID: 32823322 PMCID: PMC7442514 DOI: 10.1055/s-0040-1701983] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
OBJECTIVES Conduct a survey of the literature for advancements in cancer informatics over the last three years in three specific areas where there has been unprecedented growth: 1) digital health; 2) machine learning; and 3) precision oncology. We also highlight the ethical implications and future opportunities within each area. METHODS A search was conducted over a three-year period in two electronic databases (PubMed, Google Scholar) to identify peer-reviewed articles and conference proceedings. Search terms included variations of the following: neoplasms[MeSH], informatics[MeSH], cancer, oncology, clinical cancer informatics, medical cancer informatics. The search returned too many articles for practical review (23,994 from PubMed and 23,100 from Google Scholar). Thus, we conducted searches of key PubMed-indexed informatics journals and proceedings. We further limited our search to manuscripts that demonstrated a clear focus on clinical or translational cancer informatics. Manuscripts were then selected based on their methodological rigor, scientific impact, innovation, and contribution towards cancer informatics as a field or on their impact on cancer care and research. RESULTS Key developments and opportunities in cancer informatics research in the areas of digital health, machine learning, and precision oncology were summarized. CONCLUSION While there are numerous innovations in the field of cancer informatics to advance prevention and clinical care, considerable challenges remain related to data sharing and privacy, digital accessibility, and algorithm biases and interpretation. The implementation and application of these findings in cancer care necessitates further consideration and research.
Collapse
Affiliation(s)
| | - Umit Topaloglu
- Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Sean Davis
- National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
| | - Arlene E. Chung
- University of North Carolina at Chapel Hill, Chapel Hill, NC, USA
- University of North Carolina at Chapel Hill School of Medicine, Chapel Hill, NC, USA
- UNC Lineberger Comprehensive Cancer Center, Chapel Hill, NC, USA
| |
Collapse
|
31
|
Miller TA, Avillach P, Mandl KD. Experiences implementing scalable, containerized, cloud-based NLP for extracting biobank participant phenotypes at scale. JAMIA Open 2020; 3:185-189. [PMID: 32734158 PMCID: PMC7382623 DOI: 10.1093/jamiaopen/ooaa016] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 04/03/2020] [Accepted: 04/14/2020] [Indexed: 11/30/2022] Open
Abstract
OBJECTIVE To develop scalable natural language processing (NLP) infrastructure for processing the free text in electronic health records (EHRs). MATERIALS AND METHODS We extend the open-source Apache cTAKES NLP software with several standard technologies for scalability. We remove processing bottlenecks by monitoring component queue size. We process EHR free text for patients in the PrecisionLink Biobank at Boston Children's Hospital. The extracted concepts are made searchable via a web-based portal. RESULTS We processed over 1.2 million notes for over 8000 patients, extracting 154 million concepts. Our largest tested configuration processes over 1 million notes per day. DISCUSSION The unique information represented by extracted NLP concepts has great potential to provide a more complete picture of patient status. CONCLUSION NLP large EHR document collections can be done efficiently, in service of high throughput phenotyping.
Collapse
Affiliation(s)
- Timothy A Miller
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| | - Paul Avillach
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, Massachusetts, USA
| | - Kenneth D Mandl
- Computational Health Informatics Program, Boston Children’s Hospital, Boston, Massachusetts, USA
- Department of Pediatrics, Harvard Medical School, Boston, Massachusetts, USA
| |
Collapse
|
32
|
Rahimian M, Warner JL, Jain SK, Davis RB, Zerillo JA, Joyce RM. Significant and Distinctive n-Grams in Oncology Notes: A Text-Mining Method to Analyze the Effect of OpenNotes on Clinical Documentation. JCO Clin Cancer Inform 2020; 3:1-9. [PMID: 31184919 PMCID: PMC6873977 DOI: 10.1200/cci.19.00012] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
PURPOSE OpenNotes is a national movement established in 2010 that gives patients access to their visit notes through online patient portals, and its goal is to improve transparency and communication. To determine whether granting patients access to their medical notes will have a measurable effect on provider behavior, we developed novel methods to quantify changes in the length and frequency of use of n-grams (sets of words used in exact sequence) in the notes. METHODS We analyzed 102,135 notes of 36 hematology/oncology clinicians before and after the OpenNotes debut at Beth Israel Deaconess Medical Center. We applied methods to quantify changes in the length and frequency of use of sequential co-occurrence of words (n-grams) in the unstructured content of the notes by unsupervised hierarchical clustering and proportional analysis of n-grams. RESULTS The number of significant n-grams averaged over all providers did not change, but for individual providers, there were significant changes. That is, all significant observed changes were provider specific. We identified eight providers who were late note signers. This group significantly reduced its late signing behavior after OpenNotes implementation. CONCLUSION Although the number of significant n-grams averaged over all providers did not change, our text-mining method detected major content changes in specific providers' documentation at the n-gram level. The method successfully identified a group of providers who decreased their late note signing behavior.
Collapse
Affiliation(s)
- Maryam Rahimian
- Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, MA
| | - Jeremy L Warner
- Vanderbilt University Medical Center, Nashville, TN.,Vanderbilt University, Nashville, TN
| | - Sandeep K Jain
- Vanderbilt University, Nashville, TN.,St Louis University, St Louis, MO
| | - Roger B Davis
- Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, MA
| | - Jessica A Zerillo
- Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, MA
| | - Robin M Joyce
- Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, MA
| |
Collapse
|
33
|
Giannaris PS, Al-Taie Z, Kovalenko M, Thanintorn N, Kholod O, Innokenteva Y, Coberly E, Frazier S, Laziuk K, Popescu M, Shyu CR, Xu D, Hammer RD, Shin D. Artificial Intelligence-Driven Structurization of Diagnostic Information in Free-Text Pathology Reports. J Pathol Inform 2020; 11:4. [PMID: 32166042 PMCID: PMC7045509 DOI: 10.4103/jpi.jpi_30_19] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 12/18/2019] [Indexed: 12/15/2022] Open
Abstract
BACKGROUND Free-text sections of pathology reports contain the most important information from a diagnostic standpoint. However, this information is largely underutilized for computer-based analytics. The vast majority of NLP-based methods lack a capacity to accurately extract complex diagnostic entities and relationships among them as well as to provide an adequate knowledge representation for downstream data-mining applications. METHODS In this paper, we introduce a novel informatics pipeline that extends open information extraction (openIE) techniques with artificial intelligence (AI) based modeling to extract and transform complex diagnostic entities and relationships among them into Knowledge Graphs (KGs) of relational triples (RTs). RESULTS Evaluation studies have demonstrated that the pipeline's output significantly differs from a random process. The semantic similarity with original reports is high (Mean Weighted Overlap of 0.83). The precision and recall of extracted RTs based on experts' assessment were 0.925 and 0.841 respectively (P <0.0001). Inter-rater agreement was significant at 93.6% and inter-rated reliability was 81.8%. CONCLUSION The results demonstrated important properties of the pipeline such as high accuracy, minimality and adequate knowledge representation. Therefore, we conclude that the pipeline can be used in various downstream data-mining applications to assist diagnostic medicine.
Collapse
Affiliation(s)
- Pericles S. Giannaris
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States
- Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States
| | - Zainab Al-Taie
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States
- Department of Computer Science, College of Science for Women, University of Baghdad, Baghdad, Iraq
| | - Mikhail Kovalenko
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States
- Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States
| | - Nattapon Thanintorn
- Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States
| | - Olha Kholod
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States
- Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States
| | - Yulia Innokenteva
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States
| | - Emily Coberly
- Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States
| | - Shellaine Frazier
- Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States
| | - Katsiarina Laziuk
- Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States
| | - Mihail Popescu
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States
- Department of Electrical Engineering and Computer Science, College of Engineering, University of Missouri, Columbia, MO 65211, United States
- Department of Health Management and Informatics, School of Medicine, University of Missouri, Columbia, MO 65212, United States
| | - Chi-Ren Shyu
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States
- Department of Electrical Engineering and Computer Science, College of Engineering, University of Missouri, Columbia, MO 65211, United States
| | - Dong Xu
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States
- Department of Electrical Engineering and Computer Science, College of Engineering, University of Missouri, Columbia, MO 65211, United States
| | - Richard D. Hammer
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States
- Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States
| | - Dmitriy Shin
- Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States
- Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States
- Department of Electrical Engineering and Computer Science, College of Engineering, University of Missouri, Columbia, MO 65211, United States
| |
Collapse
|
34
|
Ju M, Short AD, Thompson P, Bakerly ND, Gkoutos GV, Tsaprouni L, Ananiadou S. Annotating and detecting phenotypic information for chronic obstructive pulmonary disease. JAMIA Open 2020; 2:261-271. [PMID: 31984360 PMCID: PMC6951876 DOI: 10.1093/jamiaopen/ooz009] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 02/21/2019] [Accepted: 03/19/2019] [Indexed: 12/29/2022] Open
Abstract
Objectives Chronic obstructive pulmonary disease (COPD) phenotypes cover a range of lung abnormalities. To allow text mining methods to identify pertinent and potentially complex information about these phenotypes from textual data, we have developed a novel annotated corpus, which we use to train a neural network-based named entity recognizer to detect fine-grained COPD phenotypic information. Materials and methods Since COPD phenotype descriptions often mention other concepts within them (proteins, treatments, etc.), our corpus annotations include both outermost phenotype descriptions and concepts nested within them. Our neural layered bidirectional long short-term memory conditional random field (BiLSTM-CRF) network firstly recognizes nested mentions, which are fed into subsequent BiLSTM-CRF layers, to help to recognize enclosing phenotype mentions. Results Our corpus of 30 full papers (available at: http://www.nactem.ac.uk/COPD) is annotated by experts with 27 030 phenotype-related concept mentions, most of which are automatically linked to UMLS Metathesaurus concepts. When trained using the corpus, our BiLSTM-CRF network outperforms other popular approaches in recognizing detailed phenotypic information. Discussion Information extracted by our method can facilitate efficient location and exploration of detailed information about phenotypes, for example, those specifically concerning reactions to treatments. Conclusion The importance of our corpus for developing methods to extract fine-grained information about COPD phenotypes is demonstrated through its successful use to train a layered BiLSTM-CRF network to extract phenotypic information at various levels of granularity. The minimal human intervention needed for training should permit ready adaption to extracting phenotypic information about other diseases.
Collapse
Affiliation(s)
- Meizhi Ju
- National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK
| | - Andrea D Short
- Faculty of Biology, Medicine and Health, The University of Manchester, Manchester, UK
| | - Paul Thompson
- National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK
| | - Nawar Diar Bakerly
- Salford Royal NHS Foundation Trust; and School of Health Sciences, The University of Manchester, Manchester, UK
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, Birmingham, UK.,Institute of Translational Medicine, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK.,MRC Health Data Research UK (HDR UK).,NIHR Experimental Cancer Medicine Centre, Birmingham, UK.,NIHR Surgical Reconstruction and Microbiology Research Centre, Birmingham, UK.,NIHR Biomedical Research Centre, Birmingham, UK
| | - Loukia Tsaprouni
- School of Health Sciences, Centre for Life and Sport Sciences, Birmingham City University, Birmingham, UK
| | - Sophia Ananiadou
- National Centre for Text Mining, School of Computer Science, The University of Manchester, Manchester, UK
| |
Collapse
|
35
|
Wang L, Luo L, Wang Y, Wampfler J, Yang P, Liu H. Natural language processing for populating lung cancer clinical research data. BMC Med Inform Decis Mak 2019; 19:239. [PMID: 31801515 PMCID: PMC6894100 DOI: 10.1186/s12911-019-0931-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
Background Lung cancer is the second most common cancer for men and women; the wide adoption of electronic health records (EHRs) offers a potential to accelerate cohort-related epidemiological studies using informatics approaches. Since manual extraction from large volumes of text materials is time consuming and labor intensive, some efforts have emerged to automatically extract information from text for lung cancer patients using natural language processing (NLP), an artificial intelligence technique. Methods In this study, using an existing cohort of 2311 lung cancer patients with information about stage, histology, tumor grade, and therapies (chemotherapy, radiotherapy and surgery) manually ascertained, we developed and evaluated an NLP system to extract information on these variables automatically for the same patients from clinical narratives including clinical notes, pathology reports and surgery reports. Results Evaluation showed promising results with the recalls for stage, histology, tumor grade, and therapies achieving 89, 98, 78, and 100% respectively and the precisions were 70, 88, 90, and 100% respectively. Conclusion This study demonstrated the feasibility and accuracy of automatically extracting pre-defined information from clinical narratives for lung cancer research.
Collapse
Affiliation(s)
- Liwei Wang
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN, 55901, USA
| | - Lei Luo
- Department of Good Clinical Practice, Guizhou Province People's Hospital, Guiyang, China
| | - Yanshan Wang
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN, 55901, USA
| | - Jason Wampfler
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN, 55901, USA
| | - Ping Yang
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN, 55901, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, MN, 55901, USA.
| |
Collapse
|
36
|
Lacson R, Laroya R, Wang A, Kapoor N, Glazer DI, Shinagare A, Ip IK, Malhotra S, Hentel K, Khorasani R. Integrity of clinical information in computerized order requisitions for diagnostic imaging. J Am Med Inform Assoc 2019; 25:1651-1656. [PMID: 30517649 DOI: 10.1093/jamia/ocy133] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 09/19/2018] [Indexed: 11/14/2022] Open
Abstract
Objective Assess information integrity (concordance and completeness of documented exam indications from the electronic health record [EHR] imaging order requisition, compared to EHR provider notes), and assess potential impact of indication inaccuracies on exam planning and interpretation. Methods This retrospective study, approved by the Institutional Review Board, was conducted at a tertiary academic medical center. There were 139 MRI lumbar spine (LS-MRI) and 176 CT abdomen/pelvis orders performed 4/1/2016-5/31/2016 randomly selected and reviewed by 4 radiologists for concordance and completeness of relevant exam indications in order requisitions compared to provider notes, and potential impact of indication inaccuracies on exam planning and interpretation. Forty each LS-MRI and CT abdomen/pelvis were re-reviewed to assess kappa agreement. Results Requisition indications were more likely to be incomplete (256/315, 81%) than discordant (133/315, 42%) compared to provider notes (p < 0.0001). Potential impact of discrepancy between clinical information in requisitions and provider notes was higher for radiologist's interpretation than for exam planning (135/315, 43%, vs 25/315, 8%, p < 0.0001). Agreement among radiologists for concordance, completeness, and potential impact was moderate to strong (Kappa 0.66-0.89). Indications in EHR order requisitions are frequently incomplete or discordant compared to physician notes, potentially impacting imaging exam planning, interpretation and accurate diagnosis. Such inaccuracies could also diminish the relevance of clinical decision support alerts if based on information in order requisitions. Conclusions Improved availability of relevant documented clinical information within EHR imaging requisition is necessary for optimal exam planning and interpretation.
Collapse
Affiliation(s)
- Ronilda Lacson
- Department of Radiology, Brigham and Women's Hospital, Center for Evidence-Based Imaging, Boston, MA, USA.,Harvard Medical School, Boston, MA, USA
| | - Romeo Laroya
- Department of Radiology, Brigham and Women's Hospital, Center for Evidence-Based Imaging, Boston, MA, USA.,Harvard Medical School, Boston, MA, USA
| | - Aijia Wang
- Department of Radiology, Brigham and Women's Hospital, Center for Evidence-Based Imaging, Boston, MA, USA
| | - Neena Kapoor
- Department of Radiology, Brigham and Women's Hospital, Center for Evidence-Based Imaging, Boston, MA, USA.,Harvard Medical School, Boston, MA, USA
| | - Daniel I Glazer
- Harvard Medical School, Boston, MA, USA.,Department of Radiology, Brigham and Women's Hospital, Boston, MA, USA
| | - Atul Shinagare
- Department of Radiology, Brigham and Women's Hospital, Center for Evidence-Based Imaging, Boston, MA, USA.,Harvard Medical School, Boston, MA, USA
| | - Ivan K Ip
- Department of Radiology, Brigham and Women's Hospital, Center for Evidence-Based Imaging, Boston, MA, USA.,Harvard Medical School, Boston, MA, USA
| | - Sameer Malhotra
- Department of Healthcare Policy and Research, Weill Cornell Medicine, New York, NY, USA
| | - Keith Hentel
- Department of Radiology, Weill Cornell Medicine, New York, NY, USA
| | - Ramin Khorasani
- Department of Radiology, Brigham and Women's Hospital, Center for Evidence-Based Imaging, Boston, MA, USA.,Harvard Medical School, Boston, MA, USA
| |
Collapse
|
37
|
Haleem A, Javaid M, Khan IH. Current status and applications of Artificial Intelligence (AI) in medical field: An overview. ACTA ACUST UNITED AC 2019. [DOI: 10.1016/j.cmrp.2019.11.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
38
|
Savova GK, Danciu I, Alamudun F, Miller T, Lin C, Bitterman DS, Tourassi G, Warner JL. Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records. Cancer Res 2019; 79:5463-5470. [PMID: 31395609 PMCID: PMC7227798 DOI: 10.1158/0008-5472.can-19-0579] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Revised: 06/17/2019] [Accepted: 07/29/2019] [Indexed: 12/12/2022]
Abstract
Current models for correlating electronic medical records with -omics data largely ignore clinical text, which is an important source of phenotype information for patients with cancer. This data convergence has the potential to reveal new insights about cancer initiation, progression, metastasis, and response to treatment. Insights from this real-world data will catalyze clinical care, research, and regulatory activities. Natural language processing (NLP) methods are needed to extract these rich cancer phenotypes from clinical text. Here, we review the advances of NLP and information extraction methods relevant to oncology based on publications from PubMed as well as NLP and machine learning conference proceedings in the last 3 years. Given the interdisciplinary nature of the fields of oncology and information extraction, this analysis serves as a critical trail marker on the path to higher fidelity oncology phenotypes from real-world data.
Collapse
Affiliation(s)
- Guergana K Savova
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts.
- Harvard Medical School, Boston, Massachusetts
| | | | | | - Timothy Miller
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts
- Harvard Medical School, Boston, Massachusetts
| | - Chen Lin
- Computational Health Informatics Program, Boston Children's Hospital, Boston, Massachusetts
| | - Danielle S Bitterman
- Harvard Medical School, Boston, Massachusetts
- Dana Farber Cancer Institute, Boston, Massachusetts
| | | | | |
Collapse
|
39
|
Hong N, Wen A, Stone DJ, Tsuji S, Kingsbury PR, Rasmussen LV, Pacheco JA, Adekkanattu P, Wang F, Luo Y, Pathak J, Liu H, Jiang G. Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries. J Biomed Inform 2019; 99:103310. [PMID: 31622801 PMCID: PMC6990976 DOI: 10.1016/j.jbi.2019.103310] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 09/15/2019] [Accepted: 10/11/2019] [Indexed: 12/16/2022]
Abstract
BACKGROUND Standards-based clinical data normalization has become a key component of effective data integration and accurate phenotyping for secondary use of electronic healthcare records (EHR) data. HL7 Fast Healthcare Interoperability Resources (FHIR) is an emerging clinical data standard for exchanging electronic healthcare data and has been used in modeling and integrating both structured and unstructured EHR data for a variety of clinical research applications. The overall objective of this study is to develop and evaluate a FHIR-based EHR phenotyping framework for identification of patients with obesity and its multiple comorbidities from semi-structured discharge summaries leveraging a FHIR-based clinical data normalization pipeline (known as NLP2FHIR). METHODS We implemented a multi-class and multi-label classification system based on the i2b2 Obesity Challenge task to evaluate the FHIR-based EHR phenotyping framework. Two core parts of the framework are: (a) the conversion of discharge summaries into corresponding FHIR resources - Composition, Condition, MedicationStatement, Procedure and FamilyMemberHistory using the NLP2FHIR pipeline, and (b) the implementation of four machine learning algorithms (logistic regression, support vector machine, decision tree, and random forest) to train classifiers to predict disease state of obesity and 15 comorbidities using features extracted from standard FHIR resources and terminology expansions. We used the macro- and micro-averaged precision (P), recall (R), and F1 score (F1) measures to evaluate the classifier performance. We validated the framework using a second obesity dataset extracted from the MIMIC-III database. RESULTS Using the NLP2FHIR pipeline, 1237 clinical discharge summaries from the 2008 i2b2 obesity challenge dataset were represented as the instances of the FHIR Composition resource consisting of 5677 records with 16 unique section types. After the NLP processing and FHIR modeling, a set of 244,438 FHIR clinical resource instances were generated. As the results of the four machine learning classifiers, the random forest algorithm performed the best with F1-micro(0.9466)/F1-macro(0.7887) and F1-micro(0.9536)/F1-macro(0.6524) for intuitive classification (reflecting medical professionals' judgments) and textual classification (reflecting the judgments based on explicitly reported information of diseases), respectively. The MIMIC-III obesity dataset was successfully integrated for prediction with minimal configuration of the NLP2FHIR pipeline and machine learning models. CONCLUSIONS The study demonstrated that the FHIR-based EHR phenotyping approach could effectively identify the state of obesity and multiple comorbidities using semi-structured discharge summaries. Our FHIR-based phenotyping approach is a first concrete step towards improving the data aspect of phenotyping portability across EHR systems and enhancing interpretability of the machine learning-based phenotyping algorithms.
Collapse
Affiliation(s)
- Na Hong
- Mayo Clinic, Rochester, MN, USA
| | | | | | | | | | - Luke V Rasmussen
- Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | | | | | - Fei Wang
- Weill Cornell Medicine, New York City, NY, USA
| | - Yuan Luo
- Northwestern University Feinberg School of Medicine, Chicago, IL, USA
| | | | | | | |
Collapse
|
40
|
[Basis and perspectives of artificial intelligence in radiation therapy]. Cancer Radiother 2019; 23:913-916. [PMID: 31645301 DOI: 10.1016/j.canrad.2019.08.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 08/15/2019] [Accepted: 08/20/2019] [Indexed: 11/23/2022]
Abstract
Artificial intelligence is a highly polysemic term. In computer science, with the objective of being able to solve totally new problems in new contexts, artificial intelligence includes connectionism (neural networks) for learning and logics for reasoning. Artificial intelligence algorithms mimic tasks normally requiring human intelligence, like deduction, induction, and abduction. All apply to radiation oncology. Combined with radiomics, neural networks have obtained good results in image classification, natural language processing, phenotyping based on electronic health records, and adaptive radiation therapy. General adversial networks have been tested to generate synthetic data. Logics based systems have been developed for providing formal domain ontologies, supporting clinical decision and checking consistency of the systems. Artificial intelligence must integrate both deep learning and logic approaches to perform complex tasks and go beyond the so-called narrow artificial intelligence that is tailored to perform some highly specialized task. Combined together with mechanistic models, artificial intelligence has the potential to provide new tools such as digital twins for precision oncology.
Collapse
|
41
|
Hong N, Wen A, Shen F, Sohn S, Wang C, Liu H, Jiang G. Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data. JAMIA Open 2019; 2:570-579. [PMID: 32025655 PMCID: PMC6993992 DOI: 10.1093/jamiaopen/ooz056] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 09/23/2019] [Accepted: 10/01/2019] [Indexed: 11/30/2022] Open
Abstract
Objective To design, develop, and evaluate a scalable clinical data normalization pipeline for standardizing unstructured electronic health record (EHR) data leveraging the HL7 Fast Healthcare Interoperability Resources (FHIR) specification. Methods We established an FHIR-based clinical data normalization pipeline known as NLP2FHIR that mainly comprises: (1) a module for a core natural language processing (NLP) engine with an FHIR-based type system; (2) a module for integrating structured data; and (3) a module for content normalization. We evaluated the FHIR modeling capability focusing on core clinical resources such as Condition, Procedure, MedicationStatement (including Medication), and FamilyMemberHistory using Mayo Clinic’s unstructured EHR data. We constructed a gold standard reusing annotation corpora from previous NLP projects. Results A total of 30 mapping rules, 62 normalization rules, and 11 NLP-specific FHIR extensions were created and implemented in the NLP2FHIR pipeline. The elements that need to integrate structured data from each clinical resource were identified. The performance of unstructured data modeling achieved F scores ranging from 0.69 to 0.99 for various FHIR element representations (0.69–0.99 for Condition; 0.75–0.84 for Procedure; 0.71–0.99 for MedicationStatement; and 0.75–0.95 for FamilyMemberHistory). Conclusion We demonstrated that the NLP2FHIR pipeline is feasible for modeling unstructured EHR data and integrating structured elements into the model. The outcomes of this work provide standards-based tools of clinical data normalization that is indispensable for enabling portable EHR-driven phenotyping and large-scale data analytics, as well as useful insights for future developments of the FHIR specifications with regard to handling unstructured clinical data.
Collapse
Affiliation(s)
- Na Hong
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Andrew Wen
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Feichen Shen
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Sunghwan Sohn
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Chen Wang
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| | - Guoqian Jiang
- Department of Health Sciences Research, Mayo Clinic, Rochester, Minnesota, USA
| |
Collapse
|
42
|
A frame semantic overview of NLP-based information extraction for cancer-related EHR notes. J Biomed Inform 2019; 100:103301. [PMID: 31589927 DOI: 10.1016/j.jbi.2019.103301] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 09/04/2019] [Accepted: 10/03/2019] [Indexed: 02/07/2023]
Abstract
OBJECTIVE There is a lot of information about cancer in Electronic Health Record (EHR) notes that can be useful for biomedical research provided natural language processing (NLP) methods are available to extract and structure this information. In this paper, we present a scoping review of existing clinical NLP literature for cancer. METHODS We identified studies describing an NLP method to extract specific cancer-related information from EHR sources from PubMed, Google Scholar, ACL Anthology, and existing reviews. Two exclusion criteria were used in this study. We excluded articles where the extraction techniques used were too broad to be represented as frames (e.g., document classification) and also where very low-level extraction methods were used (e.g. simply identifying clinical concepts). 78 articles were included in the final review. We organized this information according to frame semantic principles to help identify common areas of overlap and potential gaps. RESULTS Frames were created from the reviewed articles pertaining to cancer information such as cancer diagnosis, tumor description, cancer procedure, breast cancer diagnosis, prostate cancer diagnosis and pain in prostate cancer patients. These frames included both a definition as well as specific frame elements (i.e. extractable attributes). We found that cancer diagnosis was the most common frame among the reviewed papers (36 out of 78), with recent work focusing on extracting information related to treatment and breast cancer diagnosis. CONCLUSION The list of common frames described in this paper identifies important cancer-related information extracted by existing NLP techniques and serves as a useful resource for future researchers requiring cancer information extracted from EHR notes. We also argue, due to the heavy duplication of cancer NLP systems, that a general purpose resource of annotated cancer frames and corresponding NLP tools would be valuable.
Collapse
|
43
|
Soysal E, Warner JL, Wang J, Jiang M, Harvey K, Jain SK, Dong X, Song HY, Siddhanamatha H, Wang L, Dai Q, Chen Q, Du X, Tao C, Yang P, Denny JC, Liu H, Xu H. Developing Customizable Cancer Information Extraction Modules for Pathology Reports Using CLAMP. Stud Health Technol Inform 2019; 264:1041-1045. [PMID: 31438083 PMCID: PMC7359882 DOI: 10.3233/shti190383] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Natural language processing (NLP) technologies have been successfully applied to cancer research by enabling automated phenotypic information extraction from narratives in electronic health records (EHRs) such as pathology reports; however, developing customized NLP solutions requires substantial effort. To facilitate the adoption of NLP in cancer research, we have developed a set of customizable modules for extracting comprehensive types of cancer-related information in pathology reports (e.g., tumor size, tumor stage, and biomarkers), by leveraging the existing CLAMP system, which provides user-friendly interfaces for building customized NLP solutions for individual needs. Evaluation using annotated data at Vanderbilt University Medical Center showed that CLAMP-Cancer could extract diverse types of cancer information with good F-measures (0.80-0.98). We then applied CLAMP-Cancer to an information extraction task at Mayo Clinic and showed that we can quickly build a customized NLP system with comparable performance with an existing system at Mayo Clinic. CLAMP-Cancer is freely available for academic use.
Collapse
Affiliation(s)
- Ergin Soysal
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Jeremy L. Warner
- Department of Medicine, Vanderbilt University, Nashville, Tennessee
- Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee
- Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Jingqi Wang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Min Jiang
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Krysten Harvey
- Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee
| | - Sandeep Kumar Jain
- Vanderbilt School of Medicine, Vanderbilt University, Nashville, Tennessee
| | - Xiao Dong
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Hsing-Yi Song
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Harish Siddhanamatha
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Liwei Wang
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, Minnesota
| | - Qi Dai
- Department of Medicine, Vanderbilt University, Nashville, Tennessee
| | - Qingxia Chen
- Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
| | - Xianglin Du
- School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Cui Tao
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
| | - Ping Yang
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, Minnesota
| | - Joshua Charles Denny
- Department of Medicine, Vanderbilt University, Nashville, Tennessee
- Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee
| | - Hongfang Liu
- Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, Minnesota
| | - Hua Xu
- School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
| |
Collapse
|
44
|
Warner JL, Dymshyts D, Reich CG, Gurley MJ, Hochheiser H, Moldwin ZH, Belenkaya R, Williams AE, Yang PC. HemOnc: A new standard vocabulary for chemotherapy regimen representation in the OMOP common data model. J Biomed Inform 2019; 96:103239. [PMID: 31238109 DOI: 10.1016/j.jbi.2019.103239] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 06/20/2019] [Accepted: 06/21/2019] [Indexed: 10/26/2022]
Abstract
Systematic application of observational data to the understanding of impacts of cancer treatments requires detailed information models allowing meaningful comparisons between treatment regimens. Unfortunately, details of systemic therapies are scarce in registries and data warehouses, primarily due to the complex nature of the protocols and a lack of standardization. Since 2011, we have been creating a curated and semi-structured website of chemotherapy regimens, HemOnc.org. In coordination with the Observational Health Data Sciences and Informatics (OHDSI) Oncology Subgroup, we have transformed a substantial subset of this content into the OMOP common data model, with bindings to multiple external vocabularies, e.g., RxNorm and the National Cancer Institute Thesaurus. Currently, there are >73,000 concepts and >177,000 relationships in the full vocabulary. Content related to the definition and composition of chemotherapy regimens has been released within the ATHENA tool (athena.ohdsi.org) for widespread utilization by the OHDSI membership. Here, we describe the rationale, data model, and initial contents of the HemOnc vocabulary along with several use cases for which it may be valuable.
Collapse
Affiliation(s)
- Jeremy L Warner
- Vanderbilt University Medical Center, Nashville, TN, United States; HemOnc.org, LLC, Lexington, MA, United States.
| | | | | | | | | | - Zachary H Moldwin
- University of Illinois at Chicago College of Pharmacy, Chicago, IL, United States
| | - Rimma Belenkaya
- Memorial Sloan Kettering Cancer Center, New York, NY, United States
| | | | - Peter C Yang
- HemOnc.org, LLC, Lexington, MA, United States; Massachusetts General Hospital, Harvard Medical School, Boston, MA, United States
| |
Collapse
|
45
|
Meystre SM, Heider PM, Kim Y, Aruch DB, Britten CD. Automatic trial eligibility surveillance based on unstructured clinical data. Int J Med Inform 2019; 129:13-19. [PMID: 31445247 DOI: 10.1016/j.ijmedinf.2019.05.018] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Revised: 12/06/2018] [Accepted: 05/21/2019] [Indexed: 12/26/2022]
Abstract
INTRODUCTION Insufficient patient enrollment in clinical trials remains a serious and costly problem and is often considered the most critical issue to solve for the clinical trials community. In this project, we assessed the feasibility of automatically detecting a patient's eligibility for a sample of breast cancer clinical trials by mapping coded clinical trial eligibility criteria to the corresponding clinical information automatically extracted from text in the EHR. METHODS Three open breast cancer clinical trials were selected by oncologists. Their eligibility criteria were manually abstracted from trial descriptions using the OHDSI ATLAS web application. Patients enrolled or screened for these trials were selected as 'positive' or 'possible' cases. Other patients diagnosed with breast cancer were selected as 'negative' cases. A selection of the clinical data and all clinical notes of these 229 selected patients was extracted from the MUSC clinical data warehouse and stored in a database implementing the OMOP common data model. Eligibility criteria were extracted from clinical notes using either manually crafted pattern matching (regular expressions) or a new natural language processing (NLP) application. These extracted criteria were then compared with reference criteria from trial descriptions. This comparison was realized with three different versions of a new application: rule-based, cosine similarity-based, and machine learning-based. RESULTS For eligibility criteria extraction from clinical notes, the machine learning-based NLP application allowed for the highest accuracy with a micro-averaged recall of 90.9% and precision of 89.7%. For trial eligibility determination, the highest accuracy was reached by the machine learning-based approach with a per-trial AUC between 75.5% and 89.8%. CONCLUSION NLP can be used to extract eligibility criteria from EHR clinical notes and automatically discover patients possibly eligible for a clinical trial with good accuracy, which could be leveraged to reduce the workload of humans screening patients for trials.
Collapse
Affiliation(s)
- Stéphane M Meystre
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, United States; Division of Hematology/Oncology, Medical University of South Carolina, Charleston, SC, United States.
| | - Paul M Heider
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, United States
| | - Youngjun Kim
- Biomedical Informatics Center, Medical University of South Carolina, Charleston, SC, United States
| | - Daniel B Aruch
- Division of Hematology/Oncology, Medical University of South Carolina, Charleston, SC, United States
| | - Carolyn D Britten
- Division of Hematology/Oncology, Medical University of South Carolina, Charleston, SC, United States
| |
Collapse
|
46
|
Si Y, Roberts K. A Frame-Based NLP System for Cancer-Related Information Extraction. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018; 2018:1524-1533. [PMID: 30815198 PMCID: PMC6371330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
We propose a frame-based natural language processing (NLP) method that extracts cancer-related information from clinical narratives. We focus on three frames: cancer diagnosis, cancer therapeutic procedure, and tumor description. We utilize a deep learning-based approach, bidirectional Long Short-term Memory (LSTM) Conditional Random Field (CRF), which uses both character and word embeddings. The system consists of two constituent sequence classifiers: a frame identification (lexical unit) classifier and a frame element classifier. The classifier achieves an F1 of 93.70 for cancer diagnosis, 96.33 for therapeutic procedure, and 87.18 for tumor description. These represent improvements of 10.72, 0.85, and 8.04 over a baseline heuristic, respectively. Additionally, we demonstrate that the combination of both GloVe and MIMIC-III embeddings has the best representational effect. Overall, this study demonstrates the effectiveness of deep learning methods to extract frame semantic information from clinical narratives.
Collapse
Affiliation(s)
- Yuqi Si
- School of Biomedical Informatics The University of Texas Health Science Center at Houston Houston, TX, USA
| | - Kirk Roberts
- School of Biomedical Informatics The University of Texas Health Science Center at Houston Houston, TX, USA
| |
Collapse
|
47
|
Ahlbrandt J, Lablans M, Glocker K, Stahl-Toyota S, Maier-Hein K, Maier-Hein L, Ückert F. Modern Information Technology for Cancer Research: What's in IT for Me? An Overview of Technologies and Approaches. Oncology 2018; 98:363-369. [PMID: 30439700 DOI: 10.1159/000493638] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 09/10/2018] [Indexed: 11/19/2022]
Abstract
Information technology (IT) can enhance or change many scenarios in cancer research for the better. In this paper, we introduce several examples, starting with clinical data reuse and collaboration including data sharing in research networks. Key challenges are semantic interoperability and data access (including data privacy). We deal with gathering and analyzing genomic information, where cloud computing, uncertainties and reproducibility challenge researchers. Also, new sources for additional phenotypical data are shown in patient-reported outcome and machine learning in imaging. Last, we focus on therapy assistance, introducing tools used in molecular tumor boards and techniques for computer-assisted surgery. We discuss the need for metadata to aggregate and analyze data sets reliably. We conclude with an outlook towards a learning health care system in oncology, which connects bench and bedside by employing modern IT solutions.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Frank Ückert
- German Cancer Research Center, Heidelberg, Germany
| |
Collapse
|
48
|
Khor RC, Nguyen A, O'Dwyer J, Kothari G, Sia J, Chang D, Ng SP, Duchesne GM, Foroudi F. Extracting tumour prognostic factors from a diverse electronic record dataset in genito-urinary oncology. Int J Med Inform 2018; 121:53-57. [PMID: 30545489 DOI: 10.1016/j.ijmedinf.2018.10.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Revised: 09/17/2018] [Accepted: 10/21/2018] [Indexed: 11/27/2022]
Abstract
OBJECTIVES To implement a system for unsupervised extraction of tumor stage and prognostic data in patients with genitourinary cancers using clinicopathological and radiology text. METHODS A corpus of 1054 electronic notes (clinician notes, radiology reports and pathology reports) was annotated for tumor stage, prostate specific antigen (PSA) and Gleason grade. Annotations from five clinicians were reconciled to form a gold standard dataset. A training dataset of 386 documents was sequestered. The Medtex algorithm was adapted using the training dataset. RESULTS Adapted Medtex equaled or exceeded human performance in most annotations, except for implicit M stage (F-measure of 0.69 vs 0.84) and PSA (0.92 vs 0.96). Overall Medtex performed with an F-measure of 0.86 compared to human annotations of 0.92. There was significant inter-observer variability when comparing human annotators to the gold standard. CONCLUSIONS The Medtex algorithm performed similarly to human annotators for extracting stage and prognostic data from varied clinical texts.
Collapse
Affiliation(s)
- Richard C Khor
- Peter MacCallum Cancer Centre, Department of Radiation Oncology, Melbourne, Australia; University of Melbourne, Sir Peter MacCallum Department of Oncology, Melbourne, Australia; Austin Health, Department of Radiation Oncology, Melbourne, Australia.
| | - Anthony Nguyen
- The Australian e-Health Research Centre, CSIRO, Brisbane, Australia
| | - John O'Dwyer
- The Australian e-Health Research Centre, CSIRO, Brisbane, Australia
| | - Gargi Kothari
- Peter MacCallum Cancer Centre, Department of Radiation Oncology, Melbourne, Australia
| | - Joseph Sia
- Peter MacCallum Cancer Centre, Department of Radiation Oncology, Melbourne, Australia
| | - David Chang
- Peter MacCallum Cancer Centre, Department of Radiation Oncology, Melbourne, Australia
| | - Sweet Ping Ng
- Peter MacCallum Cancer Centre, Department of Radiation Oncology, Melbourne, Australia
| | - Gillian M Duchesne
- University of Melbourne, Sir Peter MacCallum Department of Oncology, Melbourne, Australia; Department of Medical Radiations, Monash University, Melbourne, Australia; Department of Biochemistry, Monash University, Melbourne, Australia
| | - Farshad Foroudi
- Austin Health, Department of Radiation Oncology, Melbourne, Australia; Department of Cancer Medicine, Latrobe University, Melbourne, Australia
| |
Collapse
|
49
|
Abstract
Objective:
To summarize significant research contributions on cancer informatics published in 2017.
Methods:
An extensive search using PubMed/Medline, Google Scholar, and manual review was conducted to identify the scientific contributions published in 2017 that address topics in cancer informatics. The selection process comprised three steps: (i) 15 candidate best papers were first selected by the two section editors, (ii) external reviewers from internationally renowned research teams reviewed each candidate best paper, and (iii) the final selection of three best papers was conducted by the editorial board of the Yearbook.
Results:
Results: The three selected best papers present studies addressing many facets of cancer informatics, with immediate applicability in the research and clinical domains.
Conclusion:
Cancer informatics is a broad and vigorous subfield of biomedical informatics. Strides in knowledge management, crowdsourcing, and visualization are especially notable in 2017.
Collapse
Affiliation(s)
- Jeremy L Warner
- Associate Professor, Departments of Medicine and Biomedical Informatics, Vanderbilt University, Nashville, TN, USA
| | | | | |
Collapse
|
50
|
Learning Eligibility in Cancer Clinical Trials Using Deep Neural Networks. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8071206] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Interventional cancer clinical trials are generally too restrictive, and some patients are often excluded on the basis of comorbidity, past or concomitant treatments, or the fact that they are over a certain age. The efficacy and safety of new treatments for patients with these characteristics are, therefore, not defined. In this work, we built a model to automatically predict whether short clinical statements were considered inclusion or exclusion criteria. We used protocols from cancer clinical trials that were available in public registries from the last 18 years to train word-embeddings, and we constructed a dataset of 6M short free-texts labeled as eligible or not eligible. A text classifier was trained using deep neural networks, with pre-trained word-embeddings as inputs, to predict whether or not short free-text statements describing clinical information were considered eligible. We additionally analyzed the semantic reasoning of the word-embedding representations obtained and were able to identify equivalent treatments for a type of tumor analogous with the drugs used to treat other tumors. We show that representation learning using deep neural networks can be successfully leveraged to extract the medical knowledge from clinical trial protocols for potentially assisting practitioners when prescribing treatments.
Collapse
|