Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Savova GK, Tseytlin E, Finan S, Castine M, Miller T, Medvedeva O, Harris D, Hochheiser H, Lin C, Chavan G, Jacobson RS. DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. Cancer Res 2017;77:e115-e118. [PMID: 29092954 DOI: 10.1158/0008-5472.can-17-0615] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Revised: 07/20/2017] [Accepted: 10/02/2017] [Indexed: 11/16/2022]

For:	Savova GK, Tseytlin E, Finan S, Castine M, Miller T, Medvedeva O, Harris D, Hochheiser H, Lin C, Chavan G, Jacobson RS. DeepPhe: A Natural Language Processing System for Extracting Cancer Phenotypes from Clinical Records. Cancer Res 2017;77:e115-e118. [PMID: 29092954 DOI: 10.1158/0008-5472.can-17-0615] [Citation(s) in RCA: 50] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2017] [Revised: 07/20/2017] [Accepted: 10/02/2017] [Indexed: 11/16/2022]

Number

Cited by Other Article(s)

Le KDR, Tay SBP, Choy KT, Verjans J, Sasanelli N, Kong JCH. Applications of natural language processing tools in the surgical journey. Front Surg 2024;11:1403540. [PMID: 38826809 PMCID: PMC11140056 DOI: 10.3389/fsurg.2024.1403540] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 05/07/2024] [Indexed: 06/04/2024] Open

Abstract

Background

Natural language processing tools are becoming increasingly adopted in multiple industries worldwide. They have shown promising results however their use in the field of surgery is under-recognised. Many trials have assessed these benefits in small settings with promising results before large scale adoption can be considered in surgery. This study aims to review the current research and insights into the potential for implementation of natural language processing tools into surgery.

Methods

A narrative review was conducted following a computer-assisted literature search on Medline, EMBASE and Google Scholar databases. Papers related to natural language processing tools and consideration into their use for surgery were considered.

Results

Current applications of natural language processing tools within surgery are limited. From the literature, there is evidence of potential improvement in surgical capability and service delivery, such as through the use of these technologies to streamline processes including surgical triaging, data collection and auditing, surgical communication and documentation. Additionally, there is potential to extend these capabilities to surgical academia to improve processes in surgical research and allow innovation in the development of educational resources. Despite these outcomes, the evidence to support these findings are challenged by small sample sizes with limited applicability to broader settings.

Conclusion

With the increasing adoption of natural language processing technology, such as in popular forms like ChatGPT, there has been increasing research in the use of these tools within surgery to improve surgical workflow and efficiency. This review highlights multifaceted applications of natural language processing within surgery, albeit with clear limitations due to the infancy of the infrastructure available to leverage these technologies. There remains room for more rigorous research into broader capability of natural language processing technology within the field of surgery and the need for cross-sectoral collaboration to understand the ways in which these algorithms can best be integrated.

Collapse

Rajaganapathy S, Chowdhury S, Buchner V, He Z, Jiang X, Yang P, Cerhan JR, Zong N. Synoptic Reporting by Summarizing Cancer Pathology Reports using Large Language Models. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.04.26.24306452. [PMID: 38746270 PMCID: PMC11092736 DOI: 10.1101/2024.04.26.24306452] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2024]

Truhn D, Loeffler CM, Müller-Franzes G, Nebelung S, Hewitt KJ, Brandner S, Bressem KK, Foersch S, Kather JN. Extracting structured information from unstructured histopathology reports using generative pre-trained transformer 4 (GPT-4). J Pathol 2024;262:310-319. [PMID: 38098169 DOI: 10.1002/path.6232] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2023] [Revised: 09/16/2023] [Accepted: 11/03/2023] [Indexed: 02/06/2024]

Kaufmann B, Busby D, Das CK, Tillu N, Menon M, Tewari AK, Gorin MA. Validation of a Zero-shot Learning Natural Language Processing Tool to Facilitate Data Abstraction for Urologic Research. Eur Urol Focus 2024:S2405-4569(24)00012-9. [PMID: 38278710 DOI: 10.1016/j.euf.2024.01.009] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2023] [Revised: 12/18/2023] [Accepted: 01/15/2024] [Indexed: 01/28/2024]

Abstract

BACKGROUND

Urologic research often requires data abstraction from unstructured text contained within the electronic health record. A number of natural language processing (NLP) tools have been developed to aid with this time-consuming task; however, the generalizability of these tools is typically limited by the need for task-specific training.

OBJECTIVE

To describe the development and validation of a zero-shot learning NLP tool to facilitate data abstraction from unstructured text for use in downstream urologic research.

DESIGN, SETTING, AND PARTICIPANTS

An NLP tool based on the GPT-3.5 model from OpenAI was developed and compared with three physicians for time to task completion and accuracy for abstracting 14 unique variables from a set of 199 deidentified radical prostatectomy pathology reports. The reports were processed in vectorized and scanned formats to establish the impact of optical character recognition on data abstraction.

INTERVENTION

A zero-shot learning NLP tool for data abstraction.

OUTCOME MEASUREMENTS AND STATISTICAL ANALYSIS

The tool was compared with the human abstractors in terms of superiority for data abstraction speed and noninferiority for accuracy.

RESULTS AND LIMITATIONS

The human abstractors required a median (interquartile range) of 93 s (72-122 s) per report for data abstraction, whereas the software required a median of 12 s (10-15 s) for the vectorized reports and 15 s (13-17 s) for the scanned reports (p < 0.001 for all paired comparisons). The accuracies of the three human abstractors were 94.7% (95% confidence interval [CI], 93.8-95.5%), 97.8% (95% CI, 97.2-98.3%), and 96.4% (95% CI, 95.6-97%) for the combined set of 2786 data points. The tool had accuracy of 94.2% (95% CI, 93.3-94.9%) for the vectorized reports and was noninferior to the human abstractors at a margin of -10% (α = 0.025). The tool had slightly lower accuracy of 88.7% (95% CI 87.5-89.9%) for the scanned reports, making it noninferior to two of three human abstractors.

CONCLUSIONS

The developed zero-shot learning NLP tool offers urologic researchers a highly generalizable and accurate method for data abstraction from unstructured text. An open access version of the tool is available for immediate use by the urologic community.

PATIENT SUMMARY

In this report, we describe the design and validation of an artificial intelligence tool for abstracting discrete data from unstructured notes contained within the electronic medical record. This freely available tool, which is based on the GPT-3.5 technology from OpenAI, is intended to facilitate research and scientific discovery by the urologic community.

Collapse

Hochheiser H, Finan S, Yuan Z, Durbin EB, Jeong JC, Hands I, Rust D, Kavuluru R, Wu XC, Warner JL, Savova G. DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2023:2023.05.05.23289524. [PMID: 37205575 PMCID: PMC10187451 DOI: 10.1101/2023.05.05.23289524] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/21/2023]

Abstract

Objective

The manual extraction of case details from patient records for cancer surveillance efforts is a resource-intensive task. Natural Language Processing (NLP) techniques have been proposed for automating the identification of key details in clinical notes. Our goal was to develop NLP application programming interfaces (APIs) for integration into cancer registry data abstraction tools in a computer-assisted abstraction setting.

Methods

We used cancer registry manual abstraction processes to guide the design of DeepPhe-CR, a web-based NLP service API. The coding of key variables was done through NLP methods validated using established workflows. A container-based implementation including the NLP wasdeveloped. Existing registry data abstraction software was modified to include results from DeepPhe-CR. An initial usability study with data registrars provided early validation of the feasibility of the DeepPhe-CR tools.

Results

API calls support submission of single documents and summarization of cases across multiple documents. The container-based implementation uses a REST router to handle requests and support a graph database for storing results. NLP modules extract topography, histology, behavior, laterality, and grade at 0.79-1.00 F1 across common and rare cancer types (breast, prostate, lung, colorectal, ovary and pediatric brain) on data from two cancer registries. Usability study participants were able to use the tool effectively and expressed interest in adopting the tool.

Discussion

Our DeepPhe-CR system provides a flexible architecture for building cancer-specific NLP tools directly into registrar workflows in a computer-assisted abstraction setting. Improving user interactions in client tools, may be needed to realize the potential of these approaches. DeepPhe-CR: https://deepphe.github.io/.

Collapse

Hochheiser H, Finan S, Yuan Z, Durbin EB, Jeong JC, Hands I, Rust D, Kavuluru R, Wu XC, Warner JL, Savova G. DeepPhe-CR: Natural Language Processing Software Services for Cancer Registrar Case Abstraction. JCO Clin Cancer Inform 2023;7:e2300156. [PMID: 38113411 PMCID: PMC10752457 DOI: 10.1200/cci.23.00156] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2023] [Revised: 10/04/2023] [Accepted: 10/04/2023] [Indexed: 12/21/2023] Open

Petch J, Kempainnen J, Pettengell C, Aviv S, Butler B, Pond G, Saha A, Bogach J, Allard-Coutu A, Sztur P, Ranisau J, Levine M. Developing a Data and Analytics Platform to Enable a Breast Cancer Learning Health System at a Regional Cancer Center. JCO Clin Cancer Inform 2023;7:e2200182. [PMID: 37001040 PMCID: PMC10281330 DOI: 10.1200/cci.22.00182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2022] [Accepted: 02/10/2023] [Indexed: 04/03/2023] Open

Baxter SL, Saseendrakumar BR, Cheung M, Savides TJ, Longhurst CA, Sinsky CA, Millen M, Tai-Seale M. Association of Electronic Health Record Inbasket Message Characteristics With Physician Burnout. JAMA Netw Open 2022;5:e2244363. [PMID: 36449288 PMCID: PMC9713605 DOI: 10.1001/jamanetworkopen.2022.44363] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 12/05/2022] Open

Abstract

IMPORTANCE

Physician burnout is an ongoing epidemic; electronic health record (EHR) use has been associated with burnout, and the burden of EHR inbasket messages has grown in the context of the COVID-19 pandemic. Understanding how EHR inbasket messages are associated with physician burnout may uncover new insights for intervention strategies.

OBJECTIVE

To evaluate associations between EHR inbasket message characteristics and physician burnout.

DESIGN, SETTING, AND PARTICIPANTS

Cross-sectional study in a single academic medical center involving physicians from multiple specialties. Data collection took place April to September 2020, and data were analyzed September to December 2020.

EXPOSURES

Physicians responded to a survey including the validated Mini-Z 5-point burnout scale.

MAIN OUTCOMES AND MEASURES

Physician burnout according to the self-reported burnout scale. A sentiment analysis model was used to calculate sentiment scores for EHR inbasket messages extracted for participating physicians. Multivariable modeling was used to model risk of physician burnout using factors such as message characteristics, physician demographics, and clinical practice characteristics.

RESULTS

Of 609 physicians who responded to the survey, 297 (48.8%) were women, 343 (56.3%) were White, 391 (64.2%) practiced in outpatient settings, and 428 (70.28%) had been in medical practice for 15 years or less. Half (307 [50.4%]) reported burnout (score of 3 or higher). A total of 1 453 245 inbasket messages were extracted, of which 630 828 (43.4%) were patient messages. Among negative messages, common words included medical conditions, expletives and/or profanity, and words related to violence. There were no significant associations between message characteristics (including sentiment scores) and burnout. Odds of burnout were significantly higher among Hispanic/Latino physicians (odds ratio [OR], 3.44; 95% CI, 1.18-10.61; P = .03) and women (OR, 1.60; 95% CI, 1.13-2.27; P = .01), and significantly lower among physicians in clinical practice for more than 15 years (OR, 0.46; 95% CI, 0.30-0.68; P < .001).

CONCLUSIONS AND RELEVANCE

In this cross-sectional study, message characteristics were not associated with physician burnout, but the presence of expletives and violent words represents an opportunity for improving patient engagement, EHR portal design, or filters. Natural language processing represents a novel approach to understanding potential associations between EHR inbasket messages and physician burnout and may also help inform quality improvement initiatives aimed at improving patient experience.

Collapse

Causa Andrieu P, Golia Pernicka JS, Yaeger R, Lupton K, Batch K, Zulkernine F, Simpson AL, Taya M, Gazit L, Nguyen H, Nicholas K, Gangai N, Sevilimedu V, Dickinson S, Paroder V, Bates DD, Do R. Natural Language Processing of Computed Tomography Reports to Label Metastatic Phenotypes With Prognostic Significance in Patients With Colorectal Cancer. JCO Clin Cancer Inform 2022;6:e2200014. [PMID: 36103642 PMCID: PMC9848599 DOI: 10.1200/cci.22.00014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Revised: 06/04/2022] [Accepted: 08/04/2022] [Indexed: 01/21/2023] Open

Abstract

PURPOSE

Natural language processing (NLP) applied to radiology reports can help identify clinically relevant M1 subcategories of patients with colorectal cancer (CRC). The primary purpose was to compare the overall survival (OS) of CRC according to American Joint Committee on Cancer TNM staging and explore an alternative classification. The secondary objective was to estimate the frequency of metastasis for each organ.

METHODS

Retrospective study of CRC who underwent computed tomography (CT) chest, abdomen, and pelvis between July 1, 2009, and March 26, 2019, at a tertiary cancer center, previously labeled for the presence or absence of metastasis by an NLP prediction model. Patients were classified in M0, M1a, M1b, and M1c (American Joint Committee on Cancer), or an alternative classification on the basis of the metastasis organ number: M1, single; M2, two; M3, three or more organs. Cox regression models were used to estimate hazard ratios; Kaplan-Meier curves were used to visualize survival curves using the two M1 subclassifications.

RESULTS

Nine thousand nine hundred twenty-eight patients with a total of 48,408 CT chest, abdomen, and pelvis reports were included. On the basis of NLP prediction, the median OS of M1a, M1b, and M1c was 4.47, 1.72, and 1.52 years, respectively. The median OS of M1, M2, and M3 was 4.24, 2.05, and 1.04 years, respectively. Metastases occurred most often in liver (35.8%), abdominopelvic lymph nodes (32.9%), lungs (29.3%), peritoneum (22.0%), thoracic nodes (19.9%), bones (9.2%), and pelvic organs (7.5%). Spleen and adrenal metastases occurred in < 5%.

CONCLUSION

NLP applied to a large radiology report database can identify clinically relevant metastatic phenotypes and be used to investigate new M1 substaging for CRC. Patients with three or more metastatic disease organs have the worst prognosis, with an OS of 1 year.

Collapse

Duda SN, Kennedy N, Conway D, Cheng AC, Nguyen V, Zayas-Cabán T, Harris PA. HL7 FHIR-based tools and initiatives to support clinical research: a scoping review. J Am Med Inform Assoc 2022;29:1642-1653. [PMID: 35818340 DOI: 10.1093/jamia/ocac105] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2021] [Revised: 05/23/2022] [Accepted: 06/20/2022] [Indexed: 11/14/2022] Open

Abstract

OBJECTIVES

The HL7® fast healthcare interoperability resources (FHIR®) specification has emerged as the leading interoperability standard for the exchange of healthcare data. We conducted a scoping review to identify trends and gaps in the use of FHIR for clinical research.

MATERIALS AND METHODS

We reviewed published literature, federally funded project databases, application websites, and other sources to discover FHIR-based papers, projects, and tools (collectively, "FHIR projects") available to support clinical research activities.

RESULTS

Our search identified 203 different FHIR projects applicable to clinical research. Most were associated with preparations to conduct research, such as data mapping to and from FHIR formats (n = 66, 32.5%) and managing ontologies with FHIR (n = 30, 14.8%), or post-study data activities, such as sharing data using repositories or registries (n = 24, 11.8%), general research data sharing (n = 23, 11.3%), and management of genomic data (n = 21, 10.3%). With the exception of phenotyping (n = 19, 9.4%), fewer FHIR-based projects focused on needs within the clinical research process itself.

DISCUSSION

Funding and usage of FHIR-enabled solutions for research are expanding, but most projects appear focused on establishing data pipelines and linking clinical systems such as electronic health records, patient-facing data systems, and registries, possibly due to the relative newness of FHIR and the incentives for FHIR integration in health information systems. Fewer FHIR projects were associated with research-only activities.

CONCLUSION

The FHIR standard is becoming an essential component of the clinical research enterprise. To develop FHIR's full potential for clinical research, funding and operational stakeholders should address gaps in FHIR-based research tools and methods.

Collapse

Wang L, Fu S, Wen A, Ruan X, He H, Liu S, Moon S, Mai M, Riaz IB, Wang N, Yang P, Xu H, Warner JL, Liu H. Assessment of Electronic Health Record for Cancer Research and Patient Care Through a Scoping Review of Cancer Natural Language Processing. JCO Clin Cancer Inform 2022;6:e2200006. [PMID: 35917480 PMCID: PMC9470142 DOI: 10.1200/cci.22.00006] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Revised: 03/18/2022] [Accepted: 06/15/2022] [Indexed: 11/20/2022] Open

Estevez M, Benedum CM, Jiang C, Cohen AB, Phadke S, Sarkar S, Bozkurt S. Considerations for the Use of Machine Learning Extracted Real-World Data to Support Evidence Generation: A Research-Centric Evaluation Framework. Cancers (Basel) 2022;14:cancers14133063. [PMID: 35804834 PMCID: PMC9264846 DOI: 10.3390/cancers14133063] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2022] [Revised: 06/17/2022] [Accepted: 06/17/2022] [Indexed: 02/04/2023] Open

Zhou S, Wang N, Wang L, Liu H, Zhang R. CancerBERT: a cancer domain-specific language model for extracting breast cancer phenotypes from electronic health records. J Am Med Inform Assoc 2022;29:1208-1216. [PMID: 35333345 PMCID: PMC9196678 DOI: 10.1093/jamia/ocac040] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2021] [Revised: 03/06/2022] [Accepted: 03/09/2022] [Indexed: 11/16/2022] Open

Real-world study of surgical treatment of pancreatic cancer in China: Annual Report of China Pancreas Data Center (2016–2020). JOURNAL OF PANCREATOLOGY 2022. [DOI: 10.1097/jp9.0000000000000086] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open

Hu D, Li S, Zhang H, Wu N, Lu X. Using Natural Language Processing and Machine Learning to Preoperatively Predict Lymph Node Metastasis for Non-Small Cell Lung Cancer With Electronic Medical Records: Development and Validation Study. JMIR Med Inform 2022;10:e35475. [PMID: 35468085 PMCID: PMC9086872 DOI: 10.2196/35475] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2021] [Revised: 03/31/2022] [Accepted: 04/11/2022] [Indexed: 11/21/2022] Open

Abstract

Background

Lymph node metastasis (LNM) is critical for treatment decision making of patients with resectable non–small cell lung cancer, but it is difficult to precisely diagnose preoperatively. Electronic medical records (EMRs) contain a large volume of valuable information about LNM, but some key information is recorded in free text, which hinders its secondary use.

Objective

This study aims to develop LNM prediction models based on EMRs using natural language processing (NLP) and machine learning algorithms.

Methods

We developed a multiturn question answering NLP model to extract features about the primary tumor and lymph nodes from computed tomography (CT) reports. We then combined these features with other structured clinical characteristics to develop LNM prediction models using machine learning algorithms. We conducted extensive experiments to explore the effectiveness of the predictive models and compared them with size criteria based on CT image findings (the maximum short axis diameter of lymph node >10 mm was regarded as a metastatic node) and clinician’s evaluation. Since the NLP model may extract features with mistakes, we also calculated the concordance correlation between the predicted probabilities of models using NLP-extracted features and gold standard features to explore the influence of NLP-driven automatic extraction.

Results

Experimental results show that the random forest models achieved the best performances with 0.792 area under the receiver operating characteristic curve (AUC) value and 0.456 average precision (AP) value for pN2 LNM prediction and 0.768 AUC value and 0.524 AP value for pN1&N2 LNM prediction. And all machine learning models outperformed the size criteria and clinician’s evaluation. The concordance correlation between the random forest models using NLP-extracted features and gold standard features is 0.950 and improved to 0.984 when the top 5 important NLP-extracted features were replaced with gold standard features.

Conclusions

The LNM models developed can achieve competitive performance using only limited EMR data such as CT reports and tumor markers in comparison with the clinician’s evaluation. The multiturn question answering NLP model can extract features effectively to support the development of LNM prediction models, which may facilitate the clinical application of predictive models.

Collapse

Santos T, Tariq A, Gichoya JW, Trivedi H, Banerjee I. Automatic Classification of Cancer Pathology Reports: A Systematic Review. J Pathol Inform 2022;13:100003. [PMID: 35242443 PMCID: PMC8860734 DOI: 10.1016/j.jpi.2022.100003] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2021] [Accepted: 11/12/2021] [Indexed: 11/30/2022] Open

Kim NH, Kim JM, Park DM, Ji SR, Kim JW. Analysis of depression in social media texts through the Patient Health Questionnaire-9 and natural language processing. Digit Health 2022;8:20552076221114204. [PMID: 35874865 PMCID: PMC9297458 DOI: 10.1177/20552076221114204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Accepted: 06/30/2022] [Indexed: 12/05/2022] Open

Abedian S, Sholle ET, Adekkanattu PM, Cusick MM, Weiner SE, Shoag JE, Hu JC, Campion TR. Automated Extraction of Tumor Staging and Diagnosis Information From Surgical Pathology Reports. JCO Clin Cancer Inform 2021;5:1054-1061. [PMID: 34694896 PMCID: PMC8812635 DOI: 10.1200/cci.21.00065] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2021] [Revised: 08/25/2021] [Accepted: 09/29/2021] [Indexed: 11/20/2022] Open

Yuan Z, Finan S, Warner J, Savova G, Hochheiser H. Interactive Exploration of Longitudinal Cancer Patient Histories Extracted From Clinical Text. JCO Clin Cancer Inform 2021;4:412-420. [PMID: 32383981 PMCID: PMC7265796 DOI: 10.1200/cci.19.00115] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Belenkaya R, Gurley MJ, Golozar A, Dymshyts D, Miller RT, Williams AE, Ratwani S, Siapos A, Korsik V, Warner J, Campbell WS, Rivera D, Banokina T, Modina E, Bethusamy S, Stewart HM, Patel M, Chen R, Falconer T, Park RW, You SC, Jeon H, Shin SJ, Reich C. Extending the OMOP Common Data Model and Standardized Vocabularies to Support Observational Cancer Research. JCO Clin Cancer Inform 2021;5:12-20. [PMID: 33411620 PMCID: PMC8140810 DOI: 10.1200/cci.20.00079] [Citation(s) in RCA: 27] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Hu D, Zhang H, Li S, Wang Y, Wu N, Lu X. Automatic Extraction of Lung Cancer Staging Information From Computed Tomography Reports: Deep Learning Approach. JMIR Med Inform 2021;9:e27955. [PMID: 34287213 PMCID: PMC8339987 DOI: 10.2196/27955] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2021] [Revised: 05/27/2021] [Accepted: 06/07/2021] [Indexed: 01/04/2023] Open

Abstract

BACKGROUND

Lung cancer is the leading cause of cancer deaths worldwide. Clinical staging of lung cancer plays a crucial role in making treatment decisions and evaluating prognosis. However, in clinical practice, approximately one-half of the clinical stages of lung cancer patients are inconsistent with their pathological stages. As one of the most important diagnostic modalities for staging, chest computed tomography (CT) provides a wealth of information about cancer staging, but the free-text nature of the CT reports obstructs their computerization.

OBJECTIVE

We aimed to automatically extract the staging-related information from CT reports to support accurate clinical staging of lung cancer.

METHODS

In this study, we developed an information extraction (IE) system to extract the staging-related information from CT reports. The system consisted of the following three parts: named entity recognition (NER), relation classification (RC), and postprocessing (PP). We first summarized 22 questions about lung cancer staging based on the TNM staging guideline. Next, three state-of-the-art NER algorithms were implemented to recognize the entities of interest. Next, we designed a novel RC method using the relation sign constraint (RSC) to classify the relations between entities. Finally, a rule-based PP module was established to obtain the formatted answers using the results of NER and RC.

RESULTS

We evaluated the developed IE system on a clinical data set containing 392 chest CT reports collected from the Department of Thoracic Surgery II in the Peking University Cancer Hospital. The experimental results showed that the bidirectional encoder representation from transformers (BERT) model outperformed the iterated dilated convolutional neural networks-conditional random field (ID-CNN-CRF) and bidirectional long short-term memory networks-conditional random field (Bi-LSTM-CRF) for NER tasks with macro-F1 scores of 80.97% and 90.06% under the exact and inexact matching schemes, respectively. For the RC task, the proposed RSC showed better performance than the baseline methods. Further, the BERT-RSC model achieved the best performance with a macro-F1 score of 97.13% and a micro-F1 score of 98.37%. Moreover, the rule-based PP module could correctly obtain the formatted results using the extractions of NER and RC, achieving a macro-F1 score of 94.57% and a micro-F1 score of 96.74% for all the 22 questions.

CONCLUSIONS

We conclude that the developed IE system can effectively and accurately extract information about lung cancer staging from CT reports. Experimental results show that the extracted results have significant potential for further use in stage verification and prediction to facilitate accurate clinical staging.

Collapse

Development of a generalizable natural language processing pipeline to extract physician-reported pain from clinical reports: Generated using publicly-available datasets and tested on institutional clinical reports for cancer patients with bone metastases. J Biomed Inform 2021;120:103864. [PMID: 34265451 DOI: 10.1016/j.jbi.2021.103864] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2020] [Revised: 06/30/2021] [Accepted: 07/04/2021] [Indexed: 11/22/2022]

Abstract

OBJECTIVE

The majority of cancer patients suffer from severe pain at the advanced stage of their illness. In most cases, cancer pain is underestimated by clinical staff and is not properly managed until it reaches a critical stage. Therefore, detecting and addressing cancer pain early can potentially improve the quality of life of cancer patients. The objective of this research project was to develop a generalizable Natural Language Processing (NLP) pipeline to find and classify physician-reported pain in the radiation oncology consultation notes of cancer patients with bone metastases.

MATERIALS AND METHODS

The texts of 1249 publicly-available hospital discharge notes in the i2b2 database were used as a training and validation set. The MetaMap and NegEx algorithms were implemented for medical terms extraction. Sets of NLP rules were developed to score pain terms in each note. By averaging pain scores, each note was assigned to one of the three verbally-declared pain (VDP) labels, including no pain, pain, and no mention of pain. Without further training, the generalizability of our pipeline in scoring individual pain terms was tested independently using 30 hospital discharge notes from the MIMIC-III database and 30 consultation notes of cancer patients with bone metastasis from our institution's radiation oncology electronic health record. Finally, 150 notes from our institution were used to assess the pipeline's performance at assigning VDP.

RESULTS

Our NLP pipeline successfully detected and quantified pain in the i2b2 summary notes with 93% overall precision and 92% overall recall. Testing on the MIMIC-III database achieved precision and recall of 91% and 86% respectively. The pipeline successfully detected pain with 89% precision and 82% recall on our institutional radiation oncology corpus. Finally, our pipeline assigned a VDP to each note in our institutional corpus with 84% and 82% precision and recall, respectively.

CONCLUSION

Our NLP pipeline enables the detection and classification of physician-reported pain in our radiation oncology corpus. This portable and ready-to-use pipeline can be used to automatically extract and classify physician-reported pain from clinical notes where the pain is not otherwise documented through structured data entry.

Collapse

Wen A, Rasmussen LV, Stone D, Liu S, Kiefer R, Adekkanattu P, Brandt PS, Pacheco JA, Luo Y, Wang F, Pathak J, Liu H, Jiang G. CQL4NLP: Development and Integration of FHIR NLP Extensions in Clinical Quality Language for EHR-driven Phenotyping. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2021;2021:624-633. [PMID: 34457178 PMCID: PMC8378647] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]

Bitterman DS, Miller TA, Mak RH, Savova GK. Clinical Natural Language Processing for Radiation Oncology: A Review and Practical Primer. Int J Radiat Oncol Biol Phys 2021;110:641-655. [PMID: 33545300 DOI: 10.1016/j.ijrobp.2021.01.044] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2020] [Revised: 12/22/2020] [Accepted: 01/23/2021] [Indexed: 02/07/2023]

Abstract

Natural language processing (NLP), which aims to convert human language into expressions that can be analyzed by computers, is one of the most rapidly developing and widely used technologies in the field of artificial intelligence. Natural language processing algorithms convert unstructured free text data into structured data that can be extracted and analyzed at scale. In medicine, this unlocking of the rich, expressive data within clinical free text in electronic medical records will help untap the full potential of big data for research and clinical purposes. Recent major NLP algorithmic advances have significantly improved the performance of these algorithms, leading to a surge in academic and industry interest in developing tools to automate information extraction and phenotyping from clinical texts. Thus, these technologies are poised to transform medical research and alter clinical practices in the future. Radiation oncology stands to benefit from NLP algorithms if they are appropriately developed and deployed, as they may enable advances such as automated inclusion of radiation therapy details into cancer registries, discovery of novel insights about cancer care, and improved patient data curation and presentation at the point of care. However, challenges remain before the full value of NLP is realized, such as the plethora of jargon specific to radiation oncology, nonstandard nomenclature, a lack of publicly available labeled data for model development, and interoperability limitations between radiation oncology data silos. Successful development and implementation of high quality and high value NLP models for radiation oncology will require close collaboration between computer scientists and the radiation oncology community. Here, we present a primer on artificial intelligence algorithms in general and NLP algorithms in particular; provide guidance on how to assess the performance of such algorithms; review prior research on NLP algorithms for oncology; and describe future avenues for NLP in radiation oncology research and clinics.

Collapse

Integrating Speculation Detection and Deep Learning to Extract Lung Cancer Diagnosis from Clinical Notes. APPLIED SCIENCES-BASEL 2021. [DOI: 10.3390/app11020865] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]

Hong JC, Fairchild AT, Tanksley JP, Palta M, Tenenbaum JD. Natural language processing for abstraction of cancer treatment toxicities: accuracy versus human experts. JAMIA Open 2020;3:513-517. [PMID: 33623888 PMCID: PMC7886534 DOI: 10.1093/jamiaopen/ooaa064] [Citation(s) in RCA: 14] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 10/26/2020] [Accepted: 10/30/2020] [Indexed: 12/29/2022] Open

Zong N, Wen A, Stone DJ, Sharma DK, Wang C, Yu Y, Liu H, Shi Q, Jiang G. Developing an FHIR-Based Computational Pipeline for Automatic Population of Case Report Forms for Colorectal Cancer Clinical Trials Using Electronic Health Records. JCO Clin Cancer Inform 2020;4:201-209. [PMID: 32134686 PMCID: PMC7113084 DOI: 10.1200/cci.19.00116] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Abstract

PURPOSE

The Fast Healthcare Interoperability Resources (FHIR) is emerging as a next-generation standards framework developed by HL7 for exchanging electronic health care data. The modeling capability of FHIR in standardizing cancer data has been gaining increasing attention by the cancer research informatics community. However, few studies have been conducted to examine the capability of FHIR in electronic data capture (EDC) applications for effective cancer clinical trials. The objective of this study was to design, develop, and evaluate an FHIR-based method that enables the automation of the case report forms (CRFs) population for cancer clinical trials using real-world electronic health records (EHRs).

MATERIALS AND METHODS

We developed an FHIR-based computational pipeline of EDC with a case study for modeling colorectal cancer trials. We first leveraged an existing FHIR-based cancer profile to represent EHR data of patients with colorectal cancer, and then we used the FHIR Questionnaire and QuestionnaireResponse resources to represent the CRFs and their data population. To test the accuracy of and overall quality of the computational pipeline, we used synoptic reports of 287 Mayo Clinic patients with colorectal cancer from 2013 to 2019 with standard measures of precision, recall, and F1 score.

RESULTS

Using the computational pipeline, a total of 1,037 synoptic reports were successfully converted as the instances of the FHIR-based cancer profile. The average accuracy for converting all data elements (excluding tumor perforation) of the cancer profile was 0.99, using 200 randomly selected records. The average F1 score for populating nine questions of the CRFs in a real-world colorectal cancer trial was 0.95, using 100 randomly selected records.

CONCLUSION

We demonstrated that it is feasible to populate CRFs with EHR data in an automated manner with satisfactory performance. The outcome of the study provides helpful insight into future directions in implementing FHIR-based EDC applications for modern cancer clinical trials.

Collapse

Artificial intelligence in radiation oncology. Nat Rev Clin Oncol 2020;17:771-781. [PMID: 32843739 DOI: 10.1038/s41571-020-0417-8] [Citation(s) in RCA: 138] [Impact Index Per Article: 34.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/14/2020] [Indexed: 12/14/2022]

Baxter SL, Klie AR, Radha Saseendrakumar B, Ye GY, Hogarth M. Text Processing for Detection of Fungal Ocular Involvement in Critical Care Patients: Cross-Sectional Study. J Med Internet Res 2020;22:e18855. [PMID: 32795984 PMCID: PMC7455861 DOI: 10.2196/18855] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2020] [Revised: 04/21/2020] [Accepted: 06/13/2020] [Indexed: 11/13/2022] Open

Abstract

Background

Fungal ocular involvement can develop in patients with fungal bloodstream infections and can be vision-threatening. Ocular involvement has become less common in the current era of improved antifungal therapies. Retrospectively determining the prevalence of fungal ocular involvement is important for informing clinical guidelines, such as the need for routine ophthalmologic consultations. However, manual retrospective record review to detect cases is time-consuming.

Objective

This study aimed to determine the prevalence of fungal ocular involvement in a critical care database using both structured and unstructured electronic health record (EHR) data.

Methods

We queried microbiology data from 46,467 critical care patients over 12 years (2000-2012) from the Medical Information Mart for Intensive Care III (MIMIC-III) to identify 265 patients with culture-proven fungemia. For each fungemic patient, demographic data, fungal species present in blood culture, and risk factors for fungemia (eg, presence of indwelling catheters, recent major surgery, diabetes, immunosuppressed status) were ascertained. All structured diagnosis codes and free-text narrative notes associated with each patient’s hospitalization were also extracted. Screening for fungal endophthalmitis was performed using two approaches: (1) by querying a wide array of eye- and vision-related diagnosis codes, and (2) by utilizing a custom regular expression pipeline to identify and collate relevant text matches pertaining to fungal ocular involvement. Both approaches were validated using manual record review. The main outcome measure was the documentation of any fungal ocular involvement.

Results

In total, 265 patients had culture-proven fungemia, with Candida albicans (n=114, 43%) and Candida glabrata (n=74, 28%) being the most common fungal species in blood culture. The in-hospital mortality rate was 121 (46%). In total, 7 patients were identified as having eye- or vision-related diagnosis codes, none of whom had fungal endophthalmitis based on record review. There were 26,830 free-text narrative notes associated with these 265 patients. A regular expression pipeline based on relevant terms yielded possible matches in 683 notes from 108 patients. Subsequent manual record review again demonstrated that no patients had fungal ocular involvement. Therefore, the prevalence of fungal ocular involvement in this cohort was 0%.

Conclusions

MIMIC-III contained no cases of ocular involvement among fungemic patients, consistent with prior studies reporting low rates of ocular involvement in fungemia. This study demonstrates an application of natural language processing to expedite the review of narrative notes. This approach is highly relevant for ophthalmology, where diagnoses are often based on physical examination findings that are documented within clinical notes.

Collapse

Griffin AC, Topaloglu U, Davis S, Chung AE. From Patient Engagement to Precision Oncology: Leveraging Informatics to Advance Cancer Care. Yearb Med Inform 2020;29:235-242. [PMID: 32823322 PMCID: PMC7442514 DOI: 10.1055/s-0040-1701983] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Miller TA, Avillach P, Mandl KD. Experiences implementing scalable, containerized, cloud-based NLP for extracting biobank participant phenotypes at scale. JAMIA Open 2020;3:185-189. [PMID: 32734158 PMCID: PMC7382623 DOI: 10.1093/jamiaopen/ooaa016] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Revised: 04/03/2020] [Accepted: 04/14/2020] [Indexed: 11/30/2022] Open

Rahimian M, Warner JL, Jain SK, Davis RB, Zerillo JA, Joyce RM. Significant and Distinctive n-Grams in Oncology Notes: A Text-Mining Method to Analyze the Effect of OpenNotes on Clinical Documentation. JCO Clin Cancer Inform 2020;3:1-9. [PMID: 31184919 PMCID: PMC6873977 DOI: 10.1200/cci.19.00012] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open

Giannaris PS, Al-Taie Z, Kovalenko M, Thanintorn N, Kholod O, Innokenteva Y, Coberly E, Frazier S, Laziuk K, Popescu M, Shyu CR, Xu D, Hammer RD, Shin D. Artificial Intelligence-Driven Structurization of Diagnostic Information in Free-Text Pathology Reports. J Pathol Inform 2020;11:4. [PMID: 32166042 PMCID: PMC7045509 DOI: 10.4103/jpi.jpi_30_19] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2019] [Accepted: 12/18/2019] [Indexed: 12/15/2022] Open

Affiliation(s)

Pericles S. Giannaris Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States
Zainab Al-Taie Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States Department of Computer Science, College of Science for Women, University of Baghdad, Baghdad, Iraq
Mikhail Kovalenko Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States
Nattapon Thanintorn Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States
Olha Kholod Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States
Yulia Innokenteva Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States
Emily Coberly Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States
Shellaine Frazier Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States
Katsiarina Laziuk Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States
Mihail Popescu Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States Department of Electrical Engineering and Computer Science, College of Engineering, University of Missouri, Columbia, MO 65211, United States Department of Health Management and Informatics, School of Medicine, University of Missouri, Columbia, MO 65212, United States
Chi-Ren Shyu Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States Department of Electrical Engineering and Computer Science, College of Engineering, University of Missouri, Columbia, MO 65211, United States
Dong Xu Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States Department of Electrical Engineering and Computer Science, College of Engineering, University of Missouri, Columbia, MO 65211, United States
Richard D. Hammer Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States
Dmitriy Shin Institute for Data Science and Informatics, University of Missouri, Columbia, MO 65211, United States Department of Pathology and Anatomical Sciences, School of Medicine, University of Missouri, Columbia, MO 65212, United States Department of Electrical Engineering and Computer Science, College of Engineering, University of Missouri, Columbia, MO 65211, United States

Collapse

Ju M, Short AD, Thompson P, Bakerly ND, Gkoutos GV, Tsaprouni L, Ananiadou S. Annotating and detecting phenotypic information for chronic obstructive pulmonary disease. JAMIA Open 2020;2:261-271. [PMID: 31984360 PMCID: PMC6951876 DOI: 10.1093/jamiaopen/ooz009] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2018] [Revised: 02/21/2019] [Accepted: 03/19/2019] [Indexed: 12/29/2022] Open

Abstract

Objectives

Chronic obstructive pulmonary disease (COPD) phenotypes cover a range of lung abnormalities. To allow text mining methods to identify pertinent and potentially complex information about these phenotypes from textual data, we have developed a novel annotated corpus, which we use to train a neural network-based named entity recognizer to detect fine-grained COPD phenotypic information.

Materials and methods

Since COPD phenotype descriptions often mention other concepts within them (proteins, treatments, etc.), our corpus annotations include both outermost phenotype descriptions and concepts nested within them. Our neural layered bidirectional long short-term memory conditional random field (BiLSTM-CRF) network firstly recognizes nested mentions, which are fed into subsequent BiLSTM-CRF layers, to help to recognize enclosing phenotype mentions.

Results

Our corpus of 30 full papers (available at: http://www.nactem.ac.uk/COPD) is annotated by experts with 27 030 phenotype-related concept mentions, most of which are automatically linked to UMLS Metathesaurus concepts. When trained using the corpus, our BiLSTM-CRF network outperforms other popular approaches in recognizing detailed phenotypic information.

Discussion

Information extracted by our method can facilitate efficient location and exploration of detailed information about phenotypes, for example, those specifically concerning reactions to treatments.

Conclusion

The importance of our corpus for developing methods to extract fine-grained information about COPD phenotypes is demonstrated through its successful use to train a layered BiLSTM-CRF network to extract phenotypic information at various levels of granularity. The minimal human intervention needed for training should permit ready adaption to extracting phenotypic information about other diseases.

Collapse

Wang L, Luo L, Wang Y, Wampfler J, Yang P, Liu H. Natural language processing for populating lung cancer clinical research data. BMC Med Inform Decis Mak 2019;19:239. [PMID: 31801515 PMCID: PMC6894100 DOI: 10.1186/s12911-019-0931-8] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open

Lacson R, Laroya R, Wang A, Kapoor N, Glazer DI, Shinagare A, Ip IK, Malhotra S, Hentel K, Khorasani R. Integrity of clinical information in computerized order requisitions for diagnostic imaging. J Am Med Inform Assoc 2019;25:1651-1656. [PMID: 30517649 DOI: 10.1093/jamia/ocy133] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2018] [Accepted: 09/19/2018] [Indexed: 11/14/2022] Open

Abstract

Objective

Assess information integrity (concordance and completeness of documented exam indications from the electronic health record [EHR] imaging order requisition, compared to EHR provider notes), and assess potential impact of indication inaccuracies on exam planning and interpretation.

Methods

This retrospective study, approved by the Institutional Review Board, was conducted at a tertiary academic medical center. There were 139 MRI lumbar spine (LS-MRI) and 176 CT abdomen/pelvis orders performed 4/1/2016-5/31/2016 randomly selected and reviewed by 4 radiologists for concordance and completeness of relevant exam indications in order requisitions compared to provider notes, and potential impact of indication inaccuracies on exam planning and interpretation. Forty each LS-MRI and CT abdomen/pelvis were re-reviewed to assess kappa agreement.

Results

Requisition indications were more likely to be incomplete (256/315, 81%) than discordant (133/315, 42%) compared to provider notes (p < 0.0001). Potential impact of discrepancy between clinical information in requisitions and provider notes was higher for radiologist's interpretation than for exam planning (135/315, 43%, vs 25/315, 8%, p < 0.0001). Agreement among radiologists for concordance, completeness, and potential impact was moderate to strong (Kappa 0.66-0.89). Indications in EHR order requisitions are frequently incomplete or discordant compared to physician notes, potentially impacting imaging exam planning, interpretation and accurate diagnosis. Such inaccuracies could also diminish the relevance of clinical decision support alerts if based on information in order requisitions.

Conclusions

Improved availability of relevant documented clinical information within EHR imaging requisition is necessary for optimal exam planning and interpretation.

Collapse

Haleem A, Javaid M, Khan IH. Current status and applications of Artificial Intelligence (AI) in medical field: An overview. ACTA ACUST UNITED AC 2019. [DOI: 10.1016/j.cmrp.2019.11.005] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]

Savova GK, Danciu I, Alamudun F, Miller T, Lin C, Bitterman DS, Tourassi G, Warner JL. Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records. Cancer Res 2019;79:5463-5470. [PMID: 31395609 PMCID: PMC7227798 DOI: 10.1158/0008-5472.can-19-0579] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Revised: 06/17/2019] [Accepted: 07/29/2019] [Indexed: 12/12/2022]

Hong N, Wen A, Stone DJ, Tsuji S, Kingsbury PR, Rasmussen LV, Pacheco JA, Adekkanattu P, Wang F, Luo Y, Pathak J, Liu H, Jiang G. Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries. J Biomed Inform 2019;99:103310. [PMID: 31622801 PMCID: PMC6990976 DOI: 10.1016/j.jbi.2019.103310] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2019] [Revised: 09/15/2019] [Accepted: 10/11/2019] [Indexed: 12/16/2022]

Abstract

BACKGROUND

Standards-based clinical data normalization has become a key component of effective data integration and accurate phenotyping for secondary use of electronic healthcare records (EHR) data. HL7 Fast Healthcare Interoperability Resources (FHIR) is an emerging clinical data standard for exchanging electronic healthcare data and has been used in modeling and integrating both structured and unstructured EHR data for a variety of clinical research applications. The overall objective of this study is to develop and evaluate a FHIR-based EHR phenotyping framework for identification of patients with obesity and its multiple comorbidities from semi-structured discharge summaries leveraging a FHIR-based clinical data normalization pipeline (known as NLP2FHIR).

METHODS

We implemented a multi-class and multi-label classification system based on the i2b2 Obesity Challenge task to evaluate the FHIR-based EHR phenotyping framework. Two core parts of the framework are: (a) the conversion of discharge summaries into corresponding FHIR resources - Composition, Condition, MedicationStatement, Procedure and FamilyMemberHistory using the NLP2FHIR pipeline, and (b) the implementation of four machine learning algorithms (logistic regression, support vector machine, decision tree, and random forest) to train classifiers to predict disease state of obesity and 15 comorbidities using features extracted from standard FHIR resources and terminology expansions. We used the macro- and micro-averaged precision (P), recall (R), and F1 score (F1) measures to evaluate the classifier performance. We validated the framework using a second obesity dataset extracted from the MIMIC-III database.

RESULTS

Using the NLP2FHIR pipeline, 1237 clinical discharge summaries from the 2008 i2b2 obesity challenge dataset were represented as the instances of the FHIR Composition resource consisting of 5677 records with 16 unique section types. After the NLP processing and FHIR modeling, a set of 244,438 FHIR clinical resource instances were generated. As the results of the four machine learning classifiers, the random forest algorithm performed the best with F1-micro(0.9466)/F1-macro(0.7887) and F1-micro(0.9536)/F1-macro(0.6524) for intuitive classification (reflecting medical professionals' judgments) and textual classification (reflecting the judgments based on explicitly reported information of diseases), respectively. The MIMIC-III obesity dataset was successfully integrated for prediction with minimal configuration of the NLP2FHIR pipeline and machine learning models.

CONCLUSIONS

The study demonstrated that the FHIR-based EHR phenotyping approach could effectively identify the state of obesity and multiple comorbidities using semi-structured discharge summaries. Our FHIR-based phenotyping approach is a first concrete step towards improving the data aspect of phenotyping portability across EHR systems and enhancing interpretability of the machine learning-based phenotyping algorithms.

Collapse

[Basis and perspectives of artificial intelligence in radiation therapy]. Cancer Radiother 2019;23:913-916. [PMID: 31645301 DOI: 10.1016/j.canrad.2019.08.005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2019] [Revised: 08/15/2019] [Accepted: 08/20/2019] [Indexed: 11/23/2022]

Hong N, Wen A, Shen F, Sohn S, Wang C, Liu H, Jiang G. Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data. JAMIA Open 2019;2:570-579. [PMID: 32025655 PMCID: PMC6993992 DOI: 10.1093/jamiaopen/ooz056] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 09/23/2019] [Accepted: 10/01/2019] [Indexed: 11/30/2022] Open

Abstract

Objective

To design, develop, and evaluate a scalable clinical data normalization pipeline for standardizing unstructured electronic health record (EHR) data leveraging the HL7 Fast Healthcare Interoperability Resources (FHIR) specification.

Methods

We established an FHIR-based clinical data normalization pipeline known as NLP2FHIR that mainly comprises: (1) a module for a core natural language processing (NLP) engine with an FHIR-based type system; (2) a module for integrating structured data; and (3) a module for content normalization. We evaluated the FHIR modeling capability focusing on core clinical resources such as Condition, Procedure, MedicationStatement (including Medication), and FamilyMemberHistory using Mayo Clinic’s unstructured EHR data. We constructed a gold standard reusing annotation corpora from previous NLP projects.

Results

A total of 30 mapping rules, 62 normalization rules, and 11 NLP-specific FHIR extensions were created and implemented in the NLP2FHIR pipeline. The elements that need to integrate structured data from each clinical resource were identified. The performance of unstructured data modeling achieved F scores ranging from 0.69 to 0.99 for various FHIR element representations (0.69–0.99 for Condition; 0.75–0.84 for Procedure; 0.71–0.99 for MedicationStatement; and 0.75–0.95 for FamilyMemberHistory).

Conclusion

We demonstrated that the NLP2FHIR pipeline is feasible for modeling unstructured EHR data and integrating structured elements into the model. The outcomes of this work provide standards-based tools of clinical data normalization that is indispensable for enabling portable EHR-driven phenotyping and large-scale data analytics, as well as useful insights for future developments of the FHIR specifications with regard to handling unstructured clinical data.

Collapse

A frame semantic overview of NLP-based information extraction for cancer-related EHR notes. J Biomed Inform 2019;100:103301. [PMID: 31589927 DOI: 10.1016/j.jbi.2019.103301] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2019] [Revised: 09/04/2019] [Accepted: 10/03/2019] [Indexed: 02/07/2023]

Abstract

OBJECTIVE

There is a lot of information about cancer in Electronic Health Record (EHR) notes that can be useful for biomedical research provided natural language processing (NLP) methods are available to extract and structure this information. In this paper, we present a scoping review of existing clinical NLP literature for cancer.

METHODS

We identified studies describing an NLP method to extract specific cancer-related information from EHR sources from PubMed, Google Scholar, ACL Anthology, and existing reviews. Two exclusion criteria were used in this study. We excluded articles where the extraction techniques used were too broad to be represented as frames (e.g., document classification) and also where very low-level extraction methods were used (e.g. simply identifying clinical concepts). 78 articles were included in the final review. We organized this information according to frame semantic principles to help identify common areas of overlap and potential gaps.

RESULTS

Frames were created from the reviewed articles pertaining to cancer information such as cancer diagnosis, tumor description, cancer procedure, breast cancer diagnosis, prostate cancer diagnosis and pain in prostate cancer patients. These frames included both a definition as well as specific frame elements (i.e. extractable attributes). We found that cancer diagnosis was the most common frame among the reviewed papers (36 out of 78), with recent work focusing on extracting information related to treatment and breast cancer diagnosis.

CONCLUSION

The list of common frames described in this paper identifies important cancer-related information extracted by existing NLP techniques and serves as a useful resource for future researchers requiring cancer information extracted from EHR notes. We also argue, due to the heavy duplication of cancer NLP systems, that a general purpose resource of annotated cancer frames and corresponding NLP tools would be valuable.

Collapse

Soysal E, Warner JL, Wang J, Jiang M, Harvey K, Jain SK, Dong X, Song HY, Siddhanamatha H, Wang L, Dai Q, Chen Q, Du X, Tao C, Yang P, Denny JC, Liu H, Xu H. Developing Customizable Cancer Information Extraction Modules for Pathology Reports Using CLAMP. Stud Health Technol Inform 2019;264:1041-1045. [PMID: 31438083 PMCID: PMC7359882 DOI: 10.3233/shti190383] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]

Affiliation(s)

Ergin Soysal School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
Jeremy L. Warner Department of Medicine, Vanderbilt University, Nashville, Tennessee Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee Vanderbilt-Ingram Cancer Center, Vanderbilt University Medical Center, Nashville, Tennessee
Jingqi Wang School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
Min Jiang School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
Krysten Harvey Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee
Sandeep Kumar Jain Vanderbilt School of Medicine, Vanderbilt University, Nashville, Tennessee
Xiao Dong School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
Hsing-Yi Song School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
Harish Siddhanamatha School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
Liwei Wang Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, Minnesota
Qi Dai Department of Medicine, Vanderbilt University, Nashville, Tennessee
Qingxia Chen Department of Biostatistics, Vanderbilt University Medical Center, Nashville, Tennessee
Xianglin Du School of Public Health, The University of Texas Health Science Center at Houston, Houston, Texas
Cui Tao School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas
Ping Yang Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, Minnesota
Joshua Charles Denny Department of Medicine, Vanderbilt University, Nashville, Tennessee Department of Biomedical Informatics, Vanderbilt University, Nashville, Tennessee
Hongfang Liu Department of Health Sciences Research, Mayo Clinic College of Medicine, Rochester, Minnesota
Hua Xu School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas

Collapse

Warner JL, Dymshyts D, Reich CG, Gurley MJ, Hochheiser H, Moldwin ZH, Belenkaya R, Williams AE, Yang PC. HemOnc: A new standard vocabulary for chemotherapy regimen representation in the OMOP common data model. J Biomed Inform 2019;96:103239. [PMID: 31238109 DOI: 10.1016/j.jbi.2019.103239] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2019] [Revised: 06/20/2019] [Accepted: 06/21/2019] [Indexed: 10/26/2022]

Meystre SM, Heider PM, Kim Y, Aruch DB, Britten CD. Automatic trial eligibility surveillance based on unstructured clinical data. Int J Med Inform 2019;129:13-19. [PMID: 31445247 DOI: 10.1016/j.ijmedinf.2019.05.018] [Citation(s) in RCA: 27] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2018] [Revised: 12/06/2018] [Accepted: 05/21/2019] [Indexed: 12/26/2022]

Abstract

INTRODUCTION

Insufficient patient enrollment in clinical trials remains a serious and costly problem and is often considered the most critical issue to solve for the clinical trials community. In this project, we assessed the feasibility of automatically detecting a patient's eligibility for a sample of breast cancer clinical trials by mapping coded clinical trial eligibility criteria to the corresponding clinical information automatically extracted from text in the EHR.

METHODS

Three open breast cancer clinical trials were selected by oncologists. Their eligibility criteria were manually abstracted from trial descriptions using the OHDSI ATLAS web application. Patients enrolled or screened for these trials were selected as 'positive' or 'possible' cases. Other patients diagnosed with breast cancer were selected as 'negative' cases. A selection of the clinical data and all clinical notes of these 229 selected patients was extracted from the MUSC clinical data warehouse and stored in a database implementing the OMOP common data model. Eligibility criteria were extracted from clinical notes using either manually crafted pattern matching (regular expressions) or a new natural language processing (NLP) application. These extracted criteria were then compared with reference criteria from trial descriptions. This comparison was realized with three different versions of a new application: rule-based, cosine similarity-based, and machine learning-based.

RESULTS

For eligibility criteria extraction from clinical notes, the machine learning-based NLP application allowed for the highest accuracy with a micro-averaged recall of 90.9% and precision of 89.7%. For trial eligibility determination, the highest accuracy was reached by the machine learning-based approach with a per-trial AUC between 75.5% and 89.8%.

CONCLUSION

NLP can be used to extract eligibility criteria from EHR clinical notes and automatically discover patients possibly eligible for a clinical trial with good accuracy, which could be leveraged to reduce the workload of humans screening patients for trials.

Collapse

Si Y, Roberts K. A Frame-Based NLP System for Cancer-Related Information Extraction. AMIA ... ANNUAL SYMPOSIUM PROCEEDINGS. AMIA SYMPOSIUM 2018;2018:1524-1533. [PMID: 30815198 PMCID: PMC6371330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]

Ahlbrandt J, Lablans M, Glocker K, Stahl-Toyota S, Maier-Hein K, Maier-Hein L, Ückert F. Modern Information Technology for Cancer Research: What's in IT for Me? An Overview of Technologies and Approaches. Oncology 2018;98:363-369. [PMID: 30439700 DOI: 10.1159/000493638] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Accepted: 09/10/2018] [Indexed: 11/19/2022]

Khor RC, Nguyen A, O'Dwyer J, Kothari G, Sia J, Chang D, Ng SP, Duchesne GM, Foroudi F. Extracting tumour prognostic factors from a diverse electronic record dataset in genito-urinary oncology. Int J Med Inform 2018;121:53-57. [PMID: 30545489 DOI: 10.1016/j.ijmedinf.2018.10.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2017] [Revised: 09/17/2018] [Accepted: 10/21/2018] [Indexed: 11/27/2022]

Warner JL, Patt DA. Cancer Informatics in 2017: A New Beginning and a Bright Future. Yearb Med Inform 2018;27:223-226. [PMID: 30157527 PMCID: PMC6115215 DOI: 10.1055/s-0038-1667086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open

Learning Eligibility in Cancer Clinical Trials Using Deep Neural Networks. APPLIED SCIENCES-BASEL 2018. [DOI: 10.3390/app8071206] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]